git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 1/3] xdiff: -W: relax end-of-file function detection
@ 2017-01-13 16:15 Vegard Nossum
  2017-01-13 16:15 ` [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context Vegard Nossum
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Vegard Nossum @ 2017-01-13 16:15 UTC (permalink / raw)
  To: Junio C Hamano, git; +Cc: Vegard Nossum, René Scharfe

When adding a new function to the end of a file, it's enough to know
that 1) the addition is at the end of the file; and 2) there is a
function _somewhere_ in there.

If we had simply been changing the end of an existing function, then we
would also be deleting something from the old version.

This fixes the case where we add e.g.

	// Begin of dummy
	static int dummy(void)
	{
	}

to the end of the file.

Cc: René Scharfe <l.s.r@web.de>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
---
 xdiff/xemit.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index 7389ce4..8c88dbd 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -183,16 +183,14 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 
 				/*
 				 * We don't need additional context if
-				 * a whole function was added, possibly
-				 * starting with empty lines.
+				 * a whole function was added.
 				 */
-				while (i2 < xe->xdf2.nrec &&
-				       is_empty_rec(&xe->xdf2, i2))
+				while (i2 < xe->xdf2.nrec) {
+					if (match_func_rec(&xe->xdf2, xecfg, i2,
+						dummy, sizeof(dummy)) >= 0)
+						goto post_context_calculation;
 					i2++;
-				if (i2 < xe->xdf2.nrec &&
-				    match_func_rec(&xe->xdf2, xecfg, i2,
-						   dummy, sizeof(dummy)) >= 0)
-					goto post_context_calculation;
+				}
 
 				/*
 				 * Otherwise get more context from the
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
  2017-01-13 16:15 [PATCH 1/3] xdiff: -W: relax end-of-file function detection Vegard Nossum
@ 2017-01-13 16:15 ` Vegard Nossum
  2017-01-13 18:19   ` René Scharfe
  2017-01-13 16:15 ` [PATCH 3/3] t/t4051-diff-function-context: improve tests for new diff -W behaviour Vegard Nossum
  2017-01-13 17:49 ` [PATCH 1/3] xdiff: -W: relax end-of-file function detection René Scharfe
  2 siblings, 1 reply; 15+ messages in thread
From: Vegard Nossum @ 2017-01-13 16:15 UTC (permalink / raw)
  To: Junio C Hamano, git; +Cc: Vegard Nossum, René Scharfe

When using -W to include the whole function in the diff context, you
are typically doing this to be able to review the change in its entirety
within the context of the function. It is therefore almost always
desirable to include any comments that immediately precede the function.

This also the fixes the case for C where the declaration is split across
multiple lines (where the first line of the declaration would not be
included in the output), e.g.:

	void
	dummy(void)
	{
		...
	}

We can include these lines by simply scanning upwards from the place of
the detected function start until we hit the first non-blank line.

Cc: René Scharfe <l.s.r@web.de>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
---
 xdiff/xemit.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index 8c88dbd..3a3060d 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -200,6 +200,8 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 			}
 
 			fs1 = get_func_line(xe, xecfg, NULL, i1, -1);
+			while (fs1 > 0 && !is_empty_rec(&xe->xdf1, fs1 - 1))
+				fs1--;
 			if (fs1 < 0)
 				fs1 = 0;
 			if (fs1 < s1) {
@@ -220,6 +222,8 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 			long fe1 = get_func_line(xe, xecfg, NULL,
 						 xche->i1 + xche->chg1,
 						 xe->xdf1.nrec);
+			while (fe1 > 0 && !is_empty_rec(&xe->xdf1, fe1 - 1))
+				fe1--;
 			while (fe1 > 0 && is_empty_rec(&xe->xdf1, fe1 - 1))
 				fe1--;
 			if (fe1 < 0)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] t/t4051-diff-function-context: improve tests for new diff -W behaviour
  2017-01-13 16:15 [PATCH 1/3] xdiff: -W: relax end-of-file function detection Vegard Nossum
  2017-01-13 16:15 ` [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context Vegard Nossum
@ 2017-01-13 16:15 ` Vegard Nossum
  2017-01-13 17:49 ` [PATCH 1/3] xdiff: -W: relax end-of-file function detection René Scharfe
  2 siblings, 0 replies; 15+ messages in thread
From: Vegard Nossum @ 2017-01-13 16:15 UTC (permalink / raw)
  To: Junio C Hamano, git; +Cc: Vegard Nossum, René Scharfe

We now include non-empty lines immediately before (and after) a function
as belonging to the function.

We can test this new functionality by moving the "// Begin" markers on
each function to the previous line.

This commit is intentionally not part of the previous commits in order
to show that the tests do not break even when changing the behaviour of
'diff -W' in the previous commits.

Cc: René Scharfe <l.s.r@web.de>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
---
 t/t4051-diff-function-context.sh | 2 +-
 t/t4051/appended1.c              | 3 ++-
 t/t4051/dummy.c                  | 3 ++-
 t/t4051/hello.c                  | 3 ++-
 4 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/t/t4051-diff-function-context.sh b/t/t4051-diff-function-context.sh
index 6154acb..d1a646f 100755
--- a/t/t4051-diff-function-context.sh
+++ b/t/t4051-diff-function-context.sh
@@ -72,7 +72,7 @@ test_expect_success 'setup' '
 
 	# overlap function context of 1st change and -u context of 2nd change
 	grep -v "delete me from hello" <"$dir/hello.c" >file.c &&
-	sed 2p <"$dir/dummy.c" >>file.c &&
+	sed 3p <"$dir/dummy.c" >>file.c &&
 	commit_and_tag changed_hello_dummy file.c &&
 
 	git checkout initial &&
diff --git a/t/t4051/appended1.c b/t/t4051/appended1.c
index a9f56f1..8683983 100644
--- a/t/t4051/appended1.c
+++ b/t/t4051/appended1.c
@@ -1,5 +1,6 @@
 
-int appended(void) // Begin of first part
+// Begin of first part
+int appended(void)
 {
 	int i;
 	char *s = "a string";
diff --git a/t/t4051/dummy.c b/t/t4051/dummy.c
index a43016e..db227a8 100644
--- a/t/t4051/dummy.c
+++ b/t/t4051/dummy.c
@@ -1,5 +1,6 @@
 
-static int dummy(void)	// Begin of dummy
+// Begin of dummy
+static int dummy(void)
 {
 	int rc = 0;
 
diff --git a/t/t4051/hello.c b/t/t4051/hello.c
index 63b1a1e..75eac1d 100644
--- a/t/t4051/hello.c
+++ b/t/t4051/hello.c
@@ -1,5 +1,6 @@
 
-static void hello(void)	// Begin of hello
+// Begin of hello
+static void hello(void)
 {
 	/*
 	 * Classic.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] xdiff: -W: relax end-of-file function detection
  2017-01-13 16:15 [PATCH 1/3] xdiff: -W: relax end-of-file function detection Vegard Nossum
  2017-01-13 16:15 ` [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context Vegard Nossum
  2017-01-13 16:15 ` [PATCH 3/3] t/t4051-diff-function-context: improve tests for new diff -W behaviour Vegard Nossum
@ 2017-01-13 17:49 ` René Scharfe
  2 siblings, 0 replies; 15+ messages in thread
From: René Scharfe @ 2017-01-13 17:49 UTC (permalink / raw)
  To: Vegard Nossum, Junio C Hamano, git

Am 13.01.2017 um 17:15 schrieb Vegard Nossum:
> When adding a new function to the end of a file, it's enough to know
> that 1) the addition is at the end of the file; and 2) there is a
> function _somewhere_ in there.
>
> If we had simply been changing the end of an existing function, then we
> would also be deleting something from the old version.

That makes sense, thanks.

> This fixes the case where we add e.g.
>
> 	// Begin of dummy
> 	static int dummy(void)
> 	{
> 	}
>
> to the end of the file.

Without this patch the unchanged function before the added lines is 
shown in its entirety as (uncalled for) context.

>
> Cc: René Scharfe <l.s.r@web.de>
> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
> ---
>  xdiff/xemit.c | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/xdiff/xemit.c b/xdiff/xemit.c
> index 7389ce4..8c88dbd 100644
> --- a/xdiff/xemit.c
> +++ b/xdiff/xemit.c
> @@ -183,16 +183,14 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
>
>  				/*
>  				 * We don't need additional context if
> -				 * a whole function was added, possibly
> -				 * starting with empty lines.
> +				 * a whole function was added.
>  				 */
> -				while (i2 < xe->xdf2.nrec &&
> -				       is_empty_rec(&xe->xdf2, i2))
> +				while (i2 < xe->xdf2.nrec) {
> +					if (match_func_rec(&xe->xdf2, xecfg, i2,
> +						dummy, sizeof(dummy)) >= 0)

Nit: I don't like the indentation here.  Giving "dummy" its own line is 
also not exactly pretty, but at least would allow the parameters to be 
aligned on the opening parenthesis.

> +						goto post_context_calculation;
>  					i2++;
> -				if (i2 < xe->xdf2.nrec &&
> -				    match_func_rec(&xe->xdf2, xecfg, i2,
> -						   dummy, sizeof(dummy)) >= 0)
> -					goto post_context_calculation;
> +				}
>
>  				/*
>  				 * Otherwise get more context from the
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
  2017-01-13 16:15 ` [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context Vegard Nossum
@ 2017-01-13 18:19   ` René Scharfe
  2017-01-13 18:44     ` Stefan Beller
  2017-01-13 19:51     ` Junio C Hamano
  0 siblings, 2 replies; 15+ messages in thread
From: René Scharfe @ 2017-01-13 18:19 UTC (permalink / raw)
  To: Vegard Nossum, Junio C Hamano, git

Am 13.01.2017 um 17:15 schrieb Vegard Nossum:
> When using -W to include the whole function in the diff context, you
> are typically doing this to be able to review the change in its entirety
> within the context of the function. It is therefore almost always
> desirable to include any comments that immediately precede the function.
>
> This also the fixes the case for C where the declaration is split across
> multiple lines (where the first line of the declaration would not be
> included in the output), e.g.:
>
> 	void
> 	dummy(void)
> 	{
> 		...
> 	}
>

That's true, but I'm not sure "non-empty line before function line" is 
good enough a definition for desirable lines.  It wouldn't work for 
people who don't believe in empty lines.  Or for those that put a blank 
line between comment and function.  (I have an opinion on such habits, 
but git diff should probably stay neutral.)  And that's just for C code; 
I have no idea how this heuristic would hold up for other file types 
like HTML.

We can identify function lines with arbitrary precision (with a 
xfuncname regex, if needed), but there is no accurate way to classify 
lines as comments, or as the end of functions.  Adding optional regexes 
for single- and multi-line comments would help, at least for C.

René

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
  2017-01-13 18:19   ` René Scharfe
@ 2017-01-13 18:44     ` Stefan Beller
  2017-01-13 19:51     ` Junio C Hamano
  1 sibling, 0 replies; 15+ messages in thread
From: Stefan Beller @ 2017-01-13 18:44 UTC (permalink / raw)
  To: René Scharfe, Michael Haggerty
  Cc: Vegard Nossum, Junio C Hamano, git@vger.kernel.org

On Fri, Jan 13, 2017 at 10:19 AM, René Scharfe <l.s.r@web.de> wrote:
> Am 13.01.2017 um 17:15 schrieb Vegard Nossum:
>>
>> When using -W to include the whole function in the diff context, you
>> are typically doing this to be able to review the change in its entirety
>> within the context of the function. It is therefore almost always
>> desirable to include any comments that immediately precede the function.

Do we need a small comment in the actual code to hint at why we count
upwards there?

>>
>> This also the fixes the case for C where the declaration is split across
>> multiple lines (where the first line of the declaration would not be
>> included in the output), e.g.:
>>
>>         void
>>         dummy(void)
>>         {
>>                 ...
>>         }
>>
>
> That's true, but I'm not sure "non-empty line before function line" is good
> enough a definition for desirable lines.  It wouldn't work for people who
> don't believe in empty lines.  Or for those that put a blank line between
> comment and function.  (I have an opinion on such habits, but git diff
> should probably stay neutral.)  And that's just for C code; I have no idea
> how this heuristic would hold up for other file types like HTML.

I think empty lines are "good as a first approach", see e.g.
433860f3d0beb0c6 the "compaction" heuristic for a similar
thing (the compaction was introduced at d634d61ed), and then
we can build a more elaborate thing on top.

>
> We can identify function lines with arbitrary precision (with a xfuncname
> regex, if needed), but there is no accurate way to classify lines as
> comments, or as the end of functions.  Adding optional regexes for single-
> and multi-line comments would help, at least for C.

That would cover Java and whole lot of other C like languages. So a good
start as well IMHO.

>
> René

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
  2017-01-13 18:19   ` René Scharfe
  2017-01-13 18:44     ` Stefan Beller
@ 2017-01-13 19:51     ` Junio C Hamano
  2017-01-13 20:20       ` Vegard Nossum
  1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2017-01-13 19:51 UTC (permalink / raw)
  To: René Scharfe; +Cc: Vegard Nossum, git

René Scharfe <l.s.r@web.de> writes:

> Am 13.01.2017 um 17:15 schrieb Vegard Nossum:
>> When using -W to include the whole function in the diff context, you
>> are typically doing this to be able to review the change in its entirety
>> within the context of the function. It is therefore almost always
>> desirable to include any comments that immediately precede the function.
>>
>> This also the fixes the case for C where the declaration is split across
>> multiple lines (where the first line of the declaration would not be
>> included in the output), e.g.:
>>
>> 	void
>> 	dummy(void)
>> 	{
>> 		...
>> 	}
>>
>
> That's true, but I'm not sure "non-empty line before function line" is
> good enough a definition for desirable lines.  It wouldn't work for
> people who don't believe in empty lines.  Or for those that put a
> blank line between comment and function.  (I have an opinion on such
> habits, but git diff should probably stay neutral.)  And that's just
> for C code; I have no idea how this heuristic would hold up for other
> file types like HTML.

As you are, I am fairly negative on the heuristic based on the
"non-blank" thing.  We tried once with compaction-heuristics already
and it did not quite perform well.  Let's not hardcode another one.

> We can identify function lines with arbitrary precision (with a
> xfuncname regex, if needed), but there is no accurate way to classify
> lines as comments, or as the end of functions.  Adding optional
> regexes for single- and multi-line comments would help, at least for
> C.

The funcline regexp is used for two related but different purposes.
It identifies a single line to be placed on @@ ... @@ line before a
diff hunk.  This line however does not have to be at the beginning
of a function.  It has to be the line that conveys the most
significant information (e.g. the name of the function).

The way "diff -W" codepath used it as if it were always the very
first line of a function was bound to invite a patch like this, and
if we want to be extra elaborate, I agree that an extra mechanism to
say "the line the funcline regexp matches is not the beginning of a
function, but the beginning is a line that matches this other regexp
before that line" may help.

Do we really want to be that elaborate, though?  I dunno.

I wonder if it would be sufficient to make -W take an optional
number, e.g. "git show -W4", to add extre context lines before the
funcline.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
  2017-01-13 19:51     ` Junio C Hamano
@ 2017-01-13 20:20       ` Vegard Nossum
  2017-01-13 23:56         ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Vegard Nossum @ 2017-01-13 20:20 UTC (permalink / raw)
  To: Junio C Hamano, René Scharfe; +Cc: git

On 13/01/2017 20:51, Junio C Hamano wrote:
> René Scharfe <l.s.r@web.de> writes:
>> That's true, but I'm not sure "non-empty line before function line" is
>> good enough a definition for desirable lines.  It wouldn't work for
>> people who don't believe in empty lines.  Or for those that put a
>> blank line between comment and function.  (I have an opinion on such
>> habits, but git diff should probably stay neutral.)  And that's just
>> for C code; I have no idea how this heuristic would hold up for other
>> file types like HTML.
>
> As you are, I am fairly negative on the heuristic based on the
> "non-blank" thing.  We tried once with compaction-heuristics already
> and it did not quite perform well.  Let's not hardcode another one.

The patch will work as intended and as expected for 95% of the users out
there (javadoc, Doxygen, kerneldoc, etc. all have the comment
immediately preceding the function) and fixes a very real problem for me
(and I expect many others) _today_; for the remaining 5% (who put a
blank line between their comment and the start of the function) it will
revert back to the current behaviour, so there should be no regression
for them.

For the 0% who don't put even a single blank line between their
functions, it will probably not work as expected, but then again I have
never seen such a coding style in any language, so I am doubtful if this
is something that needs to be taken into account in the first place.

>> We can identify function lines with arbitrary precision (with a
>> xfuncname regex, if needed), but there is no accurate way to classify
>> lines as comments, or as the end of functions.  Adding optional
>> regexes for single- and multi-line comments would help, at least for
>> C.
>
> The funcline regexp is used for two related but different purposes.
> It identifies a single line to be placed on @@ ... @@ line before a
> diff hunk.  This line however does not have to be at the beginning
> of a function.  It has to be the line that conveys the most
> significant information (e.g. the name of the function).
>
> The way "diff -W" codepath used it as if it were always the very
> first line of a function was bound to invite a patch like this, and
> if we want to be extra elaborate, I agree that an extra mechanism to
> say "the line the funcline regexp matches is not the beginning of a
> function, but the beginning is a line that matches this other regexp
> before that line" may help.
>
> Do we really want to be that elaborate, though?  I dunno.

Adding a regex instead of the simple "blank line" test doesn't seem very
difficult to do, but I am doubtful that it will make any difference in
practice. But that can be done incrementally as well by the people who
would actually need it (who I strongly suspect do not exist in the first
place).

> I wonder if it would be sufficient to make -W take an optional
> number, e.g. "git show -W4", to add extre context lines before the
> funcline.
>

I don't like specifying a fixed number, that negates almost all the
reason for using -W in the first place; I would much prefer adding
a config variable to control the -W behaviour (or a new diff flag).


Vegard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
  2017-01-13 20:20       ` Vegard Nossum
@ 2017-01-13 23:56         ` Junio C Hamano
  2017-01-14 14:58           ` René Scharfe
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2017-01-13 23:56 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: René Scharfe, git

Vegard Nossum <vegard.nossum@oracle.com> writes:

> The patch will work as intended and as expected for 95% of the users out
> there (javadoc, Doxygen, kerneldoc, etc. all have the comment
> immediately preceding the function) and fixes a very real problem for me
> (and I expect many others) _today_; for the remaining 5% (who put a
> blank line between their comment and the start of the function) it will
> revert back to the current behaviour, so there should be no regression
> for them.

I notice your 95% are all programming languages, but I am more
worried about the contents written in non programming languages
(René gave HTML an an example--there may be other types of contents
that we programmer types do not deal with every day, but Git users
depend on).  

I am also more focused on keeping the codebase maintainable in good
health by making sure that we made an effort to find a solution that
is general-enough before solving a single specific problem you have
today.  We may end up deciding that a blank-line heuristics gives us
good enough tradeoff, but I do not want us to make a decision before
thinking.

>> The way "diff -W" codepath used it as if it were always the very
>> first line of a function was bound to invite a patch like this, and
>> if we want to be extra elaborate, I agree that an extra mechanism to
>> say "the line the funcline regexp matches is not the beginning of a
>> function, but the beginning is a line that matches this other regexp
>> before that line" may help.
>>
>> Do we really want to be that elaborate, though?  I dunno.
>
> Adding a regex instead of the simple "blank line" test doesn't seem very
> difficult to do, but I am doubtful that it will make any difference in
> practice. But that can be done incrementally as well by the people who
> would actually need it (who I strongly suspect do not exist in the first
> place).

At least, the damage can be limited to the cases we know would work
well if we go that way.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
  2017-01-13 23:56         ` Junio C Hamano
@ 2017-01-14 14:58           ` René Scharfe
  2017-01-15  2:39             ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: René Scharfe @ 2017-01-14 14:58 UTC (permalink / raw)
  To: Junio C Hamano, Vegard Nossum; +Cc: git

Am 14.01.2017 um 00:56 schrieb Junio C Hamano:
> Vegard Nossum <vegard.nossum@oracle.com> writes:
>
>> The patch will work as intended and as expected for 95% of the users out
>> there (javadoc, Doxygen, kerneldoc, etc. all have the comment
>> immediately preceding the function) and fixes a very real problem for me
>> (and I expect many others) _today_; for the remaining 5% (who put a
>> blank line between their comment and the start of the function) it will
>> revert back to the current behaviour, so there should be no regression
>> for them.
>
> I notice your 95% are all programming languages, but I am more
> worried about the contents written in non programming languages
> (René gave HTML an an example--there may be other types of contents
> that we programmer types do not deal with every day, but Git users
> depend on).
>
> I am also more focused on keeping the codebase maintainable in good
> health by making sure that we made an effort to find a solution that
> is general-enough before solving a single specific problem you have
> today.  We may end up deciding that a blank-line heuristics gives us
> good enough tradeoff, but I do not want us to make a decision before
> thinking.

How about extending the context upward only up to and excluding a line 
that is either empty *or* a function line?  That would limit the extra 
context to a single function in the worst case.

Reducing context at the bottom with the aim to remove comments for the 
next section is more tricky as it could remove part of the function that 
we'd like to show if we get the boundary wrong.  How bad would it be to 
keep the southern border unchanged?

René


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
  2017-01-14 14:58           ` René Scharfe
@ 2017-01-15  2:39             ` Junio C Hamano
  2017-01-15 10:06               ` Vegard Nossum
  2017-01-15 16:57               ` René Scharfe
  0 siblings, 2 replies; 15+ messages in thread
From: Junio C Hamano @ 2017-01-15  2:39 UTC (permalink / raw)
  To: René Scharfe; +Cc: Vegard Nossum, git

René Scharfe <l.s.r@web.de> writes:

>> I am also more focused on keeping the codebase maintainable in good
>> health by making sure that we made an effort to find a solution that
>> is general-enough before solving a single specific problem you have
>> today.  We may end up deciding that a blank-line heuristics gives us
>> good enough tradeoff, but I do not want us to make a decision before
>> thinking.
>
> How about extending the context upward only up to and excluding a line
> that is either empty *or* a function line?  That would limit the extra
> context to a single function in the worst case.
>
> Reducing context at the bottom with the aim to remove comments for the
> next section is more tricky as it could remove part of the function
> that we'd like to show if we get the boundary wrong.  How bad would it
> be to keep the southern border unchanged?

I personally do not think there is any robust heuristic other than
Vegard's "a blank line may be a signal enough that lines before that
are not part of the beginning of the function", and I think your
"hence we look for a blank line but if there is a line that matches
the function header, stop there as we know we came too far back"
will be a good-enough safety measure.

I also agree with you that we probably do not want to futz with the
southern border.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
  2017-01-15  2:39             ` Junio C Hamano
@ 2017-01-15 10:06               ` Vegard Nossum
  2017-01-15 16:57                 ` René Scharfe
  2017-01-15 16:57               ` René Scharfe
  1 sibling, 1 reply; 15+ messages in thread
From: Vegard Nossum @ 2017-01-15 10:06 UTC (permalink / raw)
  To: Junio C Hamano, René Scharfe; +Cc: git

On 15/01/2017 03:39, Junio C Hamano wrote:
> René Scharfe <l.s.r@web.de> writes:
>
>>> I am also more focused on keeping the codebase maintainable in good
>>> health by making sure that we made an effort to find a solution that
>>> is general-enough before solving a single specific problem you have
>>> today.  We may end up deciding that a blank-line heuristics gives us
>>> good enough tradeoff, but I do not want us to make a decision before
>>> thinking.

You are right; I appreciate this approach.

>> How about extending the context upward only up to and excluding a line
>> that is either empty *or* a function line?  That would limit the extra
>> context to a single function in the worst case.
>>
>> Reducing context at the bottom with the aim to remove comments for the
>> next section is more tricky as it could remove part of the function
>> that we'd like to show if we get the boundary wrong.  How bad would it
>> be to keep the southern border unchanged?
>
> I personally do not think there is any robust heuristic other than
> Vegard's "a blank line may be a signal enough that lines before that
> are not part of the beginning of the function", and I think your
> "hence we look for a blank line but if there is a line that matches
> the function header, stop there as we know we came too far back"
> will be a good-enough safety measure.
>
> I also agree with you that we probably do not want to futz with the
> southern border.

You are right, trying to change the southern border in this way is not
quite reliable if there are no empty lines whatsoever and can
erroneously cause the function context to not include the bottom of the
function being changed.

I'm splitting the function boundary detection logic into separate
functions and trying to solve the above case without breaking the tests
(and adding a new test for the above case too).

I'll see if I can additionally provide some toggles (flags or config
variables) to control the new behaviour, what I had in mind was:

   -W[=preamble,=no-preamble]
   --function-context[=preamble,=no-preamble]
   diff.functionContextPreamble = <bool>

(where the new logic is controlled by the new config variable and
overridden by the presence of =preamble or =no-preamble).

Then it also shouldn't be too difficult to add

   diff.<driver>.preamble = <regex>
   diff.<driver>.xpreamble = <regex>

to override the heuristic used for function border detection in
exceptional cases.

You can argue about the naming now ;-) But I will use this for a start,
renaming/reworking it (or throwing it away) afterwards should be easy
once the code has been written.


Vegard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
  2017-01-15  2:39             ` Junio C Hamano
  2017-01-15 10:06               ` Vegard Nossum
@ 2017-01-15 16:57               ` René Scharfe
  1 sibling, 0 replies; 15+ messages in thread
From: René Scharfe @ 2017-01-15 16:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Vegard Nossum, git

Am 15.01.2017 um 03:39 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
> 
>>> I am also more focused on keeping the codebase maintainable in good
>>> health by making sure that we made an effort to find a solution that
>>> is general-enough before solving a single specific problem you have
>>> today.  We may end up deciding that a blank-line heuristics gives us
>>> good enough tradeoff, but I do not want us to make a decision before
>>> thinking.
>>
>> How about extending the context upward only up to and excluding a line
>> that is either empty *or* a function line?  That would limit the extra
>> context to a single function in the worst case.
>>
>> Reducing context at the bottom with the aim to remove comments for the
>> next section is more tricky as it could remove part of the function
>> that we'd like to show if we get the boundary wrong.  How bad would it
>> be to keep the southern border unchanged?
> 
> I personally do not think there is any robust heuristic other than
> Vegard's "a blank line may be a signal enough that lines before that
> are not part of the beginning of the function", and I think your
> "hence we look for a blank line but if there is a line that matches
> the function header, stop there as we know we came too far back"
> will be a good-enough safety measure.
> 
> I also agree with you that we probably do not want to futz with the
> southern border.

A replacement patch for 2/3 with these changes would look like this:

diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index 8c88dbde38..9ed54cd318 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -174,11 +174,11 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 		s2 = XDL_MAX(xch->i2 - xecfg->ctxlen, 0);
 
 		if (xecfg->flags & XDL_EMIT_FUNCCONTEXT) {
+			char dummy[1];
 			long fs1, i1 = xch->i1;
 
 			/* Appended chunk? */
 			if (i1 >= xe->xdf1.nrec) {
-				char dummy[1];
 				long i2 = xch->i2;
 
 				/*
@@ -200,6 +200,10 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 			}
 
 			fs1 = get_func_line(xe, xecfg, NULL, i1, -1);
+			while (fs1 > 0 && !is_empty_rec(&xe->xdf1, fs1 - 1) &&
+			       match_func_rec(&xe->xdf1, xecfg, fs1 - 1,
+					      dummy, sizeof(dummy)) < 0)
+				fs1--;
 			if (fs1 < 0)
 				fs1 = 0;
 			if (fs1 < s1) {


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
  2017-01-15 10:06               ` Vegard Nossum
@ 2017-01-15 16:57                 ` René Scharfe
  2017-01-15 23:28                   ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: René Scharfe @ 2017-01-15 16:57 UTC (permalink / raw)
  To: Vegard Nossum, Junio C Hamano; +Cc: git

Am 15.01.2017 um 11:06 schrieb Vegard Nossum:
> On 15/01/2017 03:39, Junio C Hamano wrote:
>> René Scharfe <l.s.r@web.de> writes:
>>> How about extending the context upward only up to and excluding a line
>>> that is either empty *or* a function line?  That would limit the extra
>>> context to a single function in the worst case.
>>>
>>> Reducing context at the bottom with the aim to remove comments for the
>>> next section is more tricky as it could remove part of the function
>>> that we'd like to show if we get the boundary wrong.  How bad would it
>>> be to keep the southern border unchanged?
>>
>> I personally do not think there is any robust heuristic other than
>> Vegard's "a blank line may be a signal enough that lines before that
>> are not part of the beginning of the function", and I think your
>> "hence we look for a blank line but if there is a line that matches
>> the function header, stop there as we know we came too far back"
>> will be a good-enough safety measure.
>>
>> I also agree with you that we probably do not want to futz with the
>> southern border.
>
> You are right, trying to change the southern border in this way is not
> quite reliable if there are no empty lines whatsoever and can
> erroneously cause the function context to not include the bottom of the
> function being changed.
>
> I'm splitting the function boundary detection logic into separate
> functions and trying to solve the above case without breaking the tests
> (and adding a new test for the above case too).
>
> I'll see if I can additionally provide some toggles (flags or config
> variables) to control the new behaviour, what I had in mind was:
>
>   -W[=preamble,=no-preamble]
>   --function-context[=preamble,=no-preamble]
>   diff.functionContextPreamble = <bool>
>
> (where the new logic is controlled by the new config variable and
> overridden by the presence of =preamble or =no-preamble).

Adding comments before a function is useful, removing comments after a 
function sounds to me as only nice to have (under the assumption that 
they belong to the next function[*]).  How bad would it be to only 
implement the first part (as in the patch I just sent) without adding 
new config settings or parameters?

Thanks,
René


[*] Silly counter-example (the #endif line):
#ifdef SOMETHING
int f(...) {
	// implementation for SOMETHING
}
#else
inf f(...) {
	// implementation without SOMETHING
}
#endif /* SOMETHING */


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context
  2017-01-15 16:57                 ` René Scharfe
@ 2017-01-15 23:28                   ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2017-01-15 23:28 UTC (permalink / raw)
  To: René Scharfe; +Cc: Vegard Nossum, git

René Scharfe <l.s.r@web.de> writes:

> ...  How bad would it be to only
> implement the first part (as in the patch I just sent) without adding
> new config settings or parameters?

That probably is a good place to stop ;-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-01-15 23:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-13 16:15 [PATCH 1/3] xdiff: -W: relax end-of-file function detection Vegard Nossum
2017-01-13 16:15 ` [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context Vegard Nossum
2017-01-13 18:19   ` René Scharfe
2017-01-13 18:44     ` Stefan Beller
2017-01-13 19:51     ` Junio C Hamano
2017-01-13 20:20       ` Vegard Nossum
2017-01-13 23:56         ` Junio C Hamano
2017-01-14 14:58           ` René Scharfe
2017-01-15  2:39             ` Junio C Hamano
2017-01-15 10:06               ` Vegard Nossum
2017-01-15 16:57                 ` René Scharfe
2017-01-15 23:28                   ` Junio C Hamano
2017-01-15 16:57               ` René Scharfe
2017-01-13 16:15 ` [PATCH 3/3] t/t4051-diff-function-context: improve tests for new diff -W behaviour Vegard Nossum
2017-01-13 17:49 ` [PATCH 1/3] xdiff: -W: relax end-of-file function detection René Scharfe

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).