git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] builtin add -p: fix hunk splitting
@ 2021-12-20 14:32 Phillip Wood via GitGitGadget
  2021-12-20 14:32 ` [PATCH 1/2] t3701: clean up hunk splitting tests Phillip Wood via GitGitGadget
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-12-20 14:32 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, SZEDER Gábor, Phillip Wood

Fix a small regression in the hunk splitting of the builtin version compared
to the perl version. Thanks to Szeder for the easy to follow bug report.

Phillip Wood (2):
  t3701: clean up hunk splitting tests
  builtin add -p: fix hunk splitting

 add-patch.c                |  7 ++++++
 t/t3701-add-interactive.sh | 48 ++++++++++++++++++++++++++++++++++----
 2 files changed, 50 insertions(+), 5 deletions(-)


base-commit: cd3e606211bb1cf8bc57f7d76bab98cc17a150bc
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1100%2Fphillipwood%2Fwip%2Fadd-p-fix-hunk-splitting-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1100/phillipwood/wip/add-p-fix-hunk-splitting-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1100
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/2] t3701: clean up hunk splitting tests
  2021-12-20 14:32 [PATCH 0/2] builtin add -p: fix hunk splitting Phillip Wood via GitGitGadget
@ 2021-12-20 14:32 ` Phillip Wood via GitGitGadget
  2021-12-20 21:09   ` Junio C Hamano
  2021-12-20 14:32 ` [PATCH 2/2] builtin add -p: fix hunk splitting Phillip Wood via GitGitGadget
  2022-01-11 11:12 ` [PATCH v2 0/2] " Phillip Wood via GitGitGadget
  2 siblings, 1 reply; 21+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-12-20 14:32 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, SZEDER Gábor, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Clean up some test constructs in preparation for extending the tests
in the next commit. There are three small changes, I've grouped them
together as they're so small it didn't seem worth creating three
separate commits.
 1 - "cat file | sed expression" is better written as
     "sed expression file".
 2 - Follow our usual practice of redirecting the output of git
     commands to a file rather than piping it into another command.
 3 - Use test_write_lines rather than 'printf "%s\n"'.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 t/t3701-add-interactive.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index 207714655f2..77de0029ba5 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -347,7 +347,7 @@ test_expect_success 'setup patch' '
 # Expected output, diff is similar to the patch but w/ diff at the top
 test_expect_success 'setup expected' '
 	echo diff --git a/file b/file >expected &&
-	cat patch |sed "/^index/s/ 100644/ 100755/" >>expected &&
+	sed "/^index/s/ 100644/ 100755/" patch >>expected &&
 	cat >expected-output <<-\EOF
 	--- a/file
 	+++ b/file
@@ -373,9 +373,9 @@ test_expect_success 'setup expected' '
 test_expect_success 'add first line works' '
 	git commit -am "clear local changes" &&
 	git apply patch &&
-	printf "%s\n" s y y | git add -p file 2>error |
-		sed -n -e "s/^([1-2]\/[1-2]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
-		       -e "/^[-+@ \\\\]"/p  >output &&
+	test_write_lines s y y | git add -p file 2>error >raw-output &&
+	sed -n -e "s/^([1-2]\/[1-2]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
+	       -e "/^[-+@ \\\\]"/p raw-output >output &&
 	test_must_be_empty error &&
 	git diff --cached >diff &&
 	diff_cmp expected diff &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/2] builtin add -p: fix hunk splitting
  2021-12-20 14:32 [PATCH 0/2] builtin add -p: fix hunk splitting Phillip Wood via GitGitGadget
  2021-12-20 14:32 ` [PATCH 1/2] t3701: clean up hunk splitting tests Phillip Wood via GitGitGadget
@ 2021-12-20 14:32 ` Phillip Wood via GitGitGadget
  2021-12-20 19:06   ` Ævar Arnfjörð Bjarmason
  2021-12-20 21:30   ` Junio C Hamano
  2022-01-11 11:12 ` [PATCH v2 0/2] " Phillip Wood via GitGitGadget
  2 siblings, 2 replies; 21+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-12-20 14:32 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, SZEDER Gábor, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

To determine whether a hunk can be split a counter is incremented each
time a context line follows an insertion or deletion. If at the end of
the hunk the value of this counter is greater than one then the hunk
can be split into that number of smaller hunks. If the last hunk in a
file ends with an insertion or deletion then there is no following
context line and the counter will not be incremented. This case is
already handled at the end of the loop where counter is incremented if
the last hunk ended with an insertion or deletion. Unfortunately there
is no similar check between files (likely because the perl version
only ever parses one diff at a time). Fix this by checking if the last
hunk ended with an insertion or deletion when we see the diff header
of a new file and extend the existing regression test.

Reproted-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 add-patch.c                |  7 ++++++
 t/t3701-add-interactive.sh | 46 ++++++++++++++++++++++++++++++++++----
 2 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/add-patch.c b/add-patch.c
index 8c41cdfe39b..5cea70666e9 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -472,6 +472,13 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
 			eol = pend;
 
 		if (starts_with(p, "diff ")) {
+			if (marker == '-' || marker == '+')
+				/*
+				 * Last hunk ended in non-context line (i.e. it
+				 * appended lines to the file, so there are no
+				 * trailing context lines).
+				 */
+				hunk->splittable_into++;
 			ALLOC_GROW_BY(s->file_diff, s->file_diff_nr, 1,
 				   file_diff_alloc);
 			file_diff = s->file_diff + s->file_diff_nr - 1;
diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index 77de0029ba5..94537a6b40a 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -326,7 +326,9 @@ test_expect_success 'correct message when there is nothing to do' '
 test_expect_success 'setup again' '
 	git reset --hard &&
 	test_chmod +x file &&
-	echo content >>file
+	echo content >>file &&
+	test_write_lines A B C D>file2 &&
+	git add file2
 '
 
 # Write the patch file with a new line at the top and bottom
@@ -341,13 +343,27 @@ test_expect_success 'setup patch' '
 	 content
 	+lastline
 	\ No newline at end of file
+	diff --git a/file2 b/file2
+	index 8422d40..35b930a 100644
+	--- a/file2
+	+++ b/file2
+	@@ -1,4 +1,5 @@
+	-A
+	+Z
+	 B
+	+Y
+	 C
+	-D
+	+X
 	EOF
 '
 
 # Expected output, diff is similar to the patch but w/ diff at the top
 test_expect_success 'setup expected' '
 	echo diff --git a/file b/file >expected &&
-	sed "/^index/s/ 100644/ 100755/" patch >>expected &&
+	sed -e "/^index 180b47c/s/ 100644/ 100755/" \
+	    -e /1,5/s//1,4/ \
+	    -e /Y/d patch >>expected &&
 	cat >expected-output <<-\EOF
 	--- a/file
 	+++ b/file
@@ -366,6 +382,28 @@ test_expect_success 'setup expected' '
 	 content
 	+lastline
 	\ No newline at end of file
+	--- a/file2
+	+++ b/file2
+	@@ -1,4 +1,5 @@
+	-A
+	+Z
+	 B
+	+Y
+	 C
+	-D
+	+X
+	@@ -1,2 +1,2 @@
+	-A
+	+Z
+	 B
+	@@ -2,2 +2,3 @@
+	 B
+	+Y
+	 C
+	@@ -3,2 +4,2 @@
+	 C
+	-D
+	+X
 	EOF
 '
 
@@ -373,8 +411,8 @@ test_expect_success 'setup expected' '
 test_expect_success 'add first line works' '
 	git commit -am "clear local changes" &&
 	git apply patch &&
-	test_write_lines s y y | git add -p file 2>error >raw-output &&
-	sed -n -e "s/^([1-2]\/[1-2]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
+	test_write_lines s y y s y n y | git add -p 2>error >raw-output &&
+	sed -n -e "s/^([1-9]\/[1-9]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
 	       -e "/^[-+@ \\\\]"/p raw-output >output &&
 	test_must_be_empty error &&
 	git diff --cached >diff &&
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] builtin add -p: fix hunk splitting
  2021-12-20 14:32 ` [PATCH 2/2] builtin add -p: fix hunk splitting Phillip Wood via GitGitGadget
@ 2021-12-20 19:06   ` Ævar Arnfjörð Bjarmason
  2022-01-11 11:13     ` Phillip Wood
  2021-12-20 21:30   ` Junio C Hamano
  1 sibling, 1 reply; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-12-20 19:06 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget
  Cc: git, Johannes Schindelin, SZEDER Gábor, Phillip Wood


On Mon, Dec 20 2021, Phillip Wood via GitGitGadget wrote:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> To determine whether a hunk can be split a counter is incremented each
> time a context line follows an insertion or deletion. If at the end of
> the hunk the value of this counter is greater than one then the hunk
> can be split into that number of smaller hunks. If the last hunk in a
> file ends with an insertion or deletion then there is no following
> context line and the counter will not be incremented. This case is
> already handled at the end of the loop where counter is incremented if
> the last hunk ended with an insertion or deletion. Unfortunately there
> is no similar check between files (likely because the perl version
> only ever parses one diff at a time). Fix this by checking if the last
> hunk ended with an insertion or deletion when we see the diff header
> of a new file and extend the existing regression test.
>
> Reproted-by: SZEDER Gábor <szeder.dev@gmail.com>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>  add-patch.c                |  7 ++++++
>  t/t3701-add-interactive.sh | 46 ++++++++++++++++++++++++++++++++++----
>  2 files changed, 49 insertions(+), 4 deletions(-)
>
> diff --git a/add-patch.c b/add-patch.c
> index 8c41cdfe39b..5cea70666e9 100644
> --- a/add-patch.c
> +++ b/add-patch.c
> @@ -472,6 +472,13 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
>  			eol = pend;
>  
>  		if (starts_with(p, "diff ")) {
> +			if (marker == '-' || marker == '+')
> +				/*
> +				 * Last hunk ended in non-context line (i.e. it
> +				 * appended lines to the file, so there are no
> +				 * trailing context lines).
> +				 */
> +				hunk->splittable_into++;

I wondered if factoring out these several "marker == '-' || marker ==
'+'" cases in parse_diff() into a "is_plus_minus(marker)" was worth it,
but probably not.

>  			ALLOC_GROW_BY(s->file_diff, s->file_diff_nr, 1,
>  				   file_diff_alloc);
>  			file_diff = s->file_diff + s->file_diff_nr - 1;
> diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
> index 77de0029ba5..94537a6b40a 100755
> --- a/t/t3701-add-interactive.sh
> +++ b/t/t3701-add-interactive.sh
> @@ -326,7 +326,9 @@ test_expect_success 'correct message when there is nothing to do' '
>  test_expect_success 'setup again' '
>  	git reset --hard &&
>  	test_chmod +x file &&
> -	echo content >>file
> +	echo content >>file &&
> +	test_write_lines A B C D>file2 &&

style nit: "cmd args >file2" not "cmd args>file2"

> @@ -373,8 +411,8 @@ test_expect_success 'setup expected' '
>  test_expect_success 'add first line works' '
>  	git commit -am "clear local changes" &&
>  	git apply patch &&
> -	test_write_lines s y y | git add -p file 2>error >raw-output &&
> -	sed -n -e "s/^([1-2]\/[1-2]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
> +	test_write_lines s y y s y n y | git add -p 2>error >raw-output &&
> +	sed -n -e "s/^([1-9]\/[1-9]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
>  	       -e "/^[-+@ \\\\]"/p raw-output >output &&
>  	test_must_be_empty error &&
>  	git diff --cached >diff &&

style/diff nit: maybe worth it to in 1/2 do some version of:

    test_write_lines ... >lines &&
    git ... <lines .. &&
    ...
    sed -n \
    	-e ... \
        -e ... \
        >output

Just to make the diff smaller, i.e. just the "test_write_lines" line
would be modified here.

The changes themselves & this series LGTM.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] t3701: clean up hunk splitting tests
  2021-12-20 14:32 ` [PATCH 1/2] t3701: clean up hunk splitting tests Phillip Wood via GitGitGadget
@ 2021-12-20 21:09   ` Junio C Hamano
  0 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2021-12-20 21:09 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget
  Cc: git, Johannes Schindelin, SZEDER Gábor, Phillip Wood

"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> Clean up some test constructs in preparation for extending the tests
> in the next commit. There are three small changes, I've grouped them
> together as they're so small it didn't seem worth creating three
> separate commits.
>  1 - "cat file | sed expression" is better written as
>      "sed expression file".
>  2 - Follow our usual practice of redirecting the output of git
>      commands to a file rather than piping it into another command.
>  3 - Use test_write_lines rather than 'printf "%s\n"'.

All good points.  Somehow people seem to forget "do not cat a single
file into a pipe".

> @@ -373,9 +373,9 @@ test_expect_success 'setup expected' '
>  test_expect_success 'add first line works' '
>  	git commit -am "clear local changes" &&
>  	git apply patch &&
> -	printf "%s\n" s y y | git add -p file 2>error |
> -		sed -n -e "s/^([1-2]\/[1-2]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
> -		       -e "/^[-+@ \\\\]"/p  >output &&
> +	test_write_lines s y y | git add -p file 2>error >raw-output &&
> +	sed -n -e "s/^([1-2]\/[1-2]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
> +	       -e "/^[-+@ \\\\]"/p raw-output >output &&

Looks good.  Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] builtin add -p: fix hunk splitting
  2021-12-20 14:32 ` [PATCH 2/2] builtin add -p: fix hunk splitting Phillip Wood via GitGitGadget
  2021-12-20 19:06   ` Ævar Arnfjörð Bjarmason
@ 2021-12-20 21:30   ` Junio C Hamano
  1 sibling, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2021-12-20 21:30 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget
  Cc: git, Johannes Schindelin, SZEDER Gábor, Phillip Wood

"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> To determine whether a hunk can be split a counter is incremented each
> time a context line follows an insertion or deletion. If at the end of
> the hunk the value of this counter is greater than one then the hunk
> can be split into that number of smaller hunks. If the last hunk in a
> file ends with an insertion or deletion then there is no following
> context line and the counter will not be incremented. This case is
> already handled at the end of the loop where counter is incremented if
> the last hunk ended with an insertion or deletion. Unfortunately there
> is no similar check between files (likely because the perl version
> only ever parses one diff at a time).

In other words, the original laid out the code in such a way that
such a bug will be impossible, and the rewrite broke it because it
rolled both "next file" and "next hunk" into the same loop?

> Fix this by checking if the last
> hunk ended with an insertion or deletion when we see the diff header
> of a new file and extend the existing regression test.

You should be able to explain what end-user visible bug is in a
simple single sentence before all of the above.

"The C reimplementation of 'add -p' fails to split a hunk when the
hunk ends with addition or deletion without post context line." or
something like that.

>  		if (starts_with(p, "diff ")) {
> +			if (marker == '-' || marker == '+')
> +				/*
> +				 * Last hunk ended in non-context line (i.e. it
> +				 * appended lines to the file, so there are no
> +				 * trailing context lines).
> +				 */
> +				hunk->splittable_into++;

This looks correct but unsatisfactory.  We have the same processing
immediately after loop---what is common between them is that this is
a process to "conclude" the hunks for the file we have been reading
the patch for.

Can we at least make a helper function that identifies what it does
clearly by its name, and use it here and after the loop, to clarify
what is going on?  Then you do not need the 5-line comment there.

		if (starts_with(p, "diff ")) {
+			conclude_file(hunk, marker);

or something like that, perhaps.

Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 0/2] builtin add -p: fix hunk splitting
  2021-12-20 14:32 [PATCH 0/2] builtin add -p: fix hunk splitting Phillip Wood via GitGitGadget
  2021-12-20 14:32 ` [PATCH 1/2] t3701: clean up hunk splitting tests Phillip Wood via GitGitGadget
  2021-12-20 14:32 ` [PATCH 2/2] builtin add -p: fix hunk splitting Phillip Wood via GitGitGadget
@ 2022-01-11 11:12 ` Phillip Wood via GitGitGadget
  2022-01-11 11:12   ` [PATCH v2 1/2] t3701: clean up hunk splitting tests Phillip Wood via GitGitGadget
                     ` (3 more replies)
  2 siblings, 4 replies; 21+ messages in thread
From: Phillip Wood via GitGitGadget @ 2022-01-11 11:12 UTC (permalink / raw)
  To: git
  Cc: Johannes Schindelin, SZEDER Gábor,
	Ævar Arnfjörð Bjarmason, Phillip Wood

Thanks to Junio and Ævar for their comments on V1. I've updated the commit
message and added a helper function as suggested.

V1 Cover Letter: Fix a small regression in the hunk splitting of the builtin
version compared to the perl version. Thanks to Szeder for the easy to
follow bug report.

Phillip Wood (2):
  t3701: clean up hunk splitting tests
  builtin add -p: fix hunk splitting

 add-patch.c                | 20 ++++++++++------
 t/t3701-add-interactive.sh | 48 ++++++++++++++++++++++++++++++++++----
 2 files changed, 56 insertions(+), 12 deletions(-)


base-commit: cd3e606211bb1cf8bc57f7d76bab98cc17a150bc
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1100%2Fphillipwood%2Fwip%2Fadd-p-fix-hunk-splitting-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1100/phillipwood/wip/add-p-fix-hunk-splitting-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1100

Range-diff vs v1:

 1:  cc8639fc29d = 1:  cc8639fc29d t3701: clean up hunk splitting tests
 2:  5d5639c2b04 ! 2:  b698989e265 builtin add -p: fix hunk splitting
     @@ Metadata
       ## Commit message ##
          builtin add -p: fix hunk splitting
      
     +    The C reimplementation of "add -p" fails to split the last hunk in a
     +    file if hunk ends with an addition or deletion without any post context
     +    line unless it is the last file to be processed.
     +
          To determine whether a hunk can be split a counter is incremented each
          time a context line follows an insertion or deletion. If at the end of
          the hunk the value of this counter is greater than one then the hunk
     @@ Commit message
          Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
      
       ## add-patch.c ##
     +@@ add-patch.c: static int is_octal(const char *p, size_t len)
     + 	return 1;
     + }
     + 
     ++static void complete_file(char marker, struct hunk *hunk)
     ++{
     ++	if (marker == '-' || marker == '+')
     ++		/*
     ++		 * Last hunk ended in non-context line (i.e. it
     ++		 * appended lines to the file, so there are no
     ++		 * trailing context lines).
     ++		 */
     ++		hunk->splittable_into++;
     ++}
     ++
     + static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
     + {
     + 	struct strvec args = STRVEC_INIT;
      @@ add-patch.c: static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
       			eol = pend;
       
       		if (starts_with(p, "diff ")) {
     -+			if (marker == '-' || marker == '+')
     -+				/*
     -+				 * Last hunk ended in non-context line (i.e. it
     -+				 * appended lines to the file, so there are no
     -+				 * trailing context lines).
     -+				 */
     -+				hunk->splittable_into++;
     ++			complete_file(marker, hunk);
       			ALLOC_GROW_BY(s->file_diff, s->file_diff_nr, 1,
       				   file_diff_alloc);
       			file_diff = s->file_diff + s->file_diff_nr - 1;
     +@@ add-patch.c: static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
     + 				file_diff->hunk->colored_end = hunk->colored_end;
     + 		}
     + 	}
     +-
     +-	if (marker == '-' || marker == '+')
     +-		/*
     +-		 * Last hunk ended in non-context line (i.e. it appended lines
     +-		 * to the file, so there are no trailing context lines).
     +-		 */
     +-		hunk->splittable_into++;
     ++	complete_file(marker, hunk);
     + 
     + 	/* non-colored shorter than colored? */
     + 	if (colored_p != colored_pend) {
      
       ## t/t3701-add-interactive.sh ##
      @@ t/t3701-add-interactive.sh: test_expect_success 'correct message when there is nothing to do' '

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/2] t3701: clean up hunk splitting tests
  2022-01-11 11:12 ` [PATCH v2 0/2] " Phillip Wood via GitGitGadget
@ 2022-01-11 11:12   ` Phillip Wood via GitGitGadget
  2022-01-11 11:12   ` [PATCH v2 2/2] builtin add -p: fix hunk splitting Phillip Wood via GitGitGadget
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Phillip Wood via GitGitGadget @ 2022-01-11 11:12 UTC (permalink / raw)
  To: git
  Cc: Johannes Schindelin, SZEDER Gábor,
	Ævar Arnfjörð Bjarmason, Phillip Wood,
	Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Clean up some test constructs in preparation for extending the tests
in the next commit. There are three small changes, I've grouped them
together as they're so small it didn't seem worth creating three
separate commits.
 1 - "cat file | sed expression" is better written as
     "sed expression file".
 2 - Follow our usual practice of redirecting the output of git
     commands to a file rather than piping it into another command.
 3 - Use test_write_lines rather than 'printf "%s\n"'.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 t/t3701-add-interactive.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index 207714655f2..77de0029ba5 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -347,7 +347,7 @@ test_expect_success 'setup patch' '
 # Expected output, diff is similar to the patch but w/ diff at the top
 test_expect_success 'setup expected' '
 	echo diff --git a/file b/file >expected &&
-	cat patch |sed "/^index/s/ 100644/ 100755/" >>expected &&
+	sed "/^index/s/ 100644/ 100755/" patch >>expected &&
 	cat >expected-output <<-\EOF
 	--- a/file
 	+++ b/file
@@ -373,9 +373,9 @@ test_expect_success 'setup expected' '
 test_expect_success 'add first line works' '
 	git commit -am "clear local changes" &&
 	git apply patch &&
-	printf "%s\n" s y y | git add -p file 2>error |
-		sed -n -e "s/^([1-2]\/[1-2]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
-		       -e "/^[-+@ \\\\]"/p  >output &&
+	test_write_lines s y y | git add -p file 2>error >raw-output &&
+	sed -n -e "s/^([1-2]\/[1-2]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
+	       -e "/^[-+@ \\\\]"/p raw-output >output &&
 	test_must_be_empty error &&
 	git diff --cached >diff &&
 	diff_cmp expected diff &&
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 2/2] builtin add -p: fix hunk splitting
  2022-01-11 11:12 ` [PATCH v2 0/2] " Phillip Wood via GitGitGadget
  2022-01-11 11:12   ` [PATCH v2 1/2] t3701: clean up hunk splitting tests Phillip Wood via GitGitGadget
@ 2022-01-11 11:12   ` Phillip Wood via GitGitGadget
  2022-01-11 12:07   ` [PATCH v2 0/2] " Ævar Arnfjörð Bjarmason
  2022-01-12 18:34   ` Junio C Hamano
  3 siblings, 0 replies; 21+ messages in thread
From: Phillip Wood via GitGitGadget @ 2022-01-11 11:12 UTC (permalink / raw)
  To: git
  Cc: Johannes Schindelin, SZEDER Gábor,
	Ævar Arnfjörð Bjarmason, Phillip Wood,
	Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

The C reimplementation of "add -p" fails to split the last hunk in a
file if hunk ends with an addition or deletion without any post context
line unless it is the last file to be processed.

To determine whether a hunk can be split a counter is incremented each
time a context line follows an insertion or deletion. If at the end of
the hunk the value of this counter is greater than one then the hunk
can be split into that number of smaller hunks. If the last hunk in a
file ends with an insertion or deletion then there is no following
context line and the counter will not be incremented. This case is
already handled at the end of the loop where counter is incremented if
the last hunk ended with an insertion or deletion. Unfortunately there
is no similar check between files (likely because the perl version
only ever parses one diff at a time). Fix this by checking if the last
hunk ended with an insertion or deletion when we see the diff header
of a new file and extend the existing regression test.

Reproted-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 add-patch.c                | 20 +++++++++++------
 t/t3701-add-interactive.sh | 46 ++++++++++++++++++++++++++++++++++----
 2 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/add-patch.c b/add-patch.c
index 8c41cdfe39b..89ffda32b26 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -383,6 +383,17 @@ static int is_octal(const char *p, size_t len)
 	return 1;
 }
 
+static void complete_file(char marker, struct hunk *hunk)
+{
+	if (marker == '-' || marker == '+')
+		/*
+		 * Last hunk ended in non-context line (i.e. it
+		 * appended lines to the file, so there are no
+		 * trailing context lines).
+		 */
+		hunk->splittable_into++;
+}
+
 static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
 {
 	struct strvec args = STRVEC_INIT;
@@ -472,6 +483,7 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
 			eol = pend;
 
 		if (starts_with(p, "diff ")) {
+			complete_file(marker, hunk);
 			ALLOC_GROW_BY(s->file_diff, s->file_diff_nr, 1,
 				   file_diff_alloc);
 			file_diff = s->file_diff + s->file_diff_nr - 1;
@@ -598,13 +610,7 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
 				file_diff->hunk->colored_end = hunk->colored_end;
 		}
 	}
-
-	if (marker == '-' || marker == '+')
-		/*
-		 * Last hunk ended in non-context line (i.e. it appended lines
-		 * to the file, so there are no trailing context lines).
-		 */
-		hunk->splittable_into++;
+	complete_file(marker, hunk);
 
 	/* non-colored shorter than colored? */
 	if (colored_p != colored_pend) {
diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index 77de0029ba5..94537a6b40a 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -326,7 +326,9 @@ test_expect_success 'correct message when there is nothing to do' '
 test_expect_success 'setup again' '
 	git reset --hard &&
 	test_chmod +x file &&
-	echo content >>file
+	echo content >>file &&
+	test_write_lines A B C D>file2 &&
+	git add file2
 '
 
 # Write the patch file with a new line at the top and bottom
@@ -341,13 +343,27 @@ test_expect_success 'setup patch' '
 	 content
 	+lastline
 	\ No newline at end of file
+	diff --git a/file2 b/file2
+	index 8422d40..35b930a 100644
+	--- a/file2
+	+++ b/file2
+	@@ -1,4 +1,5 @@
+	-A
+	+Z
+	 B
+	+Y
+	 C
+	-D
+	+X
 	EOF
 '
 
 # Expected output, diff is similar to the patch but w/ diff at the top
 test_expect_success 'setup expected' '
 	echo diff --git a/file b/file >expected &&
-	sed "/^index/s/ 100644/ 100755/" patch >>expected &&
+	sed -e "/^index 180b47c/s/ 100644/ 100755/" \
+	    -e /1,5/s//1,4/ \
+	    -e /Y/d patch >>expected &&
 	cat >expected-output <<-\EOF
 	--- a/file
 	+++ b/file
@@ -366,6 +382,28 @@ test_expect_success 'setup expected' '
 	 content
 	+lastline
 	\ No newline at end of file
+	--- a/file2
+	+++ b/file2
+	@@ -1,4 +1,5 @@
+	-A
+	+Z
+	 B
+	+Y
+	 C
+	-D
+	+X
+	@@ -1,2 +1,2 @@
+	-A
+	+Z
+	 B
+	@@ -2,2 +2,3 @@
+	 B
+	+Y
+	 C
+	@@ -3,2 +4,2 @@
+	 C
+	-D
+	+X
 	EOF
 '
 
@@ -373,8 +411,8 @@ test_expect_success 'setup expected' '
 test_expect_success 'add first line works' '
 	git commit -am "clear local changes" &&
 	git apply patch &&
-	test_write_lines s y y | git add -p file 2>error >raw-output &&
-	sed -n -e "s/^([1-2]\/[1-2]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
+	test_write_lines s y y s y n y | git add -p 2>error >raw-output &&
+	sed -n -e "s/^([1-9]\/[1-9]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
 	       -e "/^[-+@ \\\\]"/p raw-output >output &&
 	test_must_be_empty error &&
 	git diff --cached >diff &&
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] builtin add -p: fix hunk splitting
  2021-12-20 19:06   ` Ævar Arnfjörð Bjarmason
@ 2022-01-11 11:13     ` Phillip Wood
  2022-01-11 11:44       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 21+ messages in thread
From: Phillip Wood @ 2022-01-11 11:13 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Phillip Wood via GitGitGadget
  Cc: git, Johannes Schindelin, SZEDER Gábor, Phillip Wood

Hi Ævar

On 20/12/2021 19:06, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Dec 20 2021, Phillip Wood via GitGitGadget wrote:
> 
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> To determine whether a hunk can be split a counter is incremented each
>> time a context line follows an insertion or deletion. If at the end of
>> the hunk the value of this counter is greater than one then the hunk
>> can be split into that number of smaller hunks. If the last hunk in a
>> file ends with an insertion or deletion then there is no following
>> context line and the counter will not be incremented. This case is
>> already handled at the end of the loop where counter is incremented if
>> the last hunk ended with an insertion or deletion. Unfortunately there
>> is no similar check between files (likely because the perl version
>> only ever parses one diff at a time). Fix this by checking if the last
>> hunk ended with an insertion or deletion when we see the diff header
>> of a new file and extend the existing regression test.
>>
>> Reproted-by: SZEDER Gábor <szeder.dev@gmail.com>
>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>> ---
>>   add-patch.c                |  7 ++++++
>>   t/t3701-add-interactive.sh | 46 ++++++++++++++++++++++++++++++++++----
>>   2 files changed, 49 insertions(+), 4 deletions(-)
>>
>> diff --git a/add-patch.c b/add-patch.c
>> index 8c41cdfe39b..5cea70666e9 100644
>> --- a/add-patch.c
>> +++ b/add-patch.c
>> @@ -472,6 +472,13 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
>>   			eol = pend;
>>   
>>   		if (starts_with(p, "diff ")) {
>> +			if (marker == '-' || marker == '+')
>> +				/*
>> +				 * Last hunk ended in non-context line (i.e. it
>> +				 * appended lines to the file, so there are no
>> +				 * trailing context lines).
>> +				 */
>> +				hunk->splittable_into++;
> 
> I wondered if factoring out these several "marker == '-' || marker ==
> '+'" cases in parse_diff() into a "is_plus_minus(marker)" was worth it,
> but probably not.

Yeah in the end I just factored out this hunk into a new function but I 
didn't add a function for "marker == '-' || marker ==
 > '+'"

>>   			ALLOC_GROW_BY(s->file_diff, s->file_diff_nr, 1,
>>   				   file_diff_alloc);
>>   			file_diff = s->file_diff + s->file_diff_nr - 1;
>> diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
>> index 77de0029ba5..94537a6b40a 100755
>> --- a/t/t3701-add-interactive.sh
>> +++ b/t/t3701-add-interactive.sh
>> @@ -326,7 +326,9 @@ test_expect_success 'correct message when there is nothing to do' '
>>   test_expect_success 'setup again' '
>>   	git reset --hard &&
>>   	test_chmod +x file &&
>> -	echo content >>file
>> +	echo content >>file &&
>> +	test_write_lines A B C D>file2 &&
> 
> style nit: "cmd args >file2" not "cmd args>file2"
> 
>> @@ -373,8 +411,8 @@ test_expect_success 'setup expected' '
>>   test_expect_success 'add first line works' '
>>   	git commit -am "clear local changes" &&
>>   	git apply patch &&
>> -	test_write_lines s y y | git add -p file 2>error >raw-output &&
>> -	sed -n -e "s/^([1-2]\/[1-2]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
>> +	test_write_lines s y y s y n y | git add -p 2>error >raw-output &&
>> +	sed -n -e "s/^([1-9]\/[1-9]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
>>   	       -e "/^[-+@ \\\\]"/p raw-output >output &&
>>   	test_must_be_empty error &&
>>   	git diff --cached >diff &&
> 
> style/diff nit: maybe worth it to in 1/2 do some version of:
> 
>      test_write_lines ... >lines &&
>      git ... <lines .. &&
>      ...
>      sed -n \
>      	-e ... \
>          -e ... \
>          >output
> 
> Just to make the diff smaller, i.e. just the "test_write_lines" line
> would be modified here.

In the end I decided to leave this as is, while refactoring slightly 
simplifies this patch it makes the previous one bigger and means that 
would need to be reviewed again.


> The changes themselves & this series LGTM.

Thanks

Best Wishes

Phillip



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] builtin add -p: fix hunk splitting
  2022-01-11 11:13     ` Phillip Wood
@ 2022-01-11 11:44       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-01-11 11:44 UTC (permalink / raw)
  To: phillip.wood
  Cc: Phillip Wood via GitGitGadget, git, Johannes Schindelin,
	SZEDER Gábor


On Tue, Jan 11 2022, Phillip Wood wrote:

> Hi Ævar
>
> On 20/12/2021 19:06, Ævar Arnfjörð Bjarmason wrote:
>> On Mon, Dec 20 2021, Phillip Wood via GitGitGadget wrote:
>> 
>>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>>
>>> To determine whether a hunk can be split a counter is incremented each
>>> time a context line follows an insertion or deletion. If at the end of
>>> the hunk the value of this counter is greater than one then the hunk
>>> can be split into that number of smaller hunks. If the last hunk in a
>>> file ends with an insertion or deletion then there is no following
>>> context line and the counter will not be incremented. This case is
>>> already handled at the end of the loop where counter is incremented if
>>> the last hunk ended with an insertion or deletion. Unfortunately there
>>> is no similar check between files (likely because the perl version
>>> only ever parses one diff at a time). Fix this by checking if the last
>>> hunk ended with an insertion or deletion when we see the diff header
>>> of a new file and extend the existing regression test.
>>>
>>> Reproted-by: SZEDER Gábor <szeder.dev@gmail.com>
>>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>>> ---
>>>   add-patch.c                |  7 ++++++
>>>   t/t3701-add-interactive.sh | 46 ++++++++++++++++++++++++++++++++++----
>>>   2 files changed, 49 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/add-patch.c b/add-patch.c
>>> index 8c41cdfe39b..5cea70666e9 100644
>>> --- a/add-patch.c
>>> +++ b/add-patch.c
>>> @@ -472,6 +472,13 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
>>>   			eol = pend;
>>>     		if (starts_with(p, "diff ")) {
>>> +			if (marker == '-' || marker == '+')
>>> +				/*
>>> +				 * Last hunk ended in non-context line (i.e. it
>>> +				 * appended lines to the file, so there are no
>>> +				 * trailing context lines).
>>> +				 */
>>> +				hunk->splittable_into++;
>> I wondered if factoring out these several "marker == '-' || marker
>> ==
>> '+'" cases in parse_diff() into a "is_plus_minus(marker)" was worth it,
>> but probably not.
>
> Yeah in the end I just factored out this hunk into a new function but
> I didn't add a function for "marker == '-' || marker ==
>> '+'"
>
>>>   			ALLOC_GROW_BY(s->file_diff, s->file_diff_nr, 1,
>>>   				   file_diff_alloc);
>>>   			file_diff = s->file_diff + s->file_diff_nr - 1;
>>> diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
>>> index 77de0029ba5..94537a6b40a 100755
>>> --- a/t/t3701-add-interactive.sh
>>> +++ b/t/t3701-add-interactive.sh
>>> @@ -326,7 +326,9 @@ test_expect_success 'correct message when there is nothing to do' '
>>>   test_expect_success 'setup again' '
>>>   	git reset --hard &&
>>>   	test_chmod +x file &&
>>> -	echo content >>file
>>> +	echo content >>file &&
>>> +	test_write_lines A B C D>file2 &&
>> style nit: "cmd args >file2" not "cmd args>file2"
>> 
>>> @@ -373,8 +411,8 @@ test_expect_success 'setup expected' '
>>>   test_expect_success 'add first line works' '
>>>   	git commit -am "clear local changes" &&
>>>   	git apply patch &&
>>> -	test_write_lines s y y | git add -p file 2>error >raw-output &&
>>> -	sed -n -e "s/^([1-2]\/[1-2]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
>>> +	test_write_lines s y y s y n y | git add -p 2>error >raw-output &&
>>> +	sed -n -e "s/^([1-9]\/[1-9]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
>>>   	       -e "/^[-+@ \\\\]"/p raw-output >output &&
>>>   	test_must_be_empty error &&
>>>   	git diff --cached >diff &&
>> style/diff nit: maybe worth it to in 1/2 do some version of:
>>      test_write_lines ... >lines &&
>>      git ... <lines .. &&
>>      ...
>>      sed -n \
>>      	-e ... \
>>          -e ... \
>>          >output
>> Just to make the diff smaller, i.e. just the "test_write_lines" line
>> would be modified here.
>
> In the end I decided to leave this as is, while refactoring slightly
> simplifies this patch it makes the previous one bigger and means that 
> would need to be reviewed again.

All sounds good to me. Just stuff I thought I'd point out in case you
thought it made sense. Going with it as-is is fine too.

>> The changes themselves & this series LGTM.
>
> Thanks
>
> Best Wishes
>
> Phillip


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/2] builtin add -p: fix hunk splitting
  2022-01-11 11:12 ` [PATCH v2 0/2] " Phillip Wood via GitGitGadget
  2022-01-11 11:12   ` [PATCH v2 1/2] t3701: clean up hunk splitting tests Phillip Wood via GitGitGadget
  2022-01-11 11:12   ` [PATCH v2 2/2] builtin add -p: fix hunk splitting Phillip Wood via GitGitGadget
@ 2022-01-11 12:07   ` Ævar Arnfjörð Bjarmason
  2022-01-11 18:57     ` Phillip Wood
  2022-01-12 18:34   ` Junio C Hamano
  3 siblings, 1 reply; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-01-11 12:07 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget
  Cc: git, Johannes Schindelin, SZEDER Gábor, Phillip Wood


On Tue, Jan 11 2022, Phillip Wood via GitGitGadget wrote:

> Thanks to Junio and Ævar for their comments on V1. I've updated the commit
> message and added a helper function as suggested.

This v2 LGTM as far as the functionality of the end-state is concerned.

As a remaining nit the complete_file() helper you introduce in 2/2
changes 2/4 places that increment "hunk->splittable+into".

I grabbed this PR and came up with this amendmend to it which adds a 2/3
step that converts 3/3 of them, followed by adding the 4th user in your
2/2 (now patch 3/3):
https://github.com/git/git/compare/master...avar:phillipwood-avar/wip/add-p-fix-hunk-splitting-v2.1

It changes nothing as far as the end-state is concerned, but I think it
makes this easier to read & follow. The actual behavior change becomes a
one-line addition to add-patch.c, instead of being mixed up with the
refactoring of adding the new helper.

If you'd like to pick that up & run with it as a v3 that's fine by me,
and if not that's also fine :) Just a suggestion.

A range-diff between your v2 here and that linked-to
phillipwood-avar/wip/add-p-fix-hunk-splitting-v2.1:

1:  cc8639fc29d = 1:  34392397f04 t3701: clean up hunk splitting tests
-:  ----------- > 2:  c082176f8c5 add-file.c: use static helper to check marker == +|-
2:  b698989e265 ! 3:  defca0baba4 builtin add -p: fix hunk splitting
    @@ Commit message
         Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
     
      ## add-patch.c ##
    -@@ add-patch.c: static int is_octal(const char *p, size_t len)
    - 	return 1;
    - }
    - 
    -+static void complete_file(char marker, struct hunk *hunk)
    -+{
    -+	if (marker == '-' || marker == '+')
    -+		/*
    -+		 * Last hunk ended in non-context line (i.e. it
    -+		 * appended lines to the file, so there are no
    -+		 * trailing context lines).
    -+		 */
    -+		hunk->splittable_into++;
    -+}
    -+
    - static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
    - {
    - 	struct strvec args = STRVEC_INIT;
     @@ add-patch.c: static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
      			eol = pend;
      
      		if (starts_with(p, "diff ")) {
    -+			complete_file(marker, hunk);
    ++			complete_file(marker, &hunk->splittable_into);
      			ALLOC_GROW_BY(s->file_diff, s->file_diff_nr, 1,
      				   file_diff_alloc);
      			file_diff = s->file_diff + s->file_diff_nr - 1;
    -@@ add-patch.c: static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
    - 				file_diff->hunk->colored_end = hunk->colored_end;
    - 		}
    - 	}
    --
    --	if (marker == '-' || marker == '+')
    --		/*
    --		 * Last hunk ended in non-context line (i.e. it appended lines
    --		 * to the file, so there are no trailing context lines).
    --		 */
    --		hunk->splittable_into++;
    -+	complete_file(marker, hunk);
    - 
    - 	/* non-colored shorter than colored? */
    - 	if (colored_p != colored_pend) {
     
      ## t/t3701-add-interactive.sh ##
     @@ t/t3701-add-interactive.sh: test_expect_success 'correct message when there is nothing to do' '

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/2] builtin add -p: fix hunk splitting
  2022-01-11 12:07   ` [PATCH v2 0/2] " Ævar Arnfjörð Bjarmason
@ 2022-01-11 18:57     ` Phillip Wood
  2022-01-12 18:51       ` Junio C Hamano
  0 siblings, 1 reply; 21+ messages in thread
From: Phillip Wood @ 2022-01-11 18:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Phillip Wood via GitGitGadget
  Cc: git, Johannes Schindelin, SZEDER Gábor, Phillip Wood

Hi Ævar

On 11/01/2022 12:07, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Jan 11 2022, Phillip Wood via GitGitGadget wrote:
> 
>> Thanks to Junio and Ævar for their comments on V1. I've updated the commit
>> message and added a helper function as suggested.
> 
> This v2 LGTM as far as the functionality of the end-state is concerned.

Thanks for taking a look

> As a remaining nit the complete_file() helper you introduce in 2/2
> changes 2/4 places that increment "hunk->splittable+into".
> 
> I grabbed this PR and came up with this amendmend to it which adds a 2/3
> step that converts 3/3 of them, followed by adding the 4th user in your
> 2/2 (now patch 3/3):
> https://github.com/git/git/compare/master...avar:phillipwood-avar/wip/add-p-fix-hunk-splitting-v2.1
> 
> It changes nothing as far as the end-state is concerned, but I think it
> makes this easier to read & follow. The actual behavior change becomes a
> one-line addition to add-patch.c, instead of being mixed up with the
> refactoring of adding the new helper.
> 
> If you'd like to pick that up & run with it as a v3 that's fine by me,
> and if not that's also fine :) Just a suggestion.

I'm not sure I want to go with your extra changes. I've left some
comments on them below

> @@ -488,12 +499,12 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
>  	else if (starts_with(p, "@@ ") ||
>  		 (hunk == &file_diff->head &&
>  		  (skip_prefix(p, "deleted file", &deleted)))) {
> -		if (marker == '-' || marker == '+')
> -			/*
> -			 * Should not happen; previous hunk did not end
> -			 * in a context line? Handle it anyway.
> -			 */
> +			hunk->splittable_into++;
> +		/*
> +		 * Should not increment "splittable_into";
> +		 * previous hunk did not end in a context
> +		 * line? Handle it anyway.
> +		 */
> +		complete_file(marker, &hunk->splittable_into);
>  
>  		ALLOC_GROW_BY(file_diff->hunk, file_diff->hunk_nr, 1,
>  			   file_diff->hunk_alloc);

I deliberately left this alone as I think we should probably make this
BUG() out instead of silently accepting an invalid diff.

> @@ -566,8 +577,8 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
>  			    (int)(eol - (plain->buf + file_diff->head.start)),
>  			    plain->buf + file_diff->head.start);
>  
> -		if ((marker == '-' || marker == '+') && *p == ' ')
> -			hunk->splittable_into++;
> +		if (*p == ' ')
> +			complete_file(marker, &hunk->splittable_into);
>  		if (marker && *p != '\\')
>  			marker = *p;
  
Here you are calling complete_file() which has the following comment

      /*
       * Last hunk ended in non-context line (i.e. it
       * appended lines to the file, so there are no
       * trailing context lines).
       */

for all context lines so the function name and comment would need
updating.

Best Wishes

Phillip

> A range-diff between your v2 here and that linked-to
> phillipwood-avar/wip/add-p-fix-hunk-splitting-v2.1:
> 
> 1:  cc8639fc29d = 1:  34392397f04 t3701: clean up hunk splitting tests
> -:  ----------- > 2:  c082176f8c5 add-file.c: use static helper to check marker == +|-
> 2:  b698989e265 ! 3:  defca0baba4 builtin add -p: fix hunk splitting
>      @@ Commit message
>           Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>       
>        ## add-patch.c ##
>      -@@ add-patch.c: static int is_octal(const char *p, size_t len)
>      - 	return 1;
>      - }
>      -
>      -+static void complete_file(char marker, struct hunk *hunk)
>      -+{
>      -+	if (marker == '-' || marker == '+')
>      -+		/*
>      -+		 * Last hunk ended in non-context line (i.e. it
>      -+		 * appended lines to the file, so there are no
>      -+		 * trailing context lines).
>      -+		 */
>      -+		hunk->splittable_into++;
>      -+}
>      -+
>      - static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
>      - {
>      - 	struct strvec args = STRVEC_INIT;
>       @@ add-patch.c: static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
>        			eol = pend;
>        
>        		if (starts_with(p, "diff ")) {
>      -+			complete_file(marker, hunk);
>      ++			complete_file(marker, &hunk->splittable_into);
>        			ALLOC_GROW_BY(s->file_diff, s->file_diff_nr, 1,
>        				   file_diff_alloc);
>        			file_diff = s->file_diff + s->file_diff_nr - 1;
>      -@@ add-patch.c: static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
>      - 				file_diff->hunk->colored_end = hunk->colored_end;
>      - 		}
>      - 	}
>      --
>      --	if (marker == '-' || marker == '+')
>      --		/*
>      --		 * Last hunk ended in non-context line (i.e. it appended lines
>      --		 * to the file, so there are no trailing context lines).
>      --		 */
>      --		hunk->splittable_into++;
>      -+	complete_file(marker, hunk);
>      -
>      - 	/* non-colored shorter than colored? */
>      - 	if (colored_p != colored_pend) {
>       
>        ## t/t3701-add-interactive.sh ##
>       @@ t/t3701-add-interactive.sh: test_expect_success 'correct message when there is nothing to do' '


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/2] builtin add -p: fix hunk splitting
  2022-01-11 11:12 ` [PATCH v2 0/2] " Phillip Wood via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-01-11 12:07   ` [PATCH v2 0/2] " Ævar Arnfjörð Bjarmason
@ 2022-01-12 18:34   ` Junio C Hamano
  3 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2022-01-12 18:34 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget
  Cc: git, Johannes Schindelin, SZEDER Gábor,
	Ævar Arnfjörð Bjarmason, Phillip Wood

"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Thanks to Junio and Ævar for their comments on V1. I've updated the commit
> message and added a helper function as suggested.
>
> V1 Cover Letter: Fix a small regression in the hunk splitting of the builtin
> version compared to the perl version. Thanks to Szeder for the easy to
> follow bug report.

Looking good.  After comparing the output from 

    $ git grep -e 'finalize[_a-z]*(' -e 'complete[_a-z]*(' \*.c

I would have called the helper "finalize_file()", but in the context
of this file, the name complete_file() is not misleading enough to
require renaming.

Will queue.  Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/2] builtin add -p: fix hunk splitting
  2022-01-11 18:57     ` Phillip Wood
@ 2022-01-12 18:51       ` Junio C Hamano
  2022-01-19 20:01         ` Phillip Wood
  0 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2022-01-12 18:51 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Ævar Arnfjörð Bjarmason,
	Phillip Wood via GitGitGadget, git, Johannes Schindelin,
	SZEDER Gábor, Phillip Wood

Phillip Wood <phillip.wood123@gmail.com> writes:

> I'm not sure I want to go with your extra changes. I've left some
> comments on them below
>
>> @@ -488,12 +499,12 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
>>  	else if (starts_with(p, "@@ ") ||
>>  		 (hunk == &file_diff->head &&
>>  		  (skip_prefix(p, "deleted file", &deleted)))) {
>> -		if (marker == '-' || marker == '+')
>> -			/*
>> -			 * Should not happen; previous hunk did not end
>> -			 * in a context line? Handle it anyway.
>> -			 */
>> +			hunk->splittable_into++;
>> +		/*
>> +		 * Should not increment "splittable_into";
>> +		 * previous hunk did not end in a context
>> +		 * line? Handle it anyway.
>> +		 */
>> +		complete_file(marker, &hunk->splittable_into);
>>   		ALLOC_GROW_BY(file_diff->hunk, file_diff->hunk_nr, 1,
>>  			   file_diff->hunk_alloc);
>
> I deliberately left this alone as I think we should probably make this
> BUG() out instead of silently accepting an invalid diff.

As we are reading our own output, I agree that such a data error is
a BUG().

In any case, a helper to see if the file ended without post-context
is one thing, and a helper that specify what happens after we are
done with a single file, before we move on top the next file or
after processing the last file, is another thing.  The latter may be
able to make use of the former, but the latter may want to do more
than that in the future.

As complete_file() is about finalizing the processing we have done
to the current file, it should be used for that purpose, and nothing
else, I think the hunk I see at
https://github.com/git/git/commit/c082176f8c5a1fc1c8b2a93991ca28fd63aae73a
(reproduced below) is simply a nonsense.

Stepping back a bit, though, is this helper really finalizing the
current file, or is it finalizing the current hunk?  If it were the
latter, then its use in the hunk I called "nonsense" above actually
makes perfect sense.  There may not be anything other than finalizing
the last hunk when we see the end of a file right now, so we may not
need to add a finalize_file() helper right now, and when we need to
do something more than finalizing the last hunk, we may need to capture
the distinction by adding one.

diff --git i/add-patch.c w/add-patch.c
index 89ffda32b2..6094290c86 100644
--- i/add-patch.c
+++ w/add-patch.c
@@ -578,8 +578,8 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
 			    (int)(eol - (plain->buf + file_diff->head.start)),
 			    plain->buf + file_diff->head.start);
 
-		if ((marker == '-' || marker == '+') && *p == ' ')
-			hunk->splittable_into++;
+		if (*p == ' ')
+			complete_file(marker, &hunk->splittable_into);
 		if (marker && *p != '\\')
 			marker = *p;
 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/2] builtin add -p: fix hunk splitting
  2022-01-12 18:51       ` Junio C Hamano
@ 2022-01-19 20:01         ` Phillip Wood
  2022-01-20  5:02           ` Junio C Hamano
  2022-01-22  9:05           ` Johannes Schindelin
  0 siblings, 2 replies; 21+ messages in thread
From: Phillip Wood @ 2022-01-19 20:01 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason,
	Phillip Wood via GitGitGadget, git, Johannes Schindelin,
	SZEDER Gábor, Phillip Wood

Hi Junio

On 12/01/2022 18:51, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
>> I'm not sure I want to go with your extra changes. I've left some
>> comments on them below
>>
>>> @@ -488,12 +499,12 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
>>>   	else if (starts_with(p, "@@ ") ||
>>>   		 (hunk == &file_diff->head &&
>>>   		  (skip_prefix(p, "deleted file", &deleted)))) {
>>> -		if (marker == '-' || marker == '+')
>>> -			/*
>>> -			 * Should not happen; previous hunk did not end
>>> -			 * in a context line? Handle it anyway.
>>> -			 */
>>> +			hunk->splittable_into++;
>>> +		/*
>>> +		 * Should not increment "splittable_into";
>>> +		 * previous hunk did not end in a context
>>> +		 * line? Handle it anyway.
>>> +		 */
>>> +		complete_file(marker, &hunk->splittable_into);
>>>    		ALLOC_GROW_BY(file_diff->hunk, file_diff->hunk_nr, 1,
>>>   			   file_diff->hunk_alloc);
>>
>> I deliberately left this alone as I think we should probably make this
>> BUG() out instead of silently accepting an invalid diff.
> 
> As we are reading our own output, I agree that such a data error is
> a BUG().
> 
> In any case, a helper to see if the file ended without post-context
> is one thing, and a helper that specify what happens after we are
> done with a single file, before we move on top the next file or
> after processing the last file, is another thing.  The latter may be
> able to make use of the former, but the latter may want to do more
> than that in the future.
> 
> As complete_file() is about finalizing the processing we have done
> to the current file, it should be used for that purpose, and nothing
> else, I think the hunk I see at
> https://github.com/git/git/commit/c082176f8c5a1fc1c8b2a93991ca28fd63aae73a
> (reproduced below) is simply a nonsense.
> 
> Stepping back a bit, though, is this helper really finalizing the
> current file, or is it finalizing the current hunk?  If it were the
> latter, then its use in the hunk I called "nonsense" above actually
> makes perfect sense.

Even if the helper is finalizing the current hunk then I think that 
"nonsense" hunk would still wrong as it would be calling finalize_hunk() 
on _every_ context line in the hunk rather than just being called once 
to finalize the hunk. We could call the function something like 
update_splittable() but then we'd need to explain why we were calling 
that function at the start of a diff and at the end of the loop.

> There may not be anything other than finalizing
> the last hunk when we see the end of a file right now, so we may not
> need to add a finalize_file() helper right now, and when we need to
> do something more than finalizing the last hunk, we may need to capture
> the distinction by adding one.

Yes, if you're happy lets leave this series as it is

Best Wishes

Phillip

> diff --git i/add-patch.c w/add-patch.c
> index 89ffda32b2..6094290c86 100644
> --- i/add-patch.c
> +++ w/add-patch.c
> @@ -578,8 +578,8 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
>   			    (int)(eol - (plain->buf + file_diff->head.start)),
>   			    plain->buf + file_diff->head.start);
>   
> -		if ((marker == '-' || marker == '+') && *p == ' ')
> -			hunk->splittable_into++;
> +		if (*p == ' ')
> +			complete_file(marker, &hunk->splittable_into);
>   		if (marker && *p != '\\')
>   			marker = *p;
>   
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/2] builtin add -p: fix hunk splitting
  2022-01-19 20:01         ` Phillip Wood
@ 2022-01-20  5:02           ` Junio C Hamano
  2022-01-20  8:42             ` Ævar Arnfjörð Bjarmason
  2022-01-22  9:05           ` Johannes Schindelin
  1 sibling, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2022-01-20  5:02 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Ævar Arnfjörð Bjarmason,
	Phillip Wood via GitGitGadget, git, Johannes Schindelin,
	SZEDER Gábor, Phillip Wood

Phillip Wood <phillip.wood123@gmail.com> writes:

> Even if the helper is finalizing the current hunk then I think that
> "nonsense" hunk would still wrong as it would be calling
> finalize_hunk() on _every_ context line in the hunk rather than just
> being called once to finalize the hunk.

True; this triggers every time we finish reading the common context
lines and not at the end of hunk.  In any case, I think what we
queued looks good for 'next'.

>>   -		if ((marker == '-' || marker == '+') && *p == ' ')
>> -			hunk->splittable_into++;
>> +		if (*p == ' ')
>> +			complete_file(marker, &hunk->splittable_into);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/2] builtin add -p: fix hunk splitting
  2022-01-20  5:02           ` Junio C Hamano
@ 2022-01-20  8:42             ` Ævar Arnfjörð Bjarmason
  2022-01-20 19:13               ` Junio C Hamano
  0 siblings, 1 reply; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-01-20  8:42 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Phillip Wood, Phillip Wood via GitGitGadget, git,
	Johannes Schindelin, SZEDER Gábor, Phillip Wood


On Wed, Jan 19 2022, Junio C Hamano wrote:

> Phillip Wood <phillip.wood123@gmail.com> writes:
>
>> Even if the helper is finalizing the current hunk then I think that
>> "nonsense" hunk would still wrong as it would be calling
>> finalize_hunk() on _every_ context line in the hunk rather than just
>> being called once to finalize the hunk.
>
> True; this triggers every time we finish reading the common context
> lines and not at the end of hunk.  In any case, I think what we
> queued looks good for 'next'.

For what it's worth (and as the person who started this side-thread) I
agree. This looks good as-is, thanks both!

>>>   -		if ((marker == '-' || marker == '+') && *p == ' ')
>>> -			hunk->splittable_into++;
>>> +		if (*p == ' ')
>>> +			complete_file(marker, &hunk->splittable_into);


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/2] builtin add -p: fix hunk splitting
  2022-01-20  8:42             ` Ævar Arnfjörð Bjarmason
@ 2022-01-20 19:13               ` Junio C Hamano
  0 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2022-01-20 19:13 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Phillip Wood, Phillip Wood via GitGitGadget, git,
	Johannes Schindelin, SZEDER Gábor, Phillip Wood

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Wed, Jan 19 2022, Junio C Hamano wrote:
>
>> Phillip Wood <phillip.wood123@gmail.com> writes:
>>
>>> Even if the helper is finalizing the current hunk then I think that
>>> "nonsense" hunk would still wrong as it would be calling
>>> finalize_hunk() on _every_ context line in the hunk rather than just
>>> being called once to finalize the hunk.
>>
>> True; this triggers every time we finish reading the common context
>> lines and not at the end of hunk.  In any case, I think what we
>> queued looks good for 'next'.
>
> For what it's worth (and as the person who started this side-thread) I
> agree. This looks good as-is, thanks both!
>
>>>>   -		if ((marker == '-' || marker == '+') && *p == ' ')
>>>> -			hunk->splittable_into++;
>>>> +		if (*p == ' ')
>>>> +			complete_file(marker, &hunk->splittable_into);

Yup, thanks all.  The fix is now in 'next' and I expect we can
safely merge it down as part of the first batch next cycle.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/2] builtin add -p: fix hunk splitting
  2022-01-19 20:01         ` Phillip Wood
  2022-01-20  5:02           ` Junio C Hamano
@ 2022-01-22  9:05           ` Johannes Schindelin
  2022-01-24 11:10             ` Phillip Wood
  1 sibling, 1 reply; 21+ messages in thread
From: Johannes Schindelin @ 2022-01-22  9:05 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Phillip Wood via GitGitGadget, git, SZEDER Gábor

Hi Phillip,

first of all: thank you for these patches. I read over them and they have
my ACK.

On Wed, 19 Jan 2022, Phillip Wood wrote:

> On 12/01/2022 18:51, Junio C Hamano wrote:
> > Phillip Wood <phillip.wood123@gmail.com> writes:
> >
> > > I'm not sure I want to go with your extra changes. I've left some
> > > comments on them below
> > >
> > > > @@ -488,12 +499,12 @@ static int parse_diff(struct add_p_state *s, const
> > > > struct pathspec *ps)
> > > >    else if (starts_with(p, "@@ ") ||
> > > >      (hunk == &file_diff->head &&
> > > >   		  (skip_prefix(p, "deleted file", &deleted)))) {
> > > > -		if (marker == '-' || marker == '+')
> > > > -			/*
> > > > -			 * Should not happen; previous hunk did not end
> > > > -			 * in a context line? Handle it anyway.
> > > > -			 */
> > > > +			hunk->splittable_into++;
> > > > +		/*
> > > > +		 * Should not increment "splittable_into";
> > > > +		 * previous hunk did not end in a context
> > > > +		 * line? Handle it anyway.
> > > > +		 */
> > > > +		complete_file(marker, &hunk->splittable_into);
> > > >      ALLOC_GROW_BY(file_diff->hunk, file_diff->hunk_nr, 1,
> > > >         file_diff->hunk_alloc);
> > >
> > > I deliberately left this alone as I think we should probably make
> > > this BUG() out instead of silently accepting an invalid diff.

FWIW this was overzealous defensive programming on my part. More on that
below.

> > As we are reading our own output, I agree that such a data error is
> > a BUG().

Indeed. I was less worried about the output format changing, and more
concerned with bugs in my parser ;-)

Although, having said that, I had meant to verify that `git add -p` cannot
be asked to produce and consume diffs with `-U0` when I wrote that
comment. Now I did that, and I am now confident that there is no way to ask
`git add -p` to generate and use context line-free diffs: we neither add
`-U<n>` in https://github.com/git/git/blob/v2.34.1/add-patch.c#L398-L417
nor do we call the user-facing `git diff` command that would interpret
`diff.context`, but instead we use `git diff-index` and `git diff-files`
(which ignore that config setting).

> > In any case, a helper to see if the file ended without post-context
> > is one thing, and a helper that specify what happens after we are
> > done with a single file, before we move on top the next file or
> > after processing the last file, is another thing.  The latter may be
> > able to make use of the former, but the latter may want to do more
> > than that in the future.

If you are concerned about the name of the function: maybe a better name
would be `maybe_increment_splittable_hunk_count(marker)`.

> >
> > As complete_file() is about finalizing the processing we have done
> > to the current file, it should be used for that purpose, and nothing
> > else, I think the hunk I see at
> > https://github.com/git/git/commit/c082176f8c5a1fc1c8b2a93991ca28fd63aae73a
> > (reproduced below) is simply a nonsense.
> >
> > Stepping back a bit, though, is this helper really finalizing the
> > current file, or is it finalizing the current hunk?  If it were the
> > latter, then its use in the hunk I called "nonsense" above actually
> > makes perfect sense.
>
> Even if the helper is finalizing the current hunk then I think that "nonsense"
> hunk would still wrong as it would be calling finalize_hunk() on _every_
> context line in the hunk rather than just being called once to finalize the
> hunk. We could call the function something like update_splittable() but then
> we'd need to explain why we were calling that function at the start of a diff
> and at the end of the loop.

Right. The point of this check is to see whether we missed counting a
splittable hunk. Then it makes more sense to call it at the beginning of a
file, at the end of a file _and_ at a context line.

Having said all that, I am really fine with what landed in `next`.

Thank you,
Dscho

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/2] builtin add -p: fix hunk splitting
  2022-01-22  9:05           ` Johannes Schindelin
@ 2022-01-24 11:10             ` Phillip Wood
  0 siblings, 0 replies; 21+ messages in thread
From: Phillip Wood @ 2022-01-24 11:10 UTC (permalink / raw)
  To: Johannes Schindelin, Phillip Wood
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Phillip Wood via GitGitGadget, git, SZEDER Gábor

Hi Dscho

On 22/01/2022 09:05, Johannes Schindelin wrote:
> Hi Phillip,
> 
> first of all: thank you for these patches. I read over them and they have
> my ACK.

Thanks

> On Wed, 19 Jan 2022, Phillip Wood wrote:
> 
>> On 12/01/2022 18:51, Junio C Hamano wrote:
>>> Phillip Wood <phillip.wood123@gmail.com> writes:
>>>
>>>> I'm not sure I want to go with your extra changes. I've left some
>>>> comments on them below
>>>>
>>>>> @@ -488,12 +499,12 @@ static int parse_diff(struct add_p_state *s, const
>>>>> struct pathspec *ps)
>>>>>     else if (starts_with(p, "@@ ") ||
>>>>>       (hunk == &file_diff->head &&
>>>>>    		  (skip_prefix(p, "deleted file", &deleted)))) {
>>>>> -		if (marker == '-' || marker == '+')
>>>>> -			/*
>>>>> -			 * Should not happen; previous hunk did not end
>>>>> -			 * in a context line? Handle it anyway.
>>>>> -			 */
>>>>> +			hunk->splittable_into++;
>>>>> +		/*
>>>>> +		 * Should not increment "splittable_into";
>>>>> +		 * previous hunk did not end in a context
>>>>> +		 * line? Handle it anyway.
>>>>> +		 */
>>>>> +		complete_file(marker, &hunk->splittable_into);
>>>>>       ALLOC_GROW_BY(file_diff->hunk, file_diff->hunk_nr, 1,
>>>>>          file_diff->hunk_alloc);
>>>>
>>>> I deliberately left this alone as I think we should probably make
>>>> this BUG() out instead of silently accepting an invalid diff.
> 
> FWIW this was overzealous defensive programming on my part. More on that
> below.
> 
>>> As we are reading our own output, I agree that such a data error is
>>> a BUG().
> 
> Indeed. I was less worried about the output format changing, and more
> concerned with bugs in my parser ;-)
> 
> Although, having said that, I had meant to verify that `git add -p` cannot
> be asked to produce and consume diffs with `-U0` when I wrote that
> comment. Now I did that, and I am now confident that there is no way to ask
> `git add -p` to generate and use context line-free diffs: we neither add
> `-U<n>` in https://github.com/git/git/blob/v2.34.1/add-patch.c#L398-L417
> nor do we call the user-facing `git diff` command that would interpret
> `diff.context`, but instead we use `git diff-index` and `git diff-files`
> (which ignore that config setting).

I did think about zero context diffs but realized that they can never be 
split so we don't need to worry about incrementing hunk->splittable_into 
in that case. It does mean that hunk->splittable_into will be zero in 
the -U0 case rather than one but I dont think that matters as we only 
care if it is >2 for splitting.

Best Wishes

Phillip

>>> In any case, a helper to see if the file ended without post-context
>>> is one thing, and a helper that specify what happens after we are
>>> done with a single file, before we move on top the next file or
>>> after processing the last file, is another thing.  The latter may be
>>> able to make use of the former, but the latter may want to do more
>>> than that in the future.
> 
> If you are concerned about the name of the function: maybe a better name
> would be `maybe_increment_splittable_hunk_count(marker)`.
> 
>>>
>>> As complete_file() is about finalizing the processing we have done
>>> to the current file, it should be used for that purpose, and nothing
>>> else, I think the hunk I see at
>>> https://github.com/git/git/commit/c082176f8c5a1fc1c8b2a93991ca28fd63aae73a
>>> (reproduced below) is simply a nonsense.
>>>
>>> Stepping back a bit, though, is this helper really finalizing the
>>> current file, or is it finalizing the current hunk?  If it were the
>>> latter, then its use in the hunk I called "nonsense" above actually
>>> makes perfect sense.
>>
>> Even if the helper is finalizing the current hunk then I think that "nonsense"
>> hunk would still wrong as it would be calling finalize_hunk() on _every_
>> context line in the hunk rather than just being called once to finalize the
>> hunk. We could call the function something like update_splittable() but then
>> we'd need to explain why we were calling that function at the start of a diff
>> and at the end of the loop.
> 
> Right. The point of this check is to see whether we missed counting a
> splittable hunk. Then it makes more sense to call it at the beginning of a
> file, at the end of a file _and_ at a context line.
> 
> Having said all that, I am really fine with what landed in `next`.
> 
> Thank you,
> Dscho

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-01-24 11:10 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-20 14:32 [PATCH 0/2] builtin add -p: fix hunk splitting Phillip Wood via GitGitGadget
2021-12-20 14:32 ` [PATCH 1/2] t3701: clean up hunk splitting tests Phillip Wood via GitGitGadget
2021-12-20 21:09   ` Junio C Hamano
2021-12-20 14:32 ` [PATCH 2/2] builtin add -p: fix hunk splitting Phillip Wood via GitGitGadget
2021-12-20 19:06   ` Ævar Arnfjörð Bjarmason
2022-01-11 11:13     ` Phillip Wood
2022-01-11 11:44       ` Ævar Arnfjörð Bjarmason
2021-12-20 21:30   ` Junio C Hamano
2022-01-11 11:12 ` [PATCH v2 0/2] " Phillip Wood via GitGitGadget
2022-01-11 11:12   ` [PATCH v2 1/2] t3701: clean up hunk splitting tests Phillip Wood via GitGitGadget
2022-01-11 11:12   ` [PATCH v2 2/2] builtin add -p: fix hunk splitting Phillip Wood via GitGitGadget
2022-01-11 12:07   ` [PATCH v2 0/2] " Ævar Arnfjörð Bjarmason
2022-01-11 18:57     ` Phillip Wood
2022-01-12 18:51       ` Junio C Hamano
2022-01-19 20:01         ` Phillip Wood
2022-01-20  5:02           ` Junio C Hamano
2022-01-20  8:42             ` Ævar Arnfjörð Bjarmason
2022-01-20 19:13               ` Junio C Hamano
2022-01-22  9:05           ` Johannes Schindelin
2022-01-24 11:10             ` Phillip Wood
2022-01-12 18:34   ` Junio C Hamano

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).