git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Iffy output given git diff --unified=2147483647
       [not found] <xXWgbH3mlNEvFcdGLqBHwcclZoeZNPoLg8Hr6YCipHXvS5eKaHeTppzFM-l_wyB46BB1R1T0j6g_jWRXIj7-GRJh1LPxi1ta3GkQ5t8F4-0=@proton.me>
@ 2025-03-12 15:51 ` Jason Cho
  2025-03-14 22:00   ` [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk() René Scharfe
  0 siblings, 1 reply; 7+ messages in thread
From: Jason Cho @ 2025-03-12 15:51 UTC (permalink / raw)
  To: git@vger.kernel.org

> $ git --versiongit version 2.47.0.windows.2
> 
> $ git diff --unified=2147483647 1.txt 2.txt
> diff --git a/1.txt b/2.txt
> index e53eaa1..1130481 100644
> --- a/1.txt
> +++ b/2.txt
> @@ -1,10 +1,10 @@
>  1
> -2
> +a
>  3
>  4
>  5
>  6
>  7
>  8
>  a
>  0
> @@ -1,10 +1,10 @@
>  1
>  a
>  3
>  4
>  5
>  6
>  7
>  8
> -9
> +a
>  0
> 
> $ git diff --unified=3 1.txt 2.txt
> diff --git a/1.txt b/2.txt
> index e53eaa1..1130481 100644
> --- a/1.txt
> +++ b/2.txt
> @@ -1,10 +1,10 @@
>  1
> -2
> +a
>  3
>  4
>  5
>  6
>  7
>  8
> -9
> +a
>  0
> 
> $ diff --version
> diff (GNU diffutils) 3.10
> Copyright (C) 2023 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> 
> Written by Paul Eggert, Mike Haertel, David Hayes,
> Richard Stallman, and Len Tower.
> 
> $ diff  --unified=2147483647 1.txt 2.txt
> --- 1.txt       2025-03-12 16:04:06.947099900 +0100
> +++ 2.txt       2025-03-12 16:04:27.131732400 +0100
> @@ -1,10 +1,10 @@
>  1
> -2
> +a
>  3
>  4
>  5
>  6
>  7
>  8
> -9
> +a
>  0


Please see the above command line output. I run this on Windows with git for windows, but the problem should apply for other platforms. The version of my git is 2.47.

I prepare two files, I run GNU diff  --unified=2147483647 1.txt 2.txt, the output is correct. Then I run git diff with --unified=2147483647, the context of the second hunk is repeated, which is unexpected.

I investigated it and found the repetition is due to an overflow issue in   xdiff/xemit.c.


> xdchange_t *xdl_get_hunk(xdchange_t **xscr, xdemitconf_t const *xecfg){
> xdchange_t *xch, *xchp, *lxch;
> long max_common = 2 * xecfg->ctxlen + xecfg->interhunkctxlen;  <- ----
> ...
> }


The documentation https://git-scm.com/docs/git-diff doesn't say the range of --unified. Even if its max value is INT_MAX, 2147483647 is in the range.

Can you guys clarify the correct range of --unified? If my value 2147483647 is in range, git diff should output a diff without the strange repetition. Please fix it.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk()
  2025-03-12 15:51 ` Iffy output given git diff --unified=2147483647 Jason Cho
@ 2025-03-14 22:00   ` René Scharfe
  2025-03-14 22:28     ` Junio C Hamano
  2025-03-14 23:14     ` Jason Cho
  0 siblings, 2 replies; 7+ messages in thread
From: René Scharfe @ 2025-03-14 22:00 UTC (permalink / raw)
  To: Jason Cho, git@vger.kernel.org

xdl_get_hunk() calculates the maximum number of common lines between two
changes that would fit into the same hunk for the given context options.
It involves doubling and addition and thus can overflow if the terms are
huge.

The type of ctxlen and interhunkctxlen in xdemitconf_t is long, while
the type of the corresponding context and interhunkcontext in struct
diff_options is int.  On many platforms longs are bigger that ints,
which prevents the overflow.  On Windows they have the same range and
the overflow manifests as hunks that are split erroneously and lines
being repeated between them.

Fix the overflow by checking and not going beyond LONG_MAX.  This allows
specifying a huge context line count and getting all lines of a changed
files in a single hunk, as expected.

Reported-by: Jason Cho <jason11choca@proton.me>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 t/t4055-diff-context.sh | 10 ++++++++++
 xdiff/xemit.c           |  8 +++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/t/t4055-diff-context.sh b/t/t4055-diff-context.sh
index f7ff234cf9..ec2804eea6 100755
--- a/t/t4055-diff-context.sh
+++ b/t/t4055-diff-context.sh
@@ -89,4 +89,14 @@ test_expect_success '-U0 is valid, so is diff.context=0' '
 	grep "^+MODIFIED" output
 '

+test_expect_success '-U2147483647 works' '
+	echo APPENDED >>x &&
+	test_line_count = 16 x &&
+	git diff -U2147483647 >output &&
+	test_line_count = 22 output &&
+	grep "^-ADDED" output &&
+	grep "^+MODIFIED" output &&
+	grep "^+APPENDED" output
+'
+
 test_done
diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index f8e3f25b03..1d40c9cb40 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -43,6 +43,10 @@ static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *
 	return 0;
 }

+static long saturating_add(long a, long b)
+{
+	return signed_add_overflows(a, b) ? LONG_MAX : a + b;
+}

 /*
  * Starting at the passed change atom, find the latest change atom to be included
@@ -52,7 +56,9 @@ static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *
 xdchange_t *xdl_get_hunk(xdchange_t **xscr, xdemitconf_t const *xecfg)
 {
 	xdchange_t *xch, *xchp, *lxch;
-	long max_common = 2 * xecfg->ctxlen + xecfg->interhunkctxlen;
+	long max_common = saturating_add(saturating_add(xecfg->ctxlen,
+							xecfg->ctxlen),
+					 xecfg->interhunkctxlen);
 	long max_ignorable = xecfg->ctxlen;
 	long ignored = 0; /* number of ignored blank lines */

--
2.48.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk()
  2025-03-14 22:00   ` [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk() René Scharfe
@ 2025-03-14 22:28     ` Junio C Hamano
  2025-03-15  6:38       ` René Scharfe
  2025-03-14 23:14     ` Jason Cho
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2025-03-14 22:28 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jason Cho, git@vger.kernel.org

René Scharfe <l.s.r@web.de> writes:

> xdl_get_hunk() calculates the maximum number of common lines between two
> changes that would fit into the same hunk for the given context options.
> It involves doubling and addition and thus can overflow if the terms are
> huge.
>
> The type of ctxlen and interhunkctxlen in xdemitconf_t is long, while
> the type of the corresponding context and interhunkcontext in struct
> diff_options is int.  On many platforms longs are bigger that ints,
> which prevents the overflow.  On Windows they have the same range and
> the overflow manifests as hunks that are split erroneously and lines
> being repeated between them.
>
> Fix the overflow by checking and not going beyond LONG_MAX.  This allows
> specifying a huge context line count and getting all lines of a changed
> files in a single hunk, as expected.
>
> Reported-by: Jason Cho <jason11choca@proton.me>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  t/t4055-diff-context.sh | 10 ++++++++++
>  xdiff/xemit.c           |  8 +++++++-
>  2 files changed, 17 insertions(+), 1 deletion(-)

Oh, I love a patch like this that is well thought out to carefully
check the bounds, instead of blindly say "ah, counting number of
things in size_t solves everything" ;-)

> diff --git a/xdiff/xemit.c b/xdiff/xemit.c
> index f8e3f25b03..1d40c9cb40 100644
> --- a/xdiff/xemit.c
> +++ b/xdiff/xemit.c
> @@ -43,6 +43,10 @@ static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *
>  	return 0;
>  }
>
> +static long saturating_add(long a, long b)
> +{
> +	return signed_add_overflows(a, b) ? LONG_MAX : a + b;
> +}
>
>  /*
>   * Starting at the passed change atom, find the latest change atom to be included
> @@ -52,7 +56,9 @@ static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *
>  xdchange_t *xdl_get_hunk(xdchange_t **xscr, xdemitconf_t const *xecfg)
>  {
>  	xdchange_t *xch, *xchp, *lxch;
> -	long max_common = 2 * xecfg->ctxlen + xecfg->interhunkctxlen;
> +	long max_common = saturating_add(saturating_add(xecfg->ctxlen,
> +							xecfg->ctxlen),
> +					 xecfg->interhunkctxlen);

Looking good.

Thanks.  Will queue.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk()
  2025-03-14 22:00   ` [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk() René Scharfe
  2025-03-14 22:28     ` Junio C Hamano
@ 2025-03-14 23:14     ` Jason Cho
  1 sibling, 0 replies; 7+ messages in thread
From: Jason Cho @ 2025-03-14 23:14 UTC (permalink / raw)
  To: René Scharfe; +Cc: git@vger.kernel.org

Dear René,

What a thorough analysis. I didn't know this bug is only on Windows.

Thank you for submitting the fix. I am excited to see it in a new release.

For now I will cherrypick your patch to my fork of git.

Best regards,
Jason Cho


On Friday, March 14th, 2025 at 11:00 PM, René Scharfe <l.s.r@web.de> wrote:

> xdl_get_hunk() calculates the maximum number of common lines between two
> changes that would fit into the same hunk for the given context options.
> It involves doubling and addition and thus can overflow if the terms are
> huge.
> 
> The type of ctxlen and interhunkctxlen in xdemitconf_t is long, while
> the type of the corresponding context and interhunkcontext in struct
> diff_options is int. On many platforms longs are bigger that ints,
> which prevents the overflow. On Windows they have the same range and
> the overflow manifests as hunks that are split erroneously and lines
> being repeated between them.
> 
> Fix the overflow by checking and not going beyond LONG_MAX. This allows
> specifying a huge context line count and getting all lines of a changed
> files in a single hunk, as expected.
> 
> Reported-by: Jason Cho jason11choca@proton.me
> 
> Signed-off-by: René Scharfe l.s.r@web.de
> 
> ---
> t/t4055-diff-context.sh | 10 ++++++++++
> xdiff/xemit.c | 8 +++++++-
> 2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/t/t4055-diff-context.sh b/t/t4055-diff-context.sh
> index f7ff234cf9..ec2804eea6 100755
> --- a/t/t4055-diff-context.sh
> +++ b/t/t4055-diff-context.sh
> @@ -89,4 +89,14 @@ test_expect_success '-U0 is valid, so is diff.context=0' '
> grep "^+MODIFIED" output
> '
> 
> +test_expect_success '-U2147483647 works' '
> + echo APPENDED >>x &&
> 
> + test_line_count = 16 x &&
> + git diff -U2147483647 >output &&
> 
> + test_line_count = 22 output &&
> + grep "^-ADDED" output &&
> + grep "^+MODIFIED" output &&
> + grep "^+APPENDED" output
> +'
> +
> test_done
> diff --git a/xdiff/xemit.c b/xdiff/xemit.c
> index f8e3f25b03..1d40c9cb40 100644
> --- a/xdiff/xemit.c
> +++ b/xdiff/xemit.c
> @@ -43,6 +43,10 @@ static int xdl_emit_record(xdfile_t *xdf, long ri, char const pre, xdemitcb_t *
> return 0;
> }
> 
> +static long saturating_add(long a, long b)
> +{
> + return signed_add_overflows(a, b) ? LONG_MAX : a + b;
> +}
> 
> /
> * Starting at the passed change atom, find the latest change atom to be included
> @@ -52,7 +56,9 @@ static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *
> xdchange_t *xdl_get_hunk(xdchange_t **xscr, xdemitconf_t const *xecfg)
> {
> xdchange_t *xch, *xchp, *lxch;
> - long max_common = 2 * xecfg->ctxlen + xecfg->interhunkctxlen;
> 
> + long max_common = saturating_add(saturating_add(xecfg->ctxlen,
> 
> + xecfg->ctxlen),
> 
> + xecfg->interhunkctxlen);
> 
> long max_ignorable = xecfg->ctxlen;
> 
> long ignored = 0; /* number of ignored blank lines */
> 
> --
> 2.48.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk()
  2025-03-14 22:28     ` Junio C Hamano
@ 2025-03-15  6:38       ` René Scharfe
  2025-03-16 19:53         ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: René Scharfe @ 2025-03-15  6:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jason Cho, git@vger.kernel.org

Am 14.03.25 um 23:28 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>>  t/t4055-diff-context.sh | 10 ++++++++++
>>  xdiff/xemit.c           |  8 +++++++-
>>  2 files changed, 17 insertions(+), 1 deletion(-)
>
> Oh, I love a patch like this that is well thought out to carefully
> check the bounds, instead of blindly say "ah, counting number of
> things in size_t solves everything" ;-)

Converting xdiff from long to size_t is still a good idea, I think, but
would be lot more effort and thus more risky.  Comparisons to upstream
would become a lot more noisy as well.

René



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk()
  2025-03-15  6:38       ` René Scharfe
@ 2025-03-16 19:53         ` Junio C Hamano
  2025-03-17 16:50           ` René Scharfe
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2025-03-16 19:53 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jason Cho, git@vger.kernel.org

René Scharfe <l.s.r@web.de> writes:

> Am 14.03.25 um 23:28 schrieb Junio C Hamano:
>> René Scharfe <l.s.r@web.de> writes:
>>
>>>  t/t4055-diff-context.sh | 10 ++++++++++
>>>  xdiff/xemit.c           |  8 +++++++-
>>>  2 files changed, 17 insertions(+), 1 deletion(-)
>>
>> Oh, I love a patch like this that is well thought out to carefully
>> check the bounds, instead of blindly say "ah, counting number of
>> things in size_t solves everything" ;-)
>
> Converting xdiff from long to size_t is still a good idea, I think, ...

Oh, no question about that, especially if the number of counted
things are somehow proportionally related to the size of in-core
memory regions in any way.

What I do *not* like is the recent trend in the patches I see.  They
stop thinking there once they blindly replace int or unsigned or
whatever with size_t and the compiler stops warning.  Even though
compiler warnings can be a useful tool when there is very little
false positives, they are merely tools to improve the code, but I
see more and more confused patches that seem to think squelching
warnings is the goal in itself, without thinking if the resulting
code is actually improved.

And I didn't see that in this patch.  The patch was actually written
with real goal of improving the code in mind.

> but
> would be lot more effort and thus more risky.  

Perhaps, perhaps not.

> Comparisons to upstream
> would become a lot more noisy as well.

I am not sure how much of that matters these days, though.  Are they
still active, or is the code perfect and pretty much done?  I somehow
had the impression it has been the latter for a long time...

Thanks.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk()
  2025-03-16 19:53         ` Junio C Hamano
@ 2025-03-17 16:50           ` René Scharfe
  0 siblings, 0 replies; 7+ messages in thread
From: René Scharfe @ 2025-03-17 16:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jason Cho, git@vger.kernel.org

Am 16.03.25 um 20:53 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> Comparisons to upstream
>> would become a lot more noisy as well.
>
> I am not sure how much of that matters these days, though.  Are they
> still active, or is the code perfect and pretty much done?  I somehow
> had the impression it has been the latter for a long time...

http://www.xmailserver.org/xdiff-lib.html offers libxdiff-0.23.tar.gz,
whose entries bear the timestamp 2008-11-12.  Solid.

René



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-03-17 16:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <xXWgbH3mlNEvFcdGLqBHwcclZoeZNPoLg8Hr6YCipHXvS5eKaHeTppzFM-l_wyB46BB1R1T0j6g_jWRXIj7-GRJh1LPxi1ta3GkQ5t8F4-0=@proton.me>
2025-03-12 15:51 ` Iffy output given git diff --unified=2147483647 Jason Cho
2025-03-14 22:00   ` [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk() René Scharfe
2025-03-14 22:28     ` Junio C Hamano
2025-03-15  6:38       ` René Scharfe
2025-03-16 19:53         ` Junio C Hamano
2025-03-17 16:50           ` René Scharfe
2025-03-14 23:14     ` Jason Cho

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).