git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] Fix delta integer overflows
@ 2017-08-07 18:10 Martin Koegler
  2017-08-07 18:36 ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Martin Koegler @ 2017-08-07 18:10 UTC (permalink / raw
  To: git, gitster; +Cc: Martin Koegler

From: Martin Koegler <martin.koegler@chello.at>

The current delta code produces incorrect pack objects for files > 4GB.

Signed-off-by: Martin Koegler <martin.koegler@chello.at>
---
 diff-delta.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

Just pass any file > 4 GB to the delta-compression [by increasing the delta limits].
As file size, a truncated 32bit value will be encoded, leading to broken pack files.

diff --git a/diff-delta.c b/diff-delta.c
index 3797ce6..13e5a01 100644
--- a/diff-delta.c
+++ b/diff-delta.c
@@ -319,7 +319,8 @@ create_delta(const struct delta_index *index,
 	     const void *trg_buf, unsigned long trg_size,
 	     unsigned long *delta_size, unsigned long max_size)
 {
-	unsigned int i, outpos, outsize, moff, msize, val;
+	unsigned int i, val;
+	unsigned long l, outpos, outsize, moff, msize;
 	int inscnt;
 	const unsigned char *ref_data, *ref_top, *data, *top;
 	unsigned char *out;
@@ -336,20 +337,20 @@ create_delta(const struct delta_index *index,
 		return NULL;
 
 	/* store reference buffer size */
-	i = index->src_size;
-	while (i >= 0x80) {
-		out[outpos++] = i | 0x80;
-		i >>= 7;
+	l = index->src_size;
+	while (l >= 0x80) {
+		out[outpos++] = l | 0x80;
+		l >>= 7;
 	}
-	out[outpos++] = i;
+	out[outpos++] = l;
 
 	/* store target buffer size */
-	i = trg_size;
-	while (i >= 0x80) {
-		out[outpos++] = i | 0x80;
-		i >>= 7;
+	l = trg_size;
+	while (l >= 0x80) {
+		out[outpos++] = l | 0x80;
+		l >>= 7;
 	}
-	out[outpos++] = i;
+	out[outpos++] = l;
 
 	ref_data = index->src_buf;
 	ref_top = ref_data + index->src_size;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix delta integer overflows
  2017-08-07 18:10 [PATCH] Fix delta integer overflows Martin Koegler
@ 2017-08-07 18:36 ` Junio C Hamano
  2017-08-07 19:39   ` Johannes Schindelin
  2017-08-08  1:44   ` Junio C Hamano
  0 siblings, 2 replies; 11+ messages in thread
From: Junio C Hamano @ 2017-08-07 18:36 UTC (permalink / raw
  To: Martin Koegler; +Cc: git

Martin Koegler <martin.koegler@chello.at> writes:

> From: Martin Koegler <martin.koegler@chello.at>
>
> The current delta code produces incorrect pack objects for files > 4GB.
>
> Signed-off-by: Martin Koegler <martin.koegler@chello.at>
> ---
>  diff-delta.c | 23 ++++++++++++-----------
>  1 file changed, 12 insertions(+), 11 deletions(-)
>
> Just pass any file > 4 GB to the delta-compression [by increasing the delta limits].
> As file size, a truncated 32bit value will be encoded, leading to broken pack files.

The patch obviously makes the code better and self consistent in
that "struct delta_index" has src_size as ulong, and this function
takes trg_size as ulong, and it was plain wrong for the code to
assume that "i", which is uint, can receive it safely.

In the longer term we might want to move to size_t or even
uintmax_t, as the ulong on a platform may not be long enough in
order to express the largest file size the platform can have, but
this patch (1) is good even without such a change, and (2) gives a
good foundation to build on if we want such a change on top.

Thanks.  Will queue.

>
> diff --git a/diff-delta.c b/diff-delta.c
> index 3797ce6..13e5a01 100644
> --- a/diff-delta.c
> +++ b/diff-delta.c
> @@ -319,7 +319,8 @@ create_delta(const struct delta_index *index,
>  	     const void *trg_buf, unsigned long trg_size,
>  	     unsigned long *delta_size, unsigned long max_size)
>  {
> -	unsigned int i, outpos, outsize, moff, msize, val;
> +	unsigned int i, val;
> +	unsigned long l, outpos, outsize, moff, msize;
>  	int inscnt;
>  	const unsigned char *ref_data, *ref_top, *data, *top;
>  	unsigned char *out;
> @@ -336,20 +337,20 @@ create_delta(const struct delta_index *index,
>  		return NULL;
>  
>  	/* store reference buffer size */
> -	i = index->src_size;
> -	while (i >= 0x80) {
> -		out[outpos++] = i | 0x80;
> -		i >>= 7;
> +	l = index->src_size;
> +	while (l >= 0x80) {
> +		out[outpos++] = l | 0x80;
> +		l >>= 7;
>  	}
> -	out[outpos++] = i;
> +	out[outpos++] = l;
>  
>  	/* store target buffer size */
> -	i = trg_size;
> -	while (i >= 0x80) {
> -		out[outpos++] = i | 0x80;
> -		i >>= 7;
> +	l = trg_size;
> +	while (l >= 0x80) {
> +		out[outpos++] = l | 0x80;
> +		l >>= 7;
>  	}
> -	out[outpos++] = i;
> +	out[outpos++] = l;
>  
>  	ref_data = index->src_buf;
>  	ref_top = ref_data + index->src_size;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix delta integer overflows
  2017-08-07 18:36 ` Junio C Hamano
@ 2017-08-07 19:39   ` Johannes Schindelin
  2017-08-07 19:48     ` Junio C Hamano
  2017-08-08  6:20     ` Martin Koegler
  2017-08-08  1:44   ` Junio C Hamano
  1 sibling, 2 replies; 11+ messages in thread
From: Johannes Schindelin @ 2017-08-07 19:39 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Martin Koegler, git

Hi,

On Mon, 7 Aug 2017, Junio C Hamano wrote:

> Martin Koegler <martin.koegler@chello.at> writes:
> 
> > From: Martin Koegler <martin.koegler@chello.at>
> >
> > The current delta code produces incorrect pack objects for files > 4GB.
> >
> > Signed-off-by: Martin Koegler <martin.koegler@chello.at>
> > ---
> >  diff-delta.c | 23 ++++++++++++-----------
> >  1 file changed, 12 insertions(+), 11 deletions(-)
> >
> > Just pass any file > 4 GB to the delta-compression [by increasing the delta limits].
> > As file size, a truncated 32bit value will be encoded, leading to broken pack files.
> 
> The patch obviously makes the code better and self consistent in
> that "struct delta_index" has src_size as ulong, and this function
> takes trg_size as ulong, and it was plain wrong for the code to
> assume that "i", which is uint, can receive it safely.
> 
> In the longer term we might want to move to size_t or even
> uintmax_t, as the ulong on a platform may not be long enough in
> order to express the largest file size the platform can have, but
> this patch (1) is good even without such a change, and (2) gives a
> good foundation to build on if we want such a change on top.
> 
> Thanks.  Will queue.

This is sad. There is no "may not be long enough". We already know a
platform where unsigned long is not long enough, don't we? Why leave this
patch in this intermediate state?

If you want to work on data in memory, then size_t is the appropriate data
type. We already use it elsewhere. Let's use it here, too, without the
intermediate bump from the incorrect `int` to the equally incorrect
`long`.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix delta integer overflows
  2017-08-07 19:39   ` Johannes Schindelin
@ 2017-08-07 19:48     ` Junio C Hamano
  2017-08-07 21:10       ` Johannes Schindelin
  2017-08-08  6:20     ` Martin Koegler
  1 sibling, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2017-08-07 19:48 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: Martin Koegler, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> The patch obviously makes the code better and self consistent in
>> that "struct delta_index" has src_size as ulong, and this function
>> takes trg_size as ulong, and it was plain wrong for the code to
>> assume that "i", which is uint, can receive it safely.
>> 
>> In the longer term we might want to move to size_t or even
>> uintmax_t, as the ulong on a platform may not be long enough in
>> order to express the largest file size the platform can have, but
>> this patch (1) is good even without such a change, and (2) gives a
>> good foundation to build on if we want such a change on top.
>> 
>> Thanks.  Will queue.
>
> This is sad. There is no "may not be long enough". We already know a
> platform where unsigned long is not long enough, don't we? Why leave this
> patch in this intermediate state?

This is a good foundation to build on, and I never said no further
update on top of this patch is desired.  Look for "(2)" in what you
quoted.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix delta integer overflows
  2017-08-07 19:48     ` Junio C Hamano
@ 2017-08-07 21:10       ` Johannes Schindelin
  2017-08-07 21:36         ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Schindelin @ 2017-08-07 21:10 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Martin Koegler, git

Hi Junio,

On Mon, 7 Aug 2017, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> >> The patch obviously makes the code better and self consistent in
> >> that "struct delta_index" has src_size as ulong, and this function
> >> takes trg_size as ulong, and it was plain wrong for the code to
> >> assume that "i", which is uint, can receive it safely.
> >> 
> >> In the longer term we might want to move to size_t or even
> >> uintmax_t, as the ulong on a platform may not be long enough in
> >> order to express the largest file size the platform can have, but
> >> this patch (1) is good even without such a change, and (2) gives a
> >> good foundation to build on if we want such a change on top.
> >> 
> >> Thanks.  Will queue.
> >
> > This is sad. There is no "may not be long enough". We already know a
> > platform where unsigned long is not long enough, don't we? Why leave this
> > patch in this intermediate state?
> 
> This is a good foundation to build on, and I never said no further
> update on top of this patch is desired.  Look for "(2)" in what you
> quoted.

So are you saying that starting with v2.14.0, you accept patches into `pu`
for which you would previously have required multiple iterations before
even considering it for `pu`?

Frankly, I am a bit surprised that this obvious change from `unsigned
long` to `size_t` is not required in this case before queuing, but if the
rules have changed to lower the bar for patch submissions, I am all for
it. I always felt that we are wasting contributors' time a little too
freely and too deliberately.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix delta integer overflows
  2017-08-07 21:10       ` Johannes Schindelin
@ 2017-08-07 21:36         ` Junio C Hamano
  2017-08-07 23:02           ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2017-08-07 21:36 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: Martin Koegler, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> So are you saying that starting with v2.14.0, you accept patches into `pu`
> for which you would previously have required multiple iterations before
> even considering it for `pu`?
>
> Frankly, I am a bit surprised that this obvious change from `unsigned
> long` to `size_t` is not required in this case before queuing, but if the
> rules have changed to lower the bar for patch submissions, I am all for
> it. I always felt that we are wasting contributors' time a little too
> freely and too deliberately.

This is not about where the bar is set.  It is about expectation.  I
do not expect much from occasional (or new) contributors and I try
not to demand too much from them.  The consequence is that as long
as a small patch makes things better without making anything worse,
I'd want to be inclusive to encourage them to build obvious
improvements on top.  Maybe they just want a single patch landed to
fix their immediate needs (which may be generic enough and expected
to be shared with many other people) without going further, so I may
end up queuing something that only helps 40% of people until follow
up patches are done to cover the remaining 60% of people, but that
is fine as long as the patch does not make things worse (it is not
like a patch that helps 40% while hurting the remaining 60% until
a follow-up happens).

I would expect a lot more from experienced contributors, when I know
they are capable of helping the remaining 60% inside the same series
and without requiring too much hand-holding from me.  The same thing
I cannot say to a occasional (or new) contributor---they may not be
coorperative, or they may be willing to do more but may require more
help polishing their work than the bandwidth I can afford.

So if you are volunteering to help by either guiding Martin to aim
higher and make this a series with a larger and more complete scope,
I'd very much appreciate it.  Or you can do a follow-up series on
top of what Martin posted.  Either is fine by me.  Just do not step
on each others' toes ;-)

I avoided saying all of the above because I didn't want my word
taken out of context later to make it sound as if I were belittling
the competence of Martin, but you seem to want to force me to say
this, so here it is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix delta integer overflows
  2017-08-07 21:36         ` Junio C Hamano
@ 2017-08-07 23:02           ` Junio C Hamano
  0 siblings, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2017-08-07 23:02 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: Martin Koegler, git

Junio C Hamano <gitster@pobox.com> writes:

> This is not about where the bar is set.  It is about expectation....

After having thought about this a bit more, I think in the message I
am responding to I mischaracterised the aspect of a patch that
influences the "expectation".  It is much less about who the
contributor is but more about what the patch does.  

If the patch in question were from a more experienced contributor
(like you or Peff), my internal reaction would have been "gee, the
submitter should have known better that a more complete fix should
involve a larger integral type, not stopping at matching the largest
type that happens to be used in the interface without updating the
interface".

But I still would have said that the patch is an improvement--as it
indeed is; it does not make things worse anywhere and brings in a
more consistency.  And I still would have mentioned the same "in the
longer term, we would want to use size_t or uintmax_t here, not just
ulong".

The only thing I would have done differently if the submission were
by a more experienced contributor is that I probably would have
added "yes this may be an improvement, but I expected you should
know better to at least mention the longer term direction to use
size_t or uintmax_t in the log message, even if you didn't
immediately extend this patch into a more complete series".

That one is a difference of expectation between an occasional
contributor and an experienced one.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix delta integer overflows
  2017-08-07 18:36 ` Junio C Hamano
  2017-08-07 19:39   ` Johannes Schindelin
@ 2017-08-08  1:44   ` Junio C Hamano
  2017-08-08  6:25     ` Martin Koegler
  1 sibling, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2017-08-08  1:44 UTC (permalink / raw
  To: Martin Koegler; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Martin Koegler <martin.koegler@chello.at> writes:
>
>> From: Martin Koegler <martin.koegler@chello.at>
>>
>> The current delta code produces incorrect pack objects for files > 4GB.
>>
>> Signed-off-by: Martin Koegler <martin.koegler@chello.at>
>> ---
>>  diff-delta.c | 23 ++++++++++++-----------
>>  1 file changed, 12 insertions(+), 11 deletions(-)
>>
>> Just pass any file > 4 GB to the delta-compression [by increasing the delta limits].
>> As file size, a truncated 32bit value will be encoded, leading to broken pack files.
>
> The patch obviously makes the code better and self consistent in
> that "struct delta_index" has src_size as ulong, and this function
> takes trg_size as ulong, and it was plain wrong for the code to
> assume that "i", which is uint, can receive it safely.
>
> In the longer term we might want to move to size_t or even
> uintmax_t, as the ulong on a platform may not be long enough in
> order to express the largest file size the platform can have, but
> this patch (1) is good even without such a change, and (2) gives a
> good foundation to build on if we want such a change on top.
>
> Thanks.  Will queue.

Having said that, I am a bit curious how you came to this patch.
Was the issue found by code inspection, or did you actually have a
real life use case to raise the core.bigFileThreshold configuration
to a value above 4GB?

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix delta integer overflows
  2017-08-07 19:39   ` Johannes Schindelin
  2017-08-07 19:48     ` Junio C Hamano
@ 2017-08-08  6:20     ` Martin Koegler
  2017-08-08 11:38       ` Johannes Schindelin
  1 sibling, 1 reply; 11+ messages in thread
From: Martin Koegler @ 2017-08-08  6:20 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: Junio C Hamano, Martin Koegler, git

[-- Attachment #1: Type: text/plain, Size: 811 bytes --]

On Mon, Aug 07, 2017 at 09:39:12PM +0200, Johannes Schindelin wrote:
> If you want to work on data in memory, then size_t is the appropriate data
> type. We already use it elsewhere. Let's use it here, too, without the
> intermediate bump from the incorrect `int` to the equally incorrect
> `long`.

I disagree with "We already use it elsewhere.". The whole delta code uses "unsigned long" -
look at delta.h. Look at unpack-objects.c. Or cache.h. Or pack-objects.c. Or index-pack.c.

Other possible cases:
git grep "unsigned long" |grep size

So the codebase still suggests, that "unsigned long" is the data type for storing object sizes.

I would be fine with resubmitting a patch using size_t/off_t for the touched parts - changing the whole
core code is a too invasive change for a bug fix.

Regards,
Martin

[-- Attachment #2: 0001-Fix-delta-integer-overflows.patch --]
[-- Type: text/x-diff, Size: 1599 bytes --]

From d97a7810ff679dd939972c151bcf23c122cdc570 Mon Sep 17 00:00:00 2001
From: Martin Koegler <martin.koegler@chello.at>
Date: Mon, 7 Aug 2017 20:00:30 +0200
Subject: [PATCH] Fix delta integer overflows

The current delta code produces incorrect pack objects for files > 4GB.

Signed-off-by: Martin Koegler <martin.koegler@chello.at>
---
 diff-delta.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/diff-delta.c b/diff-delta.c
index 3797ce6..cd238c8 100644
--- a/diff-delta.c
+++ b/diff-delta.c
@@ -319,7 +319,9 @@ create_delta(const struct delta_index *index,
 	     const void *trg_buf, unsigned long trg_size,
 	     unsigned long *delta_size, unsigned long max_size)
 {
-	unsigned int i, outpos, outsize, moff, msize, val;
+	unsigned int i, val;
+	off_t outpos, moff;
+	size_t l, outsize, msize;
 	int inscnt;
 	const unsigned char *ref_data, *ref_top, *data, *top;
 	unsigned char *out;
@@ -336,20 +338,20 @@ create_delta(const struct delta_index *index,
 		return NULL;
 
 	/* store reference buffer size */
-	i = index->src_size;
-	while (i >= 0x80) {
-		out[outpos++] = i | 0x80;
-		i >>= 7;
+	l = index->src_size;
+	while (l >= 0x80) {
+		out[outpos++] = l | 0x80;
+		l >>= 7;
 	}
-	out[outpos++] = i;
+	out[outpos++] = l;
 
 	/* store target buffer size */
-	i = trg_size;
-	while (i >= 0x80) {
-		out[outpos++] = i | 0x80;
-		i >>= 7;
+	l = trg_size;
+	while (l >= 0x80) {
+		out[outpos++] = l | 0x80;
+		l >>= 7;
 	}
-	out[outpos++] = i;
+	out[outpos++] = l;
 
 	ref_data = index->src_buf;
 	ref_top = ref_data + index->src_size;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix delta integer overflows
  2017-08-08  1:44   ` Junio C Hamano
@ 2017-08-08  6:25     ` Martin Koegler
  0 siblings, 0 replies; 11+ messages in thread
From: Martin Koegler @ 2017-08-08  6:25 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Martin Koegler, git

On Mon, Aug 07, 2017 at 06:44:16PM -0700, Junio C Hamano wrote:
> Having said that, I am a bit curious how you came to this patch.
> Was the issue found by code inspection, or did you actually have a
> real life use case to raise the core.bigFileThreshold configuration
> to a value above 4GB?

Real life use - tracking changes in larger files.

Raising the limit above 4GB suddenly resulted in a broken pack files in the repository and
aborts of various git commands.

Data is still recoverable with all size sanity checks disabled.

Regards,
Martin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fix delta integer overflows
  2017-08-08  6:20     ` Martin Koegler
@ 2017-08-08 11:38       ` Johannes Schindelin
  0 siblings, 0 replies; 11+ messages in thread
From: Johannes Schindelin @ 2017-08-08 11:38 UTC (permalink / raw
  To: Martin Koegler; +Cc: Junio C Hamano, git

Hi Martin,

On Tue, 8 Aug 2017, Martin Koegler wrote:

> On Mon, Aug 07, 2017 at 09:39:12PM +0200, Johannes Schindelin wrote:
> > If you want to work on data in memory, then size_t is the appropriate data
> > type. We already use it elsewhere. Let's use it here, too, without the
> > intermediate bump from the incorrect `int` to the equally incorrect
> > `long`.
> 
> I disagree with "We already use it elsewhere.".

By "it" I meant "size_t".

> The whole delta code uses "unsigned long" - look at delta.h. Look at
> unpack-objects.c. Or cache.h. Or pack-objects.c. Or index-pack.c.

I know that. It is a major bug in the source code.

> Other possible cases:
> git grep "unsigned long" |grep size

Yes, even more bugs.

> So the codebase still suggests, that "unsigned long" is the data type
> for storing object sizes.

And it would be wrong, and we know it already. Most importantly, Junio
knows it already.

> I would be fine with resubmitting a patch using size_t/off_t for the
> touched parts - changing the whole core code is a too invasive change
> for a bug fix.

Sorry, my mistake: I never meant to burden you with the invasive change.
I only wanted you to change the `int` to `size_t` right away.

Thanks,
Johannes

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-08-08 11:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-07 18:10 [PATCH] Fix delta integer overflows Martin Koegler
2017-08-07 18:36 ` Junio C Hamano
2017-08-07 19:39   ` Johannes Schindelin
2017-08-07 19:48     ` Junio C Hamano
2017-08-07 21:10       ` Johannes Schindelin
2017-08-07 21:36         ` Junio C Hamano
2017-08-07 23:02           ` Junio C Hamano
2017-08-08  6:20     ` Martin Koegler
2017-08-08 11:38       ` Johannes Schindelin
2017-08-08  1:44   ` Junio C Hamano
2017-08-08  6:25     ` Martin Koegler

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).