git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] Avoid infinite loop in malformed packfiles
@ 2020-08-23  0:52 Ori Bernstein
  2020-08-23  2:52 ` ori
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Ori Bernstein @ 2020-08-23  0:52 UTC (permalink / raw)
  To: git; +Cc: Ori Bernstein

In packfile.c:1680, there's an infinite loop that tries to get
to the base of a packfile. With offset deltas, the offset needs
to be greater than 0, so it's always walking backwards, and the
search is guaranteed to terminate.

With reference deltas, there's no check for a cycle in the
references, so a cyclic reference will cause git to loop
infinitely, growing the delta_stack infinitely, which will
cause it to consume all available memory as as a full CPU
core.

This change puts an arbitrary limit of 10,000 on the number
of iterations we make when chasing down a base commit, to
prevent looping forever, using all available memory growing
the delta stack.
---
 packfile.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/packfile.c b/packfile.c
index 6ab5233613..321e002c50 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1633,6 +1633,7 @@ static void write_pack_access_log(struct packed_git *p, off_t obj_offset)
 
 int do_check_packed_object_crc;
 
+#define UNPACK_ENTRY_STACK_LIMIT 10000
 #define UNPACK_ENTRY_STACK_PREALLOC 64
 struct unpack_entry_stack_ent {
 	off_t obj_offset;
@@ -1715,6 +1716,12 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 			break;
 		}
 
+		if (delta_stack_nr > UNPACK_ENTRY_STACK_LIMIT) {
+			error("overlong delta chain at offset %jd from %s",
+			      (uintmax_t)curpos, p->pack_name);
+			goto out;
+		}
+
 		/* push object, proceed to base */
 		if (delta_stack_nr >= delta_stack_alloc
 		    && delta_stack == small_delta_stack) {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] Avoid infinite loop in malformed packfiles
  2020-08-23  0:52 [PATCH] Avoid infinite loop in malformed packfiles Ori Bernstein
@ 2020-08-23  2:52 ` ori
  2020-08-23  3:08 ` Eric Sunshine
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 20+ messages in thread
From: ori @ 2020-08-23  2:52 UTC (permalink / raw)
  To: ori, git

> In packfile.c:1680, there's an infinite loop that tries to get
> to the base of a packfile. With offset deltas, the offset needs
> to be greater than 0, so it's always walking backwards, and the
> search is guaranteed to terminate.
> 
> With reference deltas, there's no check for a cycle in the
> references, so a cyclic reference will cause git to loop
> infinitely, growing the delta_stack infinitely, which will
> cause it to consume all available memory as as a full CPU
> core.
> 
> This change puts an arbitrary limit of 10,000 on the number
> of iterations we make when chasing down a base commit, to
> prevent looping forever, using all available memory growing
> the delta stack.

For context, I discovered this accidentally when I
introduced a bug in pack deltification in git9 (my
implementation of git for plan 9). An example of a
packfile and index that will reproduce this issue
is available here:

https://eigenstate.org/tmp/95a0f4f3f3f21d723d501552eaf22ff4055e13a4.pack
https://eigenstate.org/tmp/95a0f4f3f3f21d723d501552eaf22ff4055e13a4.idx

The suggestion to just cap the depth instead of
doing full cycle detection came from Jeff King
(peff@peff.net)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Avoid infinite loop in malformed packfiles
  2020-08-23  0:52 [PATCH] Avoid infinite loop in malformed packfiles Ori Bernstein
  2020-08-23  2:52 ` ori
@ 2020-08-23  3:08 ` Eric Sunshine
  2020-08-23  3:11 ` Ori Bernstein
  2020-08-24 20:30 ` Junio C Hamano
  3 siblings, 0 replies; 20+ messages in thread
From: Eric Sunshine @ 2020-08-23  3:08 UTC (permalink / raw)
  To: Ori Bernstein; +Cc: Git List

On Sat, Aug 22, 2020 at 8:59 PM Ori Bernstein <ori@eigenstate.org> wrote:
> In packfile.c:1680, there's an infinite loop that tries to get
> to the base of a packfile. With offset deltas, the offset needs
> to be greater than 0, so it's always walking backwards, and the
> search is guaranteed to terminate.
>
> With reference deltas, there's no check for a cycle in the
> references, so a cyclic reference will cause git to loop
> infinitely, growing the delta_stack infinitely, which will
> cause it to consume all available memory as as a full CPU
> core.
>
> This change puts an arbitrary limit of 10,000 on the number
> of iterations we make when chasing down a base commit, to
> prevent looping forever, using all available memory growing
> the delta stack.
> ---

Missing sign-off.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] Avoid infinite loop in malformed packfiles
  2020-08-23  0:52 [PATCH] Avoid infinite loop in malformed packfiles Ori Bernstein
  2020-08-23  2:52 ` ori
  2020-08-23  3:08 ` Eric Sunshine
@ 2020-08-23  3:11 ` Ori Bernstein
  2020-08-23  6:26   ` René Scharfe
  2020-08-24 17:33   ` Junio C Hamano
  2020-08-24 20:30 ` Junio C Hamano
  3 siblings, 2 replies; 20+ messages in thread
From: Ori Bernstein @ 2020-08-23  3:11 UTC (permalink / raw)
  To: git; +Cc: Ori Bernstein

In packfile.c:1680, there's an infinite loop that tries to get
to the base of a packfile. With offset deltas, the offset needs
to be greater than 0, so it's always walking backwards, and the
search is guaranteed to terminate.

With reference deltas, there's no check for a cycle in the
references, so a cyclic reference will cause git to loop
infinitely, growing the delta_stack infinitely, which will
cause it to consume all available memory as as a full CPU
core.

This change puts an arbitrary limit of 10,000 on the number
of iterations we make when chasing down a base commit, to
prevent looping forever, using all available memory growing
the delta stack.

Signed-off-by: Ori Bernstein <ori@eigenstate.org>
---
 packfile.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/packfile.c b/packfile.c
index 6ab5233613..321e002c50 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1633,6 +1633,7 @@ static void write_pack_access_log(struct packed_git *p, off_t obj_offset)
 
 int do_check_packed_object_crc;
 
+#define UNPACK_ENTRY_STACK_LIMIT 10000
 #define UNPACK_ENTRY_STACK_PREALLOC 64
 struct unpack_entry_stack_ent {
 	off_t obj_offset;
@@ -1715,6 +1716,12 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 			break;
 		}
 
+		if (delta_stack_nr > UNPACK_ENTRY_STACK_LIMIT) {
+			error("overlong delta chain at offset %jd from %s",
+			      (uintmax_t)curpos, p->pack_name);
+			goto out;
+		}
+
 		/* push object, proceed to base */
 		if (delta_stack_nr >= delta_stack_alloc
 		    && delta_stack == small_delta_stack) {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] Avoid infinite loop in malformed packfiles
  2020-08-23  3:11 ` Ori Bernstein
@ 2020-08-23  6:26   ` René Scharfe
  2020-08-23 20:41     ` Ori Bernstein
  2020-08-24 17:33   ` Junio C Hamano
  1 sibling, 1 reply; 20+ messages in thread
From: René Scharfe @ 2020-08-23  6:26 UTC (permalink / raw)
  To: Ori Bernstein, git

Am 23.08.20 um 05:11 schrieb Ori Bernstein:
> In packfile.c:1680, there's an infinite loop that tries to get
> to the base of a packfile. With offset deltas, the offset needs
> to be greater than 0, so it's always walking backwards, and the
> search is guaranteed to terminate.
>
> With reference deltas, there's no check for a cycle in the
> references, so a cyclic reference will cause git to loop
> infinitely, growing the delta_stack infinitely, which will
> cause it to consume all available memory as as a full CPU
> core.

"as as"?  Perhaps "and"?

> This change puts an arbitrary limit of 10,000 on the number
> of iterations we make when chasing down a base commit, to
> prevent looping forever, using all available memory growing
> the delta stack.
>
> Signed-off-by: Ori Bernstein <ori@eigenstate.org>
> ---
>  packfile.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/packfile.c b/packfile.c
> index 6ab5233613..321e002c50 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -1633,6 +1633,7 @@ static void write_pack_access_log(struct packed_git *p, off_t obj_offset)
>
>  int do_check_packed_object_crc;
>
> +#define UNPACK_ENTRY_STACK_LIMIT 10000

b5c0cbd8083 (pack-objects: use bitfield for object_entry::depth,
2018-04-14) limited the delta depth for new packs to 4095, so 10000
seems reasonable.  Users with unreasonable packs would need to repack
them with an older version of Git, though.  Not sure if that would
affect anyone in practice.

>  #define UNPACK_ENTRY_STACK_PREALLOC 64

Hmm, setting a hard limit may allow to allocate the whole stack on the,
ehm, stack.  That would get rid of the hybrid stack/heap allocation and
thus simplify the code a bit.  10000 entries with 24 bytes each would be
quite big, though, but that might be OK without recursion.  (And not in
this patch anyway, of course.)

>  struct unpack_entry_stack_ent {
>  	off_t obj_offset;
> @@ -1715,6 +1716,12 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
>  			break;
>  		}
>
> +		if (delta_stack_nr > UNPACK_ENTRY_STACK_LIMIT) {
> +			error("overlong delta chain at offset %jd from %s",
> +			      (uintmax_t)curpos, p->pack_name);
> +			goto out;
> +		}

Other error handlers in this loop set data to NULL.  That's actually
unnecessary because it's NULL to begin with and the loop is exited after
setting it to some other value.  So not doing it here is fine.  (And a
separate cleanup patch could remove the dead stores in the other
handlers.)

> +
>  		/* push object, proceed to base */
>  		if (delta_stack_nr >= delta_stack_alloc
>  		    && delta_stack == small_delta_stack) {
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Avoid infinite loop in malformed packfiles
  2020-08-23  6:26   ` René Scharfe
@ 2020-08-23 20:41     ` Ori Bernstein
  2020-08-24 16:06       ` René Scharfe
  0 siblings, 1 reply; 20+ messages in thread
From: Ori Bernstein @ 2020-08-23 20:41 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

On Sun, 23 Aug 2020 08:26:14 +0200, René Scharfe <l.s.r@web.de> wrote:

> Am 23.08.20 um 05:11 schrieb Ori Bernstein:
> > In packfile.c:1680, there's an infinite loop that tries to get
> > to the base of a packfile. With offset deltas, the offset needs
> > to be greater than 0, so it's always walking backwards, and the
> > search is guaranteed to terminate.
> >
> > With reference deltas, there's no check for a cycle in the
> > references, so a cyclic reference will cause git to loop
> > infinitely, growing the delta_stack infinitely, which will
> > cause it to consume all available memory as as a full CPU
> > core.
> 
> "as as"?  Perhaps "and"?

I think I meant 'As well as' -- will fix.
 
> 
> b5c0cbd8083 (pack-objects: use bitfield for object_entry::depth,
> 2018-04-14) limited the delta depth for new packs to 4095, so 10000
> seems reasonable.  Users with unreasonable packs would need to repack
> them with an older version of Git, though.  Not sure if that would
> affect anyone in practice.
> 
> >  #define UNPACK_ENTRY_STACK_PREALLOC 64
> 
> Hmm, setting a hard limit may allow to allocate the whole stack on the,
> ehm, stack.  That would get rid of the hybrid stack/heap allocation and
> thus simplify the code a bit.  10000 entries with 24 bytes each would be
> quite big, though, but that might be OK without recursion.  (And not in
> this patch anyway, of course.)
> 
> >  struct unpack_entry_stack_ent {
> >  	off_t obj_offset;
> > @@ -1715,6 +1716,12 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
> >  			break;
> >  		}
> >
> > +		if (delta_stack_nr > UNPACK_ENTRY_STACK_LIMIT) {
> > +			error("overlong delta chain at offset %jd from %s",
> > +			      (uintmax_t)curpos, p->pack_name);
> > +			goto out;
> > +		}
> 
> Other error handlers in this loop set data to NULL.  That's actually
> unnecessary because it's NULL to begin with and the loop is exited after
> setting it to some other value.  So not doing it here is fine.  (And a
> separate cleanup patch could remove the dead stores in the other
> handlers.)

Is there anything you'd like me to do in this patch, other than fixing
the typo?

-- 
    Ori Bernstein

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Avoid infinite loop in malformed packfiles
  2020-08-23 20:41     ` Ori Bernstein
@ 2020-08-24 16:06       ` René Scharfe
  2020-08-24 20:12         ` Jeff King
  0 siblings, 1 reply; 20+ messages in thread
From: René Scharfe @ 2020-08-24 16:06 UTC (permalink / raw)
  To: Ori Bernstein; +Cc: git

Am 23.08.20 um 22:41 schrieb Ori Bernstein:
> On Sun, 23 Aug 2020 08:26:14 +0200, René Scharfe <l.s.r@web.de> wrote:
>
>> Am 23.08.20 um 05:11 schrieb Ori Bernstein:
>>> In packfile.c:1680, there's an infinite loop that tries to get
>>> to the base of a packfile. With offset deltas, the offset needs
>>> to be greater than 0, so it's always walking backwards, and the
>>> search is guaranteed to terminate.
>>>
>>> With reference deltas, there's no check for a cycle in the
>>> references, so a cyclic reference will cause git to loop
>>> infinitely, growing the delta_stack infinitely, which will
>>> cause it to consume all available memory as as a full CPU
>>> core.
>>
>> "as as"?  Perhaps "and"?
>
> I think I meant 'As well as' -- will fix.
>
>>
>> b5c0cbd8083 (pack-objects: use bitfield for object_entry::depth,
>> 2018-04-14) limited the delta depth for new packs to 4095, so 10000
>> seems reasonable.  Users with unreasonable packs would need to repack
>> them with an older version of Git, though.  Not sure if that would
>> affect anyone in practice.

> Is there anything you'd like me to do in this patch, other than fixing
> the typo?

Please explain in the commit message why 10000 is a good choice for that
new limit, and what users who happen to exceed it can do to regain
access to their packed data.

René

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Avoid infinite loop in malformed packfiles
  2020-08-23  3:11 ` Ori Bernstein
  2020-08-23  6:26   ` René Scharfe
@ 2020-08-24 17:33   ` Junio C Hamano
  1 sibling, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2020-08-24 17:33 UTC (permalink / raw)
  To: Ori Bernstein; +Cc: git

Ori Bernstein <ori@eigenstate.org> writes:

> diff --git a/packfile.c b/packfile.c
> index 6ab5233613..321e002c50 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -1715,6 +1716,12 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
>  			break;
>  		}
>  
> +		if (delta_stack_nr > UNPACK_ENTRY_STACK_LIMIT) {
> +			error("overlong delta chain at offset %jd from %s",
> +			      (uintmax_t)curpos, p->pack_name);

The "j" length field is not used anywhere in the codebase for
portability concerns, I think.  "d" is for signed, but curpos
is an unsigned off_t.  I think

	"... %"PRIuMAX" from %s", (uintmax_t)curpos, ...

would match how we write this kind of thing everywhere else in the
code, e.g. showing obj_offset in packed_to_object_type() in the same
file in an error message.

> @@ -1633,6 +1633,7 @@ static void write_pack_access_log(struct packed_git *p, off_t obj_offset)
>  
>  int do_check_packed_object_crc;
>  
> +#define UNPACK_ENTRY_STACK_LIMIT 10000
>  #define UNPACK_ENTRY_STACK_PREALLOC 64
>  struct unpack_entry_stack_ent {
>  	off_t obj_offset;

What escape hatch would the end-users have when they have a
legitimate packfile that has a truly deep delta chain, by the way?

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Avoid infinite loop in malformed packfiles
  2020-08-24 16:06       ` René Scharfe
@ 2020-08-24 20:12         ` Jeff King
  2020-08-24 20:38           ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff King @ 2020-08-24 20:12 UTC (permalink / raw)
  To: René Scharfe; +Cc: Ori Bernstein, git

On Mon, Aug 24, 2020 at 06:06:27PM +0200, René Scharfe wrote:

> > Is there anything you'd like me to do in this patch, other than fixing
> > the typo?
> 
> Please explain in the commit message why 10000 is a good choice for that
> new limit, and what users who happen to exceed it can do to regain
> access to their packed data.

I think it may be worth making this a configurable value
(core.maxDeltaDepth or something). Nobody would generally need to tweak
it, but it would give an escape hatch for getting people out of a broken
situation ("git -c core.maxDeltaDepth=50000 repack" or similar).

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Avoid infinite loop in malformed packfiles
  2020-08-23  0:52 [PATCH] Avoid infinite loop in malformed packfiles Ori Bernstein
                   ` (2 preceding siblings ...)
  2020-08-23  3:11 ` Ori Bernstein
@ 2020-08-24 20:30 ` Junio C Hamano
  3 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2020-08-24 20:30 UTC (permalink / raw)
  To: Ori Bernstein; +Cc: git

Ori Bernstein <ori@eigenstate.org> writes:

> Subject: Re: [PATCH] Avoid infinite loop in malformed packfiles

Documentation/SubmittingPatches[[summary-section]]. Perhaps

    packfile: avoid infinite loop with malformed packfiles

> In packfile.c:1680, there's an infinite loop that tries to get

The line numbers can easily change.  "In packfile.c::unpack_entry(),
there is..." may be more change-resistant.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Avoid infinite loop in malformed packfiles
  2020-08-24 20:12         ` Jeff King
@ 2020-08-24 20:38           ` Junio C Hamano
  2020-08-24 20:52             ` Jeff King
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2020-08-24 20:38 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Ori Bernstein, git

Jeff King <peff@peff.net> writes:

> I think it may be worth making this a configurable value
> (core.maxDeltaDepth or something). Nobody would generally need to tweak
> it, but it would give an escape hatch for getting people out of a broken
> situation ("git -c core.maxDeltaDepth=50000 repack" or similar).

... meaning "the pack I have has overlong delta chains to read, and
I am running repack to cut these chains down to more manageable
level"?  Makes sense.

As it may be a bit tricky to figure out where we should read such a
configuration for those who are new to our codebase, here is an
illustration to give a starting point.  Docs and tests are probably
needed, too.

 cache.h       | 1 +
 config.c      | 5 +++++
 environment.c | 1 +
 packfile.c    | 6 ++++++
 4 files changed, 13 insertions(+)

diff --git a/cache.h b/cache.h
index 0290849c19..b59d43f0ec 100644
--- a/cache.h
+++ b/cache.h
@@ -919,6 +919,7 @@ extern int minimum_abbrev, default_abbrev;
 extern int ignore_case;
 extern int assume_unchanged;
 extern int prefer_symlink_refs;
+extern int max_allowed_delta_depth;
 extern int warn_ambiguous_refs;
 extern int warn_on_object_refname_ambiguity;
 extern const char *apply_default_whitespace;
diff --git a/config.c b/config.c
index 2b79fe76ad..5f9114f847 100644
--- a/config.c
+++ b/config.c
@@ -1197,6 +1197,11 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.maxalloweddeltadepth")) {
+		max_allowed_delta_depth = git_config_int(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.logallrefupdates")) {
 		if (value && !strcasecmp(value, "always"))
 			log_all_ref_updates = LOG_REFS_ALWAYS;
diff --git a/environment.c b/environment.c
index 52e0c979ba..d3f9a10799 100644
--- a/environment.c
+++ b/environment.c
@@ -27,6 +27,7 @@ int minimum_abbrev = 4, default_abbrev = -1;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
+int max_allowed_delta_depth = 10000;
 int is_bare_repository_cfg = -1; /* unspecified */
 int warn_ambiguous_refs = 1;
 int warn_on_object_refname_ambiguity = 1;
diff --git a/packfile.c b/packfile.c
index 6ab5233613..2ea24a19dd 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1715,6 +1715,12 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 			break;
 		}
 
+		if (max_allowed_delta_depth < delta_stack_nr) {
+			error("overlong delta chain at offset %"PRIuMAX" from %s",
+			      (uintmax_t)curpos, p->pack_name);
+			goto out;
+		}
+
 		/* push object, proceed to base */
 		if (delta_stack_nr >= delta_stack_alloc
 		    && delta_stack == small_delta_stack) {






^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] Avoid infinite loop in malformed packfiles
  2020-08-24 20:38           ` Junio C Hamano
@ 2020-08-24 20:52             ` Jeff King
  2020-08-24 21:22               ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff King @ 2020-08-24 20:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, Ori Bernstein, git

On Mon, Aug 24, 2020 at 01:38:35PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I think it may be worth making this a configurable value
> > (core.maxDeltaDepth or something). Nobody would generally need to tweak
> > it, but it would give an escape hatch for getting people out of a broken
> > situation ("git -c core.maxDeltaDepth=50000 repack" or similar).
> 
> ... meaning "the pack I have has overlong delta chains to read, and
> I am running repack to cut these chains down to more manageable
> level"?  Makes sense.

Exactly.

> As it may be a bit tricky to figure out where we should read such a
> configuration for those who are new to our codebase, here is an
> illustration to give a starting point.  Docs and tests are probably
> needed, too.

It may be hard to test, as I suspect modern versions of Git are not
happy to create such a deep chain. We could test with a lowered value of
the config option, though.

It may also be worth introducing a true cycle using non-git commands.
There's some coverage there in t/t5309-pack-delta-cycles.sh. I think we
were mainly concerned there with how index-pack treats them, and it
would be nice to see how other commands react. Though I guess that
creates another testing difficulty: those other commands would need a
pack index, and we'd refuse to create one. :) So I think it would
require adding code to manually create a bogus idx file (or I guess
shipping one as a fixture).

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Avoid infinite loop in malformed packfiles
  2020-08-24 20:52             ` Jeff King
@ 2020-08-24 21:22               ` Junio C Hamano
  2020-08-30  3:33                 ` ori
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2020-08-24 21:22 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Ori Bernstein, git

Jeff King <peff@peff.net> writes:

> It may be hard to test, as I suspect modern versions of Git are not
> happy to create such a deep chain. We could test with a lowered value of
> the config option, though.

Yes, that was what I meant.  Start from a 1KB text, create 50
revisions of the file by adding a single line at its end at a time,
pack with depth limit of 100, and then see "git log -p" die when the
allowed max lowered to 10, or something like that.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Avoid infinite loop in malformed packfiles
  2020-08-24 21:22               ` Junio C Hamano
@ 2020-08-30  3:33                 ` ori
  2020-08-30 10:56                   ` René Scharfe
  0 siblings, 1 reply; 20+ messages in thread
From: ori @ 2020-08-30  3:33 UTC (permalink / raw)
  To: gitster, peff; +Cc: l.s.r, ori, git

> Jeff King <peff@peff.net> writes:
> 
>> It may be hard to test, as I suspect modern versions of Git are not
>> happy to create such a deep chain. We could test with a lowered value of
>> the config option, though.
> 
> Yes, that was what I meant.  Start from a 1KB text, create 50
> revisions of the file by adding a single line at its end at a time,
> pack with depth limit of 100, and then see "git log -p" die when the
> allowed max lowered to 10, or something like that.

Sorry about the delay -- most of my time to poke at this is over the weekend.

Will that work? I'd expect that modern pack files end up being
offset deltas, rather than reference deltas.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Avoid infinite loop in malformed packfiles
  2020-08-30  3:33                 ` ori
@ 2020-08-30 10:56                   ` René Scharfe
  2020-08-30 16:15                     ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: René Scharfe @ 2020-08-30 10:56 UTC (permalink / raw)
  To: ori, gitster, peff; +Cc: git

Am 30.08.20 um 05:33 schrieb ori@eigenstate.org:
>> Jeff King <peff@peff.net> writes:
>>
>>> It may be hard to test, as I suspect modern versions of Git are not
>>> happy to create such a deep chain. We could test with a lowered value of
>>> the config option, though.
>>
>> Yes, that was what I meant.  Start from a 1KB text, create 50
>> revisions of the file by adding a single line at its end at a time,
>> pack with depth limit of 100, and then see "git log -p" die when the
>> allowed max lowered to 10, or something like that.
>
> Sorry about the delay -- most of my time to poke at this is over the weekend.
>
> Will that work? I'd expect that modern pack files end up being
> offset deltas, rather than reference deltas.

True, but going down all the way would work:

diff --git a/t/t5316-pack-delta-depth.sh b/t/t5316-pack-delta-depth.sh
index 0f06c40eb1..7fd21cd3ce 100755
--- a/t/t5316-pack-delta-depth.sh
+++ b/t/t5316-pack-delta-depth.sh
@@ -94,4 +94,15 @@ test_expect_success '--depth limits depth' '
 	test_i18ncmp expect actual
 '

+test_expect_success 'maxAllowedDeltaDepth is respected' '
+	git clone . clone1 &&
+	(
+		cd clone1 &&
+		git repack -a -d &&
+		test_config core.maxAllowedDeltaDepth 0 &&
+		test_must_fail git fsck 2>err &&
+		test_i18ngrep "overlong delta chain" err
+	)
+'
+
 test_done



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] Avoid infinite loop in malformed packfiles
  2020-08-30 10:56                   ` René Scharfe
@ 2020-08-30 16:15                     ` Junio C Hamano
  2020-08-31  9:29                       ` Jeff King
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2020-08-30 16:15 UTC (permalink / raw)
  To: René Scharfe; +Cc: ori, peff, git

René Scharfe <l.s.r@web.de> writes:

>> Will that work? I'd expect that modern pack files end up being
>> offset deltas, rather than reference deltas.
>
> True, but going down all the way would work:

Perhaps, but I'd rather use pack-objects to prepare the repository
with no-delta-base-offset to force ref deltas.

> diff --git a/t/t5316-pack-delta-depth.sh b/t/t5316-pack-delta-depth.sh
> index 0f06c40eb1..7fd21cd3ce 100755
> --- a/t/t5316-pack-delta-depth.sh
> +++ b/t/t5316-pack-delta-depth.sh
> @@ -94,4 +94,15 @@ test_expect_success '--depth limits depth' '
>  	test_i18ncmp expect actual
>  '
>
> +test_expect_success 'maxAllowedDeltaDepth is respected' '
> +	git clone . clone1 &&
> +	(
> +		cd clone1 &&
> +		git repack -a -d &&
> +		test_config core.maxAllowedDeltaDepth 0 &&
> +		test_must_fail git fsck 2>err &&
> +		test_i18ngrep "overlong delta chain" err
> +	)
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Avoid infinite loop in malformed packfiles
  2020-08-30 16:15                     ` Junio C Hamano
@ 2020-08-31  9:29                       ` Jeff King
  2020-08-31 16:32                         ` Junio C Hamano
  2020-08-31 16:50                         ` ori
  0 siblings, 2 replies; 20+ messages in thread
From: Jeff King @ 2020-08-31  9:29 UTC (permalink / raw)
  To: ori; +Cc: Junio C Hamano, René Scharfe, git

On Sun, Aug 30, 2020 at 09:15:10AM -0700, Junio C Hamano wrote:

> René Scharfe <l.s.r@web.de> writes:
> 
> >> Will that work? I'd expect that modern pack files end up being
> >> offset deltas, rather than reference deltas.
> >
> > True, but going down all the way would work:
> 
> Perhaps, but I'd rather use pack-objects to prepare the repository
> with no-delta-base-offset to force ref deltas.

Yeah, that seems like a much better test setup.

It does raise an interesting question, though. I had imagined we would
limit the depth of all delta chains here, not just ref-deltas. But it is
true that ofs deltas can't cycle. Without cycles, neither type can go on
indefinitely (they are limited by the number of entries in the
packfile). I could see arguments going either way:

  - ofs deltas cannot cycle, so we do not need a counter that limits
    them (and which _could_ find a false positive). So we should not
    limit them.

  - a counter is preventing us from following cycles indefinitely, but
    also hardening us against misbehavior due to bugs or insanely large
    delta chains (intentional or not). So we should include ofs deltas
    in our limit.

A related point is that delta chains might be composed of both types. If
we don't differentiate between the two types, then the limit is clearly
total chain length. If we do, then is the limit the total number of
ref-deltas found in the current lookup, or is it the number of
consecutive ref-deltas? I guess it would have to be the former if our
goal is to catch cycles (since a cycle could include an ofs-delta, as
long as a ref-delta is the part that forms the loop).

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Avoid infinite loop in malformed packfiles
  2020-08-31  9:29                       ` Jeff King
@ 2020-08-31 16:32                         ` Junio C Hamano
  2020-08-31 19:23                           ` Jeff King
  2020-08-31 16:50                         ` ori
  1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2020-08-31 16:32 UTC (permalink / raw)
  To: Jeff King; +Cc: ori, René Scharfe, git

Jeff King <peff@peff.net> writes:

> It does raise an interesting question, though. I had imagined we would
> limit the depth of all delta chains here, not just ref-deltas. But it is
> true that ofs deltas can't cycle. Without cycles, neither type can go on
> indefinitely (they are limited by the number of entries in the
> packfile). I could see arguments going either way:
>
>   - ofs deltas cannot cycle, so we do not need a counter that limits
>     them (and which _could_ find a false positive). So we should not
>     limit them.
>
>   - a counter is preventing us from following cycles indefinitely, but
>     also hardening us against misbehavior due to bugs or insanely large
>     delta chains (intentional or not). So we should include ofs deltas
>     in our limit.

A chain can have both types, so I am fuzzy how the counting would
go.  We just do not count ofs_delta at all and only count ref_delta
we've seen during the recursion?

> A related point is that delta chains might be composed of both types. If
> we don't differentiate between the two types, then the limit is clearly
> total chain length. If we do, then is the limit the total number of
> ref-deltas found in the current lookup, or is it the number of
> consecutive ref-deltas? I guess it would have to be the former if our
> goal is to catch cycles (since a cycle could include an ofs-delta, as
> long as a ref-delta is the part that forms the loop).

Ah, OK, you've thought about it already.

I wonder we can just count both and limit the chain length to the
total number of objects in the pack we are currently looking at?  It
guarantees to catch any cycle as long as pack is not thin, but is
that too lenient and likely to bust the stack while counting?  On
the other side of the coin, we saw 10000 as a hard-coded limit in
the patch, but do we know 10000 is low enough that most boxes have
no trouble recursing that deep?

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Avoid infinite loop in malformed packfiles
  2020-08-31  9:29                       ` Jeff King
  2020-08-31 16:32                         ` Junio C Hamano
@ 2020-08-31 16:50                         ` ori
  1 sibling, 0 replies; 20+ messages in thread
From: ori @ 2020-08-31 16:50 UTC (permalink / raw)
  To: peff, ori; +Cc: gitster, l.s.r, git

> On Sun, Aug 30, 2020 at 09:15:10AM -0700, Junio C Hamano wrote:
> 
>> René Scharfe <l.s.r@web.de> writes:
>> 
>> >> Will that work? I'd expect that modern pack files end up being
>> >> offset deltas, rather than reference deltas.
>> >
>> > True, but going down all the way would work:
>> 
>> Perhaps, but I'd rather use pack-objects to prepare the repository
>> with no-delta-base-offset to force ref deltas.
> 
> Yeah, that seems like a much better test setup.
> 
> It does raise an interesting question, though. I had imagined we would
> limit the depth of all delta chains here, not just ref-deltas. But it is
> true that ofs deltas can't cycle. Without cycles, neither type can go on
> indefinitely (they are limited by the number of entries in the
> packfile). I could see arguments going either way:

Yeah -- that's what I'd implemented. I was just thinking that I'd want to
test the issue that caused the problem in the first place, but it's the
same code path either way.

I like the idea of limiting to the total number of objects in the
pack. If we do that, we don't need a knob at all, since if we need
more objects in the stack than are in the pack, it's obviously
invalid.

That does eliminate an obvious way to test things, and we'd need
to provide in an invalid pack file.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Avoid infinite loop in malformed packfiles
  2020-08-31 16:32                         ` Junio C Hamano
@ 2020-08-31 19:23                           ` Jeff King
  0 siblings, 0 replies; 20+ messages in thread
From: Jeff King @ 2020-08-31 19:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: ori, René Scharfe, git

On Mon, Aug 31, 2020 at 09:32:27AM -0700, Junio C Hamano wrote:

> > A related point is that delta chains might be composed of both types. If
> > we don't differentiate between the two types, then the limit is clearly
> > total chain length. If we do, then is the limit the total number of
> > ref-deltas found in the current lookup, or is it the number of
> > consecutive ref-deltas? I guess it would have to be the former if our
> > goal is to catch cycles (since a cycle could include an ofs-delta, as
> > long as a ref-delta is the part that forms the loop).
> 
> Ah, OK, you've thought about it already.
> 
> I wonder we can just count both and limit the chain length to the
> total number of objects in the pack we are currently looking at? 

That's an interesting suggestion. Within a single pack, it does prevent
cycles, and it does so without needing a separate knob, which is nice.

As you note, it only works as long as packs aren't thin. That shouldn't
matter for the current scheme (where all on-disk packs are
self-contained with respect to deltas), but I do wonder if we'll
eventually want to support on-disk thin packs (coupled with a
multi-pack-index, that eliminates most of the reason that one needs
repack existing objects; it's probably a necessary step in scaling to
repos with hundreds of millions of objects). We could still auto-bound
it with the total number of packed objects in the repository, though.

> It
> guarantees to catch any cycle as long as pack is not thin, but is
> that too lenient and likely to bust the stack while counting?  On
> the other side of the coin, we saw 10000 as a hard-coded limit in
> the patch, but do we know 10000 is low enough that most boxes have
> no trouble recursing that deep?

I don't think we have to worry about stack size. We already ran into
stack-busting problems with non-broken cases. ;) That led to 790d96c023
(sha1_file: remove recursion in packed_object_info, 2013-03-25) using
its own stack.

I do wonder about CPU, though. We might have tens of millions of objects
in a single pack file. How long does it take to convince ourselves we're
cycling (even if the cycle itself might only involve a handful of
objects)? I'm not sure we care too much about this being a fast
operation (after all, the point is that it should never happen and we're
just trying not to spin forever). But if it takes 60 minutes to detect
the cycle, from a user's perspective that might not be any different
than an infinite loop.

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-08-31 19:23 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-23  0:52 [PATCH] Avoid infinite loop in malformed packfiles Ori Bernstein
2020-08-23  2:52 ` ori
2020-08-23  3:08 ` Eric Sunshine
2020-08-23  3:11 ` Ori Bernstein
2020-08-23  6:26   ` René Scharfe
2020-08-23 20:41     ` Ori Bernstein
2020-08-24 16:06       ` René Scharfe
2020-08-24 20:12         ` Jeff King
2020-08-24 20:38           ` Junio C Hamano
2020-08-24 20:52             ` Jeff King
2020-08-24 21:22               ` Junio C Hamano
2020-08-30  3:33                 ` ori
2020-08-30 10:56                   ` René Scharfe
2020-08-30 16:15                     ` Junio C Hamano
2020-08-31  9:29                       ` Jeff King
2020-08-31 16:32                         ` Junio C Hamano
2020-08-31 19:23                           ` Jeff King
2020-08-31 16:50                         ` ori
2020-08-24 17:33   ` Junio C Hamano
2020-08-24 20:30 ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).