git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: ori@eigenstate.org, "René Scharfe" <l.s.r@web.de>, git@vger.kernel.org
Subject: Re: [PATCH] Avoid infinite loop in malformed packfiles
Date: Mon, 31 Aug 2020 15:23:02 -0400	[thread overview]
Message-ID: <20200831192302.GA2819760@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqqk0xehj38.fsf@gitster.c.googlers.com>

On Mon, Aug 31, 2020 at 09:32:27AM -0700, Junio C Hamano wrote:

> > A related point is that delta chains might be composed of both types. If
> > we don't differentiate between the two types, then the limit is clearly
> > total chain length. If we do, then is the limit the total number of
> > ref-deltas found in the current lookup, or is it the number of
> > consecutive ref-deltas? I guess it would have to be the former if our
> > goal is to catch cycles (since a cycle could include an ofs-delta, as
> > long as a ref-delta is the part that forms the loop).
> 
> Ah, OK, you've thought about it already.
> 
> I wonder we can just count both and limit the chain length to the
> total number of objects in the pack we are currently looking at? 

That's an interesting suggestion. Within a single pack, it does prevent
cycles, and it does so without needing a separate knob, which is nice.

As you note, it only works as long as packs aren't thin. That shouldn't
matter for the current scheme (where all on-disk packs are
self-contained with respect to deltas), but I do wonder if we'll
eventually want to support on-disk thin packs (coupled with a
multi-pack-index, that eliminates most of the reason that one needs
repack existing objects; it's probably a necessary step in scaling to
repos with hundreds of millions of objects). We could still auto-bound
it with the total number of packed objects in the repository, though.

> It
> guarantees to catch any cycle as long as pack is not thin, but is
> that too lenient and likely to bust the stack while counting?  On
> the other side of the coin, we saw 10000 as a hard-coded limit in
> the patch, but do we know 10000 is low enough that most boxes have
> no trouble recursing that deep?

I don't think we have to worry about stack size. We already ran into
stack-busting problems with non-broken cases. ;) That led to 790d96c023
(sha1_file: remove recursion in packed_object_info, 2013-03-25) using
its own stack.

I do wonder about CPU, though. We might have tens of millions of objects
in a single pack file. How long does it take to convince ourselves we're
cycling (even if the cycle itself might only involve a handful of
objects)? I'm not sure we care too much about this being a fast
operation (after all, the point is that it should never happen and we're
just trying not to spin forever). But if it takes 60 minutes to detect
the cycle, from a user's perspective that might not be any different
than an infinite loop.

-Peff

  reply	other threads:[~2020-08-31 19:23 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-23  0:52 [PATCH] Avoid infinite loop in malformed packfiles Ori Bernstein
2020-08-23  2:52 ` ori
2020-08-23  3:08 ` Eric Sunshine
2020-08-23  3:11 ` Ori Bernstein
2020-08-23  6:26   ` René Scharfe
2020-08-23 20:41     ` Ori Bernstein
2020-08-24 16:06       ` René Scharfe
2020-08-24 20:12         ` Jeff King
2020-08-24 20:38           ` Junio C Hamano
2020-08-24 20:52             ` Jeff King
2020-08-24 21:22               ` Junio C Hamano
2020-08-30  3:33                 ` ori
2020-08-30 10:56                   ` René Scharfe
2020-08-30 16:15                     ` Junio C Hamano
2020-08-31  9:29                       ` Jeff King
2020-08-31 16:32                         ` Junio C Hamano
2020-08-31 19:23                           ` Jeff King [this message]
2020-08-31 16:50                         ` ori
2020-08-24 17:33   ` Junio C Hamano
2020-08-24 20:30 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200831192302.GA2819760@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=ori@eigenstate.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).