git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Jiang Xin <worldhello.net@gmail.com>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Han Xin" <chiyutianyi@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Git List" <git@vger.kernel.org>,
	"Han Xin" <hanxin.hx@alibaba-inc.com>
Subject: Re: [PATCH v2] receive-pack: not receive pack file with large object
Date: Fri, 1 Oct 2021 02:17:30 -0400	[thread overview]
Message-ID: <YVan+n1surlXfiEw@coredump.intra.peff.net> (raw)
In-Reply-To: <CANYiYbHNBcDaoF+QE_+62EXUZD_caaJDFmt7v1_BddQfpdVcvg@mail.gmail.com>

On Fri, Oct 01, 2021 at 10:30:24AM +0800, Jiang Xin wrote:

> > Maybe an unfair knee-jerk reaction: I think we should really be pushing
> > this sort of thing into pre-receive hooks and/or the proc-receive hook,
> > i.e. see 15d3af5e22e (receive-pack: add new proc-receive hook,
> > 2020-08-27).
> 
> Last week, one user complained that he cannot push to his repo in our
> server, and later Han Xin discovered the user was trying to push a
> very big blob object over 10GB. For this case, the "pre-receive" hook
> had no change to execute because "git-receive-pack" died early because
> of OOM.  The function "unpack_non_delta_entry()" in
> "builtin/unpack-objects.c" will try to allocate memory for the whole
> 10GB blob but no lucky.
> 
> Han Xin is preparing another patch to resolve the OOM issue found in
> "unpack_non_delta_entry()". But we think it is reasonable to prevent
> such a big blob in a pack to git-receive-pack, because it will be
> slower to check objects from pack and loose objects in the quarantine
> using pre-receive hook.

I think index-pack handles this case correctly, at least for base
objects. In unpack_entry_data(), it will stream anything bigger than
big_file_threshold. Probably unpack-objects needs to learn the same
trick.

In general, the code in index-pack has gotten a _lot_ more attention
over the years than unpack-objects. I'd trust its security a lot more,
and it has extra performance enhancements (like multithreading and
streaming). At GitHub, we always run index-pack on incoming packs, and
never unpack-objects. I'm tempted to say we should stop using
unpack-objects entirely for incoming packs, and then either:

  - just keep packs for incoming objects. It's not really any worse of a
    state than loose objects, and it all eventually gets rolled up by gc
    anyway. (The exception is that thin packs may duplicate base objects
    until that rollup).

  - teach index-pack an "--unpack" option. I actually wrote some patches
    for this a while back. It's not very much code, but there were some
    rough edges and I never came back to them. I'm happy to share if
    anybody's interested.

Though I would note that for deltas, even index-pack will not stream the
contents. You generally shouldn't have very large deltas, as clients
will also avoid computing them (because that also involves putting the
whole object in memory). But the notion of "large" is not necessarily
the same between client and server. And of course somebody can
maliciously make a giant delta.

-Peff

  reply	other threads:[~2021-10-01  6:17 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-30 12:10 [PATCH] receive-pack: allow a maximum input object size specified Han Xin
2021-09-30 13:20 ` [PATCH v2] receive-pack: not receive pack file with large object Han Xin
2021-09-30 13:42   ` Ævar Arnfjörð Bjarmason
2021-10-01  2:30     ` Jiang Xin
2021-10-01  6:17       ` Jeff King [this message]
2021-10-01  6:55     ` Jeff King
2021-10-01 18:43       ` Junio C Hamano
2021-09-30 16:49   ` Junio C Hamano
2021-10-01  2:52     ` Jiang Xin
2021-10-01  6:24       ` Jeff King
  -- strict thread matches above, loose matches on Subject: below --
2021-10-01  9:16 [PATCH v10 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
2021-11-11  3:03 ` [PATCH v2] receive-pack: not receive pack file with large object Han Xin
2021-11-11 18:35   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YVan+n1surlXfiEw@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=chiyutianyi@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hanxin.hx@alibaba-inc.com \
    --cc=worldhello.net@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).