git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Han Xin <chiyutianyi@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Git List <git@vger.kernel.org>,
	Han Xin <hanxin.hx@alibaba-inc.com>
Subject: Re: [PATCH v2] receive-pack: not receive pack file with large object
Date: Thu, 30 Sep 2021 15:42:29 +0200	[thread overview]
Message-ID: <87pmsqtb2p.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20210930132004.16075-1-chiyutianyi@gmail.com>


On Thu, Sep 30 2021, Han Xin wrote:

> From: Han Xin <hanxin.hx@alibaba-inc.com>
>
> In addition to using 'receive.maxInputSize' to limit the overall size
> of the received packfile, a new config variable
> 'receive.maxInputObjectSize' is added to limit the push of a single
> object larger than this threshold.

Maybe an unfair knee-jerk reaction: I think we should really be pushing
this sort of thing into pre-receive hooks and/or the proc-receive hook,
i.e. see 15d3af5e22e (receive-pack: add new proc-receive hook,
2020-08-27).

The latter pre-dates c08db5a2d0d (receive-pack: allow a maximum input
size to be specified, 2016-08-24), which may or may not be relevant
here.

Anyway, I think there may be dragons here that you haven't
considered. Is the "size" here the absolute size on disk, or the delta
size (I'm offhand not familiar enough with unpack-objects.c to
know). Does this have the same semantics no matter the
transfer.unpackLimit?

Either way, if it's the absolute size you may have a 100MB object that's
a fantastic delta candidate, so it may only add a few bytes to your
repo, or it's /dev/urandom output and really is adding 100MB.

If we're relying on deltas there's no guarantee that what the client is
sending is delta-ing against something we can delta against, although
admittedly this is also an issue with receive.maxInputSize.

Anyway, all of this is stuff that people "in the wild" don't consider
already, so maybe I'm being too much of a curmudgeon here :)

At an ex-job I re-wrote some "big push" blocking hook that had had N
iterations of other implementations getting it subtly wrong. I.e. if you
define "size" the wrong way you end up blocking things like the revert
that reverts "master" to yesterday's commit.

That's somehing that takes git almost no size at all to store, but which
depending on the stupidity of the hook that's on the other end may be
blocked as a "big push".

So I think if we're going to expand on the existing
"receive.maxInputSize" we should really be considering the edge cases
carefully.

But even then I'm somewhat skeptical of the benefit of doing this in
git's own guts v.s. a hook, or expanding the hook infrastructure to
accommodate it, we have "receive.maxInputSize", now maybe
"receive.maxInputObjectSize", then after that perhaps
"receive.maxInputBinaryObjectSize",
"receive.maxInputObjectSizeGlob=*.mp3" etc.

I'm not saying our hook infrastructure is good enough for this right
now, but surely anything that would be wanted as a check in this area
could be done by something that "git cat-file --batch"-style feeds OIDs
to a new hook over stdin from "index-pack" and "unpack-objects"? With
that hook aware of the temporary staged objects in the incoming-* area
of the store?. I.e. if the performance aspect of not waiting until
"pre-receive" time is a concern...

  reply	other threads:[~2021-09-30 14:04 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-30 12:10 [PATCH] receive-pack: allow a maximum input object size specified Han Xin
2021-09-30 13:20 ` [PATCH v2] receive-pack: not receive pack file with large object Han Xin
2021-09-30 13:42   ` Ævar Arnfjörð Bjarmason [this message]
2021-10-01  2:30     ` Jiang Xin
2021-10-01  6:17       ` Jeff King
2021-10-01  6:55     ` Jeff King
2021-10-01 18:43       ` Junio C Hamano
2021-09-30 16:49   ` Junio C Hamano
2021-10-01  2:52     ` Jiang Xin
2021-10-01  6:24       ` Jeff King
  -- strict thread matches above, loose matches on Subject: below --
2021-10-01  9:16 [PATCH v10 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
2021-11-11  3:03 ` [PATCH v2] receive-pack: not receive pack file with large object Han Xin
2021-11-11 18:35   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pmsqtb2p.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=chiyutianyi@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hanxin.hx@alibaba-inc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).