git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jeff King <peff@peff.net>, Constantine <hi-angel@yandex.ru>
Cc: Junio C Hamano <gitster@pobox.com>,
	Christian Couder <christian.couder@gmail.com>,
	Mike Hommey <mh@glandium.org>, git <git@vger.kernel.org>
Subject: Re: git-clone causes out of memory
Date: Fri, 13 Oct 2017 09:15:53 -0400
Message-ID: <f35d03b5-a525-87b3-a426-bd892edf0c36@gmail.com> (raw)
In-Reply-To: <20171013124456.qsbaol7txdgdb6wq@sigill.intra.peff.net>

On 10/13/2017 8:44 AM, Jeff King wrote:
> On Fri, Oct 13, 2017 at 03:12:43PM +0300, Constantine wrote:
>
>> On 13.10.2017 15:04, Junio C Hamano wrote:
>>> Christian Couder <christian.couder@gmail.com> writes:
>>>
>>>> Yeah, but perhaps Git could be smarter when rev-listing too and avoid
>>>> processing files or directories it has already seen?
>>> Aren't you suggesting to optimize for a wrong case?
>>>
>> Anything that is possible with a software should be considered as a possible
>> usecase. It's in fact a DoS attack. Imagine there's a server that using git
>> to process something, and now there's a way to knock down this server. It's
>> also bad from a promotional stand point.
> But the point is that you'd have the same problem with any repository
> that had 10^7 files in it. Yes, it's convenient for the attacker that
> there are only 9 objects, but fundamentally it's pretty easy for an
> attacker to construct repositories that have large trees (or very deep
> trees -- that's what causes stack exhaustion in some cases).
>
> Note too that this attack almost always comes down to the diff code
> (which is why it kicks in for pathspec limiting) which has to actually
> expand the tree. Most "normal" server-side operations (like accepting
> pushes or serving fetches) operate only on the object graph and _do_
> avoid processing already-seen objects.
>
> As soon as servers start trying to checkout or diff, though, the attack
> surface gets quite large. And you really need to start thinking about
> having resource limits and quotas for CPU and memory use of each process
> (and group by requesting user, IP, repo, etc).
>
> I think the main thing Git could be doing here is to limit the size of
> the tree (both width and depth). But arbitrary limits like that have a
> way of being annoying, and I think it just pushes resource-exhaustion
> attacks off a little (e.g., can you construct a blob that behaves badly
> with the "--patch"?).
>
> -Peff

I'm particularly interested in why `git rev-list HEAD -- [path]` gets 
slower in this case, because I wrote the history algorithm used by VSTS. 
In our algorithm, we only walk the list of objects from commit to the 
tree containing the path item. For example, in the path d0/d0/d0, we 
would only walk:

     commit --root--> tree --d0--> tree --d0--> tree [parse oid for d0 
entry]

 From this, we can determine the TREESAME relationship by parsing four 
objects without parsing all contents below d0/d0/d0.

The reason we have exponential behavior in `git rev-list` is because we 
are calling diff_tree_oid() in tree-diff.c recursively without 
short-circuiting on equal OIDs.

I will prepare a patch that adds this OID-equal short-circuit to avoid 
this exponential behavior. I'll model my patch against a similar patch 
in master:

     commit d12a8cf0af18804c2000efc7a0224da631e04cd1 unpack-trees: avoid 
duplicate ODB lookups during checkout

It will also significantly speed up rev-list calls for short paths in 
deep repositories. It will not be very measurable in the git or Linux 
repos because their shallow folder structure.

Thanks,
-Stolee

  reply index

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-13  9:51 Constantine
2017-10-13 10:06 ` Mike Hommey
2017-10-13 10:26   ` Christian Couder
2017-10-13 10:37     ` Mike Hommey
2017-10-13 10:44       ` Christian Couder
2017-10-13 12:04         ` Junio C Hamano
2017-10-13 12:12           ` Constantine
2017-10-13 12:44             ` Jeff King
2017-10-13 13:15               ` Derrick Stolee [this message]
2017-10-13 13:39                 ` Derrick Stolee
2017-10-13 13:50                   ` Jeff King
2017-10-13 13:55                     ` Derrick Stolee
2017-10-13 13:56                       ` Jeff King
2017-10-13 14:10                         ` Jeff King
2017-10-13 14:20                           ` Jeff King
2017-10-13 14:25                             ` Derrick Stolee
2017-10-13 14:26                               ` Jeff King
2017-10-13 14:30                                 ` Derrick Stolee
2017-10-13 15:27                                 ` [PATCH] revision: quit pruning diff more quickly when possible Jeff King
2017-10-13 15:37                                   ` Derrick Stolee
2017-10-13 15:44                                     ` Jeff King
2017-10-14  2:43                                   ` Junio C Hamano
2017-10-13 12:35   ` git-clone causes out of memory Jeff King

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f35d03b5-a525-87b3-a426-bd892edf0c36@gmail.com \
    --to=stolee@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hi-angel@yandex.ru \
    --cc=mh@glandium.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox