list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <>
To: Jeff King <>, Constantine <>
Cc: Junio C Hamano <>,
	Christian Couder <>,
	Mike Hommey <>, git <>
Subject: Re: git-clone causes out of memory
Date: Fri, 13 Oct 2017 09:39:14 -0400
Message-ID: <> (raw)
In-Reply-To: <>

On 10/13/2017 9:15 AM, Derrick Stolee wrote:
> On 10/13/2017 8:44 AM, Jeff King wrote:
>> On Fri, Oct 13, 2017 at 03:12:43PM +0300, Constantine wrote:
>>> On 13.10.2017 15:04, Junio C Hamano wrote:
>>>> Christian Couder <> writes:
>>>>> Yeah, but perhaps Git could be smarter when rev-listing too and avoid
>>>>> processing files or directories it has already seen?
>>>> Aren't you suggesting to optimize for a wrong case?
>>> Anything that is possible with a software should be considered as a 
>>> possible
>>> usecase. It's in fact a DoS attack. Imagine there's a server that 
>>> using git
>>> to process something, and now there's a way to knock down this 
>>> server. It's
>>> also bad from a promotional stand point.
>> But the point is that you'd have the same problem with any repository
>> that had 10^7 files in it. Yes, it's convenient for the attacker that
>> there are only 9 objects, but fundamentally it's pretty easy for an
>> attacker to construct repositories that have large trees (or very deep
>> trees -- that's what causes stack exhaustion in some cases).
>> Note too that this attack almost always comes down to the diff code
>> (which is why it kicks in for pathspec limiting) which has to actually
>> expand the tree. Most "normal" server-side operations (like accepting
>> pushes or serving fetches) operate only on the object graph and _do_
>> avoid processing already-seen objects.
>> As soon as servers start trying to checkout or diff, though, the attack
>> surface gets quite large. And you really need to start thinking about
>> having resource limits and quotas for CPU and memory use of each process
>> (and group by requesting user, IP, repo, etc).
>> I think the main thing Git could be doing here is to limit the size of
>> the tree (both width and depth). But arbitrary limits like that have a
>> way of being annoying, and I think it just pushes resource-exhaustion
>> attacks off a little (e.g., can you construct a blob that behaves badly
>> with the "--patch"?).
>> -Peff
> I'm particularly interested in why `git rev-list HEAD -- [path]` gets 
> slower in this case, because I wrote the history algorithm used by 
> VSTS. In our algorithm, we only walk the list of objects from commit 
> to the tree containing the path item. For example, in the path 
> d0/d0/d0, we would only walk:
>     commit --root--> tree --d0--> tree --d0--> tree [parse oid for d0 
> entry]
> From this, we can determine the TREESAME relationship by parsing four 
> objects without parsing all contents below d0/d0/d0.
> The reason we have exponential behavior in `git rev-list` is because 
> we are calling diff_tree_oid() in tree-diff.c recursively without 
> short-circuiting on equal OIDs.
> I will prepare a patch that adds this OID-equal short-circuit to avoid 
> this exponential behavior. I'll model my patch against a similar patch 
> in master:
>     commit d12a8cf0af18804c2000efc7a0224da631e04cd1 unpack-trees: 
> avoid duplicate ODB lookups during checkout
> It will also significantly speed up rev-list calls for short paths in 
> deep repositories. It will not be very measurable in the git or Linux 
> repos because their shallow folder structure.
> Thanks,
> -Stolee

Since I don't understand enough about the consumers to diff_tree_oid() 
(and the fact that the recursive behavior may be wanted in some cases), 
I think we can fix this in builtin/rev-list.c with this simple diff:


diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index ded1577424..b2e8e02cc8 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -285,6 +285,9 @@ int cmd_rev_list(int argc, const char **argv, const 
char *prefix)

         git_config(git_default_config, NULL);
         init_revisions(&revs, prefix);
+       revs.pruning.flags = revs.pruning.flags & ~DIFF_OPT_RECURSIVE;
         revs.abbrev = DEFAULT_ABBREV;
         revs.commit_format = CMIT_FMT_UNSPECIFIED;
         argc = setup_revisions(argc, argv, &revs, NULL);

  reply	other threads:[~2017-10-13 13:39 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-13  9:51 Constantine
2017-10-13 10:06 ` Mike Hommey
2017-10-13 10:26   ` Christian Couder
2017-10-13 10:37     ` Mike Hommey
2017-10-13 10:44       ` Christian Couder
2017-10-13 12:04         ` Junio C Hamano
2017-10-13 12:12           ` Constantine
2017-10-13 12:44             ` Jeff King
2017-10-13 13:15               ` Derrick Stolee
2017-10-13 13:39                 ` Derrick Stolee [this message]
2017-10-13 13:50                   ` Jeff King
2017-10-13 13:55                     ` Derrick Stolee
2017-10-13 13:56                       ` Jeff King
2017-10-13 14:10                         ` Jeff King
2017-10-13 14:20                           ` Jeff King
2017-10-13 14:25                             ` Derrick Stolee
2017-10-13 14:26                               ` Jeff King
2017-10-13 14:30                                 ` Derrick Stolee
2017-10-13 15:27                                 ` [PATCH] revision: quit pruning diff more quickly when possible Jeff King
2017-10-13 15:37                                   ` Derrick Stolee
2017-10-13 15:44                                     ` Jeff King
2017-10-14  2:43                                   ` Junio C Hamano
2017-10-13 12:35   ` git-clone causes out of memory Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

  List information:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \
    --subject='Re: git-clone causes out of memory' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ \
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
 note: .onion URLs require Tor:

code repositories for project(s) associated with this inbox:

AGPL code for this site: git clone