git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Rafael Silva <rafaeloliveira.cs@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: rather slow 'git repack' in 'blob:none' partial clones
Date: Tue, 13 Apr 2021 20:05:52 +0200	[thread overview]
Message-ID: <20210413180552.GI2947267@szeder.dev> (raw)
In-Reply-To: <YHTcHY+P7RuZJGab@coredump.intra.peff.net>

On Mon, Apr 12, 2021 at 07:47:41PM -0400, Jeff King wrote:
> On Mon, Apr 12, 2021 at 11:36:53PM +0200, SZEDER Gábor wrote:
> 
> > All what you wrote above makes sense to me, but I've never looked at
> > how partial clones work, so it doesn't mean anything...  In any case
> > your patch brings great speedups, but, unfortunately, the memory usage
> > remains as high as it was:
> > 
> >   $ /usr/bin/time --format=elapsed: %E  max RSS: %Mk /home/szeder/src/git/bin-wrappers/git -C git-partial.git/ gc
> >   Enumerating objects: 188450, done.
> >   Counting objects: 100% (188450/188450), done.
> >   Delta compression using up to 4 threads
> >   Compressing objects: 100% (66623/66623), done.
> >   Writing objects: 100% (188450/188450), done.
> >   Total 188450 (delta 120109), reused 188450 (delta 120109), pack-reused 0
> >   elapsed: 0:15.18  max RSS: 1888332k
> > 
> > And git.git is not all that large, I wonder how much memory would be
> > necessary to 'gc' a 'blob:none' clone of e.g. chromium?! :)
> > 
> > BTW, this high memory usage in a partial clone is not specific to
> > 'repack', 'fsck' suffers just as much:
> 
> I think the issue is in the exclude-promisor-object code paths of the
> traversal. Try this:
> 
>   [two clones of git.git, one full and one partial]
>   $ git clone --no-local --bare /path/to/git full.git
>   $ git clone --no-local --bare --filter=blob:none /path/to/git partial.git
> 
>   [full clone is quick to traverse all objects]
>   $ time git -C full.git rev-list --count --all
>   63215
>   real	0m0.369s
>   user	0m0.365s
>   sys	0m0.004s
> 
>   [partial is, too; it's the same amount of work because we're just
>    looking at the commits here]
>   $ time git -C partial.git rev-list --count --all
>   63215
>   real	0m0.373s
>   user	0m0.364s
>   sys	0m0.009s
> 
>   [but now ask it to exclude promisor objects, and it's much slower;
>   this is because is_promisor_object() opens up each tree in the pack in
>   order to see which "promised" objects it mentions]

I don't understand this: 'git rev-list --count --all' only counts
commit objects, so why should it open any trees at all?

>   $ time git -C partial.git rev-list --exclude-promisor-objects --count --all
>   0
>   real	0m11.723s
>   user	0m11.354s
>   sys	0m0.369s
> 
> And I think that is the source for the memory use, too. Without that
> option, we peak at 11MB heap. With it, 1.6GB. Oops. Looks like there
> might be some "leaks" when we parse tree objects and then leave the
> buffers connected to the structs (technically not a leak because we
> still have the pointers, but obviously having every tree in memory at
> once is not good).
> 
> The patch below drops the peak heap to 165MB. Still quite a bit more,
> but I think it's a combination of delta-base cache (96MB) plus extra
> structs for all the non-commit objects whose flags we marked.
> 
> It does seem like there's probably a good space/time tradeoff to be made
> here (e.g., caching the list of "promisor" object ids for the pack
> instead of inflating and reading all of the trees on the fly).
> 
> diff --git a/packfile.c b/packfile.c
> index 8668345d93..b79cbc8cd4 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -2247,6 +2247,7 @@ static int add_promisor_object(const struct object_id *oid,
>  			return 0;
>  		while (tree_entry_gently(&desc, &entry))
>  			oidset_insert(set, &entry.oid);
> +		free_tree_buffer(tree);
>  	} else if (obj->type == OBJ_COMMIT) {
>  		struct commit *commit = (struct commit *) obj;
>  		struct commit_list *parents = commit->parents;
> diff --git a/revision.c b/revision.c
> index 553c0faa9b..fac2577748 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -3271,8 +3271,15 @@ static int mark_uninteresting(const struct object_id *oid,
>  			      void *cb)
>  {
>  	struct rev_info *revs = cb;
> +	/*
> +	 * yikes, do we really need to parse here? maybe

Heh, a "yikes" here and a "yuck" in your previous patch...  This issue
was worth reporting :)

> +	 * lookup_unknown_object() would be sufficient, or
> +	 * even oid_object_info() followed by the correct type
> +	 */
>  	struct object *o = parse_object(revs->repo, oid);
>  	o->flags |= UNINTERESTING | SEEN;
> +	if (o->type == OBJ_TREE)
> +		free_tree_buffer((struct tree *)o);
>  	return 0;
>  }
>  
> 
> -Peff

  parent reply	other threads:[~2021-04-13 18:06 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-03  9:04 rather slow 'git repack' in 'blob:none' partial clones SZEDER Gábor
2021-04-05  1:02 ` Rafael Silva
2021-04-07 21:17   ` Jeff King
2021-04-08  0:02     ` Jonathan Tan
2021-04-08  0:35       ` Jeff King
2021-04-12  7:09     ` Rafael Silva
2021-04-12 21:36     ` SZEDER Gábor
2021-04-12 21:49       ` Bryan Turner
2021-04-12 23:51         ` Jeff King
2021-04-12 23:47       ` Jeff King
2021-04-13  7:12         ` [PATCH 0/3] low-hanging performance fruit with promisor packs Jeff King
2021-04-13  7:15           ` [PATCH 1/3] is_promisor_object(): free tree buffer after parsing Jeff King
2021-04-13 20:17             ` Junio C Hamano
2021-04-14  5:18               ` Jeff King
2021-04-13  7:16           ` [PATCH 2/3] lookup_unknown_object(): take a repository argument Jeff King
2021-04-13  7:17           ` [PATCH 3/3] revision: avoid parsing with --exclude-promisor-objects Jeff King
2021-04-13 20:22             ` Junio C Hamano
2021-04-13 18:10           ` [PATCH 0/3] low-hanging performance fruit with promisor packs SZEDER Gábor
2021-04-14 17:14           ` Jonathan Tan
2021-04-14 19:22           ` Rafael Silva
2021-04-13 18:05         ` SZEDER Gábor [this message]
2021-04-14  5:14           ` rather slow 'git repack' in 'blob:none' partial clones Jeff King
2021-04-11 10:59   ` SZEDER Gábor
2021-04-12  7:53     ` Rafael Silva
2021-04-14 19:14 ` [PATCH 0/2] prevent `repack` to unpack and delete promisor objects Rafael Silva
2021-04-14 19:14   ` [PATCH 1/2] repack: teach --no-prune-packed to skip `git prune-packed` Rafael Silva
2021-04-14 23:50     ` Jonathan Tan
2021-04-18 14:15       ` Rafael Silva
2021-04-14 19:14   ` [PATCH 2/2] repack: avoid loosening promisor pack objects in partial clones Rafael Silva
2021-04-15  1:04     ` Jonathan Tan
2021-04-15  3:51       ` Junio C Hamano
2021-04-15  9:03         ` Jeff King
2021-04-15  9:05       ` Jeff King
2021-04-18  7:12       ` Rafael Silva
2021-04-15 18:06     ` Junio C Hamano
2021-04-18  8:40       ` Rafael Silva
2021-04-14 22:10   ` [PATCH 0/2] prevent `repack` to unpack and delete promisor objects Junio C Hamano
2021-04-15  9:15   ` Jeff King
2021-04-18  8:20     ` Rafael Silva
2021-04-18 13:57   ` [PATCH v2 0/1] " Rafael Silva
2021-04-18 13:57     ` [PATCH v2 1/1] repack: avoid loosening promisor objects in partial clones Rafael Silva
2021-04-19 19:15       ` Jonathan Tan
2021-04-21 18:54         ` Rafael Silva
2021-04-19 23:09       ` Junio C Hamano
2021-04-21 19:25         ` Rafael Silva
2021-04-21 19:32     ` [PATCH v3] " Rafael Silva

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210413180552.GI2947267@szeder.dev \
    --to=szeder.dev@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    --cc=rafaeloliveira.cs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).