git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Rafael Silva <rafaeloliveira.cs@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, szeder.dev@gmail.com, peff@peff.net,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 2/2] repack: avoid loosening promisor pack objects in partial clones
Date: Sun, 18 Apr 2021 09:12:55 +0200	[thread overview]
Message-ID: <gohp6ktuo4m6zt.fsf@cpm12071.fritz.box> (raw)
In-Reply-To: <20210415010454.4077355-1-jonathantanmy@google.com>


Jonathan Tan <jonathantanmy@google.com> writes:

>> For a partial
>> clone, that contains unreferenced objects, this results in unpacking
>> all "promisor" objects and deleting them right after, which
>> unnecessarily increases the `repack` execution time and disk usage
>> during the unpacking of the objects.
>
> I think that the commit message also needs to explain that we're
> deleting the promisor objects immediately because they happen to be in a
> promisor pack. So perhaps this whole part could be written as follows:
>
>   When "git repack -Ad" is run in a partial clone, "pack-objects" is
>   invoked twice: once to repack all promisor objects, and once to repack
>   all non-promisor objects. The latter "pack-objects" invocation is with
>   --exclude-promisor-objects and --unpack-unreachable, which loosens all
>   unused objects. Unfortunately, this includes promisor objects.
>
>   Because the "-d" argument to "git repack" subsequently deletes all
>   loose objects also in packs, these just-loosened promisor objects will
>   be immediately deleted. But this extra disk churn is unnecessary in
>   the first place.
>

Thanks for suggesting this message, obviously it's much better than
mine specially because removes the confusion that I made when writing
the message about the `pack-objects` and `prune-packed`.

I'll update patch's message on the next revision.

>> For instance, a partially cloned repository that filters all the blob
>> objects (e.g. "--filter=blob:none"), `repack` ends up unpacking all
>> blobs into the filesystem that, depending on the repo size, makes
>> nearly impossible to repack the operation before running out of disk.
>> 
>> For a partial clone, `git repack` calls `git pack-objects` twice: (1)
>> for handle the "promisor" objects and (2) for performing the repack
>> with --exclude-promisor-objects option, that results in unpacking and
>> deleting of the objects. Given that we actually should keep the
>> promisor objects, let's teach `repack` to tell `pack-objects` to
>> --keep the old "promisor" pack file.
>
> It's not this call (2) that results in any deleting of the objects, but
> the later call to prune_packed_objects(). Also, promisor objects are
> kept regardless of what we pass to "pack-objects" here (the keeping is
> done separately). Maybe write (continuation from my suggestion above):
>
>   In order to avoid this extra disk churn, pass the names of the
>   promisor packfiles as "--keep-pack" arguments to this
>   second invocation of "pack-objects". This informs "pack-objects" that
>   the promisor objects are already in a safe packfile and, therefore, do
>   not need to be loosened.
>

You're right, the second call only loosens the object and the deleting
comes after it, which makes my message misleading. 

The suggested paragraph explain better the situation, thanks for
suggesting it and will replace mine in the next revision.

>> @@ -533,7 +533,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>>  	}
>>  
>>  	packdir = mkpathdup("%s/pack", get_object_directory());
>> -	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
>> +	packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid());
>> +	packtmp = mkpathdup("%s/%s", packdir, packtmp_name);
>>  
>>  	sigchain_push_common(remove_pack_on_signal);
>>  
>
> Normally I would be concerned that packtmp_name is not freed, but in
> this case, it's a static variable (same as packtmp).
>

Indeed.

>> @@ -576,6 +577,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>>  		repack_promisor_objects(&po_args, &names);
>>  
>>  		if (existing_packs.nr && delete_redundant) {
>> +			for_each_string_list_item(item, &names) {
>> +				strvec_pushf(&cmd.args, "--keep-pack=%s-%s.pack",
>> +					     packtmp_name, item->string);
>> +			}
>
> Git style is to not have braces for single-statement loops.
>

I preferred to keep the braces simply because the
for_each_string_list_item() is macro instead of a normal for-loop
statement. 

>> +test_expect_success 'repack does not loose all objects' '
>> +	rm -rf client &&
>> +	git clone --bare --filter=blob:none "file://$(pwd)/srv.bare" client &&
>> +	test_when_finished "rm -rf client" &&
>> +	git -C client repack -A -l -d --no-prune-packed &&
>> +	git -C client count-objects -v >object-count &&
>> +	grep "^prune-packable: 0" object-count
>> +'
>
> s/loose all objects/loosen promisor objects/
>
> Also, add a comment describing why we have "--no-prune-packed" there
> (probably something about not pruning any loose objects that are already
> in packs, so that we can verify that no redundant loose objects are
> being created in the first place).

Thanks for the reviewing and helping clarify the patch intentions. I'll
address this comments on the next revision.

-- 
Thanks
Rafael

  parent reply	other threads:[~2021-04-18  7:13 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-03  9:04 rather slow 'git repack' in 'blob:none' partial clones SZEDER Gábor
2021-04-05  1:02 ` Rafael Silva
2021-04-07 21:17   ` Jeff King
2021-04-08  0:02     ` Jonathan Tan
2021-04-08  0:35       ` Jeff King
2021-04-12  7:09     ` Rafael Silva
2021-04-12 21:36     ` SZEDER Gábor
2021-04-12 21:49       ` Bryan Turner
2021-04-12 23:51         ` Jeff King
2021-04-12 23:47       ` Jeff King
2021-04-13  7:12         ` [PATCH 0/3] low-hanging performance fruit with promisor packs Jeff King
2021-04-13  7:15           ` [PATCH 1/3] is_promisor_object(): free tree buffer after parsing Jeff King
2021-04-13 20:17             ` Junio C Hamano
2021-04-14  5:18               ` Jeff King
2021-04-13  7:16           ` [PATCH 2/3] lookup_unknown_object(): take a repository argument Jeff King
2021-04-13  7:17           ` [PATCH 3/3] revision: avoid parsing with --exclude-promisor-objects Jeff King
2021-04-13 20:22             ` Junio C Hamano
2021-04-13 18:10           ` [PATCH 0/3] low-hanging performance fruit with promisor packs SZEDER Gábor
2021-04-14 17:14           ` Jonathan Tan
2021-04-14 19:22           ` Rafael Silva
2021-04-13 18:05         ` rather slow 'git repack' in 'blob:none' partial clones SZEDER Gábor
2021-04-14  5:14           ` Jeff King
2021-04-11 10:59   ` SZEDER Gábor
2021-04-12  7:53     ` Rafael Silva
2021-04-14 19:14 ` [PATCH 0/2] prevent `repack` to unpack and delete promisor objects Rafael Silva
2021-04-14 19:14   ` [PATCH 1/2] repack: teach --no-prune-packed to skip `git prune-packed` Rafael Silva
2021-04-14 23:50     ` Jonathan Tan
2021-04-18 14:15       ` Rafael Silva
2021-04-14 19:14   ` [PATCH 2/2] repack: avoid loosening promisor pack objects in partial clones Rafael Silva
2021-04-15  1:04     ` Jonathan Tan
2021-04-15  3:51       ` Junio C Hamano
2021-04-15  9:03         ` Jeff King
2021-04-15  9:05       ` Jeff King
2021-04-18  7:12       ` Rafael Silva [this message]
2021-04-15 18:06     ` Junio C Hamano
2021-04-18  8:40       ` Rafael Silva
2021-04-14 22:10   ` [PATCH 0/2] prevent `repack` to unpack and delete promisor objects Junio C Hamano
2021-04-15  9:15   ` Jeff King
2021-04-18  8:20     ` Rafael Silva
2021-04-18 13:57   ` [PATCH v2 0/1] " Rafael Silva
2021-04-18 13:57     ` [PATCH v2 1/1] repack: avoid loosening promisor objects in partial clones Rafael Silva
2021-04-19 19:15       ` Jonathan Tan
2021-04-21 18:54         ` Rafael Silva
2021-04-19 23:09       ` Junio C Hamano
2021-04-21 19:25         ` Rafael Silva
2021-04-21 19:32     ` [PATCH v3] " Rafael Silva

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=gohp6ktuo4m6zt.fsf@cpm12071.fritz.box \
    --to=rafaeloliveira.cs@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).