From: Jonathan Tan <jonathantanmy@google.com>
To: gitster@pobox.com
Cc: jonathantanmy@google.com, git@vger.kernel.org, sluongng@gmail.com
Subject: Re: [PATCH 2/2] pack-objects: prefetch objects to be packed
Date: Tue, 21 Jul 2020 09:37:36 -0700 [thread overview]
Message-ID: <20200721163736.69610-1-jonathantanmy@google.com> (raw)
In-Reply-To: <xmqqd04p8ywt.fsf@gitster.c.googlers.com>
> Hmph, the resulting codeflow structure feels somewhat iffy. Perhaps
> I am not reading the code correctly, but
>
> * There is a loop that scans from 0..to_pack.nr_objects and calls
> check_object() for each and every one of them;
>
> * The called check_object(), when it notices that a missing and
> promised (i.e. to be lazily fetched) object is in the to_pack
> array, asks prefetch_to_pack() to scan from that point to the end
> of that array and grabs all of them that are missing.
>
> It almost feels a lot cleaner to see what is going on in the
> resulting code, instead of the way the new "loop" was added, if a
> new loop is added _before_ the loop to call check_object() on all
> objects in to_pack array as a pre-processing phase when there is a
> promisor remote. That is, after reverting all the change this patch
> makes to check_object(), add a new loop in get_object_details() that
> looks more or less like so:
>
> QSORT(sorted_by_offset, to_pack.nr_objects, pack_offset_sort);
>
> + if (has_promisor_remote())
> + prefetch_to_pack(0);
> +
> for (i = 0; i < to_pack.nr_objects; i++) {
>
>
> Was the patch done this way because scanning the entire array twice
> is expensive?
Yes. If we called prefetch_to_pack(0) first (as you suggest), this first
scan involves checking the existence of all objects in the array, so I
thought it would be expensive. (Checking the existence of an object
probably brings the corresponding pack index into disk cache on
platforms like Linux, so 2 object reads might not take much more time
than 1 object read, but I didn't want to rely on this when I could just
avoid the extra read.)
> The optimization makes sense to me if certain
> conditions are met, like...
>
> - Most of the time there is no missing object due to promisor, even
> if has_promissor_to_remote() is true;
I think that optimizing for this condition makes sense - most pushes (I
would think) are pushes of objects we create locally, and thus no
objects are missing.
> - When there are missing objects due to promisor, pack_offset_sort
> will keep them near the end of the array; and
>
> - Given the oid, oid_object_info_extended() on it with
> OBJECT_INFO_FOR_PREFETCH is expensive.
I see this as expensive since it involves checking of object existence.
> Only when all these conditions are met, it would avoid unnecessary
> overhead by scanning only a very later part of the array by delaying
> the point in the array where prefetch_to_pack() starts scanning.
Yes (and when there are no missing objects at all, there is no
double-scanning).
next prev parent reply other threads:[~2020-07-21 16:37 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-21 0:21 [PATCH 0/2] Prefetch objects in pack-objects Jonathan Tan
2020-07-21 0:21 ` [PATCH 1/2] pack-objects: refactor to oid_object_info_extended Jonathan Tan
2020-07-21 0:21 ` [PATCH 2/2] pack-objects: prefetch objects to be packed Jonathan Tan
2020-07-21 1:00 ` Junio C Hamano
2020-07-21 16:37 ` Jonathan Tan [this message]
2020-07-21 19:23 ` Junio C Hamano
2020-07-21 21:27 ` Junio C Hamano
2020-07-21 23:37 ` Jonathan Tan
2020-07-21 23:56 ` Junio C Hamano
2020-07-21 23:20 ` Jonathan Tan
2020-07-21 23:51 ` Junio C Hamano
2020-07-22 21:30 ` Jonathan Tan
2020-07-22 21:45 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200721163736.69610-1-jonathantanmy@google.com \
--to=jonathantanmy@google.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=sluongng@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).