From: Junio C Hamano <gitster@pobox.com>
To: Takuto Ikuta <tikuta@chromium.org>
Cc: git@vger.kernel.org, Jonathan Tan <jonathantanmy@google.com>
Subject: Re: [PATCH] fetch-pack.c: use oidset to check existence of loose object
Date: Thu, 08 Mar 2018 10:42:10 -0800 [thread overview]
Message-ID: <xmqqr2ouwgsd.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <20180308120639.109438-1-tikuta@chromium.org> (Takuto Ikuta's message of "Thu, 8 Mar 2018 21:06:39 +0900")
Takuto Ikuta <tikuta@chromium.org> writes:
> In repository having large number of refs, lstat for non-existing loose
> objects makes `git fetch` slow.
It is not immediately clear how "large number of refs" and "lstat
for loose objects" interact with each other to create a problem.
"In repository having large number of refs, because such and such
processing needs to do this and that, 'git fetch' ends up doing a
lot of lstat(2) calls to see if many objects exist in the loose
form, which makes it slow". Please fill in the blanks.
> This patch stores existing loose objects in hashmap beforehand and use
> it to check existence instead of using lstat.
>
> With this patch, the number of lstat calls in `git fetch` is reduced
> from 411412 to 13794 for chromium repository.
>
> I took time stat of `git fetch` disabling quickfetch for chromium
> repository 3 time on linux with SSD.
Now you drop a clue that would help to fill in the blanks above, but
I am not sure what the significance of your having to disable
quickfetch in order to take measurements---it makes it sound as if
it is an articificial problem that does not exist in real life
(i.e. when quickfetch is not disabled), but I am getting the feeling
that it is not what you wanted to say here.
In any case, do_fetch_pack() tries to see if all of the tip commits
we are going to fetch exist locally, so when you are trying a fetch
that grabs huge number of refs (by the way, it means that the first
sentence of the proposed log message is not quite true---it is "When
fetching a large number of refs", as it does not matter how many
refs _we_ have, no?), everything_local() ends up making repeated
calls to has_object_file_with_flags() to all of the refs.
I like the idea---this turns "for each of these many things, check
if it exists with lstat(2)" into "enumerate what exists with
lstat(2), and then use that for the existence test"; if you need to
try N objects for existence, and you only have M objects loose where
N is vastly larger than M, it will be a huge win. If you have very
many loose objects and checking only a handful of objects for
existence check, you would lose big, though, no?
> diff --git a/fetch-pack.c b/fetch-pack.c
> index d97461296..1658487f7 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -711,6 +711,15 @@ static void mark_alternate_complete(struct object *obj)
> mark_complete(&obj->oid);
> }
>
> +static int add_loose_objects_to_set(const struct object_id *oid,
> + const char *path,
> + void *data)
> +{
> + struct oidset* set = (struct oidset*)(data);
Style: in our codebase, asterisk does not stick to the type.
struct oidset *set = (struct oidset *)(data);
> @@ -719,16 +728,21 @@ static int everything_local(struct fetch_pack_args *args,
> int retval;
> int old_save_commit_buffer = save_commit_buffer;
> timestamp_t cutoff = 0;
> + struct oidset loose_oid_set = OIDSET_INIT;
> +
> + for_each_loose_object(add_loose_objects_to_set, &loose_oid_set, 0);
OK, so this is the "enumerate all loose objects" phase.
> save_commit_buffer = 0;
>
> for (ref = *refs; ref; ref = ref->next) {
> struct object *o;
> + unsigned int flag = OBJECT_INFO_QUICK;
Hmm, OBJECT_INFO_QUICK optimization was added in dfdd4afc
("sha1_file: teach sha1_object_info_extended more flags",
2017-06-21), but since 8b4c0103 ("sha1_file: support lazily fetching
missing objects", 2017-12-08) it appears that passing
OBJECT_INFO_QUICK down the codepath does not do anything
interesting. Jonathan (cc'ed), are all remaining hits from "git
grep OBJECT_INFO_QUICK" all dead no-ops these days?
>
> - if (!has_object_file_with_flags(&ref->old_oid,
> - OBJECT_INFO_QUICK))
> - continue;
> + if (!oidset_contains(&loose_oid_set, &ref->old_oid))
> + flag |= OBJECT_INFO_SKIP_LOOSE;
>
> + if (!has_object_file_with_flags(&ref->old_oid, flag))
> + continue;
Here, you want a way to say "I know this does not exist in the loose
form, so check if it exists in a non-loose form", and that is why
you invented the new flag.
> diff --git a/sha1_file.c b/sha1_file.c
> index 1b94f39c4..c903cbcec 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1262,6 +1262,9 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
> if (find_pack_entry(real, &e))
> break;
>
> + if (flags & OBJECT_INFO_SKIP_LOOSE)
> + return -1;
> +
I cannot quite convince myself that this is done at the right layer;
it smells to be at a bit too low a layer. This change makes sense
only to a caller that is interested in the existence test. If the
flag is named after what it does, i.e. "ignore loose object", then
it does sort-of make sense, though.
> /* Most likely it's a loose object. */
> if (!sha1_loose_object_info(real, oi, flags))
> return 0;
next prev parent reply other threads:[~2018-03-08 18:42 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-08 12:06 [PATCH] fetch-pack.c: use oidset to check existence of loose object Takuto Ikuta
2018-03-08 17:19 ` René Scharfe
2018-03-09 13:42 ` Takuto Ikuta
2018-03-08 18:42 ` Junio C Hamano [this message]
2018-03-09 13:11 ` [PATCH v2 0/1] " Takuto Ikuta
2018-03-09 13:11 ` [PATCH v2 1/1] " Takuto Ikuta
2018-03-09 13:26 ` [PATCH v3] " Takuto Ikuta
2018-03-09 19:54 ` Junio C Hamano
2018-03-10 13:19 ` Takuto Ikuta
2018-03-13 17:53 ` Junio C Hamano
2018-03-14 6:26 ` Takuto Ikuta
2018-03-10 12:34 ` [PATCH v4] " Takuto Ikuta
2018-03-10 12:46 ` [PATCH v5] " Takuto Ikuta
2018-03-13 19:04 ` Junio C Hamano
2018-03-14 6:05 ` [PATCH v6] " Takuto Ikuta
2018-03-14 6:32 ` [PATCH v7] " Takuto Ikuta
2018-03-09 14:12 ` [PATCH] " Takuto Ikuta
2018-03-09 18:00 ` Junio C Hamano
2018-03-09 19:41 ` Junio C Hamano
2018-03-13 15:30 ` [PATCH] sha1_file: restore OBJECT_INFO_QUICK functionality Jonathan Tan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqr2ouwgsd.fsf@gitster-ct.c.googlers.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=tikuta@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).