git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Takuto Ikuta <tikuta@chromium.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Jonathan Tan <jonathantanmy@google.com>
Subject: Re: [PATCH] fetch-pack.c: use oidset to check existence of loose object
Date: Fri, 9 Mar 2018 23:12:06 +0900	[thread overview]
Message-ID: <CALNjmMq9gvRzkoYCfXppTVTR5UtvmBZ_4hVuBLB0t7YzR36Wbg@mail.gmail.com> (raw)
In-Reply-To: <xmqqr2ouwgsd.fsf@gitster-ct.c.googlers.com>

2018-03-09 3:42 GMT+09:00 Junio C Hamano <gitster@pobox.com>:
> Takuto Ikuta <tikuta@chromium.org> writes:
>> This patch stores existing loose objects in hashmap beforehand and use
>> it to check existence instead of using lstat.
>>
>> With this patch, the number of lstat calls in `git fetch` is reduced
>> from 411412 to 13794 for chromium repository.
>>
>> I took time stat of `git fetch` disabling quickfetch for chromium
>> repository 3 time on linux with SSD.
>
> Now you drop a clue that would help to fill in the blanks above, but
> I am not sure what the significance of your having to disable
> quickfetch in order to take measurements---it makes it sound as if
> it is an articificial problem that does not exist in real life
> (i.e. when quickfetch is not disabled), but I am getting the feeling
> that it is not what you wanted to say here.
>

Yes, I just wanted to say 'git fetch' invokes fetch-pack.
fetch-pack is skipped when running git fetch repeatedly while
remote has no update by quickfetch. So I disabled it to see the
performance of fetch-pack. In chromium repository, master branch
is updated several times in an hour, so git fetch invokes fetch-pack
in such frequency.

> In any case, do_fetch_pack() tries to see if all of the tip commits
> we are going to fetch exist locally, so when you are trying a fetch
> that grabs huge number of refs (by the way, it means that the first
> sentence of the proposed log message is not quite true---it is "When
> fetching a large number of refs", as it does not matter how many
> refs _we_ have, no?), everything_local() ends up making repeated
> calls to has_object_file_with_flags() to all of the refs.
>

I fixed description by changing 'refs' to 'remote refs'. In my understanding,
git tries to check existence of remote refs even if we won't fetch such refs.

> I like the idea---this turns "for each of these many things, check
> if it exists with lstat(2)" into "enumerate what exists with
> lstat(2), and then use that for the existence test"; if you need to
> try N objects for existence, and you only have M objects loose where
> N is vastly larger than M, it will be a huge win.  If you have very
> many loose objects and checking only a handful of objects for
> existence check, you would lose big, though, no?
>

Yes. But I think the default limit for the number of loose objects, 7000,
gives us small overhead when we do enumeration of all objects.

>> diff --git a/fetch-pack.c b/fetch-pack.c
>> index d97461296..1658487f7 100644
>> --- a/fetch-pack.c
>> +++ b/fetch-pack.c
>> @@ -711,6 +711,15 @@ static void mark_alternate_complete(struct object *obj)
>>       mark_complete(&obj->oid);
>>  }
>>
>> +static int add_loose_objects_to_set(const struct object_id *oid,
>> +                                 const char *path,
>> +                                 void *data)
>> +{
>> +     struct oidset* set = (struct oidset*)(data);
>
> Style: in our codebase, asterisk does not stick to the type.
>
>         struct oidset *set = (struct oidset *)(data);
>
>> @@ -719,16 +728,21 @@ static int everything_local(struct fetch_pack_args *args,
>>       int retval;
>>       int old_save_commit_buffer = save_commit_buffer;
>>       timestamp_t cutoff = 0;
>> +     struct oidset loose_oid_set = OIDSET_INIT;
>> +
>> +     for_each_loose_object(add_loose_objects_to_set, &loose_oid_set, 0);
>
> OK, so this is the "enumerate all loose objects" phase.
>
>>       save_commit_buffer = 0;
>>
>>       for (ref = *refs; ref; ref = ref->next) {
>>               struct object *o;
>> +             unsigned int flag = OBJECT_INFO_QUICK;
>
> Hmm, OBJECT_INFO_QUICK optimization was added in dfdd4afc
> ("sha1_file: teach sha1_object_info_extended more flags",
> 2017-06-21), but since 8b4c0103 ("sha1_file: support lazily fetching
> missing objects", 2017-12-08) it appears that passing
> OBJECT_INFO_QUICK down the codepath does not do anything
> interesting.  Jonathan (cc'ed), are all remaining hits from "git
> grep OBJECT_INFO_QUICK" all dead no-ops these days?
>

Yes the flag is no-op now, but let me untouched the flag in this patch.

>> diff --git a/sha1_file.c b/sha1_file.c
>> index 1b94f39c4..c903cbcec 100644
>> --- a/sha1_file.c
>> +++ b/sha1_file.c
>> @@ -1262,6 +1262,9 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
>>               if (find_pack_entry(real, &e))
>>                       break;
>>
>> +             if (flags & OBJECT_INFO_SKIP_LOOSE)
>> +                     return -1;
>> +
>
> I cannot quite convince myself that this is done at the right layer;
> it smells to be at a bit too low a layer.  This change makes sense
> only to a caller that is interested in the existence test.  If the
> flag is named after what it does, i.e. "ignore loose object", then
> it does sort-of make sense, though.
>

Couldn't come up with good alternative for this, I followed your
flag name suggestion in patch v3.

Takuto

  parent reply	other threads:[~2018-03-09 14:12 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-08 12:06 [PATCH] fetch-pack.c: use oidset to check existence of loose object Takuto Ikuta
2018-03-08 17:19 ` René Scharfe
2018-03-09 13:42   ` Takuto Ikuta
2018-03-08 18:42 ` Junio C Hamano
2018-03-09 13:11   ` [PATCH v2 0/1] " Takuto Ikuta
2018-03-09 13:11     ` [PATCH v2 1/1] " Takuto Ikuta
2018-03-09 13:26       ` [PATCH v3] " Takuto Ikuta
2018-03-09 19:54         ` Junio C Hamano
2018-03-10 13:19           ` Takuto Ikuta
2018-03-13 17:53             ` Junio C Hamano
2018-03-14  6:26               ` Takuto Ikuta
2018-03-10 12:34         ` [PATCH v4] " Takuto Ikuta
2018-03-10 12:46           ` [PATCH v5] " Takuto Ikuta
2018-03-13 19:04             ` Junio C Hamano
2018-03-14  6:05           ` [PATCH v6] " Takuto Ikuta
2018-03-14  6:32             ` [PATCH v7] " Takuto Ikuta
2018-03-09 14:12   ` Takuto Ikuta [this message]
2018-03-09 18:00     ` [PATCH] " Junio C Hamano
2018-03-09 19:41       ` Junio C Hamano
2018-03-13 15:30   ` [PATCH] sha1_file: restore OBJECT_INFO_QUICK functionality Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALNjmMq9gvRzkoYCfXppTVTR5UtvmBZ_4hVuBLB0t7YzR36Wbg@mail.gmail.com \
    --to=tikuta@chromium.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).