git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
From: Takuto Ikuta <tikuta@chromium.org>
To: René Scharfe <l.s.r@web.de>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] fetch-pack.c: use oidset to check existence of loose object
Date: Fri, 9 Mar 2018 22:42:03 +0900
Message-ID: <CALNjmMofu9yupivCo7gHmOTVoFurgyp9CfR1uD-=cLWzq1hVtA@mail.gmail.com> (raw)
In-Reply-To: <c97772f1-dc36-3f2c-b328-6d8bd480ff79@web.de>

2018-03-09 2:19 GMT+09:00 René Scharfe <l.s.r@web.de>:
> Am 08.03.2018 um 13:06 schrieb Takuto Ikuta:
>> +static int add_loose_objects_to_set(const struct object_id *oid,
>> +                                 const char *path,
>> +                                 void *data)
>> +{
>> +     struct oidset* set = (struct oidset*)(data);
>
> This cast is not needed (unlike in C++).  And the asterisk should be stuck
> to the variable, not the type (see Documentation/CodingGuidelines).
>
>> +     oidset_insert(set, oid);
>
> In fact, you could just put "data" in here instead of "set" (without a
> cast), with no loss in readability or safety.
>

Thank you for review, changed to use data directly in v2.

>> +     return 0;
>> +}
>> +
>>   static int everything_local(struct fetch_pack_args *args,
>>                           struct ref **refs,
>>                           struct ref **sought, int nr_sought)
>> @@ -719,16 +728,21 @@ static int everything_local(struct fetch_pack_args *args,
>>       int retval;
>>       int old_save_commit_buffer = save_commit_buffer;
>>       timestamp_t cutoff = 0;
>> +     struct oidset loose_oid_set = OIDSET_INIT;
>> +
>> +     for_each_loose_object(add_loose_objects_to_set, &loose_oid_set, 0);
>>
>>       save_commit_buffer = 0;
>>
>>       for (ref = *refs; ref; ref = ref->next) {
>>               struct object *o;
>> +             unsigned int flag = OBJECT_INFO_QUICK;
>>
>> -             if (!has_object_file_with_flags(&ref->old_oid,
>> -                                             OBJECT_INFO_QUICK))
>> -                     continue;
>> +             if (!oidset_contains(&loose_oid_set, &ref->old_oid))
>> +                     flag |= OBJECT_INFO_SKIP_LOOSE;
>>
>> +             if (!has_object_file_with_flags(&ref->old_oid, flag))
>> +                     continue;
>>               o = parse_object(&ref->old_oid);
>>               if (!o)
>>                       continue;
>> @@ -744,6 +758,8 @@ static int everything_local(struct fetch_pack_args *args,
>>               }
>>       }
>>
>> +     oidset_clear(&loose_oid_set);
>> +
>
> This part looks fine to me.  (Except perhaps call the variable "flags"
> because you sometimes have two?)
>

Changed flag names.

> Why not include packed objects as well?  Probably because packs have
> indexes which can queried quickly to determine object existence, and
> because there are only few loose objects in typical repositories,
> right?
>

Correct. In my target repository, chromium, fetch-pack's slowness
comes from many lstat
to non-existing loose objects for remote refs. I focus on to remove such lstat.

> A similar cache was introduced by cc817ca3ef (sha1_name: cache
> readdir(3) results in find_short_object_filename()) to speed up
> finding unambiguous shorts object hashes.  I wonder if it could be
> used here as well, but I don't see an easy way.
>
>>       if (!args->no_dependents) {
>>               if (!args->deepen) {
>>                       for_each_ref(mark_complete_oid, NULL);
>> diff --git a/sha1_file.c b/sha1_file.c
>> index 1b94f39c4..c903cbcec 100644
>> --- a/sha1_file.c
>> +++ b/sha1_file.c
>> @@ -1262,6 +1262,9 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
>>               if (find_pack_entry(real, &e))
>>                       break;
>>
>> +             if (flags & OBJECT_INFO_SKIP_LOOSE)
>> +                     return -1;
>> +
>>               /* Most likely it's a loose object. */
>>               if (!sha1_loose_object_info(real, oi, flags))
>>                       return 0;
>>
>
> This early return doesn't just skip checking loose objects.  It
> also skips reloading packs and fetching missing objects for
> partial clones.  That may not be a problem for fetch-pack, but
> it means the flag has a misleading name.  Do you get the same
> performance improvement if you make it only skip that
> sha1_loose_object_info() call?
>

I changed the flag name following Junio's suggestion. If we skip
sha1_loose_object_info call,
reprepare_packed_git(...) runs for every non-existing refs and git
fetch becomes slower.

Takuto

  reply index

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-08 12:06 Takuto Ikuta
2018-03-08 17:19 ` René Scharfe
2018-03-09 13:42   ` Takuto Ikuta [this message]
2018-03-08 18:42 ` Junio C Hamano
2018-03-09 13:11   ` [PATCH v2 0/1] " Takuto Ikuta
2018-03-09 13:11     ` [PATCH v2 1/1] " Takuto Ikuta
2018-03-09 13:26       ` [PATCH v3] " Takuto Ikuta
2018-03-09 19:54         ` Junio C Hamano
2018-03-10 13:19           ` Takuto Ikuta
2018-03-13 17:53             ` Junio C Hamano
2018-03-14  6:26               ` Takuto Ikuta
2018-03-10 12:34         ` [PATCH v4] " Takuto Ikuta
2018-03-10 12:46           ` [PATCH v5] " Takuto Ikuta
2018-03-13 19:04             ` Junio C Hamano
2018-03-14  6:05           ` [PATCH v6] " Takuto Ikuta
2018-03-14  6:32             ` [PATCH v7] " Takuto Ikuta
2018-03-09 14:12   ` [PATCH] " Takuto Ikuta
2018-03-09 18:00     ` Junio C Hamano
2018-03-09 19:41       ` Junio C Hamano
2018-03-13 15:30   ` [PATCH] sha1_file: restore OBJECT_INFO_QUICK functionality Jonathan Tan

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALNjmMofu9yupivCo7gHmOTVoFurgyp9CfR1uD-=cLWzq1hVtA@mail.gmail.com' \
    --to=tikuta@chromium.org \
    --cc=git@vger.kernel.org \
    --cc=l.s.r@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox