git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Takuto Ikuta <tikuta@chromium.org>
Cc: git@vger.kernel.org, Jonathan Tan <jonathantanmy@google.com>
Subject: Re: [PATCH] fetch-pack.c: use oidset to check existence of loose object
Date: Fri, 09 Mar 2018 10:00:12 -0800	[thread overview]
Message-ID: <xmqqd10d6ser.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <CALNjmMq9gvRzkoYCfXppTVTR5UtvmBZ_4hVuBLB0t7YzR36Wbg@mail.gmail.com> (Takuto Ikuta's message of "Fri, 9 Mar 2018 23:12:06 +0900")

Takuto Ikuta <tikuta@chromium.org> writes:

> Yes, I just wanted to say 'git fetch' invokes fetch-pack.
> fetch-pack is skipped when running git fetch repeatedly while
> remote has no update by quickfetch. So I disabled it to see the
> performance of fetch-pack. In chromium repository, master branch
> is updated several times in an hour, so git fetch invokes fetch-pack
> in such frequency.

I do understand that if you run "git fetch" against the same place
in quick succession three times, the first one may cost a lot (not
just that it has to do the everything_local() computation that you
care about, it also has to actually do the network transfer), while
the second and third ones that are done before anything new happens
on the other end will not involve everything_local() overhead thanks
to quickfetch.

A "fetch" that is run against a remote that has nothing new, but
still triggers everything_local() only because quickfetch is
disabled, is an artificial case that has no relevance to the real
world, I suspect, because the quickfetch optimization is to solve
the "there is nothing to be done, still do_fetch_pack() spends so
much cycles only to realize that it does not have anything to do,
why?" issue.

Isn't running the "git fetch" command with the "--dry-run" option
many times in quick succession a lot closer to what you really want
to measure, I wonder?  That way, your first fetch won't be touching
the state of the local side to affect your second and subsequent
fetches.

>> In any case, do_fetch_pack() tries to see if all of the tip commits
>> we are going to fetch exist locally, so when you are trying a fetch
>> that grabs huge number of refs (by the way, it means that the first
>> sentence of the proposed log message is not quite true---it is "When
>> fetching a large number of refs", as it does not matter how many
>> refs _we_ have, no?), everything_local() ends up making repeated
>> calls to has_object_file_with_flags() to all of the refs.
>
> I fixed description by changing 'refs' to 'remote refs'. In my understanding,
> git tries to check existence of remote refs even if we won't fetch such refs.

During fetch, everything_local() tries to mark common part by
walking the refs the other side advertised upon the first contact,
so it is correct that the number of checks is not reduced in a fetch
that does not fetch many refs, but the number of remote-tracking refs
you have has no effect, so I doubt such a rephrasing would make the
description more understandable.  "When fetching from a repository
with large number of refs" is probably what you want to say, no?

> Yes. But I think the default limit for the number of loose objects, 7000,
> gives us small overhead when we do enumeration of all objects.

Hmph, I didn't see the code that does the estimation of loose object
count before starting to enumerate, though.

>> Hmm, OBJECT_INFO_QUICK optimization was added in dfdd4afc
>> ("sha1_file: teach sha1_object_info_extended more flags",
>> 2017-06-21), but since 8b4c0103 ("sha1_file: support lazily fetching
>> missing objects", 2017-12-08) it appears that passing
>> OBJECT_INFO_QUICK down the codepath does not do anything
>> interesting.  Jonathan (cc'ed), are all remaining hits from "git
>> grep OBJECT_INFO_QUICK" all dead no-ops these days?
>
> Yes the flag is no-op now, but let me untouched the flag in this patch.

Yeah, I do not want you to be touching that in this change.  It is
an independent/orthogonal clean-up.

Thanks.

  reply	other threads:[~2018-03-09 18:00 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-08 12:06 [PATCH] fetch-pack.c: use oidset to check existence of loose object Takuto Ikuta
2018-03-08 17:19 ` René Scharfe
2018-03-09 13:42   ` Takuto Ikuta
2018-03-08 18:42 ` Junio C Hamano
2018-03-09 13:11   ` [PATCH v2 0/1] " Takuto Ikuta
2018-03-09 13:11     ` [PATCH v2 1/1] " Takuto Ikuta
2018-03-09 13:26       ` [PATCH v3] " Takuto Ikuta
2018-03-09 19:54         ` Junio C Hamano
2018-03-10 13:19           ` Takuto Ikuta
2018-03-13 17:53             ` Junio C Hamano
2018-03-14  6:26               ` Takuto Ikuta
2018-03-10 12:34         ` [PATCH v4] " Takuto Ikuta
2018-03-10 12:46           ` [PATCH v5] " Takuto Ikuta
2018-03-13 19:04             ` Junio C Hamano
2018-03-14  6:05           ` [PATCH v6] " Takuto Ikuta
2018-03-14  6:32             ` [PATCH v7] " Takuto Ikuta
2018-03-09 14:12   ` [PATCH] " Takuto Ikuta
2018-03-09 18:00     ` Junio C Hamano [this message]
2018-03-09 19:41       ` Junio C Hamano
2018-03-13 15:30   ` [PATCH] sha1_file: restore OBJECT_INFO_QUICK functionality Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqd10d6ser.fsf@gitster-ct.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=tikuta@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).