git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: Kousik Sanagavarapu <five231003@gmail.com>, git@vger.kernel.org
Subject: Re: [PATCH v2] index-pack: remove fetch_if_missing=0
Date: Fri, 10 Mar 2023 13:41:13 -0800	[thread overview]
Message-ID: <xmqqttys4746.fsf@gitster.g> (raw)
In-Reply-To: <20230310211321.4135748-1-jonathantanmy@google.com> (Jonathan Tan's message of "Fri, 10 Mar 2023 13:13:21 -0800")

Jonathan Tan <jonathantanmy@google.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>> > Hence, use has_object() to check for the existence of an object, which
>> > has the default behavior of not lazy-fetching in a partial clone. It is
>> > worth mentioning that this is the only place where there is potential for
>> > lazy-fetching and all other cases are properly handled, making it safe to
>> > remove this global here.
>> 
>> This paragraph is very well explained.
>
> It might be good if the "all other cases" were enumerated here in the
> commit message (since the consequence of missing a case might be an
> infinite loop of fetching).
>
>> OK.  The comment describes the design choice we made to flip the
>> fetch_if_missing flag off.  The old world-view was that we would
>> notice a breakage by non-functioning index-pack when a lazy clone is
>> missing objects that we need by disabling auto-fetching, and we
>> instead explicitly handle any missing and necessary objects by lazy
>> fetching (like "when we lack REF_DELTA bases").  It does sound like
>> a conservative thing to do, compared to the opposite approach we are
>> taking with this patch, i.e. we would not fail if we tried to access
>> objects we do not need to, because we have lazy fetching enabled,
>> and we just ended up with bloated object store nobody may notice.
>> 
>> To protect us from future breakage that can come from the new
>> approach, it is a very good thing that you added new tests to ensure
>> no unnecessary lazy fetching is done (I am not offhand sure if that
>> test is sufficient, though).
>
> I don't think the test is sufficient - I'll explain that below.

I admit I haven't thought about it any longer than anybody who
touched this topic, but should "fetch_if_missing=0" really be
treated as "it was a dirty hack in the past, now we do not need it,
as all callers into the object layer avoids lazy fetching when they
do not have to, so let's remove it"?  It looks to me more and more
that the old world-view to disable lazy fetching by default and have
individual calls to the object layer opt into fetching as needed may
give us a better resulting code, or is it just me?  The possible
error modes in new code that fails to follow the world-view with and
without this change are:

 * If the lazy fetching is disabled by default (i.e. without this
   patch), a new code can by mistake call has_object(), which does
   not lazy fetch, when it does need to have the object and should
   be using something like has_object_file_with_flags(), and dies
   loudly.

 * If the lazy fetching is enabled by default, on the other hand, a
   new code can by mistake call has_object_file_with_flags(), which
   does lazy fetch, when it does not need to have the object.  It
   does not die, it just lazily fetches objects it does not need.
   The (performance) "bug" will stay hidden until somebody complains.

In short, the world-view of the current code seems to give us
tighter control over what gets lazy fetched, simply because we do
not allow lazy fetching without thinking.

Do we have other uses of fetch_if_missing (i.e. disable lazy
fetching)?

    $ git grep -l fetch_if_missing
    Documentation/technical/partial-clone.txt
    builtin/fetch-pack.c
    builtin/fsck.c
    builtin/pack-objects.c
    builtin/prune.c
    builtin/rev-list.c
    cache.h
    midx.c
    object-file.c
    revision.c

As the default is 1, all these hits (outside the header, doc, and
object-file.c) are to disable lazy fetching.  Judging from the list
of "family" that want tighter control over what gets fetched, I have
a feeling that pack-index may want to stay to be in the family.

Or am I missing some big picture goal to eventually getting rid of
this mechanism and always allowing lazy fetching?

Thanks.


  reply	other threads:[~2023-03-10 21:41 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-25  5:24 [PATCH] index-pack: remove fetch_if_missing=0 Kousik Sanagavarapu
2023-02-27 16:56 ` Kousik Sanagavarapu
2023-02-27 22:14 ` Jonathan Tan
2023-02-28  3:54   ` Kousik Sanagavarapu
2023-03-10 18:30 ` [PATCH v2] " Kousik Sanagavarapu
2023-03-10 20:30   ` Junio C Hamano
2023-03-10 21:13     ` Jonathan Tan
2023-03-10 21:41       ` Junio C Hamano [this message]
2023-03-11  2:59         ` Jonathan Tan
2023-03-12 17:16         ` Kousik Sanagavarapu
2023-03-11  6:22       ` [PATCH] " Kousik Sanagavarapu
2023-03-11  6:00     ` Kousik Sanagavarapu
2023-03-13 18:15   ` [PATCH v3] " Kousik Sanagavarapu
2023-03-13 19:17     ` Junio C Hamano
2023-03-13 19:18     ` Junio C Hamano
2023-03-17 17:56     ` [PATCH v4] " Kousik Sanagavarapu
2023-03-17 22:58       ` Junio C Hamano
2023-03-19  6:17         ` Kousik Sanagavarapu
2023-03-11 20:01 ` [PATCH] " Sean Allred
2023-03-11 20:37   ` Junio C Hamano
     [not found] <--in-reply-to=20230225052439.27096-1-five231003@gmail.com>
2023-03-10 18:20 ` [PATCH v2] " Kousik Sanagavarapu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqttys4746.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=five231003@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).