git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Kousik Sanagavarapu <five231003@gmail.com>
To: gitster@pobox.com
Cc: git@vger.kernel.org, jonathantanmy@google.com
Subject: Re: [PATCH v2] index-pack: remove fetch_if_missing=0
Date: Sun, 12 Mar 2023 22:46:13 +0530	[thread overview]
Message-ID: <20230312171613.6968-1-five231003@gmail.com> (raw)
In-Reply-To: <xmqqttys4746.fsf@gitster.g>

On Sat, 11 Mar 2023 at 03:11, Junio C Hamano <gitster@pobox.com> wrote:
>
> I admit I haven't thought about it any longer than anybody who
> touched this topic, but should "fetch_if_missing=0" really be
> treated as "it was a dirty hack in the past, now we do not need it,
> as all callers into the object layer avoids lazy fetching when they
> do not have to, so let's remove it"?  It looks to me more and more
> that the old world-view to disable lazy fetching by default and have
> individual calls to the object layer opt into fetching as needed may
> give us a better resulting code, or is it just me?

I think having a single function to check for object existence, which is
compatible with partial clones is better, because the end goal is to
completely integrate the concept of partial clones with git's codebase
and not have code that worries "Oh, there maybe bugs here because
what if the user has a partial clone", everytime the code does an object
existence check (that is, a call to has_object_file() or any of its
related functions) and just have fetch_if_missing set to 1 or 0, according
to the particular command.

It is also true that has_object() is not that "single function", because
in cases where we are missing an object in partial clone and want it,
has_object() has no way of fetching it. With or without flags (it only
supports one flag which, when set, rechecks packed storage), it does not
lazy-fetch in a partial clone. But there are cases where we need such
objects, such as the commands that come into "family (b)" [1].

So, why not use oid_object_info_extended() directly, instead of wrapping
it with some other function, whenever we are checking for an object's
existence. We can skip lazy-fetches whenever we want with
OBJECT_INFO_SKIP_FETCH_OBJECT and can also prefetch with
OBJECT_INFO_FOR_PREFETCH [2].

[1] https://lore.kernel.org/git/20230311025906.4170554-1-jonathantanmy@google.com/

[2] pack-index itself is one example of where this is done.

    When we don't have REF_DELTA bases, we bulk prefetch them

	    if (has_promisor_remote()) {
                     /*
                      * Prefetch the delta bases.
                      */
                     struct oid_array to_fetch = OID_ARRAY_INIT;
		     for (i = 0; i < nr_ref_deltas; i++) {
                             struct ref_delta_entry *d = sorted_by_pos[i];
                             if (!oid_object_info_extended(the_repository, &d->oid,
                                                           NULL,
                                                           OBJECT_INFO_FOR_PREFETCH))
                                     continue;
                             oid_array_append(&to_fetch, &d->oid);
                     }
                     promisor_remote_get_direct(the_repository,
                                                to_fetch.oid, to_fetch.nr);
                     oid_array_clear(&to_fetch);
             }

    Instead of going object-by-object, which is basically like an
    infinite loop in large repos and partial clones are widely used
    in large repos.

  parent reply	other threads:[~2023-03-12 17:16 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-25  5:24 [PATCH] index-pack: remove fetch_if_missing=0 Kousik Sanagavarapu
2023-02-27 16:56 ` Kousik Sanagavarapu
2023-02-27 22:14 ` Jonathan Tan
2023-02-28  3:54   ` Kousik Sanagavarapu
2023-03-10 18:30 ` [PATCH v2] " Kousik Sanagavarapu
2023-03-10 20:30   ` Junio C Hamano
2023-03-10 21:13     ` Jonathan Tan
2023-03-10 21:41       ` Junio C Hamano
2023-03-11  2:59         ` Jonathan Tan
2023-03-12 17:16         ` Kousik Sanagavarapu [this message]
2023-03-11  6:22       ` [PATCH] " Kousik Sanagavarapu
2023-03-11  6:00     ` Kousik Sanagavarapu
2023-03-13 18:15   ` [PATCH v3] " Kousik Sanagavarapu
2023-03-13 19:17     ` Junio C Hamano
2023-03-13 19:18     ` Junio C Hamano
2023-03-17 17:56     ` [PATCH v4] " Kousik Sanagavarapu
2023-03-17 22:58       ` Junio C Hamano
2023-03-19  6:17         ` Kousik Sanagavarapu
2023-03-11 20:01 ` [PATCH] " Sean Allred
2023-03-11 20:37   ` Junio C Hamano
     [not found] <--in-reply-to=20230225052439.27096-1-five231003@gmail.com>
2023-03-10 18:20 ` [PATCH v2] " Kousik Sanagavarapu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230312171613.6968-1-five231003@gmail.com \
    --to=five231003@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).