git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Bruno Albuquerque <bga@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] object-info: support for retrieving object info
Date: Thu, 15 Apr 2021 16:06:40 -0700	[thread overview]
Message-ID: <CAPeR6H5wx6F9rAYoRC-1GwHDujBFyjMTUPdHkQAuZh-eYAoRsg@mail.gmail.com> (raw)
In-Reply-To: <xmqq4kg7p63j.fsf@gitster.g>

On Thu, Apr 15, 2021 at 2:53 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Is the assumption that such an implementation of VFS would fetch
> individual tree object with the existing "fetch this single object
> by the object name" interface?

That is the general idea, yes.

> What I am wondering is, as an ingredient for implementing VFS layer,
> if this is a bit too low level.  To respond to "ls -l" when you only
> have a tree object name, you'd need two roundtrips, one to retrieve
> the tree, and then after parsing the tree to find out what objects
> the tree refers to with what pathname component, you'd issue the
> object-info for all of them in a single request.

Yes, as it is designed you would need to do that. There are a couple
few reasons for this implementation to do this the way it does:

1 - Although it is currently only returning object sizes, it was
designed in a way that it can be extended if need to return other
object metadata.
2 - Doing it like this is a fully backwards-compatible change and
older clients would still work without changes (just not make use of
this).
3- In a real filesystem, it is common to have multiple directories
under a tree so in practice you can optimize requests (if needed) by
retrieving several tree objects and doing a single object-info request
for all objects in those trees.

Note I am not saying it could not be done in a different way, but it
does look to me this change strikes a good balance between what it
provides and its cost,

> If a request takes a single (or multiple) tree object name, and lets
> you retrieve _both_ the tree object itself _and_ object-info for the
> objects the tree refers to, you can build "ls -l" with a single
> roundtrip instead.

That is true. But it seems to me that there is not a good place to fit
size  (and maybe other metadata eventually) information currently with
the existing protocol. But it is not impossible that I might simply be
missing something.

> I do not know how much the latency matters (or more importantly, how
> much a naïve coutner-proposal like the above would help), but it is
> what immediately came to my mind.

I do not expect the latency of the object-info request to be an issue
especially because fetching the information can be done in batches and
also prefetched in some cases (as we do not need to download the
objects, we do not need to worry about downloading possibly gigabytes
of data while just iterating through directories). But, of course,
assuming there is a clean way to make things even better, I am all for
it.

> Assuming that we are good with an interface that needs two requests
> to obtain "object contents" and "object info" separately, I find
> what in this patch quite reasonable, though (admittedly, I've
> already read this patch during internal review number of times).


FWIIW, I gave this a lot of thought and for the purposes of doing a
remote file system (and one can even optimize git ls-tree to be a lot
faster for partial clones using this), I feel confident about the
change (module missing something, of course).

Thanks for your comments.

  reply	other threads:[~2021-04-15 23:06 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-15 21:20 [PATCH] object-info: support for retrieving object info Bruno Albuquerque
2021-04-15 21:53 ` Junio C Hamano
2021-04-15 23:06   ` Bruno Albuquerque [this message]
2021-04-15 22:15 ` Junio C Hamano
2021-04-20 23:43   ` Bruno Albuquerque
2021-04-16 22:01 ` brian m. carlson
2021-04-19 21:18   ` Bruno Albuquerque

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPeR6H5wx6F9rAYoRC-1GwHDujBFyjMTUPdHkQAuZh-eYAoRsg@mail.gmail.com \
    --to=bga@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).