git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: matvore@google.com
Cc: git@vger.kernel.org, jrn@google.com, jonathantanmy@google.com
Subject: Re: Proposal: object negotiation for partial clones
Date: Mon,  6 May 2019 12:28:00 -0700	[thread overview]
Message-ID: <20190506192800.213716-1-jonathantanmy@google.com> (raw)
In-Reply-To: <CAMfpvhKYRVwTVNLfRJYcjhHtg=FNLNPbnw8xtY93nJu228v6=g@mail.gmail.com>

> I'm considering implementing a feature in the Git protocol which would
> enable efficient and accurate object negotiation when the client is a
> partial clone. I'd like to refine and get some validation of my
> approach before I start to write any code, so I've written a proposal
> for anyone interested to review. Your comments would be appreciated.

Thanks. Let me try to summarize: The issue is that, during a fetch,
normally the client can say "have" to inform the server that it has a
commit and all its referenced objects (barring shallow lines), but we
can't do the same if the client is a partial clone (because having a
commit doesn't necessarily mean that we have all referenced objects).
And not doing this means that the server sends a lot of unnecessary
objects in the sent packfile. The solution is to do the fetch in 2
parts: one to get the list of objects that would be sent, and after the
client filters that, one to get the objects themselves.

It was unclear to me whether this is meant for (1) fetches directly
initiated by the user that fetch commits (e.g. "git fetch origin",
reusing the configured "core.partialclonefilter") and/or for (2) lazy
fetching of missing objects. My assumption is that this is only for (2).

My main question is: we can get the same list of objects (in the form of
tree objects) if we fetch with "blob:none" filter. Admittedly, we will
get extra data (file names, etc.) - if the extra bandwidth saving is
necessary, this should be called out. (And some of the savings will be
offset by the fact that we will actually need some of those tree
objects.)

Assuming that we do need that bandwidth saving, here's my review of that
document.

The document describes the 1st request exactly as I envision - a
specific parameter sent by the client, and the server responds with a
list of object names.

For the 2nd request, the document describes it as repeating the original
query of the 1st request while also giving the full list of objects
wanted as "choose-refs". I'm still not convinced that repeating the
original query is necessary - I would just give the list of objects as
wants. The rationale given for repeating the original query is:

> The original query is helpful because it means the server only needs
> to do a single reachability check, rather than many separate ones.

But this omits the fact that, if doing it the document's way, the server
needs to perform an object walk in addition to the "single reachability
check", and it is not true that if doing it my way, "many separate ones"
need to be done because the server can check reachability of all objects
at once.

Also, my way means that supporting the 2nd request does not require any
code or protocol change - it already works today. Assuming we follow my
approach, the discussion thus lies in supporting the 1st request.

Some more thoughts:

- Changes in server and client scalability: Currently, the server checks
  reachability of all wants, then enumerates, then sends all objects.
  With this change, the server checks reachability of all wants, then
  enumerates, then sends an object list, then checks reachability of all
  objects in the filtered list, then sends some objects. There is
  additional overhead in the extra reachability check and lists of
  objects being sent twice (once by server and once by client), but
  sending fewer objects means that I/O (server, network, client) and
  disk space usage (client) is reduced.

- Usefulness outside partial clone: If the user ever wants a list of
  objects referenced by an object but without their file names, the user
  could use this, but I can't think of such a scenario.

  parent reply	other threads:[~2019-05-06 19:28 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-28 15:55 Proposal: object negotiation for partial clones Matthew DeVore
2019-05-06 18:25 ` Jonathan Nieder
2019-05-06 19:28 ` Jonathan Tan [this message]
2019-05-06 19:46   ` Jonathan Nieder
2019-05-06 23:20     ` Matthew DeVore
2019-05-07  0:02       ` Jonathan Nieder
2019-05-06 22:47   ` Matthew DeVore
2019-05-07 18:34     ` Jonathan Tan
2019-05-07 21:57       ` Matthew DeVore
2019-05-09 18:00         ` Jonathan Tan
2019-05-14  0:09           ` Matthew DeVore
2019-05-14  0:16             ` Jonathan Nieder
2019-05-16 18:56               ` [RFC PATCH 0/3] implement composite filters Matthew DeVore
2019-05-16 18:56                 ` [RFC PATCH 1/3] list-objects-filter: refactor into a context struct Matthew DeVore
2019-05-16 18:56                 ` [RFC PATCH 2/3] list-objects-filter-options: error is localizeable Matthew DeVore
2019-05-16 18:56                 ` [RFC PATCH 3/3] list-objects-filter: implement composite filters Matthew DeVore
2019-05-17  3:25                   ` Junio C Hamano
2019-05-17 13:17                     ` Matthew DeVore
2019-05-19  1:12                       ` Junio C Hamano
2019-05-20 18:24                       ` Matthew DeVore
2019-05-20 18:28                       ` Matthew DeVore
2019-05-16 22:41                 ` [RFC PATCH 0/3] " Jonathan Tan
2019-05-17  0:01                   ` Matthew DeVore

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190506192800.213716-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=jrn@google.com \
    --cc=matvore@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).