git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: matvore@comcast.net
Cc: jonathantanmy@google.com, matvore@google.com,
	git@vger.kernel.org, jrn@google.com
Subject: Re: Proposal: object negotiation for partial clones
Date: Thu,  9 May 2019 11:00:22 -0700	[thread overview]
Message-ID: <20190509180022.91700-1-jonathantanmy@google.com> (raw)
In-Reply-To: <A0ADEE11-E3E1-4DE0-81BA-40771C783E4E@comcast.net>

> > On 2019/05/07, at 11:34, Jonathan Tan <jonathantanmy@google.com> wrote:
> >
> > To get an enumeration of available objects, don't you need to use only
> > "blob:none"? Combining filters (once that's implemented) will get all
> > objects only up to a certain depth.
> >
> > Combining "tree:<n>" and "blob:none" would allow us to reduce the number
> > of trees transmitted, but I would imagine that the savings would be
> > significant only for very large repositories. Do you have a specific use
> > case in mind that isn't solved by "blob:none"?
> 
> I am interested in supporting large repositories. The savings seem to be larger than one may expect. I tried the following command on two huge repos to find out how much it costs to fetch “blob:none” for a single commit:
> 
> $ git rev-list --objects --filter=blob:none HEAD: | xargs -n 2 bash -c 'git cat-file -s $1' | awk '{ total += $1; print total }'
> 
> Note the “:” after HEAD - this limits it to the current commit.
> 
> And the results were:
>  - Linux: 2 684 054 bytes
>  - Chromium: > 16 139 570 bytes (then I got tired of waiting for it to finish)

Thanks for the numbers. Let me think about it some more, but I'm still
reluctant to introduce multiple filter support in the protocol and the
implementation for the following reasons:

- For large projects like Linux and Chromium, it may be reasonable to
  expect that an infrequent checkout would result in a few-megabyte
  download.
- (After some in-office discussion) It may be possible to mitigate much
  of that by sending root trees that we have as "have" (e.g. by
  consulting the reflog), and that wouldn't need any protocol change.
- Supporting any combination of filter means that we have more to
  implement and test, especially if we want to support more filters in
  the future. In particular, the different filters (e.g. blob, tree)
  have different code paths now in Git. One way to solve it would be to
  combine everything into one monolith, but I would like to avoid it if
  possible (after having to deal with revision walking a few times...)

  reply	other threads:[~2019-05-09 18:00 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-28 15:55 Proposal: object negotiation for partial clones Matthew DeVore
2019-05-06 18:25 ` Jonathan Nieder
2019-05-06 19:28 ` Jonathan Tan
2019-05-06 19:46   ` Jonathan Nieder
2019-05-06 23:20     ` Matthew DeVore
2019-05-07  0:02       ` Jonathan Nieder
2019-05-06 22:47   ` Matthew DeVore
2019-05-07 18:34     ` Jonathan Tan
2019-05-07 21:57       ` Matthew DeVore
2019-05-09 18:00         ` Jonathan Tan [this message]
2019-05-14  0:09           ` Matthew DeVore
2019-05-14  0:16             ` Jonathan Nieder
2019-05-16 18:56               ` [RFC PATCH 0/3] implement composite filters Matthew DeVore
2019-05-16 18:56                 ` [RFC PATCH 1/3] list-objects-filter: refactor into a context struct Matthew DeVore
2019-05-16 18:56                 ` [RFC PATCH 2/3] list-objects-filter-options: error is localizeable Matthew DeVore
2019-05-16 18:56                 ` [RFC PATCH 3/3] list-objects-filter: implement composite filters Matthew DeVore
2019-05-17  3:25                   ` Junio C Hamano
2019-05-17 13:17                     ` Matthew DeVore
2019-05-19  1:12                       ` Junio C Hamano
2019-05-20 18:24                       ` Matthew DeVore
2019-05-20 18:28                       ` Matthew DeVore
2019-05-16 22:41                 ` [RFC PATCH 0/3] " Jonathan Tan
2019-05-17  0:01                   ` Matthew DeVore

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190509180022.91700-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=jrn@google.com \
    --cc=matvore@comcast.net \
    --cc=matvore@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).