git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, christian.couder@gmail.com,
	marc.strapetz@syntevo.com, me@ttaylorr.com
Subject: Re: Questions about partial clone with '--filter=tree:0'
Date: Mon, 26 Oct 2020 21:08:54 +0100	[thread overview]
Message-ID: <6a09c0cd-8e88-1f53-72ca-bc6f9182b517@syntevo.com> (raw)
In-Reply-To: <20201026194635.2119420-1-jonathantanmy@google.com>

On 26.10.2020 20:46, Jonathan Tan wrote:
 > No - I did talk about prefetching earlier, but here I mean having
 > Git on the server perform the "blame" computation itself.

Oh! That's an interesting twist. Unfortunately for us, we are
implementing our own Blame logic. Thinking of which, I'm now becoming
more convinced that graph walking could be the best solution for us,
because it allows any logic, including custom file rename detection.

 > For example, let's say I want to run "blame" on foo.txt at HEAD. HEAD
 > and HEAD^ are commits that only the local client has, whereas HEAD^^ was
 > fetched from the remote. By comparing HEAD, HEAD^, and HEAD^^, Git knows
 > which lines come from HEAD and HEAD^. For the rest, Git would make a
 > request to the server, passing the commit ID and the path, and would get
 > back a list of line numbers and commits.

Sounds quite involved indeed! It's curious how git kind of shifts
towards classic server-side VCS such as SVN. When partial clones are
involved, that is.

 > Yes, prefetching will require graph walking with large OID requests but
 > will not require protocol changes, as you say. I'm not too worried about
 > the large numbers of OIDs - Git servers already have to support
 > relatively large numbers of OIDs to support the bulk prefetch we do
 > during things like checkout and diff.

Hmm, let's talk about Linux repository for the sake of the numbers.
The number of commits is ~1M. For a typical Blame (without rename
detection), every request will traverse the trees one level deeper, and
for just one file blamed, that would mean 1 or 0 trees per commit 
(depending on whether the tree was modified by the commit). The first
request to discover root trees is going to be the largest, and will
request (1*numCommits) OIDs. That makes 1M OIDs in worst case, with
subsequent requests probably at ~0.1M, and there will be 1 request per
every path component in blamed path.

So the question is, will git server (or git hosting) become upset
about requests for 1M OIDs? Never really tried what is the cost of such
request, what do you think?

      reply	other threads:[~2020-10-26 20:09 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-20 17:09 Questions about partial clone with '--filter=tree:0' Alexandr Miloslavskiy
2020-10-20 22:29 ` Taylor Blau
2020-10-21 17:10   ` Alexandr Miloslavskiy
2020-10-21 17:31     ` Taylor Blau
2020-10-21 17:46       ` Alexandr Miloslavskiy
2020-10-26 18:24 ` Jonathan Tan
2020-10-26 18:44   ` Alexandr Miloslavskiy
2020-10-26 19:46     ` Jonathan Tan
2020-10-26 20:08       ` Alexandr Miloslavskiy [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6a09c0cd-8e88-1f53-72ca-bc6f9182b517@syntevo.com \
    --to=alexandr.miloslavskiy@syntevo.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=marc.strapetz@syntevo.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).