git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Elijah Newren <newren@gmail.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Jonathan Tan <jonathantanmy@google.com>,
	Derrick Stolee <dstolee@gmail.com>, Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH v3 0/5] Optimization batch 13: partial clone optimizations for merge-ort
Date: Tue, 22 Jun 2021 22:14:03 -0400	[thread overview]
Message-ID: <8f1edc60-f754-541b-1d66-7f5ec49eff55@gmail.com> (raw)
In-Reply-To: <CABPp-BH9vy3otHvAxR2T6JmVKtH2+EKj-A7NxGsuoqnZA_Bykg@mail.gmail.com>

On 6/22/2021 2:45 PM, Elijah Newren wrote:
> On Tue, Jun 22, 2021 at 9:10 AM Derrick Stolee <stolee@gmail.com> wrote:

I want to focus on this item:

>> 2. I watched for the partial clone logic to kick in and download blobs.
>>    Some of these were inevitable: we need the blobs to resolve edit/edit
>>    conflicts. Most cases none were downloaded at all, so this series is
>>    working as advertised. There _was_ a case where the inexact rename
>>    detection requested a large list of files (~2900 in three batches) but
>>    _then_ said "inexact rename detection was skipped due to too many
>>    files". This is a case that would be nice to resolve in this series. I
>>    will try to find exactly where in the code this is being triggered and
>>    report back.
> 
> This suggests perhaps that EITHER there was a real modify/delete
> conflict (because you have to do full rename detection to rule out
> that the modify/delete was part of some rename), OR that there was a
> renamed file modified on both sides that did not keep its original
> basename (because that combination is needed to bypass the various
> optimizations and make it fall back to full inexact rename detection).
> Further, in either case, there were enough adds/deletes that full
> inexact detection is still a bit expensive.  It'd be interesting to
> know which case it was.  What happens if you set merge.renameLimit to
> something higher (the default is surprisingly small)?

The behavior I'd like to see is that the partial clone logic is not
run if we are going to download more than merge.renameLimit files.
Whatever is getting these missing blobs is earlier than the limit
check, but it should be after instead.

It's particularly problematic that Git does all the work to get the
blobs, but then gives up and doesn't even use them for rename
detection.

I'm happy that we download necessary blobs when there are a few
dozen files that need inexact renames. When it gets into the
thousands, then we jump into a different category of user experience.

Having a stop-gap of rename detection limits is an important way to
avoid huge amounts of file downloads in these huge repo cases. Users
can always opt into a larger limit if they really do want that rename
detection to work at such a large scale, but we still need protections
for the vast majority of cases where a user isn't willing to pay the
cost of downloading these blobs.

Thanks,
-Stolee

  reply	other threads:[~2021-06-23  2:14 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-05  1:27 [PATCH 0/5] Optimization batch 13: partial clone optimizations for merge-ort Elijah Newren via GitGitGadget
2021-06-05  1:28 ` [PATCH 1/5] promisor-remote: output trace2 statistics for number of objects fetched Elijah Newren via GitGitGadget
2021-06-09 21:12   ` Derrick Stolee
2021-06-05  1:28 ` [PATCH 2/5] t6421: add tests checking for excessive object downloads during merge Elijah Newren via GitGitGadget
2021-06-09 21:16   ` Derrick Stolee
2021-06-05  1:28 ` [PATCH 3/5] diffcore-rename: allow different missing_object_cb functions Elijah Newren via GitGitGadget
2021-06-05  1:28 ` [PATCH 4/5] diffcore-rename: use a different prefetch for basename comparisons Elijah Newren via GitGitGadget
2021-06-05  1:28 ` [PATCH 5/5] merge-ort: add prefetching for content merges Elijah Newren via GitGitGadget
2021-06-15 22:41 ` [PATCH v2 0/5] Optimization batch 13: partial clone optimizations for merge-ort Elijah Newren via GitGitGadget
2021-06-15 22:41   ` [PATCH v2 1/5] promisor-remote: output trace2 statistics for number of objects fetched Elijah Newren via GitGitGadget
2021-06-15 22:41   ` [PATCH v2 2/5] t6421: add tests checking for excessive object downloads during merge Elijah Newren via GitGitGadget
2021-06-17  4:49     ` Junio C Hamano
2021-06-15 22:41   ` [PATCH v2 3/5] diffcore-rename: allow different missing_object_cb functions Elijah Newren via GitGitGadget
2021-06-15 22:41   ` [PATCH v2 4/5] diffcore-rename: use a different prefetch for basename comparisons Elijah Newren via GitGitGadget
2021-06-15 22:41   ` [PATCH v2 5/5] merge-ort: add prefetching for content merges Elijah Newren via GitGitGadget
2021-06-17  5:04     ` Junio C Hamano
2021-06-22  8:02       ` Elijah Newren
2021-06-16 17:54   ` [PATCH v2 0/5] Optimization batch 13: partial clone optimizations for merge-ort Derrick Stolee
2021-06-17  5:05   ` Junio C Hamano
2021-06-22  8:04   ` [PATCH v3 " Elijah Newren via GitGitGadget
2021-06-22  8:04     ` [PATCH v3 1/5] promisor-remote: output trace2 statistics for number of objects fetched Elijah Newren via GitGitGadget
2021-06-22  8:04     ` [PATCH v3 2/5] t6421: add tests checking for excessive object downloads during merge Elijah Newren via GitGitGadget
2021-06-22  8:04     ` [PATCH v3 3/5] diffcore-rename: allow different missing_object_cb functions Elijah Newren via GitGitGadget
2021-06-22  8:04     ` [PATCH v3 4/5] diffcore-rename: use a different prefetch for basename comparisons Elijah Newren via GitGitGadget
2021-06-22  8:04     ` [PATCH v3 5/5] merge-ort: add prefetching for content merges Elijah Newren via GitGitGadget
2021-06-22 16:10     ` [PATCH v3 0/5] Optimization batch 13: partial clone optimizations for merge-ort Derrick Stolee
2021-06-22 18:45       ` Elijah Newren
2021-06-23  2:14         ` Derrick Stolee [this message]
2021-06-23  8:11           ` Elijah Newren
2021-06-23 17:31             ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f1edc60-f754-541b-1d66-7f5ec49eff55@gmail.com \
    --to=stolee@gmail.com \
    --cc=dstolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).