git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Partial Clone, and a strange slow rev-list call on fetch
@ 2021-06-02  4:56 Tao Klerks
  2021-06-02 11:18 ` Derrick Stolee
  0 siblings, 1 reply; 5+ messages in thread
From: Tao Klerks @ 2021-06-02  4:56 UTC (permalink / raw)
  To: git

Hi folks,

I'm learning to use Partial Clone, and finding a behavior that I don't
know how to interpret or investigate:

Under some circumstances, doing a plain "git fetch <remote>" on a
filtered repo results in a very long (6-30 min?) wait, during which I
can see the following command being executed in the background:

/usr/libexec/git-core/git rev-list --objects --stdin
--exclude-promisor-objects --not --all --quiet --alternate-refs

So far, I have noted this happening under two distinct circumstances:
* Anytime I try to fetch on a filtered repo with a git 2.23 client -
shorter pause
* When I try to fetch with a recent (2.31) client in a repo where one
large packfile has no *.promisor file (but the others do, and the
remote I am fetching from has promisor=true) - looong pause

Can anyone explain what this rev-list call intends, and/or any hints
as to how I could see what the stdin content being fed to it from the
parent process actually is?

For background, I ended up in the "missing promisor file" situation by
trying to be (too?) clever about the blobs present in my clone: I
cloned unfiltered shallow to a certain depth with certain refspecs,
then added the promisor and filter config, and finally fetched with
"--unshallow". This produced exactly the blob-population state I
intended, but meant the original first packfile had no ".promisor"
file.

Creating an empty promisor file for that packfile *appears* to fix the
issue, and hasn't produced any weird side-effects that I've noted, and
from the "removing partial clone filtering" description from gitlab at
https://docs.gitlab.com/ee/topics/git/partial_clone.html#remove-partial-clone-filtering,
appears to be a reasonable thing to do (the implication there is that
a promisor packfile with no missing objects hs exactly the same
structure as a non-promisor packfile), but of course I would welcome
any validation or correction to that assumption.

Thanks for any info,
Tao Klerks

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Partial Clone, and a strange slow rev-list call on fetch
  2021-06-02  4:56 Partial Clone, and a strange slow rev-list call on fetch Tao Klerks
@ 2021-06-02 11:18 ` Derrick Stolee
  2021-06-03 21:10   ` Tao Klerks
  0 siblings, 1 reply; 5+ messages in thread
From: Derrick Stolee @ 2021-06-02 11:18 UTC (permalink / raw)
  To: Tao Klerks, git

On 6/2/21 12:56 AM, Tao Klerks wrote:
> Hi folks,
> 
> I'm learning to use Partial Clone, and finding a behavior that I don't
> know how to interpret or investigate:
> 
> Under some circumstances, doing a plain "git fetch <remote>" on a
> filtered repo results in a very long (6-30 min?) wait, during which I
> can see the following command being executed in the background:
> 
> /usr/libexec/git-core/git rev-list --objects --stdin
> --exclude-promisor-objects --not --all --quiet --alternate-refs
> 
> So far, I have noted this happening under two distinct circumstances:
> * Anytime I try to fetch on a filtered repo with a git 2.23 client -
> shorter pause
> * When I try to fetch with a recent (2.31) client in a repo where one
> large packfile has no *.promisor file (but the others do, and the
> remote I am fetching from has promisor=true) - looong pause

This makes me think that there was a bug fix for this situation
but the fix requires doing extra work. To help track this down,
could you re-run the scenario with GIT_TRACE2_PERF=1 which will
give the full Git process stack as we reach that rev-list call.

> Can anyone explain what this rev-list call intends, and/or any hints
> as to how I could see what the stdin content being fed to it from the
> parent process actually is?
> 
> For background, I ended up in the "missing promisor file" situation by
> trying to be (too?) clever about the blobs present in my clone: I
> cloned unfiltered shallow to a certain depth with certain refspecs,
> then added the promisor and filter config, and finally fetched with
> "--unshallow". This produced exactly the blob-population state I
> intended, but meant the original first packfile had no ".promisor"
> file.

This is the critical point: you first cloned without a filter,
and then converted the remote to a promisor remote without
marking the pack-files you received from that remote as promisor
pack-files. That means that Git needs to do some work to discover
which objects are reachable from promisor packs or not, and that
extra work is slowing you down.

Partial clone is designed to work where every remote is a
promisor remote, and always has been so. Any deviation from that
norm is venturing into uncharted territory and will have friction
like this. Another similar issue comes when you have multiple
remotes and one of them is a promisor remote and another is not.

The general advice right now is to use partial clone only if you
will use it for all remotes across the entire existence of the
repo.

Part of the difficulty here is that once you download that first
pack-file from the remote, Git has no way of knowing that the
pack came from that source or was created in another way. We
have no way to be sure that we can "upgrade" the remote in an
automated process.

This does make me wonder what happens when Git repacks objects
created locally and then starts fetching from a promisor remote.

There are some challenges here, for sure. Most likely also some
potential gains, but it is unlikely to create a seamless
experience for what you are trying to do.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Partial Clone, and a strange slow rev-list call on fetch
  2021-06-02 11:18 ` Derrick Stolee
@ 2021-06-03 21:10   ` Tao Klerks
  2021-06-04 13:21     ` Derrick Stolee
  0 siblings, 1 reply; 5+ messages in thread
From: Tao Klerks @ 2021-06-03 21:10 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 5095 bytes --]

On Wed, Jun 2, 2021 at 1:18 PM Derrick Stolee <stolee@gmail.com> wrote:

> could you re-run the scenario with GIT_TRACE2_PERF=1 which will
> give the full Git process stack as we reach that rev-list call.

Sorry about the delay, I've been trying to reproduce... reproduceably :)

I now have a whole file of examples and observations, attached (I
assume text attachments are allowed on this mailing list?), which
should be reproducible for anyone as I was able to use the linux
kernel repo to illustrate all cases.

My observations probably use incorrect terminology and/or illustrate a
lack of understanding of the underlying intended behaviors, but here
are the most surprising (not-intentional-seeming) ones:
 * the "rev-list" calls are normal behavior for non-promisor remotes;
they normally complete very fast (with perhaps very little stdin
input??)
 * they happen in "fetch.c", right after the "remote_refs" labelled
code and within the "consume_refs" labelled code
 * for promisor remotes, these rev-list calls are normally
(intentionally?) skipped. They are run only is there is a non-promisor
packfile which contains a commit that is at the "tip" of one of the
(promisor) remote's refs
 * when these rev-list calls happen, they are incredibly, strangely
expensive (a full "git rev-list --all --objects" completes *much*
faster)
 * even weirder, the cost of these calls appears proportional to the
*promisor packfile sizes* - if you specify a more permissive filter at
clone-time, and therefore have more objects actually there (not
missing/skipped) in your promisor packfiles, then it takes up taking
even (much) longer
 * maybe weirdest - all of this goes away when the non-promisor
packfile gets "hidden" behind a more recent promisor packfile;
whatever this check is supposed to do, I don't believe it is
sound/correct.

> This is the critical point: you first cloned without a filter,
> and then converted the remote to a promisor remote without
> marking the pack-files you received from that remote as promisor
> pack-files. That means that Git needs to do some work to discover
> which objects are reachable from promisor packs or not, and that
> extra work is slowing you down.

As noted above, I don't think that work is, in the version I was
testing at least (2.31.1), correct. That said, I may well be
misunderstanding its intent.

> This does make me wonder what happens when Git repacks objects
> created locally and then starts fetching from a promisor remote.

I can confirm that *if* the locally-created non-promisor packfile
contains a commit that is the "tip" of a (promisor) remote branch,
then this will trigger the strange/pathological fetch performance
issue. As soon as you add a "promisor" marker for the packfile, or as
soon as someone else pushes new commits to the branch that then get
fetched as new promisor packfiles in your repo, you're golden, the
fetch process stops doing its "panic - non-promisor packfile found"
behavior... Even though a non-promisor packfile *is still in scope!*
(just not at the tip of a ref for that remote)

Related to this, we have the other notable observation that "git
repack" (without asking to do anything with promisors) ends up doing
the same kind of work as the "rev-list" calls noted above, also
proportional to the size and/or number of objects present in promisor
packfiles; which is a little frustrating when the simplest cure to
*other* performance issue, above, is to make all packfile promissor
ones.

> There are some challenges here, for sure. Most likely also some
> potential gains, but it is unlikely to create a seamless
> experience for what you are trying to do.

There's one other crazy finding I should note explicitly here, which
is that force-pushing a branch can, for reasons I cannot explain,
cause you to redownload all the repo's commits & trees again (during
the forced push, as a just-in-time fetch). And again (if the branch
changes and you force-push again). And again for as long as you're
overwriting others' changes. For the linux kernel repo without blobs,
that's 1.09GB a pop, of purely duplicated promisor packfiles
containing the whole (no-blob-filtered) repo. Interestingly, the
behavior is the same regardless of your configured filter - so the
"blob=none" behavior must be hardcoded or implied somehow in the
codepath that produces this weird outcome.

That said, so far I think everything I'm finding is manageable:
* To avoid "non-promisor packfile" issues in the initial dual-stage
clone, we just need to add extra promisor marker files when we
"upgrade" the remote to a promisor remote.
* Git repack performance will (presumably?) only affect background GC jobs
* Any repack-originated non-promissor packfiles can be cleaned up
(made promissor packfiles) by our tooling
* Any weird re-downloads of the commits+trees during force-pushes
should be OK, they would "only" cost 400MB in our repo

If these are issues that someone is interested in looking into, I'd be
very happy to work with them, but my understanding of the codebase
(and even the language) is... poor.

Thanks,
Tao

[-- Attachment #2: 2021-06-03 Partial Clone Strangeness Investigations.txt --]
[-- Type: text/plain, Size: 16086 bytes --]


# 
# This file is a log of investigations into non-obvious behaviors of "partial clone" repositories,
# using a (writable, throwaway) fork of the linux kernel as test repository.
# 
# The OUTCOMES described here are for git 2.31.1 on fedora linux 33 as of 2021-06-03
# 


# -----
# SETUP
# -----

# Enable perf tracing, including child process details
export GIT_TRACE2_PERF=1

# Set up a suitable read & writable linux kernel remote URL
export REMOTE_URL="https://github.com/TaoK/linux.git"



# -------
# OBSERVATIONS: 
#  - repos without promisor packfiles run a "git rev-list" process as part of "fetch.c", right after the "remote_refs" and within the "consume_refs" labelled code areas
#  - these processes are generally very fast
#  - these processes *don't run* if the packfiles containing the "tips"/refs shared/fetched from the remote are all promisor packfiles
#  - these processes *run differently* if *any* of the packfiles containing the "tips"/refs shared/fetched from the remote are not promisor packfiles, but the remote is a promisor
#    - Specifically, the same child processes do run, but an extra "--exclude-promisor-objects" parameter is added.
#  - these child processes, running with a promisor remote, run very fast if promisor packfiles are small
#  - these same child processes run very *slow* if promisor packfiles are large
#  - that slowness is not *proportional* to packfile size, but it is related; this relationship appears to be non-linear
#  - that slowness is substantially increased, for example, by using a "loose" filter, and ending up with many or all blobs *not missing* from the promisor packfile
#  - that slowness seems to depend exclusively on the size/content of the promisor packfiles; non-promisor packfiles "trigger" the behavior but do not appear to impact its speed
# -------


# Regular single-branch quite-shallow clone without checkout, followed by a regular fetch
# -> in the fetch, the "remote_refs" and "consume_refs" labelled areas spawn "git rev-list --objects --stdin --not --all --quiet --alternate-refs" processes
# -> these processes complete instantaneously in this context
# -> an equivalent process (same args) is invoked during clone, and labels progress as "Checking connectivity"
TESTFOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $TESTFOLDER --no-checkout --shallow-since=2018-01-01 &&
  git -C $TESTFOLDER fetch


# Fully blob-filtered single-branch quite-shallow clone without checkout, followed by a regular fetch
# -> "remote_refs" and "consume_refs" don't call "rev-list" child processes at all
TESTFOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $TESTFOLDER --no-checkout --shallow-since=2018-01-01 --filter=blob:none &&
  git -C $TESTFOLDER fetch

# Regular single-branch quite-shallow clone, with a blob-filtered "tip"
#  (another repo is created just to move the tip for the promisor packfile to be created)
# -> "remote_refs" and "consume_refs" don't call "rev-list" child processes at all
TESTFOLDER="linuxtest_$RANDOM" &&
  TIP_MOVER_FOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $TESTFOLDER --no-checkout --shallow-since=2018-01-01 &&
  git clone $REMOTE_URL $TIP_MOVER_FOLDER --depth=1 && 
  echo "Something $RANDOM" > "$TIP_MOVER_FOLDER/test_file_$RANDOM" &&
  git -C $TIP_MOVER_FOLDER add -A &&
  git -C $TIP_MOVER_FOLDER commit --no-gpg-sign -m "test file commit" &&
  git -C $TIP_MOVER_FOLDER push origin HEAD &&
  git -C $TESTFOLDER config remote.origin.promisor true &&
  git -C $TESTFOLDER config remote.origin.partialclonefilter blob:none &&
  git -C $TESTFOLDER fetch &&
  git -C $TESTFOLDER fetch

# Regular single-branch quite-shallow clone, upgraded to a promisor remote
# -> "remote_refs" and "consume_refs" *do* call "rev-list" child processes, with an extra parameter
# -> these rev-list child processes complete very fast (presumably their speed relates to the volume of promisor packfiles in some way)
TESTFOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $TESTFOLDER --no-checkout --shallow-since=2018-01-01 &&
  git -C $TESTFOLDER config remote.origin.promisor true &&
  git -C $TESTFOLDER config remote.origin.partialclonefilter blob:none &&
  git -C $TESTFOLDER fetch

# Regular single-branch quite-shallow clone, with *short* blob-filtered "roots" added later
#  (the deepening is a little chaotic, with lots of blobs being retrieved, presumably from the different shallow roots)
# -> the presence of small promisor packfiles (NOT at the tip) makes no difference; non-promisor packfiles at the tip trigger the rev-list call, it completes fast
TESTFOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $TESTFOLDER --no-checkout --shallow-since=2018-01-01 &&
  git -C $TESTFOLDER config remote.origin.promisor true &&
  git -C $TESTFOLDER config remote.origin.partialclonefilter blob:none &&
  git -C $TESTFOLDER fetch --shallow-since=2017-12-30 &&
  git -C $TESTFOLDER fetch


# Regular single-branch quite-shallow clone, with both a blob-filtered "tip" and a non-promisor-packfile "tip"
#  (another repo is created just to move the tip for the promisor packfile to be created)
# -> the presence of small promisor packfiles (at the tip) makes no difference; non-promisor packfiles at the tip trigger the rev-list call, it completes fast
TESTFOLDER="linuxtest_$RANDOM" &&
  TIP_MOVER_FOLDER="linuxtest_$RANDOM" &&
  ORIGINAL_TIP_BRANCH="testbranch_$RANDOM" &&
  git clone $REMOTE_URL $TESTFOLDER --no-checkout --shallow-since=2018-01-01 &&
  git clone $REMOTE_URL $TIP_MOVER_FOLDER --depth=1 && 
  echo "Something $RANDOM" > "$TIP_MOVER_FOLDER/test_file_$RANDOM" &&
  git -C $TIP_MOVER_FOLDER add -A &&
  git -C $TIP_MOVER_FOLDER commit --no-gpg-sign -m "test file commit" &&
  git -C $TIP_MOVER_FOLDER push origin HEAD &&
  git -C $TESTFOLDER config --add remote.origin.fetch "+refs/heads/$ORIGINAL_TIP_BRANCH:refs/remotes/origin/$ORIGINAL_TIP_BRANCH" &&
  git -C $TESTFOLDER push origin "HEAD:refs/heads/$ORIGINAL_TIP_BRANCH" &&
  git -C $TESTFOLDER config remote.origin.promisor true &&
  git -C $TESTFOLDER config remote.origin.partialclonefilter blob:none &&
  git -C $TESTFOLDER fetch &&
  git -C $TESTFOLDER fetch


# Regular single-branch very-shallow clone, with full blob-filtered history added later
# -> the presence of *large* promisor packfiles (with non-promisor packfiles at the tip) means the rev-list call runs, and runs long...
# (size: 205MB + 1.09GB, clone & unshallow: 60s + 314s, fetch: 324s made up of 2X 162s rev-list)
TESTFOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $TESTFOLDER --no-checkout --depth=1 &&
  git -C $TESTFOLDER config remote.origin.promisor true &&
  git -C $TESTFOLDER config remote.origin.partialclonefilter blob:none &&
  git -C $TESTFOLDER fetch --unshallow &&
  git -C $TESTFOLDER fetch


# Regular single-branch very-shallow clone, with full barely-filtered history added later
# -> the presence of *huge* promisor packfiles (with non-promisor packfiles at the tip) means the rev-list call runs even longer
# (size: 205MB + 2.73GB, clone & unshallow: 60s + 1420s, fetch: 1900s made up of 2X 940s rev-list)
TESTFOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $TESTFOLDER --no-checkout --depth=1 &&
  git -C $TESTFOLDER config remote.origin.promisor true &&
  git -C $TESTFOLDER config remote.origin.partialclonefilter blob:limit=1m &&
  git -C $TESTFOLDER fetch --unshallow &&
  git -C $TESTFOLDER fetch


# Regular single-branch "deep" clone *of a very old ref*, tag "v2.6.13", and then full blob-filtered history *since then* added later
# -> the presence of *large* promisor packfiles (with non-promisor packfiles at the root) has exactly the same impact as with non-promisors at tip
# (size: 76MB + 1.09GB, clone & fetch: 9s + 265s, fetch: 320s made up of 2X 160s rev-list)
TESTFOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $TESTFOLDER --no-checkout -b v2.6.13 --single-branch &&
  git -C $TESTFOLDER config --add remote.origin.fetch "+refs/heads/master:refs/remotes/origin/master" &&
  git -C $TESTFOLDER config remote.origin.promisor true &&
  git -C $TESTFOLDER config remote.origin.partialclonefilter blob:none &&
  git -C $TESTFOLDER fetch &&
  git -C $TESTFOLDER fetch




# -------
# OBSERVATIONS: 
#  - running git repack on a fully-promisored repo can/will yield non-promisor packfiles (if there are loose objects from initially-local commits)
#  - if/when such a locally-packed non-promisor packfile contains any "tip" commits for the promisor remote, then:
#    - fetch will be slow (depending on the size of promissor packfiles)
#    - this will continue until you manually mark the pack file as ".promisor" OR you get a later tip for the affected branch(es) from a promisor remote
# -------

#
# (clone: 180s, add: 70s warmup, fetch: 1s, repack: 173s, fetch: 340s from 2X rev-list at 170s)
# -> after repack, fetch is *very* slow until one of two corrective circumstances arise:
#  1. you make the stray packfile a promisor packfile, or 
#  2. you fetch another change (to that branch) from the promisor remote / the tip of the branch "naturally" becomes a promisor again
TESTFOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $TESTFOLDER --single-branch --filter=blob:none &&
  echo "Something $RANDOM" > "$TESTFOLDER/test_file_$RANDOM" &&
  git -C $TESTFOLDER add -A &&
  git -C $TESTFOLDER commit --no-gpg-sign -m "test file commit" &&
  git -C $TESTFOLDER push &&
  git -C $TESTFOLDER fetch
  git -C $TESTFOLDER repack &&
  git -C $TESTFOLDER fetch
FIXUPFOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $FIXUPFOLDER --depth=1 &&
  echo "Something $RANDOM" > "$FIXUPFOLDER/test_file_$RANDOM" &&
  git -C $FIXUPFOLDER add -A &&
  git -C $FIXUPFOLDER commit --no-gpg-sign -m "test file commit" &&
  git -C $FIXUPFOLDER push
git -C $TESTFOLDER fetch





# -------
# OBSERVATIONS: 
#  - Under some specific circumstances, force-pushing a branch from a partial clone causes this repo 
#       to *re-fetch the repo's commits & trees* into another new promisor packfile. 
#    - This can be repeated any number of times, yielding effectively identical (large) duplicate packfiles
#    - The preconditions appear to be that the remote and the local repo each have mutually unknown commits at the tip...
#  - The behavior is to re-download the commits & trees only, even if the filter settings on the repo are "lax" (eg "limit=1M")
# -------


# Get a fully filtered full clone, commit to the branch from elsewhere, and force the originally cloned branch state back
# -> during the forced push we randomly, strangely, re-retrieve the whole filtered clone data - commits + trees
TESTFOLDER="linuxtest_$RANDOM" &&
  INTERFERINGFOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $INTERFERINGFOLDER --depth=1 &&
  git clone $REMOTE_URL $TESTFOLDER --filter=blob:none &&
  git -C $INTERFERINGFOLDER commit --allow-empty --no-gpg-sign -m "test file commit" &&
  git -C $INTERFERINGFOLDER push &&
  git -C $TESTFOLDER push -f

# Confirm - there are two 1.09-GB packfiles
ll "$TESTFOLDER/.git/objects/pack"

# This can be repeated any number of times, each time creating yet another 1.09GB promisor packfile...
git -C $INTERFERINGFOLDER commit --allow-empty --no-gpg-sign -m "test file commit" &&
  git -C $INTERFERINGFOLDER push -f &&
  git -C $TESTFOLDER push -f 

# Confirm - there are three 1.09-GB packfiles
ll "$TESTFOLDER/.git/objects/pack"

# This can be repeated any number of times, each time creating yet another 1.09GB promisor packfile...
git -C $INTERFERINGFOLDER commit --allow-empty --no-gpg-sign -m "test file commit" &&
  git -C $INTERFERINGFOLDER push -f &&
  git -C $TESTFOLDER push -f 

# Confirm - there are four 1.09-GB packfiles
ll "$TESTFOLDER/.git/objects/pack"



# Again with a "loosely" filtered clone
# -> Even though the original clone size is 3GB, the later "duplicated" download is once again 1.09 GB - commits + trees only.
TESTFOLDER="linuxtest_$RANDOM" &&
  INTERFERINGFOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $INTERFERINGFOLDER --depth=1 &&
  git clone $REMOTE_URL $TESTFOLDER --filter=blob:limit=1M &&
  git -C $INTERFERINGFOLDER commit --allow-empty --no-gpg-sign -m "test file commit" &&
  git -C $INTERFERINGFOLDER push &&
  git -C $TESTFOLDER push -f





# ----
# OBSERVATIONS: 
#  - git repack is slow on filtered repos (repos with promisor packfiles)
#  - git repack's speed is *directly related to the size of the promisor packfiles*
#     (but given that repos with a mix of promisor and non-promisor packfiles misbehave in other ways, this conclusion is of limited value)
# ----

# Baseline - regular full clone
# (size = 3.11GB, clone = 235s, repack = 10s)
TESTFOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $TESTFOLDER &&
  git -C $TESTFOLDER repack

# Baseline - very-shallow single-branch clone
# (size = 203MB, clone = 74s, repack = 0s)
TESTFOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $TESTFOLDER --depth=1 &&
  git -C $TESTFOLDER repack

# Demo - fully blob-filtered full clone, normal checkout
# (size = 1.09GB + 203MB, clone = 90s + 100s, repack = 180s)
TESTFOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $TESTFOLDER --filter=blob:none &&
  git -C $TESTFOLDER repack

# Demo - fully blob-filtered full clone, no checkout
# (size = 1.09GB, clone = 70s, repack = 160s)
TESTFOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $TESTFOLDER --filter=blob:none --no-checkout &&
  git -C $TESTFOLDER repack

# Demo - fully blob-filtered very-shallow single-branch clone, normal checkout
# (size = 203MB, clone = 150s, repack = 8s)
TESTFOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $TESTFOLDER --filter=blob:none --depth=1 &&
  git -C $TESTFOLDER repack

# Demo - fully blob-filtered very-shallow single-branch clone, no checkout
# (size = 2MB, clone = 2s, repack = 0s)
TESTFOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $TESTFOLDER --filter=blob:none --no-checkout --depth=1 &&
  git -C $TESTFOLDER repack

# Demo - barely blob-filtered single-branch clone, no checkout
# (size = 3.1GB, clone = 200s, repack = 810s)
TESTFOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $TESTFOLDER --filter=blob:limit=10m --single-branch --no-checkout &&
  git -C $TESTFOLDER repack

# Demo - medium-deep shallow single-branch clone (no checkout), followed by fully filtered unshallow
# (promisor packfile "in the past" of the single tip ref, with most history in the "tip" non-promisor packfile)
# (size = 2.63GB + 200MB, clone = 337s + 108s, repack = 28s)
#  -> Repack time is directly related to promisor packfile size/content/scope when further back in history
TESTFOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $TESTFOLDER --no-checkout --shallow-since=2010-01-01 --single-branch && 
  git -C $TESTFOLDER config remote.origin.promisor true &&
  git -C $TESTFOLDER config remote.origin.partialclonefilter blob:none &&
  git -C $TESTFOLDER fetch --unshallow &&
  git -C $TESTFOLDER repack

# Demo - reasonably shallow single-branch clone (no checkout), followed by extra commit appearing at tip on the remote, 
#  followed by filtered fetch of that new tip
# (promisor packfile "at the tip", with most of the history in a non-promisor packfile "behind the tip")
# (size = 981MB + , clone = 2s, repack = 3s)
# (TIP_MOVER clone time ignored)
#  -> Repack time is directly related to promisor packfile size/content/scope when at tip
TESTFOLDER="linuxtest_$RANDOM" &&
  TIP_MOVER_FOLDER="linuxtest_$RANDOM" &&
  git clone $REMOTE_URL $TESTFOLDER --no-checkout --shallow-since=2018-01-01 --single-branch && 
  git clone $REMOTE_URL $TIP_MOVER_FOLDER --depth=1 && 
  echo "Something $RANDOM" > "$TIP_MOVER_FOLDER/test_file_$RANDOM" &&
  git -C $TIP_MOVER_FOLDER add -A &&
  git -C $TIP_MOVER_FOLDER commit --no-gpg-sign -m "test file commit" &&
  git -C $TIP_MOVER_FOLDER push origin HEAD &&
  git -C $TESTFOLDER config remote.origin.promisor true &&
  git -C $TESTFOLDER config remote.origin.partialclonefilter blob:none &&
  git -C $TESTFOLDER fetch &&
  git -C $TESTFOLDER repack



# -------
# CLEANUP
# -------

# Delete all the weird repos created.
rm -rf linuxtest_*



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Partial Clone, and a strange slow rev-list call on fetch
  2021-06-03 21:10   ` Tao Klerks
@ 2021-06-04 13:21     ` Derrick Stolee
  2021-06-05  6:35       ` Tao Klerks
  0 siblings, 1 reply; 5+ messages in thread
From: Derrick Stolee @ 2021-06-04 13:21 UTC (permalink / raw)
  To: Tao Klerks; +Cc: git

On 6/3/2021 5:10 PM, Tao Klerks wrote:
> On Wed, Jun 2, 2021 at 1:18 PM Derrick Stolee <stolee@gmail.com> wrote:
> 
>> could you re-run the scenario with GIT_TRACE2_PERF=1 which will
>> give the full Git process stack as we reach that rev-list call.
> 
> Sorry about the delay, I've been trying to reproduce... reproduceably :)
> 
> I now have a whole file of examples and observations, attached (I
> assume text attachments are allowed on this mailing list?), which
> should be reproducible for anyone as I was able to use the linux
> kernel repo to illustrate all cases.

I appreciate that you took so much time to investigate here. You
have convinced me that there are deeper things going on than just
the "unshallow, but with filters this time" situation.

I have created an internal issue for my team to investigate this
when we have capacity for it. I don't think it will happen this
month, so if anyone else has the time now then don't wait for us.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Partial Clone, and a strange slow rev-list call on fetch
  2021-06-04 13:21     ` Derrick Stolee
@ 2021-06-05  6:35       ` Tao Klerks
  0 siblings, 0 replies; 5+ messages in thread
From: Tao Klerks @ 2021-06-05  6:35 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git

On Fri, Jun 4, 2021 at 3:21 PM Derrick Stolee <stolee@gmail.com> wrote:
> You
> have convinced me that there are deeper things going on than just
> the "unshallow, but with filters this time" situation.
>
> I have created an internal issue for my team to investigate this
> when we have capacity for it.

Awesome, thanks!

One more comment on this: As I was trawling through release notes
looking for the right minimum git client version to target/support, I
discovered that the whole "fetch shallow, then unshallow filtered"
flow, which I might have thought I came up with on my own, was
intended to be supported with a change Xin Li made about a year ago,
and landed in 2.28.0:
https://github.com/git/git/commit/01bbbbd9daaa277a95ae46e5a32f6fba026610ac.

As far as I can tell this implementation/behavior does *not* attempt
to convert existing non-promisor packfiles to promisor packfiles, so
either this behavior should be adjusted to do so, or we should
consider this flow (and the presence of non-promisor packfiles at the
"tip") supported...?

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-06-05  6:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-02  4:56 Partial Clone, and a strange slow rev-list call on fetch Tao Klerks
2021-06-02 11:18 ` Derrick Stolee
2021-06-03 21:10   ` Tao Klerks
2021-06-04 13:21     ` Derrick Stolee
2021-06-05  6:35       ` Tao Klerks

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).