git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Tao Klerks <tao@klerks.biz>
To: git@vger.kernel.org
Subject: Removing Partial Clone / Filtered Clone on a repo
Date: Tue, 1 Jun 2021 12:24:23 +0200	[thread overview]
Message-ID: <CAPMMpoim38J3=4pd0_fM2h=DN_PrEE_Osg2duU5Ur8WUZ5S1Pg@mail.gmail.com> (raw)

Hi folks,

I'm trying to deepen my understanding of the Partial Clone
functionality for a possible deployment at scale (with a large-ish
13GB project where we are using date-based shallow clones for the time
being), and one thing that I can't get my head around yet is how you
"unfilter" an existing filtered clone.

The gitlab intro document
(https://docs.gitlab.com/ee/topics/git/partial_clone.html#remove-partial-clone-filtering)
suggests that you need to get the full list of missing blobs, and pass
that into a fetch...:

git fetch origin $(git rev-list --objects --all --missing=print | grep
-oP '^\?\K\w+')

In my project's case, that would be millions of blob IDs! I tested
this with a path-based filter to rev-list, to see what getting 30,000
blobs might look like, and it took a looong while... I don't
understand much about the negotiation process, but I have to assume
there is a fixed per-blob cost in this scenario which is *much* higher
than in a "regular" fetch or clone.

Obviously one answer is to throw away the repo and start again with a
clean unfiltered clone... But between repo-local config, project
settings in IDEs / external tools, and unpushed local branches, this
is an awkward thing to ask people to do.

I initially thought it might be possible to add an extra remote
(without filter / promisor settings), mess with the negotiation
settings to make the new remote not know anything about what's local,
and then get a full set of refs and their blobs from that remote...
but I must have misunderstood how the negotation-tip stuff works
because I can't get that to do anything (it always "sees" my existing
refs and I just get the new remote's refs "for free" without object
transfer).

The official doc at https://git-scm.com/docs/partial-clone makes no
mention of plans or goals (or non-goals) related to this "unfiltering"
- is it something that we should expect a story to emerge around?

Thanks,
Tao Klerks

             reply	other threads:[~2021-06-01 10:24 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-01 10:24 Tao Klerks [this message]
2021-06-01 10:39 ` Removing Partial Clone / Filtered Clone on a repo Derrick Stolee
2021-06-01 13:16   ` Tao Klerks
2021-06-01 13:40     ` Derrick Stolee
2021-06-01 16:54       ` Tao Klerks
2021-06-02  5:04         ` Tao Klerks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPMMpoim38J3=4pd0_fM2h=DN_PrEE_Osg2duU5Ur8WUZ5S1Pg@mail.gmail.com' \
    --to=tao@klerks.biz \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).