git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: git@vger.kernel.org, John Cai <johncai86@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH 0/3] Implement filtering repacks
Date: Fri, 14 Oct 2022 09:46:21 -0700	[thread overview]
Message-ID: <xmqqilkm9wv6.fsf@gitster.g> (raw)
In-Reply-To: <20221012135114.294680-1-christian.couder@gmail.com> (Christian Couder's message of "Wed, 12 Oct 2022 15:51:11 +0200")

Christian Couder <christian.couder@gmail.com> writes:

> For example one might want to clone with a filter to avoid too many
> space to be taken by some large blobs, and one might realize after
> some time that a number of the large blobs have still be downloaded
> because some old branches referencing them were checked out. In this
> case a filtering repack could remove some of those large blobs.
>
> Some of the comments on the patch series that John sent were related
> to the possible data loss and repo corruption that a filtering repack
> could cause. It's indeed true that it could be very dangerous, and we
> agree that improvements were needed in this area.

The wish is understandable, but I do not think this gives a good UI.

This feature is, from an end-user's point of view, very similar to
"git prune-packed", in that we prune data that is not necessary due
to redundancy.  Nobody runs "prune-packed" directly; most people are
even unaware of it being run on their behalf when they run "git gc".

Reusing pack-objects as an underlying mechanism is OK, but this
needs to be plumbed through to "git gc" for a more consistent
experience for the end users.

Is there a way to check if the "promisor remote" is still willing to
keep the previous promise it made, so that the users do not have to
see "we may corrupt the repository as the result of this operation,
you have been warned", by the way?  Possibly with a protocol
extension?

In a sense, once you made a partial clone, your repository is at the
mercy of the remote.  They can disappear any time with majority of
the data you depend on, leaving only what you created locally and
haven't pruned away, in a repository that may technically pass
"fsck", because the things that are supposed to exist locally
exists, but may not be usable in practice.  So from that point of
view, a simple check that asks "I earlier fetched from you with this
filter and these tips of histories; are you still willing to support
me?" and gets yes/no answer might be sufficient.  A remote that is
not trustworthy can say "yes" and still change their mind later, so
such a check may not even be needed.

The above two big paragraphs is a way to say that I am not all that
worried about losing objects that we should be able to refetch again
by adding this feature.  The perceived need for "--force" or "must
run from terminal" may be overblown.  I do not think this negatively
affects correctness or robustness at all (as long as the pruning is
not buggy, of course).

HOWEVER

Unlike prune-packed, pruning objects that could be refetched has
negative performance impact.  So adding an option to enable/disable
such pruning is needed.  If the frontmost UI entry point were "gc",
it needs an opt-in option to invoke the "filtering repack", in other
words.  "pack-objects" should not need any more work than what you
have here (with the "terminal" and "force" discarded), as "--filter"
is such an opt-in option already.


  parent reply	other threads:[~2022-10-14 16:47 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-12 13:51 [PATCH 0/3] Implement filtering repacks Christian Couder
2022-10-12 13:51 ` [PATCH 1/3] pack-objects: allow --filter without --stdout Christian Couder
2022-10-12 13:51 ` [PATCH 2/3] repack: add --filter=<filter-spec> option Christian Couder
2022-10-12 13:51 ` [PATCH 3/3] repack: introduce --force to force filtering Christian Couder
2022-10-14 16:46 ` Junio C Hamano [this message]
2022-10-20 11:23   ` [PATCH 0/3] Implement filtering repacks Christian Couder
2022-10-28 19:49     ` Taylor Blau
2022-10-28 20:26       ` Junio C Hamano
2022-11-07  9:12         ` Christian Couder
2022-11-07  9:00       ` Christian Couder
2022-10-25 12:28 ` [PATCH v2 0/2] " Christian Couder
2022-10-25 12:28   ` [PATCH v2 1/2] pack-objects: allow --filter without --stdout Christian Couder
2022-10-25 12:28   ` [PATCH v2 2/2] repack: add --filter=<filter-spec> option Christian Couder
2022-10-28 19:54   ` [PATCH v2 0/2] Implement filtering repacks Taylor Blau
2022-11-07  9:29     ` Christian Couder
2022-11-22 17:51   ` [PATCH v3 " Christian Couder
2022-11-22 17:51     ` [PATCH v3 1/2] pack-objects: allow --filter without --stdout Christian Couder
2022-11-22 17:51     ` [PATCH v3 2/2] repack: add --filter=<filter-spec> option Christian Couder
2022-11-23  0:31     ` [PATCH v3 0/2] Implement filtering repacks Junio C Hamano
2022-12-21  3:53       ` Christian Couder
2022-11-23  0:35     ` Junio C Hamano
2022-12-21  4:04     ` [PATCH v4 0/3] " Christian Couder
2022-12-21  4:04       ` [PATCH v4 1/3] pack-objects: allow --filter without --stdout Christian Couder
2023-01-04 14:56         ` Patrick Steinhardt
2022-12-21  4:04       ` [PATCH v4 2/3] repack: add --filter=<filter-spec> option Christian Couder
2023-01-04 14:56         ` Patrick Steinhardt
2023-01-05  1:39           ` Junio C Hamano
2022-12-21  4:04       ` [PATCH v4 3/3] gc: add gc.repackFilter config option Christian Couder
2023-01-04 14:57         ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqilkm9wv6.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=johncai86@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).