From: Christian Couder <christian.couder@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: Junio C Hamano <gitster@pobox.com>,
git@vger.kernel.org, John Cai <johncai86@gmail.com>,
Jonathan Tan <jonathantanmy@google.com>,
Jonathan Nieder <jrnieder@gmail.com>
Subject: Re: [PATCH 0/3] Implement filtering repacks
Date: Mon, 7 Nov 2022 10:00:52 +0100 [thread overview]
Message-ID: <CAP8UFD31vHzV6fmvXLPadVFk3a_-MHgfQGUnO6tAftQxu7KE0w@mail.gmail.com> (raw)
In-Reply-To: <Y1wyVpHprWGxEDi/@nand.local>
On Fri, Oct 28, 2022 at 9:49 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Thu, Oct 20, 2022 at 01:23:02PM +0200, Christian Couder wrote:
> > On Fri, Oct 14, 2022 at 6:46 PM Junio C Hamano <gitster@pobox.com> wrote:
> > >
> > > Christian Couder <christian.couder@gmail.com> writes:
> > >
> > > > For example one might want to clone with a filter to avoid too many
> > > > space to be taken by some large blobs, and one might realize after
> > > > some time that a number of the large blobs have still be downloaded
> > > > because some old branches referencing them were checked out. In this
> > > > case a filtering repack could remove some of those large blobs.
> > > >
> > > > Some of the comments on the patch series that John sent were related
> > > > to the possible data loss and repo corruption that a filtering repack
> > > > could cause. It's indeed true that it could be very dangerous, and we
> > > > agree that improvements were needed in this area.
> > >
> > > The wish is understandable, but I do not think this gives a good UI.
> > >
> > > This feature is, from an end-user's point of view, very similar to
> > > "git prune-packed", in that we prune data that is not necessary due
> > > to redundancy. Nobody runs "prune-packed" directly; most people are
> > > even unaware of it being run on their behalf when they run "git gc".
> >
> > I am Ok with adding the --filter option to `git gc`, or a config
> > option with a similar effect. I wonder how `git gc` should implement
> > that option though.
> >
> > If we implement a new command called for example `git filter-packed`,
> > similar to `git prune-packed`, then this new command will call `git
> > pack-objects --filter=...`.
>
> Conceptually, yes, the two are similar. Though `prune-filtered` is
> necessarily going to differ in implementation from `prune-packed`, since
> we will have to write new pack(s), not just delete loose objects which
> appear in packs already.
Yeah, that's why I say `prune-filtered` will call `git pack-objects
--filter=...`.
> So it's really not just a matter of purely deleting redundant loose
> copies of objects like in the case of prune-packed. Here we really do
> care about potentially writing a new set of packs to satisfy the new
> filter constraint.
Yeah, I agree.
> Presumably that tool would implement creating the new packs according to
> the given --filter, and would similarly delete existing packs. That is
> basically what your implementation in repack already does, so I am not
> sure what the difference would be.
Indeed, there wouldn't be much difference implementation wise between
a new `git filter-packed` command like Junio suggested and the current
implementation I sent which implements the feature in `git repack`. (A
new `git filter-packed` would just duplicate the repack features that
are needed and just call `git pack-objects --filter=...`). That's why
I don't really see the point of a new `git filter-packed` command and
the version 2 I sent still implements the feature in `git repack`.
So I have a hard time understanding your comment unless you just agree with me.
> > Yeah. So to sum up, it looks like you are Ok with `git gc
> > --filter=...` which is fine for me, even if I wonder if `git repack
> > --filter=...` could be a good first step as it is less likely to be
> > used automatically (so safer in a way) and it might be better for
> > implementation related performance reasons.
>
> If we don't intend to have `git repack --filter` part of our backwards
> compatibility guarantee, then I would prefer to see the implementation
> just live in git-gc from start to finish.
About the implementation living in `git gc` I wrote the following:
>>> `git gc` is already running `git repack` under the hood in a number of
>>> cases though. So running `git gc --filter=...` would in many cases
>>> call `git pack-objects` twice, as it would call it once through git
>>> repack and once through `git filter-packed`. Or am I missing something
>>> here?
Even if we don't have a `git filter-packed` command, if the feature is
implemented in `git gc` (but not in `git repack`) it would just call
`git pack-objects --filter=...` from there, which means that `git
pack-objects` would be called twice (once through `git repack` and
once for this new feature) by `git gc` in some cases, instead of just
once if the feature was implemented in `git repack` as `git gc` could
then just calls `git repack ... --filter=...` once.
That's why I think it's better for performance reasons if the feature
is implemented in `git repack`. If you don't want for some reason to
have `git repack --filter=...` part of our backwards compatibility
guarantee, then --filter can be a hidden and undocumented option in
`git repack`. Or maybe we could use a new env variable to instruct
`git repack` to pass some --filter option to `git pack-objects`, but
my opinion is that it's much simpler to just accept --filter to be a
regular, though dangerous, `git repack` option, and then add --filter
to `git gc`.
I am also Ok with adding --filter to `git gc` in this patch series and
have the doc say that it's better to use `git gc --filter` instead of
`git repack --filter` so that users could learn right away to use the
feature through `git gc` instead of through `git repack`.
next prev parent reply other threads:[~2022-11-07 9:02 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-12 13:51 [PATCH 0/3] Implement filtering repacks Christian Couder
2022-10-12 13:51 ` [PATCH 1/3] pack-objects: allow --filter without --stdout Christian Couder
2022-10-12 13:51 ` [PATCH 2/3] repack: add --filter=<filter-spec> option Christian Couder
2022-10-12 13:51 ` [PATCH 3/3] repack: introduce --force to force filtering Christian Couder
2022-10-14 16:46 ` [PATCH 0/3] Implement filtering repacks Junio C Hamano
2022-10-20 11:23 ` Christian Couder
2022-10-28 19:49 ` Taylor Blau
2022-10-28 20:26 ` Junio C Hamano
2022-11-07 9:12 ` Christian Couder
2022-11-07 9:00 ` Christian Couder [this message]
2022-10-25 12:28 ` [PATCH v2 0/2] " Christian Couder
2022-10-25 12:28 ` [PATCH v2 1/2] pack-objects: allow --filter without --stdout Christian Couder
2022-10-25 12:28 ` [PATCH v2 2/2] repack: add --filter=<filter-spec> option Christian Couder
2022-10-28 19:54 ` [PATCH v2 0/2] Implement filtering repacks Taylor Blau
2022-11-07 9:29 ` Christian Couder
2022-11-22 17:51 ` [PATCH v3 " Christian Couder
2022-11-22 17:51 ` [PATCH v3 1/2] pack-objects: allow --filter without --stdout Christian Couder
2022-11-22 17:51 ` [PATCH v3 2/2] repack: add --filter=<filter-spec> option Christian Couder
2022-11-23 0:31 ` [PATCH v3 0/2] Implement filtering repacks Junio C Hamano
2022-12-21 3:53 ` Christian Couder
2022-11-23 0:35 ` Junio C Hamano
2022-12-21 4:04 ` [PATCH v4 0/3] " Christian Couder
2022-12-21 4:04 ` [PATCH v4 1/3] pack-objects: allow --filter without --stdout Christian Couder
2023-01-04 14:56 ` Patrick Steinhardt
2022-12-21 4:04 ` [PATCH v4 2/3] repack: add --filter=<filter-spec> option Christian Couder
2023-01-04 14:56 ` Patrick Steinhardt
2023-01-05 1:39 ` Junio C Hamano
2022-12-21 4:04 ` [PATCH v4 3/3] gc: add gc.repackFilter config option Christian Couder
2023-01-04 14:57 ` Patrick Steinhardt
2024-05-15 13:25 ` [PATCH v2 0/3] upload-pack: support a missing-action Christian Couder
2024-05-15 13:25 ` [PATCH v2 1/3] rev-list: refactor --missing=<missing-action> Christian Couder
2024-05-15 16:16 ` Junio C Hamano
2024-05-15 13:25 ` [PATCH v2 2/3] pack-objects: use the missing action API Christian Couder
2024-05-15 16:46 ` Junio C Hamano
2024-05-24 16:40 ` Christian Couder
2024-05-15 13:25 ` [PATCH v2 3/3] upload-pack: allow configuring a missing-action Christian Couder
2024-05-15 17:08 ` Junio C Hamano
2024-05-24 16:41 ` Christian Couder
2024-05-24 21:51 ` Junio C Hamano
2024-05-28 10:10 ` Christian Couder
2024-05-28 15:54 ` Junio C Hamano
2024-05-31 20:43 ` Christian Couder
2024-06-01 9:43 ` Junio C Hamano
2024-06-03 15:01 ` Christian Couder
2024-06-03 17:29 ` Junio C Hamano
2024-05-15 13:59 ` [PATCH v2 0/3] upload-pack: support " Christian Couder
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAP8UFD31vHzV6fmvXLPadVFk3a_-MHgfQGUnO6tAftQxu7KE0w@mail.gmail.com \
--to=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johncai86@gmail.com \
--cc=jonathantanmy@google.com \
--cc=jrnieder@gmail.com \
--cc=me@ttaylorr.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).