git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Roberto Tyley <roberto.tyley@gmail.com>
To: Elijah Newren <newren@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: New command/tool: git filter-repo
Date: Thu, 31 Jan 2019 23:36:12 +0000	[thread overview]
Message-ID: <CAFY1edZ6hS5Dz9z5ZAhU59he9SjxetvfTN+ndzUZkjxhsuwEZA@mail.gmail.com> (raw)
In-Reply-To: <CABPp-BH==w5APkz9cvUYq7m4qieJ3LWCsYySevgJuZ8bi2RzjQ@mail.gmail.com>

On Thu, 31 Jan 2019 at 22:37, Elijah Newren <newren@gmail.com> wrote:
> On Thu, Jan 31, 2019 at 8:09 PM Junio C Hamano <gitster@pobox.com> wrote:
> > Elijah Newren <newren@gmail.com> writes:
> >
> > > git-filter-repo[1], a filter-branch-like tool for rewriting repository
> > > history, is ready for more widespread testing and feedback.  The rough
> > > edges I previously mentioned have been fixed, and it has several useful
> > > features already, though more development work is ongoing (docs are a
> > > bit sparse right now, though -h provides some help).
> > >
> > > Why filter-repo vs. filter-branch?

I like the name! I think a lot of users are interested in filtering
their entire repo, rather than rewriting a single branch.

> > How does it compare with bfg-repo-cleaner?  Somehow I was led to
> > believe that all serious users of filter-branch like functionality
> > are using bfg-repo-cleaner instead.
>
> No, bfg-repo-cleaner only covers an important subset of the usecases.

That's true - the focus with BFG Repo-Cleaner is on removing unwanted
data - completely eradicating it from a repo's history. There are some
mistakes in history that repo owners just really *do not* want to
share (ie large files, private data/credentials), and they can be a
critical blocker to sharing or working with a Git repo. In terms of
rewriting history, my internal criterion for what I features I really
want to be in the BFG is: is this unwanted data completely stopping
many users from sharing their code or doing their work?

I understand that when it comes to rewriting history, there are loads
of other operations that people sometimes want to perform, beyond
removing unwanted data - merging/splitting of history,
anonymization/renaming of committers, etc. Some of those might be nice
to add to the BFG - but as with many OSS-maintainers, I have limited
time, and a life to balance outside of software...!

> bfg-repo-cleaner does a really good job if your goal is to remove a
> few big files and/or to remove some sensitive text (matched via
> regexes) from all blobs.  It was designed for that specific role and
> has more options in this area than filter-repo currently has.  But
> even within this design space it was optimized for, it is missing two
> things that I really want:
>
>   * pruning of commits which become empty due to filtering

There certainly have been several users asking for this feature on the
BFG, and even a kindly contributed PR for the functionality which I've
yet to merge. As it doesn't actually stop users from doing work - so
far as I can see - it's something that I've done a poor job of
following up.

>   * providing a way for the user to know what needs to be cleaned up.
> It has options like --strip-blobs-bigger-than <size> or
> --strip-biggest-blobs <NUM>, but no way for the user to figure out
> what <size> or <NUM> should be.

For users of GitHub, It's normally 100MB with
--strip-blobs-bigger-than <size> :-)

> Also, since it just focuses on really
> big blobs, it misses cases like someone checking in directories with a
> huge number of small-to-moderately sized files (e.g. bower_components/
> or node_modules/, though these could also contain a few big blobs

For those use-cases, it might be that BFG's --delete-folders flag is
useful, especially given the protected-head-commit feature of the BFG.


It's getting late for me, must be even later in Brussels - I wish I
could have made it there to join in! Merry Git Merge to you all, and
good luck to you Elijah with git-filter-repo.

Roberto

  reply	other threads:[~2019-01-31 23:36 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-31  8:57 New command/tool: git filter-repo Elijah Newren
2019-01-31 19:09 ` Junio C Hamano
2019-01-31 20:43   ` Elijah Newren
2019-01-31 23:36     ` Roberto Tyley [this message]
2019-02-01  7:38       ` Elijah Newren
2019-01-31 20:47 ` Elijah Newren
2019-02-08  1:25 ` Elijah Newren
2019-02-08 10:22   ` Johannes Schindelin
2019-02-08 18:53 ` Ævar Arnfjörð Bjarmason
2019-02-08 20:13   ` Johannes Schindelin
2019-02-11 16:00     ` Elijah Newren
2019-02-11 15:47   ` Elijah Newren
2019-06-08 16:20   ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFY1edZ6hS5Dz9z5ZAhU59he9SjxetvfTN+ndzUZkjxhsuwEZA@mail.gmail.com \
    --to=roberto.tyley@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).