git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* RFC: Proposing git-filter-repo for inclusion in git.git
@ 2019-08-22 18:26 Elijah Newren
  2019-08-22 20:23 ` Junio C Hamano
  0 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-08-22 18:26 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder

Hi everyone,

git-filter-repo[1] is a tool for rewriting or restructuring git
history.  The recent Git Rev News article[2] on filter-repo introduced
it and explained how it is much more versatile than git-filter-branch
or BFG Repo Cleaner, and with speed comparable to BFG.  (I probably
also should have gone into more detail on how it is also much safer
than filter-branch).

I propose merging git-filter-repo.git into git.git.  There was some
previous discussion at [3], but the tool was incomplete and still
under heavy development back then.

Questions, comments, or concerns with this proposal?  Alternative
proposals?  If inclusion is acceptable, are there any other tasks that
need to be completed first?


Thanks,
Elijah

[1] https://github.com/newren/git-filter-repo
    or
    https://github.com/newren/git-filter-repo/blob/master/Documentation/git-filter-repo.txt
[2] https://git.github.io/rev_news/2019/08/21/edition-54/#an-introduction-to-git-filter-repo--written-by-elijah-newren
[3] https://public-inbox.org/git/CABPp-BFC--s+D0ijRkFCRxP5Lxfi+__YF4EdxkpO5z+GoNW7Gg@mail.gmail.com/

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: Proposing git-filter-repo for inclusion in git.git
  2019-08-22 18:26 RFC: Proposing git-filter-repo for inclusion in git.git Elijah Newren
@ 2019-08-22 20:23 ` Junio C Hamano
  2019-08-22 21:12   ` Elijah Newren
  2019-08-26 19:56   ` Jeff King
  0 siblings, 2 replies; 73+ messages in thread
From: Junio C Hamano @ 2019-08-22 20:23 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Git Mailing List, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Lars Schneider, Jonathan Nieder

Elijah Newren <newren@gmail.com> writes:

> Questions, comments, or concerns with this proposal?  Alternative
> proposals?  If inclusion is acceptable, are there any other tasks that
> need to be completed first?

I do not want a discussion to begin with a Devil's Advocate
response, but anyway...

Are we planning to go to all batteries included approach?  I have a
feeling that there are other tools (hello, "git imerge") that
equally deserve attention by Git users; are we in the business of
absorbing them all?  How big a project will our tree become, and how
much more activity would have to be haneld by the readership of the
Git mailing list?

I'd rather see us shed non-core tools we already have (e.g. git-svn,
cvs import/export) out of git.git and have them as independent
projects.  But that may be just me.

The benefits I see in the "leave as much as possible out" approach
are (not in particular order):

 - Distributed development and integration.  The release schedule of
   Git-core does not have to be constrained by the readiness of
   non-core part.

 - Choice of development tools, language, etc.  The core part will
   stay in C, but peripheral tools do not have to be constrained by
   our choice.  Those who run their own project can choose what they
   want to use and how they want to structure their development
   process.

The primary downside I would want to avoid from absorbing non-core
stuff is that such peripheral tools (git-cvsimport, I am looking at
you) can go stale when their development stalls, and nobody would be
bold enough to suggest removing them.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: Proposing git-filter-repo for inclusion in git.git
  2019-08-22 20:23 ` Junio C Hamano
@ 2019-08-22 21:12   ` Elijah Newren
  2019-08-22 21:34     ` Junio C Hamano
                       ` (2 more replies)
  2019-08-26 19:56   ` Jeff King
  1 sibling, 3 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-22 21:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Lars Schneider, Jonathan Nieder

On Thu, Aug 22, 2019 at 1:24 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > Questions, comments, or concerns with this proposal?  Alternative
> > proposals?  If inclusion is acceptable, are there any other tasks that
> > need to be completed first?
>
> I do not want a discussion to begin with a Devil's Advocate
> response, but anyway...
>
> Are we planning to go to all batteries included approach?  I have a
> feeling that there are other tools (hello, "git imerge") that
> equally deserve attention by Git users; are we in the business of
> absorbing them all?  How big a project will our tree become, and how
> much more activity would have to be haneld by the readership of the
> Git mailing list?
>
> I'd rather see us shed non-core tools we already have (e.g. git-svn,
> cvs import/export) out of git.git and have them as independent
> projects.  But that may be just me.

Ooh, if you're going to open this door, then a proposal I assumed
would be shot down but which I'd be just about as happy with is:

  * Remove git-filter-branch from git.git.  Mention in the release
notes where people can go to get it.[1]

filter-branch is not merely a slow or difficult-to-use tool, it's one
that *fosters* mistakes by making it hard to get things right in
several different ways.  Granted, people exercise extra caution using
filter-branch because they know they need to, but there are so many
gotchas that they're likely to accidentally mess something up.  Those
mess-ups are not always discovered immediately, and by then it's
nearly cast into stone (rewriting being something you want to do very
rarely).

For as long as git-filter-branch is part of git.git and other tools
are outside, people will take that as a strong endorsement from us for
filter-branch and use it regardless of how much other education
exists...and that causes problems.


Thoughts?

Elijah


[1] We'd still have to decide where to put it.  If no one else wants
to do it, I could include it in git-filter-repo with the promise that
it's there for backward compatibility for those that still need the
tool, even if I recommend folks use filter-repo instead.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: Proposing git-filter-repo for inclusion in git.git
  2019-08-22 21:12   ` Elijah Newren
@ 2019-08-22 21:34     ` Junio C Hamano
  2019-08-26 23:52       ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Elijah Newren
  2019-08-23  3:00     ` RFC: Proposing git-filter-repo for inclusion in git.git Eric Wong
  2019-08-23 12:02     ` Derrick Stolee
  2 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2019-08-22 21:34 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Git Mailing List, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Lars Schneider, Jonathan Nieder

Elijah Newren <newren@gmail.com> writes:

> Ooh, if you're going to open this door, then a proposal I assumed
> would be shot down but which I'd be just about as happy with is:
>
>   * Remove git-filter-branch from git.git.  Mention in the release
> notes where people can go to get it.[1]
> ...

Yup, I think, especially given now filter-repo exists and is well
known by Git users, this is a good move in the mid to longer term.

> For as long as git-filter-branch is part of git.git and other tools
> are outside, people will take that as a strong endorsement from us for
> filter-branch and use it regardless of how much other education
> exists...and that causes problems.

Exactly.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: Proposing git-filter-repo for inclusion in git.git
  2019-08-22 21:12   ` Elijah Newren
  2019-08-22 21:34     ` Junio C Hamano
@ 2019-08-23  3:00     ` Eric Wong
  2019-08-23 18:06       ` Elijah Newren
  2019-08-23 12:02     ` Derrick Stolee
  2 siblings, 1 reply; 73+ messages in thread
From: Eric Wong @ 2019-08-23  3:00 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Junio C Hamano, git, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Lars Schneider, Jonathan Nieder

Elijah Newren <newren@gmail.com> wrote:
>   * Remove git-filter-branch from git.git.  Mention in the release
> notes where people can go to get it.[1]
> 
> filter-branch is not merely a slow or difficult-to-use tool, it's one
> that *fosters* mistakes by making it hard to get things right in
> several different ways.  Granted, people exercise extra caution using
> filter-branch because they know they need to, but there are so many
> gotchas that they're likely to accidentally mess something up.  Those
> mess-ups are not always discovered immediately, and by then it's
> nearly cast into stone (rewriting being something you want to do very
> rarely).

Is it possible to turn git-filter-branch into a fast, compatible,
and (maybe) safe wrapper for git-filter-repo?  That would "fix"
filter-branch and (if done carefully) not break existing uses.

It could also spew warnings to recommend safer switches.

Stability is a major reason I use git, the Linux kernel,
and why I distrust+avoid desktop/GUI software.  Removing
"unsafe" features, even with good intentions, inevitably leads
to frustrated users.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: Proposing git-filter-repo for inclusion in git.git
  2019-08-22 21:12   ` Elijah Newren
  2019-08-22 21:34     ` Junio C Hamano
  2019-08-23  3:00     ` RFC: Proposing git-filter-repo for inclusion in git.git Eric Wong
@ 2019-08-23 12:02     ` Derrick Stolee
  2 siblings, 0 replies; 73+ messages in thread
From: Derrick Stolee @ 2019-08-23 12:02 UTC (permalink / raw)
  To: Elijah Newren, Junio C Hamano
  Cc: Git Mailing List, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Lars Schneider, Jonathan Nieder

On 8/22/2019 5:12 PM, Elijah Newren wrote:
> On Thu, Aug 22, 2019 at 1:24 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Elijah Newren <newren@gmail.com> writes:
>>
>>> Questions, comments, or concerns with this proposal?  Alternative
>>> proposals?  If inclusion is acceptable, are there any other tasks that
>>> need to be completed first?
>>
>> I do not want a discussion to begin with a Devil's Advocate
>> response, but anyway...
>>
>> Are we planning to go to all batteries included approach?  I have a
>> feeling that there are other tools (hello, "git imerge") that
>> equally deserve attention by Git users; are we in the business of
>> absorbing them all?  How big a project will our tree become, and how
>> much more activity would have to be haneld by the readership of the
>> Git mailing list?
>>
>> I'd rather see us shed non-core tools we already have (e.g. git-svn,
>> cvs import/export) out of git.git and have them as independent
>> projects.  But that may be just me.

Yes please! Let's make the repo smaller.

> Ooh, if you're going to open this door, then a proposal I assumed
> would be shot down but which I'd be just about as happy with is:
> 
>   * Remove git-filter-branch from git.git.  Mention in the release
> notes where people can go to get it.[1]
> 
[snip]
>
> [1] We'd still have to decide where to put it.  If no one else wants
> to do it, I could include it in git-filter-repo with the promise that
> it's there for backward compatibility for those that still need the
> tool, even if I recommend folks use filter-repo instead.

May I recommend an idea, which may be silly?

We could strip these "extra" tools out of git.git and place them in
their own repos. The hope would be that they could build on their own
and have their own test suites.

Then, Git distributors could pick and choose the components they
bundle with Git. Dscho would know more about this sort of thing as
he distributes MinGit, which strips these things out already.

The biggest question is: how do we make sure that as git.git moves
forward that we don't break the ecosystem? Maybe we create a new,
larger repo that contains all of these subrepos? This would give
the community more experience dogfooding our own repo-splitting tools.

Personally, I like the idea of 'git subtree' over something like
'git submodule'. Using 'git subtree' may mean that tools like
'git-svn' that may be hard to split into a completely independent
repo could live primarily in the meta repo with a source dependence
on the included git.git subtree.

This "meta-git.git" repo could then be more flexible in adding
new tools like git-filter-repo or even git-lfs and friends. Again,
distributors could select a subset to include, but we would have
one place to run CI builds and make sure the tools are not
obviously breaking as git.git updates.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: Proposing git-filter-repo for inclusion in git.git
  2019-08-23  3:00     ` RFC: Proposing git-filter-repo for inclusion in git.git Eric Wong
@ 2019-08-23 18:06       ` Elijah Newren
  2019-08-23 18:29         ` Elijah Newren
  2019-08-28 11:09         ` Johannes Schindelin
  0 siblings, 2 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-23 18:06 UTC (permalink / raw)
  To: Eric Wong
  Cc: Junio C Hamano, Git Mailing List,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder

Hi Eric!

On Thu, Aug 22, 2019 at 8:01 PM Eric Wong <e@80x24.org> wrote:
>
> Elijah Newren <newren@gmail.com> wrote:
> >   * Remove git-filter-branch from git.git.  Mention in the release
> > notes where people can go to get it.[1]
> >
> > filter-branch is not merely a slow or difficult-to-use tool, it's one
> > that *fosters* mistakes by making it hard to get things right in
> > several different ways.  Granted, people exercise extra caution using
> > filter-branch because they know they need to, but there are so many
> > gotchas that they're likely to accidentally mess something up.  Those
> > mess-ups are not always discovered immediately, and by then it's
> > nearly cast into stone (rewriting being something you want to do very
> > rarely).
>
> Is it possible to turn git-filter-branch into a fast, compatible,
> and (maybe) safe wrapper for git-filter-repo?  That would "fix"
> filter-branch and (if done carefully) not break existing uses.

Ooh, what an interesting question.  I can probably ramble on a LOT
longer than you expected about this...

== Short answer ==

It is certainly possible to reimplement git-filter-branch on top of
git-filter-repo, though slightly differently than how you appear to be
suggesting here.  In doing so, you can provide the same user interface
and have it be perfectly compatible.  In fact, I've already created
such a thing -- except that I took a few small liberties with
compatibility (and documented each), primarily to improve the speed --
and I can use it instead of git-filter-branch to pass the git.git
testsuite.  You can see it here:
https://github.com/newren/git-filter-repo/blob/master/contrib/filter-repo-demos/filter-lamely

HOWEVER, it is NOT possible at all to make such a thing be fast or
safe.  Not even close.  The performance and safety and are not
accidents of the implementation, but are baked into the design from
top to bottom and cannot be fixed without breaking backward
compatibility in lots and lots of different ways.  For the really
curious, I'll provide a possibly non-comprehensive list of why you
can't fix performance or safety if you require any compatibility at
all (and maybe throw in one or two things that could be
backward-compatibly fixed in filter-branch, such as the commit
encoding disaster):


== Long answer ==

Performance:

  * In editing files, git-filter-branch by design checks out each and
every commit as it existed in the original repo.  If your repo has
10^5 files and 10^5 commits, but each commit only modifies 5 files,
then git-filter-branch will make you do 10^10 modifications, despite
only having (at most) 5*10^5 unique blobs.
  * If you try and cheat and try to make filter-branch only work on
files modified in a commit, then two things happen (1) you run into
problems with deletions whenever the user is simply trying to rename
files (because attempting to delete files that don't exist looks like
a no-op; it takes some chicanery and work to remap deletes across file
renames when the renames happen via arbitrary user-provided shell),
and (2) even if you succeed at the map-deletes-for-renames chicanery
(as I believe I did in my reimplementation), you still technically
violate backward compatibility because users are allowed to filter
files in ways that depend upon topology of commits instead of
filtering solely based on file contents or names (though I have never
seen any user ever do this).
  * Even if you don't need to edit files but only want to e.g. rename
or remove some and thus can avoid checking out each file (i.e. you can
use --index-filter), you still are passing shell snippets for your
filters.  This means that for every commit, you have to have a
prepared git repo where users can run git commands.  That's a lot of
setup.  It also means you have to fork at least one process to run the
user-provided shell snippet, and odds are that the user's shell
snippet invokes lots of commands in some long pipeline, so you will
have lots and lots of forks.  For every. single. commit.  That's a
massive amount of overhead to rename a few files.
  * filter-branch is written in shell, which is kind of slow.
Naturally, it makes sense to want to rewrite that in some other
language.  However, filter-branch documentation states that several
additional shell functions are provided for users to call, e.g. 'map',
'skip_commit', 'git_commit_non_empty_tree', If filter-branch itself
isn't a shell script, then in order to make those shell functions
available to the users' shell snippets you have to prepend the shell
definitions of these functions to every one of the users' shell
snippets and thus make these special shell functions be parsed with
each and every commit.
  * filter-branch provides a --setup option which is a shell snippet
that can be sourced to make shell functions and variables available to
all other filters.  If filter-branch is a shell script, it can simply
eval this shell snippet once at the beginning.  If you try to fix
performance by making filter-branch not be a shell script, then you
have to prepend the setup shell snippet to all other filters and parse
it with every single commit.
  * git-filter-branch writes lots of files to $workdir/../map/ to keep
a mapping of commits, which it uses for the map() command it provides.
Other files like $tempdir/backup-refs, $tempdir/raw-refs,
$tempdir/heads, $tempdir/tree-state are all created internally too --
and users could have accessed any of these.  Users even had a pointer
to follow in the form of Documentation that the 'map' command existed,
which naturally uses the $workdir/../map/* files.  So, even if you
don't have to edit files, for strict backward compatibility you need
to still write a bunch of files to disk somewhere and keep them
updated for every commit.  You can claim it was an implementation
detail that users should not have depended upon, but the truth is
they've had a decade where they could so.  So, if you want full
compatibility, it has to be there.  Besides, the regression tests
depend on at least one of these details, specifying an --index-filter
that reaches down and grabs backup-refs from $tempdir, and thus
provides resourceful users who do google searches an example that
there are files there for them to read and grab and use.  (And if you
want to pass the existing regression tests, you have to at least put
the backup-refs file there even if it's irrelevant to your
implementation otherwise.)

Safety:

filter-branch is riddled with gotchas resulting in various ways to
easily corrupt repos or end up with a mess worse than what you started
with:

* Someone can have a set of "working and tested filters" which they
document or provide to a coworker, who then runs them on a different
OS where the same commands are not working/tested (even the
git-filter-branch manpage is guilty here).  BSD vs. GNU userland
differences can really bite.  If you're lucky, you get ugly error
messages spewed.  But just as likely, the commands either don't do the
filtering requested, or silently corrupt making some unwanted change.
The unwanted change may only affect a few commits, so it's not
necessarily obvious either.  (The fact that problems won't necessarily
be obvious means they are likely to go unnoticed until the rewritten
history is in use for quite a while, at which point it's really hard
to justify another flag-day for another rewrite.)
 * filenames with spaces (which are rare) are often mishandled by
shell snippets since they cause problems for shell pipelines.  Yes, I
know find -print0, xargs -0, ls-files -z, etc.  Not everyone does.
And even if they do, they may assume it's not relevant because someone
else renamed any such files in their repo back before the person doing
the filtering joined the project (or maybe they are just being lazy
and not thinking about everything that could go wrong).
 * non-ascii filenames (which are rare) can be silently removed
despite being in a desired directory (the desire to select paths to
keep often use pipelines like 'git ls-files | grep -v ^WANTED_DIR/ |
xargs git rm".  ls-files will only quote filenames if needed so folks
may not notice that one of the files didn't match the regex, again
until it's much too late.  Yes, someone who knows about core.quotePath
can avoid this (unless they have other special characters like \t, \n,
or "), and people who use ls-files -z can avoid this, but that doesn't
mean they will).
  * Similarly, when moving files around, one can find that filenames
with non-ascii or special characters end up in a different directory,
one that includes a double quote character.  (This is technically the
same issue as above with quoting, but perhaps an interesting different
way that it can and has manifested as a problem.)
  * It's far too easy to accidentally mix up old and new history.
It's still possible with any tool, but filter-branch almost invites
it.  If we're lucky, the only downside is users getting frustrated
that they don't know how to shrink their repo and remove the old
stuff.  If we're unlucky, they merge old and new history and end up
with multiple "copies" of each commit, some of which have unwanted or
sensitive files and others which don't.  This comes about in multiple
different ways: the default to only doing a partial history rewrite
('--all' is not the default and over 80% of the examples in the
manpage don't use it), the fact that there's no automatic post-run
cleanup, the fact that --tag-name-filter (when used to rename tags)
doesn't remove the old tags but just adds new ones with the new name
(the manpage documents this so it's presumably not a "bug" even though
it feels like it), and the fact that little educational information is
provided to inform users of the ramifications of a rewrite and how to
avoid mixing old and new history (e.g. not only do other users need to
understand that they need to rebase their changes for all their
branches on top of new history (or delete and reclone), but they also
need to manually delete all their tags before refetching, any
references that were on any shared servers (e.g. the central repo
folks push to) that weren't part of the rewrite need to be deleted,
and if the shared server has any locked-down refs such as
refs/changes/, refs/pull/, or refs/merge-requests/ then people need to
exercise extra caution; since none of these are fool-proof, someone
should probably also add some server-side hooks to prevent folks from
accidentally re-pushing old history, or make use of special facilities
(such as gerrit's ban-commit command) to prevent it.)
  * annotated tags can be accidentally converted to lightweight tags.
The first way this happens is folks do a rewrite, realize they messed
up, restore from the backups in refs/original/, and then redo their
filter-branch command.  (The backup in refs/original/ is not a real
backup; it dereferences tags first.)  Another way this happens is
despite passing --tags or --all on the command line, filter-branch
dereferences the tag for them.  The documentation does not make it all
that clear that in order to retain annotated tags as annotated, you
must use --tag-name-filter (and must not have restored from
refs/original/ in a previously botched rewrite).
  * Any commit messages that specify an encoding will become corrupted
by the rewrite; filter-branch ignores the encoding, takes the original
bytes, and feeds it to commit-tree without telling it the proper
encoding.  (This happens whether or not --msg-filter is used, though I
suspect --msg-filter provides additional ways to really mess things
up).
  * commit messages (even if they are all UTF-8) by default become
corrupted due to not being updated -- any references to other commit
hashes in commit messages will now refer to no-longer-extant commits.
  * no facilities for helping users find what unwanted crud they
should delete means they are much more likely to have incomplete or
partial cleanups that sometimes result in confusion and people wasting
time trying to understand  (e.g. folks tend to just look for big files
to delete instead of big directories or extensions, and once they do
so, then sometime later folks using the new repository who are going
through history will notice a build artifact directory that has some
files but not others, or a cache of dependencies (node_modules or
similar) which couldn't have ever been functional since it's missing
some files)
  * if --prune-empty isn't specified, then the filtering process can
create hoards of confusing empty commits
  * if --prune-empty is specified, then intentionally placed empty
commits from before the filtering operation are also pruned instead of
just pruning commits that became empty due to filtering rules.
  * if --prune empty is specified, sometimes empty commits are missed
and left around anyway (probably just a bug, but...).

Also, performance and safety combine:

  * Coming up with the correct shell snippet to do the filtering you
want is sometimes difficult unless you're just doing a trivial
modification such as deleting a couple files.  People have often come
to me for help, so I should be practiced and an expert, but even for
fairly simple cases I still sometimes taken over 10 minutes and
several iterations to get the right commands -- and that's assuming
they are working on a tiny repository.  Unfortunately, people often
learn if the snippet is right or wrong by trying it out, but the
rightness or wrongness can vary depending on special circumstances
(spaces in filenames, non-ascii filenames, funny author names or
emails, invalid timezones, presence of grafts or replace objects,
etc.), meaning they may have to wait a long time, hit an error, then
restart.  The performance of filter-branch is so bad that this cycle
is painful, reducing the time available to carefully re-check (to say
nothing about what it does to the patience of the person doing the
rewrite even if they do technically have more time available).  This
problem is extra compounded because errors from broken filters may not
be shown for a long time and/or get lost in a sea of output.  Even
worse, broken filters often just result in silent incorrect rewrites.
  * To top it all off, even when users finally find working commands,
they naturally want to share them.  But they may be unaware that their
repo didn't have some special cases that someone else's does.  So,
when someone else with a different repository runs the same commands,
they get hit by the problems above.  Or, the user just runs commands
that really were vetted for special cases, but they run it on a
different OS where it doesn't work, as noted above.

== End of long answer ==


Summary of above: Anything compatible with git-filter-branch will be
slower than molasses and extraordinarily unsafe.



> It could also spew warnings to recommend safer switches.

Ooh, I can take a crack at that right now: "For safety, don't use
--tree-filter, --index-filter, --commit-filter, --tag-name-filter,
--prune-empty, or (obviously) --force and _always_ use '--all' (and
nothing else) for <rev-list options>.  For performance, don't use
--tree-filter, --index-filter, --commit-filter, --tag-name-filter,
--setup, --env-filter, or --msg-filter.  Also, don't depend
refs/original/ stuff since that's unsafe (tag dereferencing), making
--original useless.  If you follow all these suggestions, -d is
useless too.  Oh, and I forgot to include --parent-filter among the
bad-for-performance cases because you should have been using
git-replace(1) instead of it for some time now,  So that leaves us
with our subset that could theoretically be made safe and performant:
--subdirectory-filter."

Turns out, filter-repo does support this exact flag, so if you're
willing to restrict yourself to this subset, then filter-repo IS a
drop-in replacement.  :-)

And if you want not just recommendations of flags to avoid, bug flags
you can use, then, again, see filter-repo for flags you can use.


> Stability is a major reason I use git, the Linux kernel,
> and why I distrust+avoid desktop/GUI software.  Removing
> "unsafe" features, even with good intentions, inevitably leads
> to frustrated users.

I did not and would not suggest deleting git-filter-branch.  I
suggested removing it from git.git and putting it elsewhere AND
telling people where that elsewhere is.  That elsewhere might be
git-filter-repo, it could be a different repo, or it could even be my
alternative faster (but still way too slow) re-implementation of
filter-branch.

Hope that helps,
Elijah

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: Proposing git-filter-repo for inclusion in git.git
  2019-08-23 18:06       ` Elijah Newren
@ 2019-08-23 18:29         ` Elijah Newren
  2019-08-28 11:09         ` Johannes Schindelin
  1 sibling, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-23 18:29 UTC (permalink / raw)
  To: Eric Wong
  Cc: Junio C Hamano, Git Mailing List,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder

On Fri, Aug 23, 2019 at 11:06 AM Elijah Newren <newren@gmail.com> wrote:
<snip>
> Safety:
<snip>

Ooh, and another one I remembered just after hitting 'send':
  * If the user provides a --tag-name-filter that maps multiple tags
to the same name, no warning or error is provided; filter-branch
simply overwrites each tag in some undocumented pre-defined order
(lexicographic) resulting in only one tag at the end.  A regression
test will fail if you attempt to error out and warn the user, so if
you are trying to make a backward compatible reimplementation you have
to add extra code to detect collisions and make sure that only the
lexicographically last one is rewritten.  (fast-import will naturally
error out if told to write the same tag more than once, so you have to
avoid triggering it.)

<snip>
> Summary of above: Anything compatible with git-filter-branch will be
> slower than molasses and extraordinarily unsafe.
<snip.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: Proposing git-filter-repo for inclusion in git.git
  2019-08-22 20:23 ` Junio C Hamano
  2019-08-22 21:12   ` Elijah Newren
@ 2019-08-26 19:56   ` Jeff King
  1 sibling, 0 replies; 73+ messages in thread
From: Jeff King @ 2019-08-26 19:56 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren, Git Mailing List,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder

On Thu, Aug 22, 2019 at 01:23:59PM -0700, Junio C Hamano wrote:

> I do not want a discussion to begin with a Devil's Advocate
> response, but anyway...
> 
> Are we planning to go to all batteries included approach?  I have a
> feeling that there are other tools (hello, "git imerge") that
> equally deserve attention by Git users; are we in the business of
> absorbing them all?  How big a project will our tree become, and how
> much more activity would have to be haneld by the readership of the
> Git mailing list?
> 
> I'd rather see us shed non-core tools we already have (e.g. git-svn,
> cvs import/export) out of git.git and have them as independent
> projects.  But that may be just me.

I like the general line of thinking here, but let me Devil's Advocate
your Devil's Advocate:

  - having separate repos and release schedules means that the
    dependency changes need to be coordinated. E.g., if a feature in
    git-filter-repo needs a new feature in git-core, then the feature
    needs to land in git-core first, then filter-repo needs to decide
    how to handle older versions. Whereas in the same repo, they can
    generally assume to move forward atomically.

  - some of the non-core stuff helps test coverage for the core parts of
    the system. E.g., what bugs might we find in fast-import that are
    only triggered by the filter-repo test suite? Similarly, the
    scripted tools in git.git often serve as canaries for
    backwards-incompatible changes to the plumbing.

    Something like the meta-git.git that Stolee proposed would help with
    that. But then we may end up dealing with other people's messes,
    which is one of the things we'd try to avoid with such a split.

-Peff

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere
  2019-08-22 21:34     ` Junio C Hamano
@ 2019-08-26 23:52       ` Elijah Newren
  2019-08-26 23:52         ` [RFC PATCH 1/5] t6006: simplify and optimize empty message test Elijah Newren
                           ` (7 more replies)
  0 siblings, 8 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-26 23:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Elijah Newren

Following up on the suggestion to make git.git smaller and shed non-core
tools, here's an RFC series to do so with git-filter-branch.  This
series first removes dependencies on git-filter-branch (of which there
were very few), and then deletes git-filter-branch itself in the final
commit.

I'm more than happy to consider alternate places for the filter-branch
history (I had considered just merging it in with git-filter-repo), but
for now I just made it available here:
        https://github.com/newren/git-filter-branch

The rewrite above contains the history of the files deleted in Patch 5,
plus a one-time copy of relevant build files (Makefiles, test-lib.sh,
etc. -- I didn't want the whole history of these), and then touchups to
streamline the build files and make them all work in this standalone
repo.


Some highlevel notes on the patches:

  * Patches 1&2: are good cleanups & performance wins regardless of
    whether the rest of the series is taken
    
  * Patch 3: an attempt to improve i18n situation for external scripts,
    but discovered to not be necessary/useful for git-filter-branch
    specifically

  * Patch 4:
    * If we are good with deleting git-filter-branch now and just noting
      it in the release notes, then patch 4 could be simplified; there's
      no need to update git-filter-branch.txt in that case.
    * If, however, we want to do some external messaging for an
      additional release cycle or two before moving git-filter-branch
      out of git.git, this patch will help us until then to at least
      avoid recommending a tool which will likely mangle user's data in
      unexpected ways.  But it'd be really helpful if folks could review
      and opine on the BFG stuff if so.

  * Patch 5: actually deletes git-filter-branch, its tests, and
    documentation.


Elijah Newren (5):
  t6006: simplify and optimize empty message test
  t3427: accelerate this test by using fast-export and fast-import
  git-sh-i18n: work with external scripts
  Recommend git-filter-repo instead of git-filter-branch in
    documentation
  Remove git-filter-branch, it is now external to git.git

 .gitignore                          |   1 -
 Documentation/git-fast-export.txt   |   6 +-
 Documentation/git-filter-branch.txt | 481 --------------------
 Documentation/git-gc.txt            |  17 +-
 Documentation/git-rebase.txt        |   2 +-
 Documentation/git-replace.txt       |  10 +-
 Documentation/git-svn.txt           |   4 +-
 Documentation/githooks.txt          |   7 +-
 Makefile                            |   1 -
 command-list.txt                    |   1 -
 contrib/svn-fe/svn-fe.txt           |   4 +-
 git-filter-branch.sh                | 662 ----------------------------
 git-sh-i18n.sh                      |   7 +-
 t/perf/p7000-filter-branch.sh       |  24 -
 t/t3427-rebase-subtree.sh           |  32 +-
 t/t6006-rev-list-format.sh          |   5 +-
 t/t7003-filter-branch.sh            | 505 ---------------------
 t/t7009-filter-branch-null-sha1.sh  |  55 ---
 t/t9902-completion.sh               |  12 +-
 19 files changed, 63 insertions(+), 1773 deletions(-)
 delete mode 100644 Documentation/git-filter-branch.txt
 delete mode 100755 git-filter-branch.sh
 delete mode 100755 t/perf/p7000-filter-branch.sh
 delete mode 100755 t/t7003-filter-branch.sh
 delete mode 100755 t/t7009-filter-branch-null-sha1.sh

-- 
2.23.0.5.g775ebaa2a0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH 1/5] t6006: simplify and optimize empty message test
  2019-08-26 23:52       ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Elijah Newren
@ 2019-08-26 23:52         ` Elijah Newren
  2019-08-27  1:23           ` Derrick Stolee
  2019-08-26 23:52         ` [RFC PATCH 2/5] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
                           ` (6 subsequent siblings)
  7 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-08-26 23:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Elijah Newren

Test t6006.71 ("oneline with empty message") was creating two commits
with simple commit messages, and then running filter-branch to rewrite
the commit messages to be empty.  This test was written this way because
the --allow-empty-message option to git commit did not exist at the
time.  Simplify this test and avoid the need to invoke filter-branch by
just using --allow-empty-message when creating the commit.

Despite only being one piece of the 71st test and there being 73 tests
overall, this small change to just this one test speeds up the overall
execution time of t6006 (as measured by the best of 3 runs of `time
./t6006-rev-list-format.sh`) by about 11% on Linux and by 13% on
Mac.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t6006-rev-list-format.sh | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
index da113d975b..d30e41c9f7 100755
--- a/t/t6006-rev-list-format.sh
+++ b/t/t6006-rev-list-format.sh
@@ -501,9 +501,8 @@ test_expect_success 'reflog identity' '
 '
 
 test_expect_success 'oneline with empty message' '
-	git commit -m "dummy" --allow-empty &&
-	git commit -m "dummy" --allow-empty &&
-	git filter-branch --msg-filter "sed -e s/dummy//" HEAD^^.. &&
+	git commit --allow-empty --allow-empty-message &&
+	git commit --allow-empty --allow-empty-message &&
 	git rev-list --oneline HEAD >test.txt &&
 	test_line_count = 5 test.txt &&
 	git rev-list --oneline --graph HEAD >testg.txt &&
-- 
2.23.0.5.g775ebaa2a0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH 2/5] t3427: accelerate this test by using fast-export and fast-import
  2019-08-26 23:52       ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Elijah Newren
  2019-08-26 23:52         ` [RFC PATCH 1/5] t6006: simplify and optimize empty message test Elijah Newren
@ 2019-08-26 23:52         ` Elijah Newren
  2019-08-27  1:25           ` Derrick Stolee
  2019-08-26 23:52         ` [RFC PATCH 3/5] git-sh-i18n: work with external scripts Elijah Newren
                           ` (5 subsequent siblings)
  7 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-08-26 23:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Elijah Newren

fast-export and fast-import can easily handle the simple rewrite that
was being done by filter-branch, and should be significantly faster on
systems with a slow fork.  Timings from before and after on two laptops
that I have access to (measured via `time ./t3427-rebase-subtree.sh`,
i.e. including everything in this test -- not just the filter-branch or
fast-export/fast-import pair):

   Linux:  4.305s -> 3.684s (~17% speedup)
   Mac:   10.128s -> 7.038s (~30% speedup)

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t3427-rebase-subtree.sh | 32 ++++++++++++++++++++++++--------
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/t/t3427-rebase-subtree.sh b/t/t3427-rebase-subtree.sh
index d8640522a0..d05fcce5dc 100755
--- a/t/t3427-rebase-subtree.sh
+++ b/t/t3427-rebase-subtree.sh
@@ -42,7 +42,9 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --preserve-merges --onto commit 4' '
 	reset_rebase &&
 	git checkout -b rebase-preserve-merges-4 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	git fast-export --no-data HEAD -- files_subtree/ \
+		| sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" \
+		| git fast-import --force --quiet &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD~)" = "files_subtree/master4"
@@ -53,7 +55,9 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --preserve-merges --onto commit 5' '
 	reset_rebase &&
 	git checkout -b rebase-preserve-merges-5 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	git fast-export --no-data HEAD -- files_subtree/ \
+		| sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" \
+		| git fast-import --force --quiet &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD)" = "files_subtree/master5"
@@ -64,7 +68,9 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --keep-empty --preserve-merges --onto commit 4' '
 	reset_rebase &&
 	git checkout -b rebase-keep-empty-4 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	git fast-export --no-data HEAD -- files_subtree/ \
+		| sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" \
+		| git fast-import --force --quiet &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --keep-empty --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD~2)" = "files_subtree/master4"
@@ -75,7 +81,9 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --keep-empty --preserve-merges --onto commit 5' '
 	reset_rebase &&
 	git checkout -b rebase-keep-empty-5 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	git fast-export --no-data HEAD -- files_subtree/ \
+		| sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" \
+		| git fast-import --force --quiet &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --keep-empty --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD~)" = "files_subtree/master5"
@@ -86,7 +94,9 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --keep-empty --preserve-merges --onto empty commit' '
 	reset_rebase &&
 	git checkout -b rebase-keep-empty-empty master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	git fast-export --no-data HEAD -- files_subtree/ \
+		| sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" \
+		| git fast-import --force --quiet &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --keep-empty --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD)" = "Empty commit"
@@ -96,7 +106,9 @@ test_expect_failure REBASE_P \
 test_expect_failure 'Rebase -Xsubtree --onto commit 4' '
 	reset_rebase &&
 	git checkout -b rebase-onto-4 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	git fast-export --no-data HEAD -- files_subtree/ \
+		| sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" \
+		| git fast-import --force --quiet &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --onto files-master master &&
 	verbose test "$(commit_message HEAD~2)" = "files_subtree/master4"
@@ -106,7 +118,9 @@ test_expect_failure 'Rebase -Xsubtree --onto commit 4' '
 test_expect_failure 'Rebase -Xsubtree --onto commit 5' '
 	reset_rebase &&
 	git checkout -b rebase-onto-5 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	git fast-export --no-data HEAD -- files_subtree/ \
+		| sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" \
+		| git fast-import --force --quiet &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --onto files-master master &&
 	verbose test "$(commit_message HEAD~)" = "files_subtree/master5"
@@ -115,7 +129,9 @@ test_expect_failure 'Rebase -Xsubtree --onto commit 5' '
 test_expect_failure 'Rebase -Xsubtree --onto empty commit' '
 	reset_rebase &&
 	git checkout -b rebase-onto-empty master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	git fast-export --no-data HEAD -- files_subtree/ \
+		| sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" \
+		| git fast-import --force --quiet &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --onto files-master master &&
 	verbose test "$(commit_message HEAD)" = "Empty commit"
-- 
2.23.0.5.g775ebaa2a0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH 3/5] git-sh-i18n: work with external scripts
  2019-08-26 23:52       ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Elijah Newren
  2019-08-26 23:52         ` [RFC PATCH 1/5] t6006: simplify and optimize empty message test Elijah Newren
  2019-08-26 23:52         ` [RFC PATCH 2/5] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
@ 2019-08-26 23:52         ` Elijah Newren
  2019-08-27  1:28           ` Derrick Stolee
  2019-08-26 23:52         ` [RFC PATCH 4/5] Recommend git-filter-repo instead of git-filter-branch in documentation Elijah Newren
                           ` (4 subsequent siblings)
  7 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-08-26 23:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Elijah Newren

Scripts external to git could source $(git --exec-path)/git-sh-setup (as
we document in Documentation/git-sh-setup.txt).  This will in turn
source git-sh-i18n, which will setup some handy internationalization
infrastructure.  However, git-sh-i18n hardcodes the TEXTDOMAIN, meaning
that anyone using this infrastructure will only get translations that
are shipped with git.  Allow the external scripts to specify their own
translation domain but otherwise use our infrastructure for accessing
translations.

My original plan was to have git-filter-branch be the first testcase
using this feature, with a goal of minimizing the number of changes that
needed to be made to it when I moved it out of git.git.  However, I
realized after creating this patch that no strings in git-filter-branch
are translated.  However, the generalization could be useful if we move
other tools from git.git to an external location.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 git-sh-i18n.sh | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/git-sh-i18n.sh b/git-sh-i18n.sh
index 8eef60b43f..3d04d5d515 100644
--- a/git-sh-i18n.sh
+++ b/git-sh-i18n.sh
@@ -5,7 +5,12 @@
 #
 
 # Export the TEXTDOMAIN* data that we need for Git
-TEXTDOMAIN=git
+if test -z "$TEXTDOMAIN_OVERRIDE"
+then
+	TEXTDOMAIN=git
+else
+	TEXTDOMAIN="$TEXTDOMAIN_OVERRIDE"
+fi
 export TEXTDOMAIN
 if test -z "$GIT_TEXTDOMAINDIR"
 then
-- 
2.23.0.5.g775ebaa2a0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH 4/5] Recommend git-filter-repo instead of git-filter-branch in documentation
  2019-08-26 23:52       ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Elijah Newren
                           ` (2 preceding siblings ...)
  2019-08-26 23:52         ` [RFC PATCH 3/5] git-sh-i18n: work with external scripts Elijah Newren
@ 2019-08-26 23:52         ` Elijah Newren
  2019-08-27  1:32           ` Derrick Stolee
  2019-08-26 23:52         ` [RFC PATCH 5/5] Remove git-filter-branch, it is now external to git.git Elijah Newren
                           ` (3 subsequent siblings)
  7 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-08-26 23:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Elijah Newren

filter-branch suffers from a huge number of pitfalls that can result in
incorrectly rewritten history, and many of the problems can easily go
undetected until the new repository is in use.  This can result in
problems ranging from an even messier history than what led folks to
filter-branch in the first place, to data loss or corruption.  These
issues cannot be backward compatibly fixed, so add a warning to the
filter-branch manpage about this and recommand that another tool (such
as filter-repo) be used instead.

Also, update other manpages that referenced filter-branch.  Several of
these needed updates even if we could continue recommending
filter-branch, either due to implying that something was unique to
filter-branch when it applied more generally to all history rewriting
tools (e.g. BFG, reposurgeon, fast-import, filter-repo), or because
something about filter-branch was used as an example despite other more
commonly known examples now existing.  Reword these sections to fix
these issues and to avoid recommending filter-branch.

Finally, remove the section explaining BFG Repo Cleaner as an
alternative to filter-branch.  I feel somewhat bad about this,
especially since I feel like I learned so much from BFG that I put to
good use in filter-repo (which is much more than I can say for
filter-branch), but keeping that section presented a few problems:
  * In order to recommend that people quit using filter-branch, we need
    to provide them a recomendation for something else to use that
    can handle all the same types of rewrites.  To my knowledge,
    filter-repo is the only such tool.  So it needs to be mentioned.
  * I don't want to give conflicting recommendations to users
  * If we recommend two tools, we shouldn't expect users to learn both
    and pick which one to use; we should explain which problems one
    can solve that the other can't or when one is much faster than
    the other.
  * BFG and filter-repo have similar performance
  * All filtering types that BFG can do, filter-repo can also do.  In
    fact, filter-repo comes with a reimplementation of BFG named
    bfg-ish which provides the same user-interface as BFG but with
    several bugfixes and new features that are hard to implement in
    BFG due to its technical underpinnings.
While I could still mention both tools, it seems like I would need to
provide some kind of comparison and I would ultimately just say that
filter-repo can do everything BFG can, so ultimately it seems that it
is just better to remove that section altogether.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-fast-export.txt   |  6 ++---
 Documentation/git-filter-branch.txt | 42 ++++++++---------------------
 Documentation/git-gc.txt            | 17 ++++++------
 Documentation/git-rebase.txt        |  2 +-
 Documentation/git-replace.txt       | 10 +++----
 Documentation/git-svn.txt           |  4 +--
 Documentation/githooks.txt          |  7 ++---
 contrib/svn-fe/svn-fe.txt           |  4 +--
 8 files changed, 36 insertions(+), 56 deletions(-)

diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
index cc940eb9ad..784e934009 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -17,9 +17,9 @@ This program dumps the given revisions in a form suitable to be piped
 into 'git fast-import'.
 
 You can use it as a human-readable bundle replacement (see
-linkgit:git-bundle[1]), or as a kind of an interactive
-'git filter-branch'.
-
+linkgit:git-bundle[1]), or as a format that can be edited before being
+fed to 'git fast-import' in order to do history rewrites (an ability
+relied on by tools like 'git filter-repo').
 
 OPTIONS
 -------
diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
index 6b53dd7e06..8c586eed55 100644
--- a/Documentation/git-filter-branch.txt
+++ b/Documentation/git-filter-branch.txt
@@ -16,6 +16,17 @@ SYNOPSIS
 	[--original <namespace>] [-d <directory>] [-f | --force]
 	[--state-branch <branch>] [--] [<rev-list options>...]
 
+WARNING
+-------
+'git filter-branch' has a litany of gotchas that can and will cause
+history to be rewritten incorrectly (in addition to abysmal
+performance).  These issues cannot be backward compatibly fixed and as
+such, its use is not recommended.  Please use an alternative history
+filtering tool such as 'git filter-repo'.  If you still need to use
+'git filter-branch', please carefully read the "Safety" section of
+https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@mail.gmail.com/
+and avoid as many of the pitfalls listed there as reasonably possible.
+
 DESCRIPTION
 -----------
 Lets you rewrite Git revision history by rewriting the branches mentioned
@@ -445,37 +456,6 @@ warned.
   (or if your git-gc is not new enough to support arguments to
   `--prune`, use `git repack -ad; git prune` instead).
 
-NOTES
------
-
-git-filter-branch allows you to make complex shell-scripted rewrites
-of your Git history, but you probably don't need this flexibility if
-you're simply _removing unwanted data_ like large files or passwords.
-For those operations you may want to consider
-http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner],
-a JVM-based alternative to git-filter-branch, typically at least
-10-50x faster for those use-cases, and with quite different
-characteristics:
-
-* Any particular version of a file is cleaned exactly _once_. The BFG,
-  unlike git-filter-branch, does not give you the opportunity to
-  handle a file differently based on where or when it was committed
-  within your history. This constraint gives the core performance
-  benefit of The BFG, and is well-suited to the task of cleansing bad
-  data - you don't care _where_ the bad data is, you just want it
-  _gone_.
-
-* By default The BFG takes full advantage of multi-core machines,
-  cleansing commit file-trees in parallel. git-filter-branch cleans
-  commits sequentially (i.e. in a single-threaded manner), though it
-  _is_ possible to write filters that include their own parallelism,
-  in the scripts executed against each commit.
-
-* The http://rtyley.github.io/bfg-repo-cleaner/#examples[command options]
-  are much more restrictive than git-filter branch, and dedicated just
-  to the tasks of removing unwanted data- e.g:
-  `--strip-blobs-bigger-than 1M`.
-
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 247f765604..0c114ad1ca 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -115,15 +115,14 @@ NOTES
 -----
 
 'git gc' tries very hard not to delete objects that are referenced
-anywhere in your repository. In
-particular, it will keep not only objects referenced by your current set
-of branches and tags, but also objects referenced by the index,
-remote-tracking branches, refs saved by 'git filter-branch' in
-refs/original/, reflogs (which may reference commits in branches
-that were later amended or rewound), and anything else in the refs/* namespace.
-If you are expecting some objects to be deleted and they aren't, check
-all of those locations and decide whether it makes sense in your case to
-remove those references.
+anywhere in your repository. In particular, it will keep not only
+objects referenced by your current set of branches and tags, but also
+objects referenced by the index, remote-tracking branches, notes saved
+by 'git notes' under refs/notes/, reflogs (which may reference commits
+in branches that were later amended or rewound), and anything else in
+the refs/* namespace.  If you are expecting some objects to be deleted
+and they aren't, check all of those locations and decide whether it
+makes sense in your case to remove those references.
 
 On the other hand, when 'git gc' runs concurrently with another process,
 there is a risk of it deleting an object that the other process is using
diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 6156609cf7..2f201d85d4 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -832,7 +832,7 @@ Hard case: The changes are not the same.::
 	This happens if the 'subsystem' rebase had conflicts, or used
 	`--interactive` to omit, edit, squash, or fixup commits; or
 	if the upstream used one of `commit --amend`, `reset`, or
-	`filter-branch`.
+	a full history rewriting command like `filter-repo`.
 
 
 The easy case
diff --git a/Documentation/git-replace.txt b/Documentation/git-replace.txt
index 246dc9943c..35595a2cd3 100644
--- a/Documentation/git-replace.txt
+++ b/Documentation/git-replace.txt
@@ -123,10 +123,10 @@ The following format are available:
 CREATING REPLACEMENT OBJECTS
 ----------------------------
 
-linkgit:git-filter-branch[1], linkgit:git-hash-object[1] and
-linkgit:git-rebase[1], among other git commands, can be used to create
-replacement objects from existing objects. The `--edit` option can
-also be used with 'git replace' to create a replacement object by
+linkgit:git-hash-object[1], linkgit:git-rebase[1], and
+linkgit:git-filter-repo[1], among other git commands, can be used to
+create replacement objects from existing objects. The `--edit` option
+can also be used with 'git replace' to create a replacement object by
 editing an existing object.
 
 If you want to replace many blobs, trees or commits that are part of a
@@ -148,8 +148,8 @@ pending objects.
 SEE ALSO
 --------
 linkgit:git-hash-object[1]
-linkgit:git-filter-branch[1]
 linkgit:git-rebase[1]
+linkgit:git-filter-repo[1]
 linkgit:git-tag[1]
 linkgit:git-branch[1]
 linkgit:git-commit[1]
diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
index 30711625fd..f2762dd5d4 100644
--- a/Documentation/git-svn.txt
+++ b/Documentation/git-svn.txt
@@ -769,9 +769,9 @@ option for (hopefully) obvious reasons.
 +
 This option is NOT recommended as it makes it difficult to track down
 old references to SVN revision numbers in existing documentation, bug
-reports and archives.  If you plan to eventually migrate from SVN to Git
+reports, and archives.  If you plan to eventually migrate from SVN to Git
 and are certain about dropping SVN history, consider
-linkgit:git-filter-branch[1] instead.  filter-branch also allows
+linkgit:git-filter-repo[1] instead.  filter-repo also allows
 reformatting of metadata for ease-of-reading and rewriting authorship
 info for non-"svn.authorsFile" users.
 
diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 82cd573776..997548f5ed 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -425,9 +425,10 @@ post-rewrite
 
 This hook is invoked by commands that rewrite commits
 (linkgit:git-commit[1] when called with `--amend` and
-linkgit:git-rebase[1]; currently `git filter-branch` does 'not' call
-it!).  Its first argument denotes the command it was invoked by:
-currently one of `amend` or `rebase`.  Further command-dependent
+linkgit:git-rebase[1]; however, full-history (re)writing tools like
+linkgit:git-fast-import[1] or linkgit:git-filter-repo[1] typically do
+not call it!).  Its first argument denotes the command it was invoked
+by: currently one of `amend` or `rebase`.  Further command-dependent
 arguments may be passed in the future.
 
 The hook receives a list of the rewritten commits on stdin, in the
diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
index a3425f4770..19333fc8df 100644
--- a/contrib/svn-fe/svn-fe.txt
+++ b/contrib/svn-fe/svn-fe.txt
@@ -56,7 +56,7 @@ line.  This line has the form `git-svn-id: URL@REVNO UUID`.
 
 The resulting repository will generally require further processing
 to put each project in its own repository and to separate the history
-of each branch.  The 'git filter-branch --subdirectory-filter' command
+of each branch.  The 'git filter-repo --subdirectory-filter' command
 may be useful for this purpose.
 
 BUGS
@@ -67,5 +67,5 @@ The exit status does not reflect whether an error was detected.
 
 SEE ALSO
 --------
-git-svn(1), svn2git(1), svk(1), git-filter-branch(1), git-fast-import(1),
+git-svn(1), svn2git(1), svk(1), git-filter-repo(1), git-fast-import(1),
 https://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt
-- 
2.23.0.5.g775ebaa2a0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH 5/5] Remove git-filter-branch, it is now external to git.git
  2019-08-26 23:52       ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Elijah Newren
                           ` (3 preceding siblings ...)
  2019-08-26 23:52         ` [RFC PATCH 4/5] Recommend git-filter-repo instead of git-filter-branch in documentation Elijah Newren
@ 2019-08-26 23:52         ` Elijah Newren
  2019-08-27  1:39         ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Derrick Stolee
                           ` (2 subsequent siblings)
  7 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-26 23:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Elijah Newren

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 .gitignore                          |   1 -
 Documentation/git-filter-branch.txt | 461 -------------------
 Makefile                            |   1 -
 command-list.txt                    |   1 -
 git-filter-branch.sh                | 662 ----------------------------
 t/perf/p7000-filter-branch.sh       |  24 -
 t/t7003-filter-branch.sh            | 505 ---------------------
 t/t7009-filter-branch-null-sha1.sh  |  55 ---
 t/t9902-completion.sh               |  12 +-
 9 files changed, 6 insertions(+), 1716 deletions(-)
 delete mode 100644 Documentation/git-filter-branch.txt
 delete mode 100755 git-filter-branch.sh
 delete mode 100755 t/perf/p7000-filter-branch.sh
 delete mode 100755 t/t7003-filter-branch.sh
 delete mode 100755 t/t7009-filter-branch-null-sha1.sh

diff --git a/.gitignore b/.gitignore
index 521d8f4fb4..97f5d8afea 100644
--- a/.gitignore
+++ b/.gitignore
@@ -63,7 +63,6 @@
 /git-fast-import
 /git-fetch
 /git-fetch-pack
-/git-filter-branch
 /git-fmt-merge-msg
 /git-for-each-ref
 /git-format-patch
diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
deleted file mode 100644
index 8c586eed55..0000000000
--- a/Documentation/git-filter-branch.txt
+++ /dev/null
@@ -1,461 +0,0 @@
-git-filter-branch(1)
-====================
-
-NAME
-----
-git-filter-branch - Rewrite branches
-
-SYNOPSIS
---------
-[verse]
-'git filter-branch' [--setup <command>] [--subdirectory-filter <directory>]
-	[--env-filter <command>] [--tree-filter <command>]
-	[--index-filter <command>] [--parent-filter <command>]
-	[--msg-filter <command>] [--commit-filter <command>]
-	[--tag-name-filter <command>] [--prune-empty]
-	[--original <namespace>] [-d <directory>] [-f | --force]
-	[--state-branch <branch>] [--] [<rev-list options>...]
-
-WARNING
--------
-'git filter-branch' has a litany of gotchas that can and will cause
-history to be rewritten incorrectly (in addition to abysmal
-performance).  These issues cannot be backward compatibly fixed and as
-such, its use is not recommended.  Please use an alternative history
-filtering tool such as 'git filter-repo'.  If you still need to use
-'git filter-branch', please carefully read the "Safety" section of
-https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@mail.gmail.com/
-and avoid as many of the pitfalls listed there as reasonably possible.
-
-DESCRIPTION
------------
-Lets you rewrite Git revision history by rewriting the branches mentioned
-in the <rev-list options>, applying custom filters on each revision.
-Those filters can modify each tree (e.g. removing a file or running
-a perl rewrite on all files) or information about each commit.
-Otherwise, all information (including original commit times or merge
-information) will be preserved.
-
-The command will only rewrite the _positive_ refs mentioned in the
-command line (e.g. if you pass 'a..b', only 'b' will be rewritten).
-If you specify no filters, the commits will be recommitted without any
-changes, which would normally have no effect.  Nevertheless, this may be
-useful in the future for compensating for some Git bugs or such,
-therefore such a usage is permitted.
-
-*NOTE*: This command honors `.git/info/grafts` file and refs in
-the `refs/replace/` namespace.
-If you have any grafts or replacement refs defined, running this command
-will make them permanent.
-
-*WARNING*! The rewritten history will have different object names for all
-the objects and will not converge with the original branch.  You will not
-be able to easily push and distribute the rewritten branch on top of the
-original branch.  Please do not use this command if you do not know the
-full implications, and avoid using it anyway, if a simple single commit
-would suffice to fix your problem.  (See the "RECOVERING FROM UPSTREAM
-REBASE" section in linkgit:git-rebase[1] for further information about
-rewriting published history.)
-
-Always verify that the rewritten version is correct: The original refs,
-if different from the rewritten ones, will be stored in the namespace
-'refs/original/'.
-
-Note that since this operation is very I/O expensive, it might
-be a good idea to redirect the temporary directory off-disk with the
-`-d` option, e.g. on tmpfs.  Reportedly the speedup is very noticeable.
-
-
-Filters
-~~~~~~~
-
-The filters are applied in the order as listed below.  The <command>
-argument is always evaluated in the shell context using the 'eval' command
-(with the notable exception of the commit filter, for technical reasons).
-Prior to that, the `$GIT_COMMIT` environment variable will be set to contain
-the id of the commit being rewritten.  Also, GIT_AUTHOR_NAME,
-GIT_AUTHOR_EMAIL, GIT_AUTHOR_DATE, GIT_COMMITTER_NAME, GIT_COMMITTER_EMAIL,
-and GIT_COMMITTER_DATE are taken from the current commit and exported to
-the environment, in order to affect the author and committer identities of
-the replacement commit created by linkgit:git-commit-tree[1] after the
-filters have run.
-
-If any evaluation of <command> returns a non-zero exit status, the whole
-operation will be aborted.
-
-A 'map' function is available that takes an "original sha1 id" argument
-and outputs a "rewritten sha1 id" if the commit has been already
-rewritten, and "original sha1 id" otherwise; the 'map' function can
-return several ids on separate lines if your commit filter emitted
-multiple commits.
-
-
-OPTIONS
--------
-
---setup <command>::
-	This is not a real filter executed for each commit but a one
-	time setup just before the loop. Therefore no commit-specific
-	variables are defined yet.  Functions or variables defined here
-	can be used or modified in the following filter steps except
-	the commit filter, for technical reasons.
-
---subdirectory-filter <directory>::
-	Only look at the history which touches the given subdirectory.
-	The result will contain that directory (and only that) as its
-	project root. Implies <<Remap_to_ancestor>>.
-
---env-filter <command>::
-	This filter may be used if you only need to modify the environment
-	in which the commit will be performed.  Specifically, you might
-	want to rewrite the author/committer name/email/time environment
-	variables (see linkgit:git-commit-tree[1] for details).
-
---tree-filter <command>::
-	This is the filter for rewriting the tree and its contents.
-	The argument is evaluated in shell with the working
-	directory set to the root of the checked out tree.  The new tree
-	is then used as-is (new files are auto-added, disappeared files
-	are auto-removed - neither .gitignore files nor any other ignore
-	rules *HAVE ANY EFFECT*!).
-
---index-filter <command>::
-	This is the filter for rewriting the index.  It is similar to the
-	tree filter but does not check out the tree, which makes it much
-	faster.  Frequently used with `git rm --cached
-	--ignore-unmatch ...`, see EXAMPLES below.  For hairy
-	cases, see linkgit:git-update-index[1].
-
---parent-filter <command>::
-	This is the filter for rewriting the commit's parent list.
-	It will receive the parent string on stdin and shall output
-	the new parent string on stdout.  The parent string is in
-	the format described in linkgit:git-commit-tree[1]: empty for
-	the initial commit, "-p parent" for a normal commit and
-	"-p parent1 -p parent2 -p parent3 ..." for a merge commit.
-
---msg-filter <command>::
-	This is the filter for rewriting the commit messages.
-	The argument is evaluated in the shell with the original
-	commit message on standard input; its standard output is
-	used as the new commit message.
-
---commit-filter <command>::
-	This is the filter for performing the commit.
-	If this filter is specified, it will be called instead of the
-	'git commit-tree' command, with arguments of the form
-	"<TREE_ID> [(-p <PARENT_COMMIT_ID>)...]" and the log message on
-	stdin.  The commit id is expected on stdout.
-+
-As a special extension, the commit filter may emit multiple
-commit ids; in that case, the rewritten children of the original commit will
-have all of them as parents.
-+
-You can use the 'map' convenience function in this filter, and other
-convenience functions, too.  For example, calling 'skip_commit "$@"'
-will leave out the current commit (but not its changes! If you want
-that, use 'git rebase' instead).
-+
-You can also use the `git_commit_non_empty_tree "$@"` instead of
-`git commit-tree "$@"` if you don't wish to keep commits with a single parent
-and that makes no change to the tree.
-
---tag-name-filter <command>::
-	This is the filter for rewriting tag names. When passed,
-	it will be called for every tag ref that points to a rewritten
-	object (or to a tag object which points to a rewritten object).
-	The original tag name is passed via standard input, and the new
-	tag name is expected on standard output.
-+
-The original tags are not deleted, but can be overwritten;
-use "--tag-name-filter cat" to simply update the tags.  In this
-case, be very careful and make sure you have the old tags
-backed up in case the conversion has run afoul.
-+
-Nearly proper rewriting of tag objects is supported. If the tag has
-a message attached, a new tag object will be created with the same message,
-author, and timestamp. If the tag has a signature attached, the
-signature will be stripped. It is by definition impossible to preserve
-signatures. The reason this is "nearly" proper, is because ideally if
-the tag did not change (points to the same object, has the same name, etc.)
-it should retain any signature. That is not the case, signatures will always
-be removed, buyer beware. There is also no support for changing the
-author or timestamp (or the tag message for that matter). Tags which point
-to other tags will be rewritten to point to the underlying commit.
-
---prune-empty::
-	Some filters will generate empty commits that leave the tree untouched.
-	This option instructs git-filter-branch to remove such commits if they
-	have exactly one or zero non-pruned parents; merge commits will
-	therefore remain intact.  This option cannot be used together with
-	`--commit-filter`, though the same effect can be achieved by using the
-	provided `git_commit_non_empty_tree` function in a commit filter.
-
---original <namespace>::
-	Use this option to set the namespace where the original commits
-	will be stored. The default value is 'refs/original'.
-
--d <directory>::
-	Use this option to set the path to the temporary directory used for
-	rewriting.  When applying a tree filter, the command needs to
-	temporarily check out the tree to some directory, which may consume
-	considerable space in case of large projects.  By default it
-	does this in the `.git-rewrite/` directory but you can override
-	that choice by this parameter.
-
--f::
---force::
-	'git filter-branch' refuses to start with an existing temporary
-	directory or when there are already refs starting with
-	'refs/original/', unless forced.
-
---state-branch <branch>::
-	This option will cause the mapping from old to new objects to
-	be loaded from named branch upon startup and saved as a new
-	commit to that branch upon exit, enabling incremental of large
-	trees. If '<branch>' does not exist it will be created.
-
-<rev-list options>...::
-	Arguments for 'git rev-list'.  All positive refs included by
-	these options are rewritten.  You may also specify options
-	such as `--all`, but you must use `--` to separate them from
-	the 'git filter-branch' options. Implies <<Remap_to_ancestor>>.
-
-
-[[Remap_to_ancestor]]
-Remap to ancestor
-~~~~~~~~~~~~~~~~~
-
-By using linkgit:git-rev-list[1] arguments, e.g., path limiters, you can limit the
-set of revisions which get rewritten. However, positive refs on the command
-line are distinguished: we don't let them be excluded by such limiters. For
-this purpose, they are instead rewritten to point at the nearest ancestor that
-was not excluded.
-
-
-EXIT STATUS
------------
-
-On success, the exit status is `0`.  If the filter can't find any commits to
-rewrite, the exit status is `2`.  On any other error, the exit status may be
-any other non-zero value.
-
-
-EXAMPLES
---------
-
-Suppose you want to remove a file (containing confidential information
-or copyright violation) from all commits:
-
--------------------------------------------------------
-git filter-branch --tree-filter 'rm filename' HEAD
--------------------------------------------------------
-
-However, if the file is absent from the tree of some commit,
-a simple `rm filename` will fail for that tree and commit.
-Thus you may instead want to use `rm -f filename` as the script.
-
-Using `--index-filter` with 'git rm' yields a significantly faster
-version.  Like with using `rm filename`, `git rm --cached filename`
-will fail if the file is absent from the tree of a commit.  If you
-want to "completely forget" a file, it does not matter when it entered
-history, so we also add `--ignore-unmatch`:
-
---------------------------------------------------------------------------
-git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD
---------------------------------------------------------------------------
-
-Now, you will get the rewritten history saved in HEAD.
-
-To rewrite the repository to look as if `foodir/` had been its project
-root, and discard all other history:
-
--------------------------------------------------------
-git filter-branch --subdirectory-filter foodir -- --all
--------------------------------------------------------
-
-Thus you can, e.g., turn a library subdirectory into a repository of
-its own.  Note the `--` that separates 'filter-branch' options from
-revision options, and the `--all` to rewrite all branches and tags.
-
-To set a commit (which typically is at the tip of another
-history) to be the parent of the current initial commit, in
-order to paste the other history behind the current history:
-
--------------------------------------------------------------------
-git filter-branch --parent-filter 'sed "s/^\$/-p <graft-id>/"' HEAD
--------------------------------------------------------------------
-
-(if the parent string is empty - which happens when we are dealing with
-the initial commit - add graftcommit as a parent).  Note that this assumes
-history with a single root (that is, no merge without common ancestors
-happened).  If this is not the case, use:
-
---------------------------------------------------------------------------
-git filter-branch --parent-filter \
-	'test $GIT_COMMIT = <commit-id> && echo "-p <graft-id>" || cat' HEAD
---------------------------------------------------------------------------
-
-or even simpler:
-
------------------------------------------------
-git replace --graft $commit-id $graft-id
-git filter-branch $graft-id..HEAD
------------------------------------------------
-
-To remove commits authored by "Darl McBribe" from the history:
-
-------------------------------------------------------------------------------
-git filter-branch --commit-filter '
-	if [ "$GIT_AUTHOR_NAME" = "Darl McBribe" ];
-	then
-		skip_commit "$@";
-	else
-		git commit-tree "$@";
-	fi' HEAD
-------------------------------------------------------------------------------
-
-The function 'skip_commit' is defined as follows:
-
---------------------------
-skip_commit()
-{
-	shift;
-	while [ -n "$1" ];
-	do
-		shift;
-		map "$1";
-		shift;
-	done;
-}
---------------------------
-
-The shift magic first throws away the tree id and then the -p
-parameters.  Note that this handles merges properly! In case Darl
-committed a merge between P1 and P2, it will be propagated properly
-and all children of the merge will become merge commits with P1,P2
-as their parents instead of the merge commit.
-
-*NOTE* the changes introduced by the commits, and which are not reverted
-by subsequent commits, will still be in the rewritten branch. If you want
-to throw out _changes_ together with the commits, you should use the
-interactive mode of 'git rebase'.
-
-You can rewrite the commit log messages using `--msg-filter`.  For
-example, 'git svn-id' strings in a repository created by 'git svn' can
-be removed this way:
-
--------------------------------------------------------
-git filter-branch --msg-filter '
-	sed -e "/^git-svn-id:/d"
-'
--------------------------------------------------------
-
-If you need to add 'Acked-by' lines to, say, the last 10 commits (none
-of which is a merge), use this command:
-
---------------------------------------------------------
-git filter-branch --msg-filter '
-	cat &&
-	echo "Acked-by: Bugs Bunny <bunny@bugzilla.org>"
-' HEAD~10..HEAD
---------------------------------------------------------
-
-The `--env-filter` option can be used to modify committer and/or author
-identity.  For example, if you found out that your commits have the wrong
-identity due to a misconfigured user.email, you can make a correction,
-before publishing the project, like this:
-
---------------------------------------------------------
-git filter-branch --env-filter '
-	if test "$GIT_AUTHOR_EMAIL" = "root@localhost"
-	then
-		GIT_AUTHOR_EMAIL=john@example.com
-	fi
-	if test "$GIT_COMMITTER_EMAIL" = "root@localhost"
-	then
-		GIT_COMMITTER_EMAIL=john@example.com
-	fi
-' -- --all
---------------------------------------------------------
-
-To restrict rewriting to only part of the history, specify a revision
-range in addition to the new branch name.  The new branch name will
-point to the top-most revision that a 'git rev-list' of this range
-will print.
-
-Consider this history:
-
-------------------
-     D--E--F--G--H
-    /     /
-A--B-----C
-------------------
-
-To rewrite only commits D,E,F,G,H, but leave A, B and C alone, use:
-
---------------------------------
-git filter-branch ... C..H
---------------------------------
-
-To rewrite commits E,F,G,H, use one of these:
-
-----------------------------------------
-git filter-branch ... C..H --not D
-git filter-branch ... D..H --not C
-----------------------------------------
-
-To move the whole tree into a subdirectory, or remove it from there:
-
----------------------------------------------------------------
-git filter-branch --index-filter \
-	'git ls-files -s | sed "s-\t\"*-&newsubdir/-" |
-		GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
-			git update-index --index-info &&
-	 mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD
----------------------------------------------------------------
-
-
-
-CHECKLIST FOR SHRINKING A REPOSITORY
-------------------------------------
-
-git-filter-branch can be used to get rid of a subset of files,
-usually with some combination of `--index-filter` and
-`--subdirectory-filter`.  People expect the resulting repository to
-be smaller than the original, but you need a few more steps to
-actually make it smaller, because Git tries hard not to lose your
-objects until you tell it to.  First make sure that:
-
-* You really removed all variants of a filename, if a blob was moved
-  over its lifetime.  `git log --name-only --follow --all -- filename`
-  can help you find renames.
-
-* You really filtered all refs: use `--tag-name-filter cat -- --all`
-  when calling git-filter-branch.
-
-Then there are two ways to get a smaller repository.  A safer way is
-to clone, that keeps your original intact.
-
-* Clone it with `git clone file:///path/to/repo`.  The clone
-  will not have the removed objects.  See linkgit:git-clone[1].  (Note
-  that cloning with a plain path just hardlinks everything!)
-
-If you really don't want to clone it, for whatever reasons, check the
-following points instead (in this order).  This is a very destructive
-approach, so *make a backup* or go back to cloning it.  You have been
-warned.
-
-* Remove the original refs backed up by git-filter-branch: say `git
-  for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git
-  update-ref -d`.
-
-* Expire all reflogs with `git reflog expire --expire=now --all`.
-
-* Garbage collect all unreferenced objects with `git gc --prune=now`
-  (or if your git-gc is not new enough to support arguments to
-  `--prune`, use `git repack -ad; git prune` instead).
-
-GIT
----
-Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index f9255344ae..20850def5d 100644
--- a/Makefile
+++ b/Makefile
@@ -607,7 +607,6 @@ unexport CDPATH
 
 SCRIPT_SH += git-bisect.sh
 SCRIPT_SH += git-difftool--helper.sh
-SCRIPT_SH += git-filter-branch.sh
 SCRIPT_SH += git-merge-octopus.sh
 SCRIPT_SH += git-merge-one-file.sh
 SCRIPT_SH += git-merge-resolve.sh
diff --git a/command-list.txt b/command-list.txt
index a9ac72bef4..1ba65d9516 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -90,7 +90,6 @@ git-fast-export                         ancillarymanipulators
 git-fast-import                         ancillarymanipulators
 git-fetch                               mainporcelain           remote
 git-fetch-pack                          synchingrepositories
-git-filter-branch                       ancillarymanipulators
 git-fmt-merge-msg                       purehelpers
 git-for-each-ref                        plumbinginterrogators
 git-format-patch                        mainporcelain
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
deleted file mode 100755
index 5c5afa2b98..0000000000
--- a/git-filter-branch.sh
+++ /dev/null
@@ -1,662 +0,0 @@
-#!/bin/sh
-#
-# Rewrite revision history
-# Copyright (c) Petr Baudis, 2006
-# Minimal changes to "port" it to core-git (c) Johannes Schindelin, 2007
-#
-# Lets you rewrite the revision history of the current branch, creating
-# a new branch. You can specify a number of filters to modify the commits,
-# files and trees.
-
-# The following functions will also be available in the commit filter:
-
-functions=$(cat << \EOF
-EMPTY_TREE=$(git hash-object -t tree /dev/null)
-
-warn () {
-	echo "$*" >&2
-}
-
-map()
-{
-	# if it was not rewritten, take the original
-	if test -r "$workdir/../map/$1"
-	then
-		cat "$workdir/../map/$1"
-	else
-		echo "$1"
-	fi
-}
-
-# if you run 'skip_commit "$@"' in a commit filter, it will print
-# the (mapped) parents, effectively skipping the commit.
-
-skip_commit()
-{
-	shift;
-	while [ -n "$1" ];
-	do
-		shift;
-		map "$1";
-		shift;
-	done;
-}
-
-# if you run 'git_commit_non_empty_tree "$@"' in a commit filter,
-# it will skip commits that leave the tree untouched, commit the other.
-git_commit_non_empty_tree()
-{
-	if test $# = 3 && test "$1" = $(git rev-parse "$3^{tree}"); then
-		map "$3"
-	elif test $# = 1 && test "$1" = $EMPTY_TREE; then
-		:
-	else
-		git commit-tree "$@"
-	fi
-}
-# override die(): this version puts in an extra line break, so that
-# the progress is still visible
-
-die()
-{
-	echo >&2
-	echo "$*" >&2
-	exit 1
-}
-EOF
-)
-
-eval "$functions"
-
-finish_ident() {
-	# Ensure non-empty id name.
-	echo "case \"\$GIT_$1_NAME\" in \"\") GIT_$1_NAME=\"\${GIT_$1_EMAIL%%@*}\" && export GIT_$1_NAME;; esac"
-	# And make sure everything is exported.
-	echo "export GIT_$1_NAME"
-	echo "export GIT_$1_EMAIL"
-	echo "export GIT_$1_DATE"
-}
-
-set_ident () {
-	parse_ident_from_commit author AUTHOR committer COMMITTER
-	finish_ident AUTHOR
-	finish_ident COMMITTER
-}
-
-USAGE="[--setup <command>] [--subdirectory-filter <directory>] [--env-filter <command>]
-	[--tree-filter <command>] [--index-filter <command>]
-	[--parent-filter <command>] [--msg-filter <command>]
-	[--commit-filter <command>] [--tag-name-filter <command>]
-	[--original <namespace>]
-	[-d <directory>] [-f | --force] [--state-branch <branch>]
-	[--] [<rev-list options>...]"
-
-OPTIONS_SPEC=
-. git-sh-setup
-
-if [ "$(is_bare_repository)" = false ]; then
-	require_clean_work_tree 'rewrite branches'
-fi
-
-tempdir=.git-rewrite
-filter_setup=
-filter_env=
-filter_tree=
-filter_index=
-filter_parent=
-filter_msg=cat
-filter_commit=
-filter_tag_name=
-filter_subdir=
-state_branch=
-orig_namespace=refs/original/
-force=
-prune_empty=
-remap_to_ancestor=
-while :
-do
-	case "$1" in
-	--)
-		shift
-		break
-		;;
-	--force|-f)
-		shift
-		force=t
-		continue
-		;;
-	--remap-to-ancestor)
-		# deprecated ($remap_to_ancestor is set now automatically)
-		shift
-		remap_to_ancestor=t
-		continue
-		;;
-	--prune-empty)
-		shift
-		prune_empty=t
-		continue
-		;;
-	-*)
-		;;
-	*)
-		break;
-	esac
-
-	# all switches take one argument
-	ARG="$1"
-	case "$#" in 1) usage ;; esac
-	shift
-	OPTARG="$1"
-	shift
-
-	case "$ARG" in
-	-d)
-		tempdir="$OPTARG"
-		;;
-	--setup)
-		filter_setup="$OPTARG"
-		;;
-	--subdirectory-filter)
-		filter_subdir="$OPTARG"
-		remap_to_ancestor=t
-		;;
-	--env-filter)
-		filter_env="$OPTARG"
-		;;
-	--tree-filter)
-		filter_tree="$OPTARG"
-		;;
-	--index-filter)
-		filter_index="$OPTARG"
-		;;
-	--parent-filter)
-		filter_parent="$OPTARG"
-		;;
-	--msg-filter)
-		filter_msg="$OPTARG"
-		;;
-	--commit-filter)
-		filter_commit="$functions; $OPTARG"
-		;;
-	--tag-name-filter)
-		filter_tag_name="$OPTARG"
-		;;
-	--original)
-		orig_namespace=$(expr "$OPTARG/" : '\(.*[^/]\)/*$')/
-		;;
-	--state-branch)
-		state_branch="$OPTARG"
-		;;
-	*)
-		usage
-		;;
-	esac
-done
-
-case "$prune_empty,$filter_commit" in
-,)
-	filter_commit='git commit-tree "$@"';;
-t,)
-	filter_commit="$functions;"' git_commit_non_empty_tree "$@"';;
-,*)
-	;;
-*)
-	die "Cannot set --prune-empty and --commit-filter at the same time"
-esac
-
-case "$force" in
-t)
-	rm -rf "$tempdir"
-;;
-'')
-	test -d "$tempdir" &&
-		die "$tempdir already exists, please remove it"
-esac
-orig_dir=$(pwd)
-mkdir -p "$tempdir/t" &&
-tempdir="$(cd "$tempdir"; pwd)" &&
-cd "$tempdir/t" &&
-workdir="$(pwd)" ||
-die ""
-
-# Remove tempdir on exit
-trap 'cd "$orig_dir"; rm -rf "$tempdir"' 0
-
-ORIG_GIT_DIR="$GIT_DIR"
-ORIG_GIT_WORK_TREE="$GIT_WORK_TREE"
-ORIG_GIT_INDEX_FILE="$GIT_INDEX_FILE"
-ORIG_GIT_AUTHOR_NAME="$GIT_AUTHOR_NAME"
-ORIG_GIT_AUTHOR_EMAIL="$GIT_AUTHOR_EMAIL"
-ORIG_GIT_AUTHOR_DATE="$GIT_AUTHOR_DATE"
-ORIG_GIT_COMMITTER_NAME="$GIT_COMMITTER_NAME"
-ORIG_GIT_COMMITTER_EMAIL="$GIT_COMMITTER_EMAIL"
-ORIG_GIT_COMMITTER_DATE="$GIT_COMMITTER_DATE"
-
-GIT_WORK_TREE=.
-export GIT_DIR GIT_WORK_TREE
-
-# Make sure refs/original is empty
-git for-each-ref > "$tempdir"/backup-refs || exit
-while read sha1 type name
-do
-	case "$force,$name" in
-	,$orig_namespace*)
-		die "Cannot create a new backup.
-A previous backup already exists in $orig_namespace
-Force overwriting the backup with -f"
-	;;
-	t,$orig_namespace*)
-		git update-ref -d "$name" $sha1
-	;;
-	esac
-done < "$tempdir"/backup-refs
-
-# The refs should be updated if their heads were rewritten
-git rev-parse --no-flags --revs-only --symbolic-full-name \
-	--default HEAD "$@" > "$tempdir"/raw-refs || exit
-while read ref
-do
-	case "$ref" in ^?*) continue ;; esac
-
-	if git rev-parse --verify "$ref"^0 >/dev/null 2>&1
-	then
-		echo "$ref"
-	else
-		warn "WARNING: not rewriting '$ref' (not a committish)"
-	fi
-done >"$tempdir"/heads <"$tempdir"/raw-refs
-
-test -s "$tempdir"/heads ||
-	die "You must specify a ref to rewrite."
-
-GIT_INDEX_FILE="$(pwd)/../index"
-export GIT_INDEX_FILE
-
-# map old->new commit ids for rewriting parents
-mkdir ../map || die "Could not create map/ directory"
-
-if test -n "$state_branch"
-then
-	state_commit=$(git rev-parse --no-flags --revs-only "$state_branch")
-	if test -n "$state_commit"
-	then
-		echo "Populating map from $state_branch ($state_commit)" 1>&2
-		perl -e'open(MAP, "-|", "git show $ARGV[0]:filter.map") or die;
-			while (<MAP>) {
-				m/(.*):(.*)/ or die;
-				open F, ">../map/$1" or die;
-				print F "$2" or die;
-				close(F) or die;
-			}
-			close(MAP) or die;' "$state_commit" \
-				|| die "Unable to load state from $state_branch:filter.map"
-	else
-		echo "Branch $state_branch does not exist. Will create" 1>&2
-	fi
-fi
-
-# we need "--" only if there are no path arguments in $@
-nonrevs=$(git rev-parse --no-revs "$@") || exit
-if test -z "$nonrevs"
-then
-	dashdash=--
-else
-	dashdash=
-	remap_to_ancestor=t
-fi
-
-git rev-parse --revs-only "$@" >../parse
-
-case "$filter_subdir" in
-"")
-	eval set -- "$(git rev-parse --sq --no-revs "$@")"
-	;;
-*)
-	eval set -- "$(git rev-parse --sq --no-revs "$@" $dashdash \
-		"$filter_subdir")"
-	;;
-esac
-
-git rev-list --reverse --topo-order --default HEAD \
-	--parents --simplify-merges --stdin "$@" <../parse >../revs ||
-	die "Could not get the commits"
-commits=$(wc -l <../revs | tr -d " ")
-
-test $commits -eq 0 && die_with_status 2 "Found nothing to rewrite"
-
-# Rewrite the commits
-report_progress ()
-{
-	if test -n "$progress" &&
-		test $git_filter_branch__commit_count -gt $next_sample_at
-	then
-		count=$git_filter_branch__commit_count
-
-		now=$(date +%s)
-		elapsed=$(($now - $start_timestamp))
-		remaining=$(( ($commits - $count) * $elapsed / $count ))
-		if test $elapsed -gt 0
-		then
-			next_sample_at=$(( ($elapsed + 1) * $count / $elapsed ))
-		else
-			next_sample_at=$(($next_sample_at + 1))
-		fi
-		progress=" ($elapsed seconds passed, remaining $remaining predicted)"
-	fi
-	printf "\rRewrite $commit ($count/$commits)$progress    "
-}
-
-git_filter_branch__commit_count=0
-
-progress= start_timestamp=
-if date '+%s' 2>/dev/null | grep -q '^[0-9][0-9]*$'
-then
-	next_sample_at=0
-	progress="dummy to ensure this is not empty"
-	start_timestamp=$(date '+%s')
-fi
-
-if test -n "$filter_index" ||
-   test -n "$filter_tree" ||
-   test -n "$filter_subdir"
-then
-	need_index=t
-else
-	need_index=
-fi
-
-eval "$filter_setup" < /dev/null ||
-	die "filter setup failed: $filter_setup"
-
-while read commit parents; do
-	git_filter_branch__commit_count=$(($git_filter_branch__commit_count+1))
-
-	report_progress
-	test -f "$workdir"/../map/$commit && continue
-
-	case "$filter_subdir" in
-	"")
-		if test -n "$need_index"
-		then
-			GIT_ALLOW_NULL_SHA1=1 git read-tree -i -m $commit
-		fi
-		;;
-	*)
-		# The commit may not have the subdirectory at all
-		err=$(GIT_ALLOW_NULL_SHA1=1 \
-		      git read-tree -i -m $commit:"$filter_subdir" 2>&1) || {
-			if ! git rev-parse -q --verify $commit:"$filter_subdir"
-			then
-				rm -f "$GIT_INDEX_FILE"
-			else
-				echo >&2 "$err"
-				false
-			fi
-		}
-	esac || die "Could not initialize the index"
-
-	GIT_COMMIT=$commit
-	export GIT_COMMIT
-	git cat-file commit "$commit" >../commit ||
-		die "Cannot read commit $commit"
-
-	eval "$(set_ident <../commit)" ||
-		die "setting author/committer failed for commit $commit"
-	eval "$filter_env" < /dev/null ||
-		die "env filter failed: $filter_env"
-
-	if [ "$filter_tree" ]; then
-		git checkout-index -f -u -a ||
-			die "Could not checkout the index"
-		# files that $commit removed are now still in the working tree;
-		# remove them, else they would be added again
-		git clean -d -q -f -x
-		eval "$filter_tree" < /dev/null ||
-			die "tree filter failed: $filter_tree"
-
-		(
-			git diff-index -r --name-only --ignore-submodules $commit -- &&
-			git ls-files --others
-		) > "$tempdir"/tree-state || exit
-		git update-index --add --replace --remove --stdin \
-			< "$tempdir"/tree-state || exit
-	fi
-
-	eval "$filter_index" < /dev/null ||
-		die "index filter failed: $filter_index"
-
-	parentstr=
-	for parent in $parents; do
-		for reparent in $(map "$parent"); do
-			case "$parentstr " in
-			*" -p $reparent "*)
-				;;
-			*)
-				parentstr="$parentstr -p $reparent"
-				;;
-			esac
-		done
-	done
-	if [ "$filter_parent" ]; then
-		parentstr="$(echo "$parentstr" | eval "$filter_parent")" ||
-				die "parent filter failed: $filter_parent"
-	fi
-
-	{
-		while IFS='' read -r header_line && test -n "$header_line"
-		do
-			# skip header lines...
-			:;
-		done
-		# and output the actual commit message
-		cat
-	} <../commit |
-		eval "$filter_msg" > ../message ||
-			die "msg filter failed: $filter_msg"
-
-	if test -n "$need_index"
-	then
-		tree=$(git write-tree)
-	else
-		tree=$(git rev-parse "$commit^{tree}")
-	fi
-	workdir=$workdir @SHELL_PATH@ -c "$filter_commit" "git commit-tree" \
-		"$tree" $parentstr < ../message > ../map/$commit ||
-			die "could not write rewritten commit"
-done <../revs
-
-# If we are filtering for paths, as in the case of a subdirectory
-# filter, it is possible that a specified head is not in the set of
-# rewritten commits, because it was pruned by the revision walker.
-# Ancestor remapping fixes this by mapping these heads to the unique
-# nearest ancestor that survived the pruning.
-
-if test "$remap_to_ancestor" = t
-then
-	while read ref
-	do
-		sha1=$(git rev-parse "$ref"^0)
-		test -f "$workdir"/../map/$sha1 && continue
-		ancestor=$(git rev-list --simplify-merges -1 "$ref" "$@")
-		test "$ancestor" && echo $(map $ancestor) >> "$workdir"/../map/$sha1
-	done < "$tempdir"/heads
-fi
-
-# Finally update the refs
-
-_x40='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]'
-_x40="$_x40$_x40$_x40$_x40$_x40$_x40$_x40$_x40"
-echo
-while read ref
-do
-	# avoid rewriting a ref twice
-	test -f "$orig_namespace$ref" && continue
-
-	sha1=$(git rev-parse "$ref"^0)
-	rewritten=$(map $sha1)
-
-	test $sha1 = "$rewritten" &&
-		warn "WARNING: Ref '$ref' is unchanged" &&
-		continue
-
-	case "$rewritten" in
-	'')
-		echo "Ref '$ref' was deleted"
-		git update-ref -m "filter-branch: delete" -d "$ref" $sha1 ||
-			die "Could not delete $ref"
-	;;
-	$_x40)
-		echo "Ref '$ref' was rewritten"
-		if ! git update-ref -m "filter-branch: rewrite" \
-					"$ref" $rewritten $sha1 2>/dev/null; then
-			if test $(git cat-file -t "$ref") = tag; then
-				if test -z "$filter_tag_name"; then
-					warn "WARNING: You said to rewrite tagged commits, but not the corresponding tag."
-					warn "WARNING: Perhaps use '--tag-name-filter cat' to rewrite the tag."
-				fi
-			else
-				die "Could not rewrite $ref"
-			fi
-		fi
-	;;
-	*)
-		# NEEDSWORK: possibly add -Werror, making this an error
-		warn "WARNING: '$ref' was rewritten into multiple commits:"
-		warn "$rewritten"
-		warn "WARNING: Ref '$ref' points to the first one now."
-		rewritten=$(echo "$rewritten" | head -n 1)
-		git update-ref -m "filter-branch: rewrite to first" \
-				"$ref" $rewritten $sha1 ||
-			die "Could not rewrite $ref"
-	;;
-	esac
-	git update-ref -m "filter-branch: backup" "$orig_namespace$ref" $sha1 ||
-		 exit
-done < "$tempdir"/heads
-
-# TODO: This should possibly go, with the semantics that all positive given
-#       refs are updated, and their original heads stored in refs/original/
-# Filter tags
-
-if [ "$filter_tag_name" ]; then
-	git for-each-ref --format='%(objectname) %(objecttype) %(refname)' refs/tags |
-	while read sha1 type ref; do
-		ref="${ref#refs/tags/}"
-		# XXX: Rewrite tagged trees as well?
-		if [ "$type" != "commit" -a "$type" != "tag" ]; then
-			continue;
-		fi
-
-		if [ "$type" = "tag" ]; then
-			# Dereference to a commit
-			sha1t="$sha1"
-			sha1="$(git rev-parse -q "$sha1"^{commit})" || continue
-		fi
-
-		[ -f "../map/$sha1" ] || continue
-		new_sha1="$(cat "../map/$sha1")"
-		GIT_COMMIT="$sha1"
-		export GIT_COMMIT
-		new_ref="$(echo "$ref" | eval "$filter_tag_name")" ||
-			die "tag name filter failed: $filter_tag_name"
-
-		echo "$ref -> $new_ref ($sha1 -> $new_sha1)"
-
-		if [ "$type" = "tag" ]; then
-			new_sha1=$( ( printf 'object %s\ntype commit\ntag %s\n' \
-						"$new_sha1" "$new_ref"
-				git cat-file tag "$ref" |
-				sed -n \
-				    -e '1,/^$/{
-					  /^object /d
-					  /^type /d
-					  /^tag /d
-					}' \
-				    -e '/^-----BEGIN PGP SIGNATURE-----/q' \
-				    -e 'p' ) |
-				git hash-object -t tag -w --stdin) ||
-				die "Could not create new tag object for $ref"
-			if git cat-file tag "$ref" | \
-			   sane_grep '^-----BEGIN PGP SIGNATURE-----' >/dev/null 2>&1
-			then
-				warn "gpg signature stripped from tag object $sha1t"
-			fi
-		fi
-
-		git update-ref "refs/tags/$new_ref" "$new_sha1" ||
-			die "Could not write tag $new_ref"
-	done
-fi
-
-unset GIT_DIR GIT_WORK_TREE GIT_INDEX_FILE
-unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE
-unset GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL GIT_COMMITTER_DATE
-test -z "$ORIG_GIT_DIR" || {
-	GIT_DIR="$ORIG_GIT_DIR" && export GIT_DIR
-}
-test -z "$ORIG_GIT_WORK_TREE" || {
-	GIT_WORK_TREE="$ORIG_GIT_WORK_TREE" &&
-	export GIT_WORK_TREE
-}
-test -z "$ORIG_GIT_INDEX_FILE" || {
-	GIT_INDEX_FILE="$ORIG_GIT_INDEX_FILE" &&
-	export GIT_INDEX_FILE
-}
-test -z "$ORIG_GIT_AUTHOR_NAME" || {
-	GIT_AUTHOR_NAME="$ORIG_GIT_AUTHOR_NAME" &&
-	export GIT_AUTHOR_NAME
-}
-test -z "$ORIG_GIT_AUTHOR_EMAIL" || {
-	GIT_AUTHOR_EMAIL="$ORIG_GIT_AUTHOR_EMAIL" &&
-	export GIT_AUTHOR_EMAIL
-}
-test -z "$ORIG_GIT_AUTHOR_DATE" || {
-	GIT_AUTHOR_DATE="$ORIG_GIT_AUTHOR_DATE" &&
-	export GIT_AUTHOR_DATE
-}
-test -z "$ORIG_GIT_COMMITTER_NAME" || {
-	GIT_COMMITTER_NAME="$ORIG_GIT_COMMITTER_NAME" &&
-	export GIT_COMMITTER_NAME
-}
-test -z "$ORIG_GIT_COMMITTER_EMAIL" || {
-	GIT_COMMITTER_EMAIL="$ORIG_GIT_COMMITTER_EMAIL" &&
-	export GIT_COMMITTER_EMAIL
-}
-test -z "$ORIG_GIT_COMMITTER_DATE" || {
-	GIT_COMMITTER_DATE="$ORIG_GIT_COMMITTER_DATE" &&
-	export GIT_COMMITTER_DATE
-}
-
-if test -n "$state_branch"
-then
-	echo "Saving rewrite state to $state_branch" 1>&2
-	state_blob=$(
-		perl -e'opendir D, "../map" or die;
-			open H, "|-", "git hash-object -w --stdin" or die;
-			foreach (sort readdir(D)) {
-				next if m/^\.\.?$/;
-				open F, "<../map/$_" or die;
-				chomp($f = <F>);
-				print H "$_:$f\n" or die;
-			}
-			close(H) or die;' || die "Unable to save state")
-	state_tree=$(printf '100644 blob %s\tfilter.map\n' "$state_blob" | git mktree)
-	if test -n "$state_commit"
-	then
-		state_commit=$(echo "Sync" | git commit-tree "$state_tree" -p "$state_commit")
-	else
-		state_commit=$(echo "Sync" | git commit-tree "$state_tree" )
-	fi
-	git update-ref "$state_branch" "$state_commit"
-fi
-
-cd "$orig_dir"
-rm -rf "$tempdir"
-
-trap - 0
-
-if [ "$(is_bare_repository)" = false ]; then
-	git read-tree -u -m HEAD || exit
-fi
-
-exit 0
diff --git a/t/perf/p7000-filter-branch.sh b/t/perf/p7000-filter-branch.sh
deleted file mode 100755
index b029586ccb..0000000000
--- a/t/perf/p7000-filter-branch.sh
+++ /dev/null
@@ -1,24 +0,0 @@
-#!/bin/sh
-
-test_description='performance of filter-branch'
-. ./perf-lib.sh
-
-test_perf_default_repo
-test_checkout_worktree
-
-test_expect_success 'mark bases for tests' '
-	git tag -f tip &&
-	git tag -f base HEAD~100
-'
-
-test_perf 'noop filter' '
-	git checkout --detach tip &&
-	git filter-branch -f base..HEAD
-'
-
-test_perf 'noop prune-empty' '
-	git checkout --detach tip &&
-	git filter-branch -f --prune-empty base..HEAD
-'
-
-test_done
diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh
deleted file mode 100755
index e23de7d0b5..0000000000
--- a/t/t7003-filter-branch.sh
+++ /dev/null
@@ -1,505 +0,0 @@
-#!/bin/sh
-
-test_description='git filter-branch'
-. ./test-lib.sh
-. "$TEST_DIRECTORY/lib-gpg.sh"
-
-test_expect_success 'setup' '
-	test_commit A &&
-	GIT_COMMITTER_DATE="@0 +0000" GIT_AUTHOR_DATE="@0 +0000" &&
-	test_commit --notick B &&
-	git checkout -b branch B &&
-	test_commit D &&
-	mkdir dir &&
-	test_commit dir/D &&
-	test_commit E &&
-	git checkout master &&
-	test_commit C &&
-	git checkout branch &&
-	git merge C &&
-	git tag F &&
-	test_commit G &&
-	test_commit H
-'
-# * (HEAD, branch) H
-# * G
-# *   Merge commit 'C' into branch
-# |\
-# | * (master) C
-# * | E
-# * | dir/D
-# * | D
-# |/
-# * B
-# * A
-
-
-H=$(git rev-parse H)
-
-test_expect_success 'rewrite identically' '
-	git filter-branch branch
-'
-test_expect_success 'result is really identical' '
-	test $H = $(git rev-parse HEAD)
-'
-
-test_expect_success 'rewrite bare repository identically' '
-	(git config core.bare true && cd .git &&
-	 git filter-branch branch > filter-output 2>&1 &&
-	! fgrep fatal filter-output)
-'
-git config core.bare false
-test_expect_success 'result is really identical' '
-	test $H = $(git rev-parse HEAD)
-'
-
-TRASHDIR=$(pwd)
-test_expect_success 'correct GIT_DIR while using -d' '
-	mkdir drepo &&
-	( cd drepo &&
-	git init &&
-	test_commit drepo &&
-	git filter-branch -d "$TRASHDIR/dfoo" \
-		--index-filter "cp \"$TRASHDIR\"/dfoo/backup-refs \"$TRASHDIR\"" \
-	) &&
-	grep drepo "$TRASHDIR/backup-refs"
-'
-
-test_expect_success 'tree-filter works with -d' '
-	git init drepo-tree &&
-	(
-		cd drepo-tree &&
-		test_commit one &&
-		git filter-branch -d "$TRASHDIR/dfoo" \
-			--tree-filter "echo changed >one.t" &&
-		echo changed >expect &&
-		git cat-file blob HEAD:one.t >actual &&
-		test_cmp expect actual &&
-		test_cmp one.t actual
-	)
-'
-
-test_expect_success 'Fail if commit filter fails' '
-	test_must_fail git filter-branch -f --commit-filter "exit 1" HEAD
-'
-
-test_expect_success 'rewrite, renaming a specific file' '
-	git filter-branch -f --tree-filter "mv D.t doh || :" HEAD
-'
-
-test_expect_success 'test that the file was renamed' '
-	test D = "$(git show HEAD:doh --)" &&
-	! test -f D.t &&
-	test -f doh &&
-	test D = "$(cat doh)"
-'
-
-test_expect_success 'rewrite, renaming a specific directory' '
-	git filter-branch -f --tree-filter "mv dir diroh || :" HEAD
-'
-
-test_expect_success 'test that the directory was renamed' '
-	test dir/D = "$(git show HEAD:diroh/D.t --)" &&
-	! test -d dir &&
-	test -d diroh &&
-	! test -d diroh/dir &&
-	test -f diroh/D.t &&
-	test dir/D = "$(cat diroh/D.t)"
-'
-
-V=$(git rev-parse HEAD)
-
-test_expect_success 'populate --state-branch' '
-	git filter-branch --state-branch state -f --tree-filter "touch file || :" HEAD
-'
-
-W=$(git rev-parse HEAD)
-
-test_expect_success 'using --state-branch to skip already rewritten commits' '
-	test_when_finished git reset --hard $V &&
-	git reset --hard $V &&
-	git filter-branch --state-branch state -f --tree-filter "touch file || :" HEAD &&
-	test_cmp_rev $W HEAD
-'
-
-git tag oldD HEAD~4
-test_expect_success 'rewrite one branch, keeping a side branch' '
-	git branch modD oldD &&
-	git filter-branch -f --tree-filter "mv B.t boh || :" D..modD
-'
-
-test_expect_success 'common ancestor is still common (unchanged)' '
-	test "$(git merge-base modD D)" = "$(git rev-parse B)"
-'
-
-test_expect_success 'filter subdirectory only' '
-	mkdir subdir &&
-	touch subdir/new &&
-	git add subdir/new &&
-	test_tick &&
-	git commit -m "subdir" &&
-	echo H > A.t &&
-	test_tick &&
-	git commit -m "not subdir" A.t &&
-	echo A > subdir/new &&
-	test_tick &&
-	git commit -m "again subdir" subdir/new &&
-	git rm A.t &&
-	test_tick &&
-	git commit -m "again not subdir" &&
-	git branch sub &&
-	git branch sub-earlier HEAD~2 &&
-	git filter-branch -f --subdirectory-filter subdir \
-		refs/heads/sub refs/heads/sub-earlier
-'
-
-test_expect_success 'subdirectory filter result looks okay' '
-	test 2 = $(git rev-list sub | wc -l) &&
-	git show sub:new &&
-	test_must_fail git show sub:subdir &&
-	git show sub-earlier:new &&
-	test_must_fail git show sub-earlier:subdir
-'
-
-test_expect_success 'more setup' '
-	git checkout master &&
-	mkdir subdir &&
-	echo A > subdir/new &&
-	git add subdir/new &&
-	test_tick &&
-	git commit -m "subdir on master" subdir/new &&
-	git rm A.t &&
-	test_tick &&
-	git commit -m "again subdir on master" &&
-	git merge branch
-'
-
-test_expect_success 'use index-filter to move into a subdirectory' '
-	git branch directorymoved &&
-	git filter-branch -f --index-filter \
-		 "git ls-files -s | sed \"s-	-&newsubdir/-\" |
-	          GIT_INDEX_FILE=\$GIT_INDEX_FILE.new \
-			git update-index --index-info &&
-		  mv \"\$GIT_INDEX_FILE.new\" \"\$GIT_INDEX_FILE\"" directorymoved &&
-	git diff --exit-code HEAD directorymoved:newsubdir
-'
-
-test_expect_success 'stops when msg filter fails' '
-	old=$(git rev-parse HEAD) &&
-	test_must_fail git filter-branch -f --msg-filter false HEAD &&
-	test $old = $(git rev-parse HEAD) &&
-	rm -rf .git-rewrite
-'
-
-test_expect_success 'author information is preserved' '
-	: > i &&
-	git add i &&
-	test_tick &&
-	GIT_AUTHOR_NAME="B V Uips" git commit -m bvuips &&
-	git branch preserved-author &&
-	(sane_unset GIT_AUTHOR_NAME &&
-	 git filter-branch -f --msg-filter "cat; \
-			test \$GIT_COMMIT != $(git rev-parse master) || \
-			echo Hallo" \
-		preserved-author) &&
-	git rev-list --author="B V Uips" preserved-author >actual &&
-	test_line_count = 1 actual
-'
-
-test_expect_success "remove a certain author's commits" '
-	echo i > i &&
-	test_tick &&
-	git commit -m i i &&
-	git branch removed-author &&
-	git filter-branch -f --commit-filter "\
-		if [ \"\$GIT_AUTHOR_NAME\" = \"B V Uips\" ];\
-		then\
-			skip_commit \"\$@\";
-		else\
-			git commit-tree \"\$@\";\
-		fi" removed-author &&
-	cnt1=$(git rev-list master | wc -l) &&
-	cnt2=$(git rev-list removed-author | wc -l) &&
-	test $cnt1 -eq $(($cnt2 + 1)) &&
-	git rev-list --author="B V Uips" removed-author >actual &&
-	test_line_count = 0 actual
-'
-
-test_expect_success 'barf on invalid name' '
-	test_must_fail git filter-branch -f master xy-problem &&
-	test_must_fail git filter-branch -f HEAD^
-'
-
-test_expect_success '"map" works in commit filter' '
-	git filter-branch -f --commit-filter "\
-		parent=\$(git rev-parse \$GIT_COMMIT^) &&
-		mapped=\$(map \$parent) &&
-		actual=\$(echo \"\$@\" | sed \"s/^.*-p //\") &&
-		test \$mapped = \$actual &&
-		git commit-tree \"\$@\";" master~2..master &&
-	git rev-parse --verify master
-'
-
-test_expect_success 'Name needing quotes' '
-
-	git checkout -b rerere A &&
-	mkdir foo &&
-	name="れれれ" &&
-	>foo/$name &&
-	git add foo &&
-	git commit -m "Adding a file" &&
-	git filter-branch --tree-filter "rm -fr foo" &&
-	test_must_fail git ls-files --error-unmatch "foo/$name" &&
-	test $(git rev-parse --verify rerere) != $(git rev-parse --verify A)
-
-'
-
-test_expect_success 'Subdirectory filter with disappearing trees' '
-	git reset --hard &&
-	git checkout master &&
-
-	mkdir foo &&
-	touch foo/bar &&
-	git add foo &&
-	test_tick &&
-	git commit -m "Adding foo" &&
-
-	git rm -r foo &&
-	test_tick &&
-	git commit -m "Removing foo" &&
-
-	mkdir foo &&
-	touch foo/bar &&
-	git add foo &&
-	test_tick &&
-	git commit -m "Re-adding foo" &&
-
-	git filter-branch -f --subdirectory-filter foo &&
-	git rev-list master >actual &&
-	test_line_count = 3 actual
-'
-
-test_expect_success 'Tag name filtering retains tag message' '
-	git tag -m atag T &&
-	git cat-file tag T > expect &&
-	git filter-branch -f --tag-name-filter cat &&
-	git cat-file tag T > actual &&
-	test_cmp expect actual
-'
-
-faux_gpg_tag='object XXXXXX
-type commit
-tag S
-tagger T A Gger <tagger@example.com> 1206026339 -0500
-
-This is a faux gpg signed tag.
------BEGIN PGP SIGNATURE-----
-Version: FauxGPG v0.0.0 (FAUX/Linux)
-
-gdsfoewhxu/6l06f1kxyxhKdZkrcbaiOMtkJUA9ITAc1mlamh0ooasxkH1XwMbYQ
-acmwXaWET20H0GeAGP+7vow=
-=agpO
------END PGP SIGNATURE-----
-'
-test_expect_success 'Tag name filtering strips gpg signature' '
-	sha1=$(git rev-parse HEAD) &&
-	sha1t=$(echo "$faux_gpg_tag" | sed -e s/XXXXXX/$sha1/ | git mktag) &&
-	git update-ref "refs/tags/S" "$sha1t" &&
-	echo "$faux_gpg_tag" | sed -e s/XXXXXX/$sha1/ | head -n 6 > expect &&
-	git filter-branch -f --tag-name-filter cat &&
-	git cat-file tag S > actual &&
-	test_cmp expect actual
-'
-
-test_expect_success GPG 'Filtering retains message of gpg signed commit' '
-	mkdir gpg &&
-	touch gpg/foo &&
-	git add gpg &&
-	test_tick &&
-	git commit -S -m "Adding gpg" &&
-
-	git log -1 --format="%s" > expect &&
-	git filter-branch -f --msg-filter "cat" &&
-	git log -1 --format="%s" > actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'Tag name filtering allows slashes in tag names' '
-	git tag -m tag-with-slash X/1 &&
-	git cat-file tag X/1 | sed -e s,X/1,X/2, > expect &&
-	git filter-branch -f --tag-name-filter "echo X/2" &&
-	git cat-file tag X/2 > actual &&
-	test_cmp expect actual
-'
-test_expect_success 'setup --prune-empty comparisons' '
-	git checkout --orphan master-no-a &&
-	git rm -rf . &&
-	unset test_tick &&
-	test_tick &&
-	GIT_COMMITTER_DATE="@0 +0000" GIT_AUTHOR_DATE="@0 +0000" &&
-	test_commit --notick B B.t B Bx &&
-	git checkout -b branch-no-a Bx &&
-	test_commit D D.t D Dx &&
-	mkdir dir &&
-	test_commit dir/D dir/D.t dir/D dir/Dx &&
-	test_commit E E.t E Ex &&
-	git checkout master-no-a &&
-	test_commit C C.t C Cx &&
-	git checkout branch-no-a &&
-	git merge Cx -m "Merge tag '\''C'\'' into branch" &&
-	git tag Fx &&
-	test_commit G G.t G Gx &&
-	test_commit H H.t H Hx &&
-	git checkout branch
-'
-
-test_expect_success 'Prune empty commits' '
-	git rev-list HEAD > expect &&
-	test_commit to_remove &&
-	git filter-branch -f --index-filter "git update-index --remove to_remove.t" --prune-empty HEAD &&
-	git rev-list HEAD > actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'prune empty collapsed merges' '
-	test_config merge.ff false &&
-	git rev-list HEAD >expect &&
-	test_commit to_remove_2 &&
-	git reset --hard HEAD^ &&
-	test_merge non-ff to_remove_2 &&
-	git filter-branch -f --index-filter "git update-index --remove to_remove_2.t" --prune-empty HEAD &&
-	git rev-list HEAD >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'prune empty works even without index/tree filters' '
-	git rev-list HEAD >expect &&
-	git commit --allow-empty -m empty &&
-	git filter-branch -f --prune-empty HEAD &&
-	git rev-list HEAD >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success '--prune-empty is able to prune root commit' '
-	git rev-list branch-no-a >expect &&
-	git branch testing H &&
-	git filter-branch -f --prune-empty --index-filter "git update-index --remove A.t" testing &&
-	git rev-list testing >actual &&
-	git branch -D testing &&
-	test_cmp expect actual
-'
-
-test_expect_success '--prune-empty is able to prune entire branch' '
-	git branch prune-entire B &&
-	git filter-branch -f --prune-empty --index-filter "git update-index --remove A.t B.t" prune-entire &&
-	test_path_is_missing .git/refs/heads/prune-entire &&
-	test_must_fail git reflog exists refs/heads/prune-entire
-'
-
-test_expect_success '--remap-to-ancestor with filename filters' '
-	git checkout master &&
-	git reset --hard A &&
-	test_commit add-foo foo 1 &&
-	git branch moved-foo &&
-	test_commit add-bar bar a &&
-	git branch invariant &&
-	orig_invariant=$(git rev-parse invariant) &&
-	git branch moved-bar &&
-	test_commit change-foo foo 2 &&
-	git filter-branch -f --remap-to-ancestor \
-		moved-foo moved-bar A..master \
-		-- -- foo &&
-	test $(git rev-parse moved-foo) = $(git rev-parse moved-bar) &&
-	test $(git rev-parse moved-foo) = $(git rev-parse master^) &&
-	test $orig_invariant = $(git rev-parse invariant)
-'
-
-test_expect_success 'automatic remapping to ancestor with filename filters' '
-	git checkout master &&
-	git reset --hard A &&
-	test_commit add-foo2 foo 1 &&
-	git branch moved-foo2 &&
-	test_commit add-bar2 bar a &&
-	git branch invariant2 &&
-	orig_invariant=$(git rev-parse invariant2) &&
-	git branch moved-bar2 &&
-	test_commit change-foo2 foo 2 &&
-	git filter-branch -f \
-		moved-foo2 moved-bar2 A..master \
-		-- -- foo &&
-	test $(git rev-parse moved-foo2) = $(git rev-parse moved-bar2) &&
-	test $(git rev-parse moved-foo2) = $(git rev-parse master^) &&
-	test $orig_invariant = $(git rev-parse invariant2)
-'
-
-test_expect_success 'setup submodule' '
-	rm -fr ?* .git &&
-	git init &&
-	test_commit file &&
-	mkdir submod &&
-	submodurl="$PWD/submod" &&
-	( cd submod &&
-	  git init &&
-	  test_commit file-in-submod ) &&
-	git submodule add "$submodurl" &&
-	git commit -m "added submodule" &&
-	test_commit add-file &&
-	( cd submod && test_commit add-in-submodule ) &&
-	git add submod &&
-	git commit -m "changed submodule" &&
-	git branch original HEAD
-'
-
-orig_head=$(git show-ref --hash --head HEAD)
-
-test_expect_success 'rewrite submodule with another content' '
-	git filter-branch --tree-filter "test -d submod && {
-					 rm -rf submod &&
-					 git rm -rf --quiet submod &&
-					 mkdir submod &&
-					 : > submod/file
-					 } || :" HEAD &&
-	test $orig_head != $(git show-ref --hash --head HEAD)
-'
-
-test_expect_success 'replace submodule revision' '
-	git reset --hard original &&
-	git filter-branch -f --tree-filter \
-	    "if git ls-files --error-unmatch -- submod > /dev/null 2>&1
-	     then git update-index --cacheinfo 160000 0123456789012345678901234567890123456789 submod
-	     fi" HEAD &&
-	test $orig_head != $(git show-ref --hash --head HEAD)
-'
-
-test_expect_success 'filter commit message without trailing newline' '
-	git reset --hard original &&
-	commit=$(printf "no newline" | git commit-tree HEAD^{tree}) &&
-	git update-ref refs/heads/no-newline $commit &&
-	git filter-branch -f refs/heads/no-newline &&
-	echo $commit >expect &&
-	git rev-parse refs/heads/no-newline >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'tree-filter deals with object name vs pathname ambiguity' '
-	test_when_finished "git reset --hard original" &&
-	ambiguous=$(git rev-list -1 HEAD) &&
-	git filter-branch --tree-filter "mv file.t $ambiguous" HEAD^.. &&
-	git show HEAD:$ambiguous
-'
-
-test_expect_success 'rewrite repository including refs that point at non-commit object' '
-	test_when_finished "git reset --hard original" &&
-	tree=$(git rev-parse HEAD^{tree}) &&
-	test_when_finished "git replace -d $tree" &&
-	echo A >new &&
-	git add new &&
-	new_tree=$(git write-tree) &&
-	git replace $tree $new_tree &&
-	git tag -a -m "tag to a tree" treetag $new_tree &&
-	git reset --hard HEAD &&
-	git filter-branch -f -- --all >filter-output 2>&1 &&
-	! fgrep fatal filter-output
-'
-
-test_done
diff --git a/t/t7009-filter-branch-null-sha1.sh b/t/t7009-filter-branch-null-sha1.sh
deleted file mode 100755
index 9ba9f24ad2..0000000000
--- a/t/t7009-filter-branch-null-sha1.sh
+++ /dev/null
@@ -1,55 +0,0 @@
-#!/bin/sh
-
-test_description='filter-branch removal of trees with null sha1'
-. ./test-lib.sh
-
-test_expect_success 'setup: base commits' '
-	test_commit one &&
-	test_commit two &&
-	test_commit three
-'
-
-test_expect_success 'setup: a commit with a bogus null sha1 in the tree' '
-	{
-		git ls-tree HEAD &&
-		printf "160000 commit $ZERO_OID\\tbroken\\n"
-	} >broken-tree &&
-	echo "add broken entry" >msg &&
-
-	tree=$(git mktree <broken-tree) &&
-	test_tick &&
-	commit=$(git commit-tree $tree -p HEAD <msg) &&
-	git update-ref HEAD "$commit"
-'
-
-# we have to make one more commit on top removing the broken
-# entry, since otherwise our index does not match HEAD (and filter-branch will
-# complain). We could make the index match HEAD, but doing so would involve
-# writing a null sha1 into the index.
-test_expect_success 'setup: bring HEAD and index in sync' '
-	test_tick &&
-	git commit -a -m "back to normal"
-'
-
-test_expect_success 'noop filter-branch complains' '
-	test_must_fail git filter-branch \
-		--force --prune-empty \
-		--index-filter "true"
-'
-
-test_expect_success 'filter commands are still checked' '
-	test_must_fail git filter-branch \
-		--force --prune-empty \
-		--index-filter "git rm --cached --ignore-unmatch three.t"
-'
-
-test_expect_success 'removing the broken entry works' '
-	echo three >expect &&
-	git filter-branch \
-		--force --prune-empty \
-		--index-filter "git rm --cached --ignore-unmatch broken" &&
-	git log -1 --format=%s >actual &&
-	test_cmp expect actual
-'
-
-test_done
diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index 75512c3403..4e7f669c76 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -28,10 +28,10 @@ complete ()
 #
 # (2) A test makes sure that common subcommands are included in the
 #     completion for "git <TAB>", and a plumbing is excluded.  "add",
-#     "filter-branch" and "ls-files" are listed for this.
+#     "rebase" and "ls-files" are listed for this.
 
-GIT_TESTING_ALL_COMMAND_LIST='add checkout check-attr filter-branch ls-files'
-GIT_TESTING_PORCELAIN_COMMAND_LIST='add checkout filter-branch'
+GIT_TESTING_ALL_COMMAND_LIST='add checkout check-attr rebase ls-files'
+GIT_TESTING_PORCELAIN_COMMAND_LIST='add checkout rebase'
 
 . "$GIT_BUILD_DIR/contrib/completion/git-completion.bash"
 
@@ -1392,12 +1392,12 @@ test_expect_success 'basic' '
 	# built-in
 	grep -q "^add \$" out &&
 	# script
-	grep -q "^filter-branch \$" out &&
+	grep -q "^rebase \$" out &&
 	# plumbing
 	! grep -q "^ls-files \$" out &&
 
-	run_completion "git f" &&
-	! grep -q -v "^f" out
+	run_completion "git r" &&
+	! grep -q -v "^r" out
 '
 
 test_expect_success 'double dash "git" itself' '
-- 
2.23.0.5.g775ebaa2a0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH 1/5] t6006: simplify and optimize empty message test
  2019-08-26 23:52         ` [RFC PATCH 1/5] t6006: simplify and optimize empty message test Elijah Newren
@ 2019-08-27  1:23           ` Derrick Stolee
  0 siblings, 0 replies; 73+ messages in thread
From: Derrick Stolee @ 2019-08-27  1:23 UTC (permalink / raw)
  To: Elijah Newren, git
  Cc: Junio C Hamano, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder

On 8/26/2019 7:52 PM, Elijah Newren wrote:
> Test t6006.71 ("oneline with empty message") was creating two commits
> with simple commit messages, and then running filter-branch to rewrite
> the commit messages to be empty.  This test was written this way because
> the --allow-empty-message option to git commit did not exist at the
> time.  Simplify this test and avoid the need to invoke filter-branch by
> just using --allow-empty-message when creating the commit.
> 
> Despite only being one piece of the 71st test and there being 73 tests
> overall, this small change to just this one test speeds up the overall
> execution time of t6006 (as measured by the best of 3 runs of `time
> ./t6006-rev-list-format.sh`) by about 11% on Linux and by 13% on
> Mac.

Wow! A good cleanup to include, regardless of other concerns.

> 
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  t/t6006-rev-list-format.sh | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
> index da113d975b..d30e41c9f7 100755
> --- a/t/t6006-rev-list-format.sh
> +++ b/t/t6006-rev-list-format.sh
> @@ -501,9 +501,8 @@ test_expect_success 'reflog identity' '
>  '
>  
>  test_expect_success 'oneline with empty message' '
> -	git commit -m "dummy" --allow-empty &&
> -	git commit -m "dummy" --allow-empty &&
> -	git filter-branch --msg-filter "sed -e s/dummy//" HEAD^^.. &&
> +	git commit --allow-empty --allow-empty-message &&
> +	git commit --allow-empty --allow-empty-message &&
>  	git rev-list --oneline HEAD >test.txt &&
>  	test_line_count = 5 test.txt &&
>  	git rev-list --oneline --graph HEAD >testg.txt &&

LGTM.

-Stolee

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH 2/5] t3427: accelerate this test by using fast-export and fast-import
  2019-08-26 23:52         ` [RFC PATCH 2/5] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
@ 2019-08-27  1:25           ` Derrick Stolee
  0 siblings, 0 replies; 73+ messages in thread
From: Derrick Stolee @ 2019-08-27  1:25 UTC (permalink / raw)
  To: Elijah Newren, git
  Cc: Junio C Hamano, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder

On 8/26/2019 7:52 PM, Elijah Newren wrote:
> fast-export and fast-import can easily handle the simple rewrite that
> was being done by filter-branch, and should be significantly faster on
> systems with a slow fork.  Timings from before and after on two laptops
> that I have access to (measured via `time ./t3427-rebase-subtree.sh`,
> i.e. including everything in this test -- not just the filter-branch or
> fast-export/fast-import pair):
> 
>    Linux:  4.305s -> 3.684s (~17% speedup)
>    Mac:   10.128s -> 7.038s (~30% speedup)

Again, impressive speedup!

> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  t/t3427-rebase-subtree.sh | 32 ++++++++++++++++++++++++--------
>  1 file changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/t/t3427-rebase-subtree.sh b/t/t3427-rebase-subtree.sh
> index d8640522a0..d05fcce5dc 100755
> --- a/t/t3427-rebase-subtree.sh
> +++ b/t/t3427-rebase-subtree.sh
> @@ -42,7 +42,9 @@ test_expect_failure REBASE_P \
>  	'Rebase -Xsubtree --preserve-merges --onto commit 4' '
>  	reset_rebase &&
>  	git checkout -b rebase-preserve-merges-4 master &&
> -	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
> +	git fast-export --no-data HEAD -- files_subtree/ \
> +		| sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" \
> +		| git fast-import --force --quiet &&
>  	git commit -m "Empty commit" --allow-empty &&

There's a lot of repetition in these changes. Is this a good time
to introduce a helper method? This trio of commands happens to be
difficult to parse, so I'd rather do it just once.

>  	git rebase -Xsubtree=files_subtree --preserve-merges --onto files-master master &&
>  	verbose test "$(commit_message HEAD~)" = "files_subtree/master4"
> @@ -53,7 +55,9 @@ test_expect_failure REBASE_P \
>  	'Rebase -Xsubtree --preserve-merges --onto commit 5' '
>  	reset_rebase &&
>  	git checkout -b rebase-preserve-merges-5 master &&
> -	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
> +	git fast-export --no-data HEAD -- files_subtree/ \
> +		| sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" \
> +		| git fast-import --force --quiet &&
>  	git commit -m "Empty commit" --allow-empty &&
>  	git rebase -Xsubtree=files_subtree --preserve-merges --onto files-master master &&
>  	verbose test "$(commit_message HEAD)" = "files_subtree/master5"
> @@ -64,7 +68,9 @@ test_expect_failure REBASE_P \
>  	'Rebase -Xsubtree --keep-empty --preserve-merges --onto commit 4' '
>  	reset_rebase &&
>  	git checkout -b rebase-keep-empty-4 master &&
> -	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
> +	git fast-export --no-data HEAD -- files_subtree/ \
> +		| sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" \
> +		| git fast-import --force --quiet &&
>  	git commit -m "Empty commit" --allow-empty &&
>  	git rebase -Xsubtree=files_subtree --keep-empty --preserve-merges --onto files-master master &&
>  	verbose test "$(commit_message HEAD~2)" = "files_subtree/master4"
> @@ -75,7 +81,9 @@ test_expect_failure REBASE_P \
>  	'Rebase -Xsubtree --keep-empty --preserve-merges --onto commit 5' '
>  	reset_rebase &&
>  	git checkout -b rebase-keep-empty-5 master &&
> -	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
> +	git fast-export --no-data HEAD -- files_subtree/ \
> +		| sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" \
> +		| git fast-import --force --quiet &&
>  	git commit -m "Empty commit" --allow-empty &&
>  	git rebase -Xsubtree=files_subtree --keep-empty --preserve-merges --onto files-master master &&
>  	verbose test "$(commit_message HEAD~)" = "files_subtree/master5"
> @@ -86,7 +94,9 @@ test_expect_failure REBASE_P \
>  	'Rebase -Xsubtree --keep-empty --preserve-merges --onto empty commit' '
>  	reset_rebase &&
>  	git checkout -b rebase-keep-empty-empty master &&
> -	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
> +	git fast-export --no-data HEAD -- files_subtree/ \
> +		| sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" \
> +		| git fast-import --force --quiet &&
>  	git commit -m "Empty commit" --allow-empty &&
>  	git rebase -Xsubtree=files_subtree --keep-empty --preserve-merges --onto files-master master &&
>  	verbose test "$(commit_message HEAD)" = "Empty commit"
> @@ -96,7 +106,9 @@ test_expect_failure REBASE_P \
>  test_expect_failure 'Rebase -Xsubtree --onto commit 4' '
>  	reset_rebase &&
>  	git checkout -b rebase-onto-4 master &&
> -	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
> +	git fast-export --no-data HEAD -- files_subtree/ \
> +		| sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" \
> +		| git fast-import --force --quiet &&
>  	git commit -m "Empty commit" --allow-empty &&
>  	git rebase -Xsubtree=files_subtree --onto files-master master &&
>  	verbose test "$(commit_message HEAD~2)" = "files_subtree/master4"
> @@ -106,7 +118,9 @@ test_expect_failure 'Rebase -Xsubtree --onto commit 4' '
>  test_expect_failure 'Rebase -Xsubtree --onto commit 5' '
>  	reset_rebase &&
>  	git checkout -b rebase-onto-5 master &&
> -	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
> +	git fast-export --no-data HEAD -- files_subtree/ \
> +		| sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" \
> +		| git fast-import --force --quiet &&
>  	git commit -m "Empty commit" --allow-empty &&
>  	git rebase -Xsubtree=files_subtree --onto files-master master &&
>  	verbose test "$(commit_message HEAD~)" = "files_subtree/master5"
> @@ -115,7 +129,9 @@ test_expect_failure 'Rebase -Xsubtree --onto commit 5' '
>  test_expect_failure 'Rebase -Xsubtree --onto empty commit' '
>  	reset_rebase &&
>  	git checkout -b rebase-onto-empty master &&
> -	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
> +	git fast-export --no-data HEAD -- files_subtree/ \
> +		| sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" \
> +		| git fast-import --force --quiet &&
>  	git commit -m "Empty commit" --allow-empty &&
>  	git rebase -Xsubtree=files_subtree --onto files-master master &&
>  	verbose test "$(commit_message HEAD)" = "Empty commit"
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH 3/5] git-sh-i18n: work with external scripts
  2019-08-26 23:52         ` [RFC PATCH 3/5] git-sh-i18n: work with external scripts Elijah Newren
@ 2019-08-27  1:28           ` Derrick Stolee
  0 siblings, 0 replies; 73+ messages in thread
From: Derrick Stolee @ 2019-08-27  1:28 UTC (permalink / raw)
  To: Elijah Newren, git
  Cc: Junio C Hamano, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder

On 8/26/2019 7:52 PM, Elijah Newren wrote:
> Scripts external to git could source $(git --exec-path)/git-sh-setup (as
> we document in Documentation/git-sh-setup.txt).  This will in turn
> source git-sh-i18n, which will setup some handy internationalization
> infrastructure.  However, git-sh-i18n hardcodes the TEXTDOMAIN, meaning
> that anyone using this infrastructure will only get translations that
> are shipped with git.  Allow the external scripts to specify their own
> translation domain but otherwise use our infrastructure for accessing
> translations.
> 
> My original plan was to have git-filter-branch be the first testcase
> using this feature, with a goal of minimizing the number of changes that
> needed to be made to it when I moved it out of git.git.  However, I
> realized after creating this patch that no strings in git-filter-branch
> are translated.  However, the generalization could be useful if we move
> other tools from git.git to an external location.

Hm. Maybe extract it from this series for the YAGNI principle?

A noble goal, but in my opinion I'd prefer to see this be exercised
by something immediately.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH 4/5] Recommend git-filter-repo instead of git-filter-branch in documentation
  2019-08-26 23:52         ` [RFC PATCH 4/5] Recommend git-filter-repo instead of git-filter-branch in documentation Elijah Newren
@ 2019-08-27  1:32           ` Derrick Stolee
  2019-08-27  6:23             ` Elijah Newren
  0 siblings, 1 reply; 73+ messages in thread
From: Derrick Stolee @ 2019-08-27  1:32 UTC (permalink / raw)
  To: Elijah Newren, git
  Cc: Junio C Hamano, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder

On 8/26/2019 7:52 PM, Elijah Newren wrote:
> filter-branch suffers from a huge number of pitfalls that can result in
> incorrectly rewritten history, and many of the problems can easily go
> undetected until the new repository is in use.  This can result in
> problems ranging from an even messier history than what led folks to
> filter-branch in the first place, to data loss or corruption.  These
> issues cannot be backward compatibly fixed, so add a warning to the
> filter-branch manpage about this and recommand that another tool (such
> as filter-repo) be used instead.
> 
> Also, update other manpages that referenced filter-branch.  Several of
> these needed updates even if we could continue recommending
> filter-branch, either due to implying that something was unique to
> filter-branch when it applied more generally to all history rewriting
> tools (e.g. BFG, reposurgeon, fast-import, filter-repo), or because
> something about filter-branch was used as an example despite other more
> commonly known examples now existing.  Reword these sections to fix
> these issues and to avoid recommending filter-branch.
> 
> Finally, remove the section explaining BFG Repo Cleaner as an
> alternative to filter-branch.  I feel somewhat bad about this,
> especially since I feel like I learned so much from BFG that I put to
> good use in filter-repo (which is much more than I can say for
> filter-branch), but keeping that section presented a few problems:
>   * In order to recommend that people quit using filter-branch, we need
>     to provide them a recomendation for something else to use that
>     can handle all the same types of rewrites.  To my knowledge,
>     filter-repo is the only such tool.  So it needs to be mentioned.
>   * I don't want to give conflicting recommendations to users
>   * If we recommend two tools, we shouldn't expect users to learn both
>     and pick which one to use; we should explain which problems one
>     can solve that the other can't or when one is much faster than
>     the other.
>   * BFG and filter-repo have similar performance
>   * All filtering types that BFG can do, filter-repo can also do.  In
>     fact, filter-repo comes with a reimplementation of BFG named
>     bfg-ish which provides the same user-interface as BFG but with
>     several bugfixes and new features that are hard to implement in
>     BFG due to its technical underpinnings.
> While I could still mention both tools, it seems like I would need to
> provide some kind of comparison and I would ultimately just say that
> filter-repo can do everything BFG can, so ultimately it seems that it
> is just better to remove that section altogether.
> 
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  Documentation/git-fast-export.txt   |  6 ++---
>  Documentation/git-filter-branch.txt | 42 ++++++++---------------------
>  Documentation/git-gc.txt            | 17 ++++++------
>  Documentation/git-rebase.txt        |  2 +-
>  Documentation/git-replace.txt       | 10 +++----
>  Documentation/git-svn.txt           |  4 +--
>  Documentation/githooks.txt          |  7 ++---
>  contrib/svn-fe/svn-fe.txt           |  4 +--
>  8 files changed, 36 insertions(+), 56 deletions(-)
> 
> diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
> index cc940eb9ad..784e934009 100644
> --- a/Documentation/git-fast-export.txt
> +++ b/Documentation/git-fast-export.txt
> @@ -17,9 +17,9 @@ This program dumps the given revisions in a form suitable to be piped
>  into 'git fast-import'.
>  
>  You can use it as a human-readable bundle replacement (see
> -linkgit:git-bundle[1]), or as a kind of an interactive
> -'git filter-branch'.
> -
> +linkgit:git-bundle[1]), or as a format that can be edited before being
> +fed to 'git fast-import' in order to do history rewrites (an ability
> +relied on by tools like 'git filter-repo').
>  
>  OPTIONS
>  -------
> diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
> index 6b53dd7e06..8c586eed55 100644
> --- a/Documentation/git-filter-branch.txt
> +++ b/Documentation/git-filter-branch.txt
> @@ -16,6 +16,17 @@ SYNOPSIS
>  	[--original <namespace>] [-d <directory>] [-f | --force]
>  	[--state-branch <branch>] [--] [<rev-list options>...]
>  
> +WARNING
> +-------
> +'git filter-branch' has a litany of gotchas that can and will cause
> +history to be rewritten incorrectly (in addition to abysmal
> +performance).  These issues cannot be backward compatibly fixed and as
> +such, its use is not recommended.  Please use an alternative history
> +filtering tool such as 'git filter-repo'.  If you still need to use
> +'git filter-branch', please carefully read the "Safety" section of
> +https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@mail.gmail.com/

Is it possible to present this URL as a hyperlink with a succinct
description? Maybe 'carefully read [the "Safety" section of this
message on the Git mailing list](url).' (I'm using Markdown notation
here as I don't know the equivalent for our docs.)

> +and avoid as many of the pitfalls listed there as reasonably possible.
> +
>  DESCRIPTION
>  -----------
>  Lets you rewrite Git revision history by rewriting the branches mentioned
> @@ -445,37 +456,6 @@ warned.
>    (or if your git-gc is not new enough to support arguments to
>    `--prune`, use `git repack -ad; git prune` instead).
>  
> -NOTES
> ------
> -
> -git-filter-branch allows you to make complex shell-scripted rewrites
> -of your Git history, but you probably don't need this flexibility if
> -you're simply _removing unwanted data_ like large files or passwords.
> -For those operations you may want to consider
> -http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner],
> -a JVM-based alternative to git-filter-branch, typically at least
> -10-50x faster for those use-cases, and with quite different
> -characteristics:
> -
> -* Any particular version of a file is cleaned exactly _once_. The BFG,
> -  unlike git-filter-branch, does not give you the opportunity to
> -  handle a file differently based on where or when it was committed
> -  within your history. This constraint gives the core performance
> -  benefit of The BFG, and is well-suited to the task of cleansing bad
> -  data - you don't care _where_ the bad data is, you just want it
> -  _gone_.
> -
> -* By default The BFG takes full advantage of multi-core machines,
> -  cleansing commit file-trees in parallel. git-filter-branch cleans
> -  commits sequentially (i.e. in a single-threaded manner), though it
> -  _is_ possible to write filters that include their own parallelism,
> -  in the scripts executed against each commit.
> -
> -* The http://rtyley.github.io/bfg-repo-cleaner/#examples[command options]
> -  are much more restrictive than git-filter branch, and dedicated just
> -  to the tasks of removing unwanted data- e.g:
> -  `--strip-blobs-bigger-than 1M`.
> -
>  GIT
>  ---
>  Part of the linkgit:git[1] suite
> diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
> index 247f765604..0c114ad1ca 100644
> --- a/Documentation/git-gc.txt
> +++ b/Documentation/git-gc.txt
> @@ -115,15 +115,14 @@ NOTES
>  -----
>  
>  'git gc' tries very hard not to delete objects that are referenced
> -anywhere in your repository. In
> -particular, it will keep not only objects referenced by your current set
> -of branches and tags, but also objects referenced by the index,
> -remote-tracking branches, refs saved by 'git filter-branch' in
> -refs/original/, reflogs (which may reference commits in branches
> -that were later amended or rewound), and anything else in the refs/* namespace.
> -If you are expecting some objects to be deleted and they aren't, check
> -all of those locations and decide whether it makes sense in your case to
> -remove those references.
> +anywhere in your repository. In particular, it will keep not only
> +objects referenced by your current set of branches and tags, but also
> +objects referenced by the index, remote-tracking branches, notes saved
> +by 'git notes' under refs/notes/, reflogs (which may reference commits
> +in branches that were later amended or rewound), and anything else in
> +the refs/* namespace.  If you are expecting some objects to be deleted
> +and they aren't, check all of those locations and decide whether it
> +makes sense in your case to remove those references.
>  
>  On the other hand, when 'git gc' runs concurrently with another process,
>  there is a risk of it deleting an object that the other process is using
> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> index 6156609cf7..2f201d85d4 100644
> --- a/Documentation/git-rebase.txt
> +++ b/Documentation/git-rebase.txt
> @@ -832,7 +832,7 @@ Hard case: The changes are not the same.::
>  	This happens if the 'subsystem' rebase had conflicts, or used
>  	`--interactive` to omit, edit, squash, or fixup commits; or
>  	if the upstream used one of `commit --amend`, `reset`, or
> -	`filter-branch`.
> +	a full history rewriting command like `filter-repo`.
>  
>  
>  The easy case
> diff --git a/Documentation/git-replace.txt b/Documentation/git-replace.txt
> index 246dc9943c..35595a2cd3 100644
> --- a/Documentation/git-replace.txt
> +++ b/Documentation/git-replace.txt
> @@ -123,10 +123,10 @@ The following format are available:
>  CREATING REPLACEMENT OBJECTS
>  ----------------------------
>  
> -linkgit:git-filter-branch[1], linkgit:git-hash-object[1] and
> -linkgit:git-rebase[1], among other git commands, can be used to create
> -replacement objects from existing objects. The `--edit` option can
> -also be used with 'git replace' to create a replacement object by
> +linkgit:git-hash-object[1], linkgit:git-rebase[1], and
> +linkgit:git-filter-repo[1], among other git commands, can be used to
> +create replacement objects from existing objects. The `--edit` option
> +can also be used with 'git replace' to create a replacement object by
>  editing an existing object.
>  
>  If you want to replace many blobs, trees or commits that are part of a
> @@ -148,8 +148,8 @@ pending objects.
>  SEE ALSO
>  --------
>  linkgit:git-hash-object[1]
> -linkgit:git-filter-branch[1]
>  linkgit:git-rebase[1]
> +linkgit:git-filter-repo[1]
>  linkgit:git-tag[1]
>  linkgit:git-branch[1]
>  linkgit:git-commit[1]
> diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
> index 30711625fd..f2762dd5d4 100644
> --- a/Documentation/git-svn.txt
> +++ b/Documentation/git-svn.txt
> @@ -769,9 +769,9 @@ option for (hopefully) obvious reasons.
>  +
>  This option is NOT recommended as it makes it difficult to track down
>  old references to SVN revision numbers in existing documentation, bug
> -reports and archives.  If you plan to eventually migrate from SVN to Git
> +reports, and archives.  If you plan to eventually migrate from SVN to Git
>  and are certain about dropping SVN history, consider
> -linkgit:git-filter-branch[1] instead.  filter-branch also allows
> +linkgit:git-filter-repo[1] instead.  filter-repo also allows
>  reformatting of metadata for ease-of-reading and rewriting authorship
>  info for non-"svn.authorsFile" users.
>  
> diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
> index 82cd573776..997548f5ed 100644
> --- a/Documentation/githooks.txt
> +++ b/Documentation/githooks.txt
> @@ -425,9 +425,10 @@ post-rewrite
>  
>  This hook is invoked by commands that rewrite commits
>  (linkgit:git-commit[1] when called with `--amend` and
> -linkgit:git-rebase[1]; currently `git filter-branch` does 'not' call
> -it!).  Its first argument denotes the command it was invoked by:
> -currently one of `amend` or `rebase`.  Further command-dependent
> +linkgit:git-rebase[1]; however, full-history (re)writing tools like
> +linkgit:git-fast-import[1] or linkgit:git-filter-repo[1] typically do
> +not call it!).  Its first argument denotes the command it was invoked
> +by: currently one of `amend` or `rebase`.  Further command-dependent
>  arguments may be passed in the future.
>  
>  The hook receives a list of the rewritten commits on stdin, in the
> diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
> index a3425f4770..19333fc8df 100644
> --- a/contrib/svn-fe/svn-fe.txt
> +++ b/contrib/svn-fe/svn-fe.txt
> @@ -56,7 +56,7 @@ line.  This line has the form `git-svn-id: URL@REVNO UUID`.
>  
>  The resulting repository will generally require further processing
>  to put each project in its own repository and to separate the history
> -of each branch.  The 'git filter-branch --subdirectory-filter' command
> +of each branch.  The 'git filter-repo --subdirectory-filter' command
>  may be useful for this purpose.
>  
>  BUGS
> @@ -67,5 +67,5 @@ The exit status does not reflect whether an error was detected.
>  
>  SEE ALSO
>  --------
> -git-svn(1), svn2git(1), svk(1), git-filter-branch(1), git-fast-import(1),
> +git-svn(1), svn2git(1), svk(1), git-filter-repo(1), git-fast-import(1),
>  https://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere
  2019-08-26 23:52       ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Elijah Newren
                           ` (4 preceding siblings ...)
  2019-08-26 23:52         ` [RFC PATCH 5/5] Remove git-filter-branch, it is now external to git.git Elijah Newren
@ 2019-08-27  1:39         ` Derrick Stolee
  2019-08-27  6:17           ` Elijah Newren
  2019-08-27  7:03         ` Eric Wong
  2019-08-28  0:22         ` [PATCH v2 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
  7 siblings, 1 reply; 73+ messages in thread
From: Derrick Stolee @ 2019-08-27  1:39 UTC (permalink / raw)
  To: Elijah Newren, git
  Cc: Junio C Hamano, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder

On 8/26/2019 7:52 PM, Elijah Newren wrote:
> Following up on the suggestion to make git.git smaller and shed non-core
> tools, here's an RFC series to do so with git-filter-branch.  This
> series first removes dependencies on git-filter-branch (of which there
> were very few), and then deletes git-filter-branch itself in the final
> commit.
> 
> I'm more than happy to consider alternate places for the filter-branch
> history (I had considered just merging it in with git-filter-repo), but
> for now I just made it available here:
>         https://github.com/newren/git-filter-branch
> 
> The rewrite above contains the history of the files deleted in Patch 5,
> plus a one-time copy of relevant build files (Makefiles, test-lib.sh,
> etc. -- I didn't want the whole history of these), and then touchups to
> streamline the build files and make them all work in this standalone
> repo.
> 
> 
> Some highlevel notes on the patches:
> 
>   * Patches 1&2: are good cleanups & performance wins regardless of
>     whether the rest of the series is taken

I agree! These are great. I just had a nit about extracting a helper
instead of copy-pasting the same three lines in multiple tests.

>   * Patch 3: an attempt to improve i18n situation for external scripts,
>     but discovered to not be necessary/useful for git-filter-branch
>     specifically

I'm not sure this is super-important now, but could be saved for a
later date, when it is important.

>   * Patch 4:
>     * If we are good with deleting git-filter-branch now and just noting
>       it in the release notes, then patch 4 could be simplified; there's
>       no need to update git-filter-branch.txt in that case.
>     * If, however, we want to do some external messaging for an
>       additional release cycle or two before moving git-filter-branch
>       out of git.git, this patch will help us until then to at least
>       avoid recommending a tool which will likely mangle user's data in
>       unexpected ways.  But it'd be really helpful if folks could review
>       and opine on the BFG stuff if so.

I think this is a good step, and should be taken even if we never
plan to take Patch 5.

>   * Patch 5: actually deletes git-filter-branch, its tests, and
>     documentation.

This is the one where others need to chime in with opinions. I
think this one can only be taken if we have a concrete plan about
how to support the tool _somehow_, even if it is "go download the
script from this place; it may have broken since we last tested it."

Yes, we want to strongly recommend that people use newer, better
tools. That's not always something users can accept. Having the
tool live somewhere that is accessible can appease some users for
a while, and it can decay and die a slow death there.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere
  2019-08-27  1:39         ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Derrick Stolee
@ 2019-08-27  6:17           ` Elijah Newren
  0 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-27  6:17 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Git Mailing List, Junio C Hamano, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder

On Mon, Aug 26, 2019 at 6:39 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/26/2019 7:52 PM, Elijah Newren wrote:
> > Following up on the suggestion to make git.git smaller and shed non-core
> > tools, here's an RFC series to do so with git-filter-branch.  This
> > series first removes dependencies on git-filter-branch (of which there
> > were very few), and then deletes git-filter-branch itself in the final
> > commit.
> >
> > I'm more than happy to consider alternate places for the filter-branch
> > history (I had considered just merging it in with git-filter-repo), but
> > for now I just made it available here:
> >         https://github.com/newren/git-filter-branch
> >
> > The rewrite above contains the history of the files deleted in Patch 5,
> > plus a one-time copy of relevant build files (Makefiles, test-lib.sh,
> > etc. -- I didn't want the whole history of these), and then touchups to
> > streamline the build files and make them all work in this standalone
> > repo.
> >
> >
> > Some highlevel notes on the patches:
> >
> >   * Patches 1&2: are good cleanups & performance wins regardless of
> >     whether the rest of the series is taken
>
> I agree! These are great. I just had a nit about extracting a helper
> instead of copy-pasting the same three lines in multiple tests.
>
> >   * Patch 3: an attempt to improve i18n situation for external scripts,
> >     but discovered to not be necessary/useful for git-filter-branch
> >     specifically
>
> I'm not sure this is super-important now, but could be saved for a
> later date, when it is important.
>
> >   * Patch 4:
> >     * If we are good with deleting git-filter-branch now and just noting
> >       it in the release notes, then patch 4 could be simplified; there's
> >       no need to update git-filter-branch.txt in that case.
> >     * If, however, we want to do some external messaging for an
> >       additional release cycle or two before moving git-filter-branch
> >       out of git.git, this patch will help us until then to at least
> >       avoid recommending a tool which will likely mangle user's data in
> >       unexpected ways.  But it'd be really helpful if folks could review
> >       and opine on the BFG stuff if so.
>
> I think this is a good step, and should be taken even if we never
> plan to take Patch 5.
>
> >   * Patch 5: actually deletes git-filter-branch, its tests, and
> >     documentation.
>
> This is the one where others need to chime in with opinions. I
> think this one can only be taken if we have a concrete plan about
> how to support the tool _somehow_, even if it is "go download the
> script from this place; it may have broken since we last tested it."
>
> Yes, we want to strongly recommend that people use newer, better
> tools. That's not always something users can accept. Having the
> tool live somewhere that is accessible can appease some users for
> a while, and it can decay and die a slow death there.

Perhaps I should add some more words about the separate repo I
created; even though it wasn't one of the five patches in this series
it actually represented the lionshare of the work before I submitted
this.  Anyway, it has a Makefile which supports the normal 'test',
'doc', 'clean', 'install' (and variants), and 'dist' (and variants).
The 'test' target will run all the filter-branch tests taken from the
git.git testsuite (i.e. t7003 and t7009) without requiring a version
of git built inside that separate repo, 'doc' will build both html and
manpages, etc.  It doesn't look at any config.mak* files and has
stripped out lots of stuff from the main repo, but it's relatively
minimal and self-contained beyond an assumption that a normal copy of
git has been installed somehow already.  Given how infrequently
filter-branch has needed fixes in the past, and the fact that it is
pretty good about relying on plumbing rather than porcelain, I suspect
it might actually be a pretty light maintenance load to keep it
running for a good long time.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH 4/5] Recommend git-filter-repo instead of git-filter-branch in documentation
  2019-08-27  1:32           ` Derrick Stolee
@ 2019-08-27  6:23             ` Elijah Newren
  0 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-27  6:23 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Git Mailing List, Junio C Hamano, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder

On Mon, Aug 26, 2019 at 6:33 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/26/2019 7:52 PM, Elijah Newren wrote:

> > +WARNING
> > +-------
> > +'git filter-branch' has a litany of gotchas that can and will cause
> > +history to be rewritten incorrectly (in addition to abysmal
> > +performance).  These issues cannot be backward compatibly fixed and as
> > +such, its use is not recommended.  Please use an alternative history
> > +filtering tool such as 'git filter-repo'.  If you still need to use
> > +'git filter-branch', please carefully read the "Safety" section of
> > +https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@mail.gmail.com/
>
> Is it possible to present this URL as a hyperlink with a succinct
> description? Maybe 'carefully read [the "Safety" section of this
> message on the Git mailing list](url).' (I'm using Markdown notation
> here as I don't know the equivalent for our docs.)

Looks like the syntax is
  URL[description]
e.g.
  https://public-inbox.org/git/CABPp-BEDOH-etc-etc-etc@mail.gmail.com[the
"Safety" section of yadda yadda]

I'll fix that up.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere
  2019-08-26 23:52       ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Elijah Newren
                           ` (5 preceding siblings ...)
  2019-08-27  1:39         ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Derrick Stolee
@ 2019-08-27  7:03         ` Eric Wong
  2019-08-27  8:43           ` Sergey Organov
  2019-08-28  0:22         ` [PATCH v2 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
  7 siblings, 1 reply; 73+ messages in thread
From: Eric Wong @ 2019-08-27  7:03 UTC (permalink / raw)
  To: Elijah Newren
  Cc: git, Junio C Hamano, Derrick Stolee, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder

Elijah Newren <newren@gmail.com> wrote:
> Some highlevel notes on the patches:
> 
>   * Patches 1&2: are good cleanups & performance wins regardless of
>     whether the rest of the series is taken

Agreed.  Though weren't we moving away from pipes in tests
because failures could go unnoticed?  (I haven't been paying too
much attention, though)

>   * Patch 5: actually deletes git-filter-branch, its tests, and
>     documentation.

Given how long we've had git-filter-branch, I suggest we keep it
around but add a warning at runtime notifying users of it's
impending removal.  Such warning should remain for >=10 years or
whatever an distro support cycle is, nowadays.

And there should probably be a way to disable the warning via
git-config if it's too annoying.

AFAIK, filter-branch is not causing support headaches for any
git developers today.  With so many commands in git, it's
unlikely newbies will ever get around to discover it :)
So I think think we should be in any rush to remove it.

But I agree that filter-branch isn't useful and certainly
shouldn't be encouraged/promoted.

Yet there's probably still users which ARE happy with it, that
will never hit the edge cases and problems it poses; and will
never read release notes.  And said users are probably getting
git from a slow-moving distro, so it'd be a disservice to them
if they lost a tool they depend on without any warning.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere
  2019-08-27  7:03         ` Eric Wong
@ 2019-08-27  8:43           ` Sergey Organov
  2019-08-27 22:18             ` Elijah Newren
  0 siblings, 1 reply; 73+ messages in thread
From: Sergey Organov @ 2019-08-27  8:43 UTC (permalink / raw)
  To: Eric Wong
  Cc: Elijah Newren, git, Junio C Hamano, Derrick Stolee, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder

Eric Wong <e@80x24.org> writes:


[...]

> AFAIK, filter-branch is not causing support headaches for any
> git developers today.  With so many commands in git, it's
> unlikely newbies will ever get around to discover it :)
> So I think think we should be in any rush to remove it.

Nah, discovering it is simple. Just Google for "git change author". That
eventually leads to a script that uses "git filter-branch --env-filter"
to get the job done, and I'm afraid it is spread all over the world.

See, e.g.:

https://help.github.com/en/articles/changing-author-info

> But I agree that filter-branch isn't useful and certainly
> shouldn't be encouraged/promoted.

Well, is there more suitable way to change author for a (large) set of
commits then?

> Yet there's probably still users which ARE happy with it, that
> will never hit the edge cases and problems it poses; and will
> never read release notes.  And said users are probably getting
> git from a slow-moving distro, so it'd be a disservice to them
> if they lost a tool they depend on without any warning.

Personally, I'm far from happy with it, but I have no clue how to
substitute it in the job above. Anybody?

-- Sergey

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere
  2019-08-27  8:43           ` Sergey Organov
@ 2019-08-27 22:18             ` Elijah Newren
  2019-08-28  8:52               ` Sergey Organov
  0 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-08-27 22:18 UTC (permalink / raw)
  To: Sergey Organov
  Cc: Eric Wong, Git Mailing List, Junio C Hamano, Derrick Stolee,
	Jeff King, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Lars Schneider, Jonathan Nieder

On Tue, Aug 27, 2019 at 1:43 AM Sergey Organov <sorganov@gmail.com> wrote:
>
> Eric Wong <e@80x24.org> writes:
>
>
> [...]
>
> > AFAIK, filter-branch is not causing support headaches for any
> > git developers today.  With so many commands in git, it's
> > unlikely newbies will ever get around to discover it :)
> > So I think think we should be in any rush to remove it.
>
> Nah, discovering it is simple. Just Google for "git change author". That
> eventually leads to a script that uses "git filter-branch --env-filter"
> to get the job done, and I'm afraid it is spread all over the world.
>
> See, e.g.:
>
> https://help.github.com/en/articles/changing-author-info

Side note: Is the goal to "fix names and email addresses in this
repository"?  If so, this guide fails: it doesn't update tagger names
or email addresses.  Indeed, filter-branch doesn't provide a way to do
that.  (Not to mention other problems like not updating references to
commit hashes in commit messages when it busy rewriting everything.)

> > But I agree that filter-branch isn't useful and certainly
> > shouldn't be encouraged/promoted.
>
> Well, is there more suitable way to change author for a (large) set of
> commits then?

I would say yes, use git filter-repo (note that this thread started
with me proposing filter-repo for inclusion in git.git -- and getting
suggestions that we should remove stuff instead of adding more stuff).
I'm biased, but I think it's much better at this particular job as
well:


You can create a mailmap file and pass it to the --mailmap option to
git-filter-repo.

Or, if you prefer (perhaps you don't like git's mailmap format as used
by shortlog and now log, or perhaps you really want to be able to do
regex replacement or something), you can use the --name-callback or
--email-callback to work on those fields more directly.

Or, if you prefer (e.g. you want to handle author vs. committer vs.
tagger differently), you can use the --commit-callback and
--tag-callback filters.


As an added bonus, filter-repo will also perform the rewrite far
faster than filter-branch (and rewrite commit hashes in commit
messages as alluded to above).

> > Yet there's probably still users which ARE happy with it, that
> > will never hit the edge cases and problems it poses; and will
> > never read release notes.  And said users are probably getting
> > git from a slow-moving distro, so it'd be a disservice to them
> > if they lost a tool they depend on without any warning.
>
> Personally, I'm far from happy with it, but I have no clue how to
> substitute it in the job above. Anybody?

The start of this thread where I proposed git filter-repo for
inclusion in git[1] had links to documentation and comparisons to
other tools and such.  You may find those links helpful; if not, let
me know what needs to be fixed in the documentation.

Elijah

[1] https://public-inbox.org/git/CABPp-BEr8LVM+yWTbi76hAq7Moe1hyp2xqxXfgVV4_teh_9skA@mail.gmail.com/

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v2 0/4] Warn about git-filter-branch usage and avoid it
  2019-08-26 23:52       ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Elijah Newren
                           ` (6 preceding siblings ...)
  2019-08-27  7:03         ` Eric Wong
@ 2019-08-28  0:22         ` Elijah Newren
  2019-08-28  0:22           ` [PATCH v2 1/4] t6006: simplify and optimize empty message test Elijah Newren
                             ` (5 more replies)
  7 siblings, 6 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-28  0:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Elijah Newren

Here's a series that shifts the focus slightly to warning about
git-filter-branch usage and avoiding it ourselves.  I have retained
patch 4 but left it marked as RFC for further discussion.  It appears
that folks generally seem to agree the first three patches are good
to include now -- assuming my small fixes correctly address their
requests and suggestions.

Changes since v1 (full range-diff below):
  * I might have had a little fun with a thesaurus (just trying to give
    reviewers something small to smile about...)
  * addressed feedback from Eric and Stolee, as detailed below
  * [Patch 2] factored out some common code
  * [Patch 3] fixed links in asciidoc documentation to make them more
    readable in both manpages and html format
  * [Patch 3] added a warning blurb to git-filter-branch itself

In particular, it'd be helpful if people could take a look at the changes
to git-filter-branch.sh in patch 3 and comment on whether an environment
variable is fine or if we should make it a config setting or something.

Elijah Newren (4):
  t6006: simplify and optimize empty message test
  t3427: accelerate this test by using fast-export and fast-import
  Recommend git-filter-repo instead of git-filter-branch
  [RFC] Remove git-filter-branch, it is now external to git.git

 .gitignore                          |   1 -
 Documentation/git-fast-export.txt   |   6 +-
 Documentation/git-filter-branch.txt | 481 --------------------
 Documentation/git-gc.txt            |  17 +-
 Documentation/git-rebase.txt        |   2 +-
 Documentation/git-replace.txt       |  10 +-
 Documentation/git-svn.txt           |   4 +-
 Documentation/githooks.txt          |   7 +-
 Makefile                            |   1 -
 command-list.txt                    |   1 -
 contrib/svn-fe/svn-fe.txt           |   4 +-
 git-filter-branch.sh                | 662 ----------------------------
 t/perf/p7000-filter-branch.sh       |  24 -
 t/t3427-rebase-subtree.sh           |  22 +-
 t/t6006-rev-list-format.sh          |   5 +-
 t/t7003-filter-branch.sh            | 505 ---------------------
 t/t7009-filter-branch-null-sha1.sh  |  55 ---
 t/t9902-completion.sh               |  12 +-
 18 files changed, 47 insertions(+), 1772 deletions(-)
 delete mode 100644 Documentation/git-filter-branch.txt
 delete mode 100755 git-filter-branch.sh
 delete mode 100755 t/perf/p7000-filter-branch.sh
 delete mode 100755 t/t7003-filter-branch.sh
 delete mode 100755 t/t7009-filter-branch-null-sha1.sh

Range-diff:
1:  7ddbeea2ca = 1:  7ddbeea2ca t6006: simplify and optimize empty message test
2:  0172ca771e < -:  ---------- t3427: accelerate this test by using fast-export and fast-import
3:  b814cc7b65 < -:  ---------- git-sh-i18n: work with external scripts
-:  ---------- > 2:  f18bd7a609 t3427: accelerate this test by using fast-export and fast-import
4:  dcec36d113 ! 3:  7008c16984 Recommend git-filter-repo instead of git-filter-branch in documentation
    @@ Metadata
     Author: Elijah Newren <newren@gmail.com>
     
      ## Commit message ##
    -    Recommend git-filter-repo instead of git-filter-branch in documentation
    +    Recommend git-filter-repo instead of git-filter-branch
     
    -    filter-branch suffers from a huge number of pitfalls that can result in
    -    incorrectly rewritten history, and many of the problems can easily go
    -    undetected until the new repository is in use.  This can result in
    -    problems ranging from an even messier history than what led folks to
    -    filter-branch in the first place, to data loss or corruption.  These
    -    issues cannot be backward compatibly fixed, so add a warning to the
    -    filter-branch manpage about this and recommand that another tool (such
    -    as filter-repo) be used instead.
    +    filter-branch suffers from a deluge of disguised dangers that disfigure
    +    history rewrites (i.e. deviate from the deliberate changes).  Many of
    +    these problems are unobtrusive and can easily go undiscovered until the
    +    new repository is in use.  This can result in problems ranging from an
    +    even messier history than what led folks to filter-branch in the first
    +    place, to data loss or corruption.  These issues cannot be backward
    +    compatibly fixed, so add a warning to both filter-branch and its manpage
    +    recommending that another tool (such as filter-repo) be used instead.
     
         Also, update other manpages that referenced filter-branch.  Several of
         these needed updates even if we could continue recommending
    @@ Documentation/git-filter-branch.txt: SYNOPSIS
      
     +WARNING
     +-------
    -+'git filter-branch' has a litany of gotchas that can and will cause
    -+history to be rewritten incorrectly (in addition to abysmal
    -+performance).  These issues cannot be backward compatibly fixed and as
    -+such, its use is not recommended.  Please use an alternative history
    -+filtering tool such as 'git filter-repo'.  If you still need to use
    -+'git filter-branch', please carefully read the "Safety" section of
    -+https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@mail.gmail.com/
    -+and avoid as many of the pitfalls listed there as reasonably possible.
    ++'git filter-branch' has a plethora of pitfalls that can produce non-obvious
    ++manglings of the intended history rewrite (and can leave you with little
    ++time to investigate such problems since it has such abysmal performance).
    ++These safety and performance issues cannot be backward compatibly fixed and
    ++as such, its use is not recommended.  Please use an alternative history
    ++filtering tool such as https://github.com/newren/git-filter-repo/[git
    ++filter-repo].  If you still need to use 'git filter-branch', please
    ++carefully read the "Safety" section of the message on the Git mailing list
    ++https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@mail.gmail.com/[detailing
    ++the land mines of filter-branch] and vigilantly avoid as many of the
    ++hazards listed there as reasonably possible.
     +
      DESCRIPTION
      -----------
    @@ contrib/svn-fe/svn-fe.txt: The exit status does not reflect whether an error was
     -git-svn(1), svn2git(1), svk(1), git-filter-branch(1), git-fast-import(1),
     +git-svn(1), svn2git(1), svk(1), git-filter-repo(1), git-fast-import(1),
      https://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt
    +
    + ## git-filter-branch.sh (mode change 100755 => 100644) ##
    +@@ git-filter-branch.sh: set_ident () {
    + 	finish_ident COMMITTER
    + }
    + 
    ++if [ -z "$FILTER_BRANCH_SQUELCH_WARNING" -a \
    ++     -z "$GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS" ]; then
    ++	cat <<EOF
    ++WARNING: git-filter-branch has a glut of gotchas generating mangled history
    ++         rewrites.  Please use an alternative filtering tool such as 'git
    ++         filter-repo' (https://github.com/newren/git-filter-repo/) instead.
    ++         See the filter-branch manual page for more details; to squelch
    ++         this warning and pause, set FILTER_BRANCH_SQUELCH_WARNING=1.
    ++
    ++EOF
    ++	sleep 5
    ++fi
    ++
    + USAGE="[--setup <command>] [--subdirectory-filter <directory>] [--env-filter <command>]
    + 	[--tree-filter <command>] [--index-filter <command>]
    + 	[--parent-filter <command>] [--msg-filter <command>]
5:  9dec8e06ee ! 4:  ff3e04e558 Remove git-filter-branch, it is now external to git.git
    @@ Metadata
      ## Commit message ##
         Remove git-filter-branch, it is now external to git.git
     
    +    git-filter-branch still exists, still has the same regression tests,
    +    etc., but it is now being tracked in a separate repo that users will
    +    need to download separately.
    +
         Signed-off-by: Elijah Newren <newren@gmail.com>
     
      ## .gitignore ##
    @@ Documentation/git-filter-branch.txt (deleted)
     -
     -WARNING
     --------
    --'git filter-branch' has a litany of gotchas that can and will cause
    --history to be rewritten incorrectly (in addition to abysmal
    --performance).  These issues cannot be backward compatibly fixed and as
    --such, its use is not recommended.  Please use an alternative history
    --filtering tool such as 'git filter-repo'.  If you still need to use
    --'git filter-branch', please carefully read the "Safety" section of
    --https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@mail.gmail.com/
    --and avoid as many of the pitfalls listed there as reasonably possible.
    +-'git filter-branch' has a plethora of pitfalls that can produce non-obvious
    +-manglings of the intended history rewrite (and can leave you with little
    +-time to investigate such problems since it has such abysmal performance).
    +-These safety and performance issues cannot be backward compatibly fixed and
    +-as such, its use is not recommended.  Please use an alternative history
    +-filtering tool such as https://github.com/newren/git-filter-repo/[git
    +-filter-repo].  If you still need to use 'git filter-branch', please
    +-carefully read the "Safety" section of the message on the Git mailing list
    +-https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@mail.gmail.com/[detailing
    +-the land mines of filter-branch] and vigilantly avoid as many of the
    +-hazards listed there as reasonably possible.
     -
     -DESCRIPTION
     ------------
    @@ git-filter-branch.sh (deleted)
     -	finish_ident COMMITTER
     -}
     -
    +-if [ -z "$FILTER_BRANCH_SQUELCH_WARNING" -a \
    +-     -z "$GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS" ]; then
    +-	cat <<EOF
    +-WARNING: git-filter-branch has a glut of gotchas generating mangled history
    +-         rewrites.  Please use an alternative filtering tool such as 'git
    +-         filter-repo' (https://github.com/newren/git-filter-repo/) instead.
    +-         See the filter-branch manual page for more details; to squelch
    +-         this warning and pause, set FILTER_BRANCH_SQUELCH_WARNING=1.
    +-
    +-EOF
    +-	sleep 5
    +-fi
    +-
     -USAGE="[--setup <command>] [--subdirectory-filter <directory>] [--env-filter <command>]
     -	[--tree-filter <command>] [--index-filter <command>]
     -	[--parent-filter <command>] [--msg-filter <command>]
-- 
2.23.0.3.gcc10030edf.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v2 1/4] t6006: simplify and optimize empty message test
  2019-08-28  0:22         ` [PATCH v2 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
@ 2019-08-28  0:22           ` Elijah Newren
  2019-08-28  0:22           ` [PATCH v2 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
                             ` (4 subsequent siblings)
  5 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-28  0:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Elijah Newren

Test t6006.71 ("oneline with empty message") was creating two commits
with simple commit messages, and then running filter-branch to rewrite
the commit messages to be empty.  This test was written this way because
the --allow-empty-message option to git commit did not exist at the
time.  Simplify this test and avoid the need to invoke filter-branch by
just using --allow-empty-message when creating the commit.

Despite only being one piece of the 71st test and there being 73 tests
overall, this small change to just this one test speeds up the overall
execution time of t6006 (as measured by the best of 3 runs of `time
./t6006-rev-list-format.sh`) by about 11% on Linux and by 13% on
Mac.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t6006-rev-list-format.sh | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
index da113d975b..d30e41c9f7 100755
--- a/t/t6006-rev-list-format.sh
+++ b/t/t6006-rev-list-format.sh
@@ -501,9 +501,8 @@ test_expect_success 'reflog identity' '
 '
 
 test_expect_success 'oneline with empty message' '
-	git commit -m "dummy" --allow-empty &&
-	git commit -m "dummy" --allow-empty &&
-	git filter-branch --msg-filter "sed -e s/dummy//" HEAD^^.. &&
+	git commit --allow-empty --allow-empty-message &&
+	git commit --allow-empty --allow-empty-message &&
 	git rev-list --oneline HEAD >test.txt &&
 	test_line_count = 5 test.txt &&
 	git rev-list --oneline --graph HEAD >testg.txt &&
-- 
2.23.0.3.gcc10030edf.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v2 2/4] t3427: accelerate this test by using fast-export and fast-import
  2019-08-28  0:22         ` [PATCH v2 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
  2019-08-28  0:22           ` [PATCH v2 1/4] t6006: simplify and optimize empty message test Elijah Newren
@ 2019-08-28  0:22           ` Elijah Newren
  2019-08-28  6:00             ` Eric Sunshine
  2019-08-28  0:22           ` [PATCH v2 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
                             ` (3 subsequent siblings)
  5 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-08-28  0:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Elijah Newren

fast-export and fast-import can easily handle the simple rewrite that
was being done by filter-branch, and should be significantly faster on
systems with a slow fork.  Timings from before and after on two laptops
that I have access to (measured via `time ./t3427-rebase-subtree.sh`,
i.e. including everything in this test -- not just the filter-branch or
fast-export/fast-import pair):

   Linux:  4.305s -> 3.684s (~17% speedup)
   Mac:   10.128s -> 7.038s (~30% speedup)

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t3427-rebase-subtree.sh | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/t/t3427-rebase-subtree.sh b/t/t3427-rebase-subtree.sh
index d8640522a0..943ae92226 100755
--- a/t/t3427-rebase-subtree.sh
+++ b/t/t3427-rebase-subtree.sh
@@ -11,6 +11,12 @@ commit_message() {
 	git log --pretty=format:%s -1 "$1"
 }
 
+extract_files_subtree() {
+	git fast-export --no-data HEAD -- files_subtree/ \
+		| sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" \
+		| git fast-import --force --quiet
+}
+
 test_expect_success 'setup' '
 	test_commit README &&
 	mkdir files &&
@@ -42,7 +48,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --preserve-merges --onto commit 4' '
 	reset_rebase &&
 	git checkout -b rebase-preserve-merges-4 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD~)" = "files_subtree/master4"
@@ -53,7 +59,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --preserve-merges --onto commit 5' '
 	reset_rebase &&
 	git checkout -b rebase-preserve-merges-5 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD)" = "files_subtree/master5"
@@ -64,7 +70,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --keep-empty --preserve-merges --onto commit 4' '
 	reset_rebase &&
 	git checkout -b rebase-keep-empty-4 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --keep-empty --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD~2)" = "files_subtree/master4"
@@ -75,7 +81,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --keep-empty --preserve-merges --onto commit 5' '
 	reset_rebase &&
 	git checkout -b rebase-keep-empty-5 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --keep-empty --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD~)" = "files_subtree/master5"
@@ -86,7 +92,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --keep-empty --preserve-merges --onto empty commit' '
 	reset_rebase &&
 	git checkout -b rebase-keep-empty-empty master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --keep-empty --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD)" = "Empty commit"
@@ -96,7 +102,7 @@ test_expect_failure REBASE_P \
 test_expect_failure 'Rebase -Xsubtree --onto commit 4' '
 	reset_rebase &&
 	git checkout -b rebase-onto-4 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --onto files-master master &&
 	verbose test "$(commit_message HEAD~2)" = "files_subtree/master4"
@@ -106,7 +112,7 @@ test_expect_failure 'Rebase -Xsubtree --onto commit 4' '
 test_expect_failure 'Rebase -Xsubtree --onto commit 5' '
 	reset_rebase &&
 	git checkout -b rebase-onto-5 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --onto files-master master &&
 	verbose test "$(commit_message HEAD~)" = "files_subtree/master5"
@@ -115,7 +121,7 @@ test_expect_failure 'Rebase -Xsubtree --onto commit 5' '
 test_expect_failure 'Rebase -Xsubtree --onto empty commit' '
 	reset_rebase &&
 	git checkout -b rebase-onto-empty master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --onto files-master master &&
 	verbose test "$(commit_message HEAD)" = "Empty commit"
-- 
2.23.0.3.gcc10030edf.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v2 3/4] Recommend git-filter-repo instead of git-filter-branch
  2019-08-28  0:22         ` [PATCH v2 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
  2019-08-28  0:22           ` [PATCH v2 1/4] t6006: simplify and optimize empty message test Elijah Newren
  2019-08-28  0:22           ` [PATCH v2 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
@ 2019-08-28  0:22           ` Elijah Newren
  2019-08-28  6:17             ` Eric Sunshine
  2019-08-28  0:22           ` [RFC PATCH v2 4/4] Remove git-filter-branch, it is now external to git.git Elijah Newren
                             ` (2 subsequent siblings)
  5 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-08-28  0:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Elijah Newren

filter-branch suffers from a deluge of disguised dangers that disfigure
history rewrites (i.e. deviate from the deliberate changes).  Many of
these problems are unobtrusive and can easily go undiscovered until the
new repository is in use.  This can result in problems ranging from an
even messier history than what led folks to filter-branch in the first
place, to data loss or corruption.  These issues cannot be backward
compatibly fixed, so add a warning to both filter-branch and its manpage
recommending that another tool (such as filter-repo) be used instead.

Also, update other manpages that referenced filter-branch.  Several of
these needed updates even if we could continue recommending
filter-branch, either due to implying that something was unique to
filter-branch when it applied more generally to all history rewriting
tools (e.g. BFG, reposurgeon, fast-import, filter-repo), or because
something about filter-branch was used as an example despite other more
commonly known examples now existing.  Reword these sections to fix
these issues and to avoid recommending filter-branch.

Finally, remove the section explaining BFG Repo Cleaner as an
alternative to filter-branch.  I feel somewhat bad about this,
especially since I feel like I learned so much from BFG that I put to
good use in filter-repo (which is much more than I can say for
filter-branch), but keeping that section presented a few problems:
  * In order to recommend that people quit using filter-branch, we need
    to provide them a recomendation for something else to use that
    can handle all the same types of rewrites.  To my knowledge,
    filter-repo is the only such tool.  So it needs to be mentioned.
  * I don't want to give conflicting recommendations to users
  * If we recommend two tools, we shouldn't expect users to learn both
    and pick which one to use; we should explain which problems one
    can solve that the other can't or when one is much faster than
    the other.
  * BFG and filter-repo have similar performance
  * All filtering types that BFG can do, filter-repo can also do.  In
    fact, filter-repo comes with a reimplementation of BFG named
    bfg-ish which provides the same user-interface as BFG but with
    several bugfixes and new features that are hard to implement in
    BFG due to its technical underpinnings.
While I could still mention both tools, it seems like I would need to
provide some kind of comparison and I would ultimately just say that
filter-repo can do everything BFG can, so ultimately it seems that it
is just better to remove that section altogether.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-fast-export.txt   |  6 ++--
 Documentation/git-filter-branch.txt | 45 +++++++++--------------------
 Documentation/git-gc.txt            | 17 +++++------
 Documentation/git-rebase.txt        |  2 +-
 Documentation/git-replace.txt       | 10 +++----
 Documentation/git-svn.txt           |  4 +--
 Documentation/githooks.txt          |  7 +++--
 contrib/svn-fe/svn-fe.txt           |  4 +--
 git-filter-branch.sh                | 13 +++++++++
 9 files changed, 52 insertions(+), 56 deletions(-)
 mode change 100755 => 100644 git-filter-branch.sh

diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
index cc940eb9ad..784e934009 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -17,9 +17,9 @@ This program dumps the given revisions in a form suitable to be piped
 into 'git fast-import'.
 
 You can use it as a human-readable bundle replacement (see
-linkgit:git-bundle[1]), or as a kind of an interactive
-'git filter-branch'.
-
+linkgit:git-bundle[1]), or as a format that can be edited before being
+fed to 'git fast-import' in order to do history rewrites (an ability
+relied on by tools like 'git filter-repo').
 
 OPTIONS
 -------
diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
index 6b53dd7e06..e4047d472e 100644
--- a/Documentation/git-filter-branch.txt
+++ b/Documentation/git-filter-branch.txt
@@ -16,6 +16,20 @@ SYNOPSIS
 	[--original <namespace>] [-d <directory>] [-f | --force]
 	[--state-branch <branch>] [--] [<rev-list options>...]
 
+WARNING
+-------
+'git filter-branch' has a plethora of pitfalls that can produce non-obvious
+manglings of the intended history rewrite (and can leave you with little
+time to investigate such problems since it has such abysmal performance).
+These safety and performance issues cannot be backward compatibly fixed and
+as such, its use is not recommended.  Please use an alternative history
+filtering tool such as https://github.com/newren/git-filter-repo/[git
+filter-repo].  If you still need to use 'git filter-branch', please
+carefully read the "Safety" section of the message on the Git mailing list
+https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@mail.gmail.com/[detailing
+the land mines of filter-branch] and vigilantly avoid as many of the
+hazards listed there as reasonably possible.
+
 DESCRIPTION
 -----------
 Lets you rewrite Git revision history by rewriting the branches mentioned
@@ -445,37 +459,6 @@ warned.
   (or if your git-gc is not new enough to support arguments to
   `--prune`, use `git repack -ad; git prune` instead).
 
-NOTES
------
-
-git-filter-branch allows you to make complex shell-scripted rewrites
-of your Git history, but you probably don't need this flexibility if
-you're simply _removing unwanted data_ like large files or passwords.
-For those operations you may want to consider
-http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner],
-a JVM-based alternative to git-filter-branch, typically at least
-10-50x faster for those use-cases, and with quite different
-characteristics:
-
-* Any particular version of a file is cleaned exactly _once_. The BFG,
-  unlike git-filter-branch, does not give you the opportunity to
-  handle a file differently based on where or when it was committed
-  within your history. This constraint gives the core performance
-  benefit of The BFG, and is well-suited to the task of cleansing bad
-  data - you don't care _where_ the bad data is, you just want it
-  _gone_.
-
-* By default The BFG takes full advantage of multi-core machines,
-  cleansing commit file-trees in parallel. git-filter-branch cleans
-  commits sequentially (i.e. in a single-threaded manner), though it
-  _is_ possible to write filters that include their own parallelism,
-  in the scripts executed against each commit.
-
-* The http://rtyley.github.io/bfg-repo-cleaner/#examples[command options]
-  are much more restrictive than git-filter branch, and dedicated just
-  to the tasks of removing unwanted data- e.g:
-  `--strip-blobs-bigger-than 1M`.
-
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 247f765604..0c114ad1ca 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -115,15 +115,14 @@ NOTES
 -----
 
 'git gc' tries very hard not to delete objects that are referenced
-anywhere in your repository. In
-particular, it will keep not only objects referenced by your current set
-of branches and tags, but also objects referenced by the index,
-remote-tracking branches, refs saved by 'git filter-branch' in
-refs/original/, reflogs (which may reference commits in branches
-that were later amended or rewound), and anything else in the refs/* namespace.
-If you are expecting some objects to be deleted and they aren't, check
-all of those locations and decide whether it makes sense in your case to
-remove those references.
+anywhere in your repository. In particular, it will keep not only
+objects referenced by your current set of branches and tags, but also
+objects referenced by the index, remote-tracking branches, notes saved
+by 'git notes' under refs/notes/, reflogs (which may reference commits
+in branches that were later amended or rewound), and anything else in
+the refs/* namespace.  If you are expecting some objects to be deleted
+and they aren't, check all of those locations and decide whether it
+makes sense in your case to remove those references.
 
 On the other hand, when 'git gc' runs concurrently with another process,
 there is a risk of it deleting an object that the other process is using
diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 6156609cf7..2f201d85d4 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -832,7 +832,7 @@ Hard case: The changes are not the same.::
 	This happens if the 'subsystem' rebase had conflicts, or used
 	`--interactive` to omit, edit, squash, or fixup commits; or
 	if the upstream used one of `commit --amend`, `reset`, or
-	`filter-branch`.
+	a full history rewriting command like `filter-repo`.
 
 
 The easy case
diff --git a/Documentation/git-replace.txt b/Documentation/git-replace.txt
index 246dc9943c..35595a2cd3 100644
--- a/Documentation/git-replace.txt
+++ b/Documentation/git-replace.txt
@@ -123,10 +123,10 @@ The following format are available:
 CREATING REPLACEMENT OBJECTS
 ----------------------------
 
-linkgit:git-filter-branch[1], linkgit:git-hash-object[1] and
-linkgit:git-rebase[1], among other git commands, can be used to create
-replacement objects from existing objects. The `--edit` option can
-also be used with 'git replace' to create a replacement object by
+linkgit:git-hash-object[1], linkgit:git-rebase[1], and
+linkgit:git-filter-repo[1], among other git commands, can be used to
+create replacement objects from existing objects. The `--edit` option
+can also be used with 'git replace' to create a replacement object by
 editing an existing object.
 
 If you want to replace many blobs, trees or commits that are part of a
@@ -148,8 +148,8 @@ pending objects.
 SEE ALSO
 --------
 linkgit:git-hash-object[1]
-linkgit:git-filter-branch[1]
 linkgit:git-rebase[1]
+linkgit:git-filter-repo[1]
 linkgit:git-tag[1]
 linkgit:git-branch[1]
 linkgit:git-commit[1]
diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
index 30711625fd..f2762dd5d4 100644
--- a/Documentation/git-svn.txt
+++ b/Documentation/git-svn.txt
@@ -769,9 +769,9 @@ option for (hopefully) obvious reasons.
 +
 This option is NOT recommended as it makes it difficult to track down
 old references to SVN revision numbers in existing documentation, bug
-reports and archives.  If you plan to eventually migrate from SVN to Git
+reports, and archives.  If you plan to eventually migrate from SVN to Git
 and are certain about dropping SVN history, consider
-linkgit:git-filter-branch[1] instead.  filter-branch also allows
+linkgit:git-filter-repo[1] instead.  filter-repo also allows
 reformatting of metadata for ease-of-reading and rewriting authorship
 info for non-"svn.authorsFile" users.
 
diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 82cd573776..997548f5ed 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -425,9 +425,10 @@ post-rewrite
 
 This hook is invoked by commands that rewrite commits
 (linkgit:git-commit[1] when called with `--amend` and
-linkgit:git-rebase[1]; currently `git filter-branch` does 'not' call
-it!).  Its first argument denotes the command it was invoked by:
-currently one of `amend` or `rebase`.  Further command-dependent
+linkgit:git-rebase[1]; however, full-history (re)writing tools like
+linkgit:git-fast-import[1] or linkgit:git-filter-repo[1] typically do
+not call it!).  Its first argument denotes the command it was invoked
+by: currently one of `amend` or `rebase`.  Further command-dependent
 arguments may be passed in the future.
 
 The hook receives a list of the rewritten commits on stdin, in the
diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
index a3425f4770..19333fc8df 100644
--- a/contrib/svn-fe/svn-fe.txt
+++ b/contrib/svn-fe/svn-fe.txt
@@ -56,7 +56,7 @@ line.  This line has the form `git-svn-id: URL@REVNO UUID`.
 
 The resulting repository will generally require further processing
 to put each project in its own repository and to separate the history
-of each branch.  The 'git filter-branch --subdirectory-filter' command
+of each branch.  The 'git filter-repo --subdirectory-filter' command
 may be useful for this purpose.
 
 BUGS
@@ -67,5 +67,5 @@ The exit status does not reflect whether an error was detected.
 
 SEE ALSO
 --------
-git-svn(1), svn2git(1), svk(1), git-filter-branch(1), git-fast-import(1),
+git-svn(1), svn2git(1), svk(1), git-filter-repo(1), git-fast-import(1),
 https://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
old mode 100755
new mode 100644
index 5c5afa2b98..7b1865c1d5
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -83,6 +83,19 @@ set_ident () {
 	finish_ident COMMITTER
 }
 
+if [ -z "$FILTER_BRANCH_SQUELCH_WARNING" -a \
+     -z "$GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS" ]; then
+	cat <<EOF
+WARNING: git-filter-branch has a glut of gotchas generating mangled history
+         rewrites.  Please use an alternative filtering tool such as 'git
+         filter-repo' (https://github.com/newren/git-filter-repo/) instead.
+         See the filter-branch manual page for more details; to squelch
+         this warning and pause, set FILTER_BRANCH_SQUELCH_WARNING=1.
+
+EOF
+	sleep 5
+fi
+
 USAGE="[--setup <command>] [--subdirectory-filter <directory>] [--env-filter <command>]
 	[--tree-filter <command>] [--index-filter <command>]
 	[--parent-filter <command>] [--msg-filter <command>]
-- 
2.23.0.3.gcc10030edf.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [RFC PATCH v2 4/4] Remove git-filter-branch, it is now external to git.git
  2019-08-28  0:22         ` [PATCH v2 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
                             ` (2 preceding siblings ...)
  2019-08-28  0:22           ` [PATCH v2 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
@ 2019-08-28  0:22           ` Elijah Newren
  2019-08-29  0:06           ` [PATCH v3 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
  2019-09-03 18:55           ` [PATCH v5 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
  5 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-28  0:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Elijah Newren

git-filter-branch still exists, still has the same regression tests,
etc., but it is now being tracked in a separate repo that users will
need to download separately.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 .gitignore                          |   1 -
 Documentation/git-filter-branch.txt | 464 -------------------
 Makefile                            |   1 -
 command-list.txt                    |   1 -
 git-filter-branch.sh                | 675 ----------------------------
 t/perf/p7000-filter-branch.sh       |  24 -
 t/t7003-filter-branch.sh            | 505 ---------------------
 t/t7009-filter-branch-null-sha1.sh  |  55 ---
 t/t9902-completion.sh               |  12 +-
 9 files changed, 6 insertions(+), 1732 deletions(-)
 delete mode 100644 Documentation/git-filter-branch.txt
 delete mode 100644 git-filter-branch.sh
 delete mode 100755 t/perf/p7000-filter-branch.sh
 delete mode 100755 t/t7003-filter-branch.sh
 delete mode 100755 t/t7009-filter-branch-null-sha1.sh

diff --git a/.gitignore b/.gitignore
index 521d8f4fb4..97f5d8afea 100644
--- a/.gitignore
+++ b/.gitignore
@@ -63,7 +63,6 @@
 /git-fast-import
 /git-fetch
 /git-fetch-pack
-/git-filter-branch
 /git-fmt-merge-msg
 /git-for-each-ref
 /git-format-patch
diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
deleted file mode 100644
index e4047d472e..0000000000
--- a/Documentation/git-filter-branch.txt
+++ /dev/null
@@ -1,464 +0,0 @@
-git-filter-branch(1)
-====================
-
-NAME
-----
-git-filter-branch - Rewrite branches
-
-SYNOPSIS
---------
-[verse]
-'git filter-branch' [--setup <command>] [--subdirectory-filter <directory>]
-	[--env-filter <command>] [--tree-filter <command>]
-	[--index-filter <command>] [--parent-filter <command>]
-	[--msg-filter <command>] [--commit-filter <command>]
-	[--tag-name-filter <command>] [--prune-empty]
-	[--original <namespace>] [-d <directory>] [-f | --force]
-	[--state-branch <branch>] [--] [<rev-list options>...]
-
-WARNING
--------
-'git filter-branch' has a plethora of pitfalls that can produce non-obvious
-manglings of the intended history rewrite (and can leave you with little
-time to investigate such problems since it has such abysmal performance).
-These safety and performance issues cannot be backward compatibly fixed and
-as such, its use is not recommended.  Please use an alternative history
-filtering tool such as https://github.com/newren/git-filter-repo/[git
-filter-repo].  If you still need to use 'git filter-branch', please
-carefully read the "Safety" section of the message on the Git mailing list
-https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@mail.gmail.com/[detailing
-the land mines of filter-branch] and vigilantly avoid as many of the
-hazards listed there as reasonably possible.
-
-DESCRIPTION
------------
-Lets you rewrite Git revision history by rewriting the branches mentioned
-in the <rev-list options>, applying custom filters on each revision.
-Those filters can modify each tree (e.g. removing a file or running
-a perl rewrite on all files) or information about each commit.
-Otherwise, all information (including original commit times or merge
-information) will be preserved.
-
-The command will only rewrite the _positive_ refs mentioned in the
-command line (e.g. if you pass 'a..b', only 'b' will be rewritten).
-If you specify no filters, the commits will be recommitted without any
-changes, which would normally have no effect.  Nevertheless, this may be
-useful in the future for compensating for some Git bugs or such,
-therefore such a usage is permitted.
-
-*NOTE*: This command honors `.git/info/grafts` file and refs in
-the `refs/replace/` namespace.
-If you have any grafts or replacement refs defined, running this command
-will make them permanent.
-
-*WARNING*! The rewritten history will have different object names for all
-the objects and will not converge with the original branch.  You will not
-be able to easily push and distribute the rewritten branch on top of the
-original branch.  Please do not use this command if you do not know the
-full implications, and avoid using it anyway, if a simple single commit
-would suffice to fix your problem.  (See the "RECOVERING FROM UPSTREAM
-REBASE" section in linkgit:git-rebase[1] for further information about
-rewriting published history.)
-
-Always verify that the rewritten version is correct: The original refs,
-if different from the rewritten ones, will be stored in the namespace
-'refs/original/'.
-
-Note that since this operation is very I/O expensive, it might
-be a good idea to redirect the temporary directory off-disk with the
-`-d` option, e.g. on tmpfs.  Reportedly the speedup is very noticeable.
-
-
-Filters
-~~~~~~~
-
-The filters are applied in the order as listed below.  The <command>
-argument is always evaluated in the shell context using the 'eval' command
-(with the notable exception of the commit filter, for technical reasons).
-Prior to that, the `$GIT_COMMIT` environment variable will be set to contain
-the id of the commit being rewritten.  Also, GIT_AUTHOR_NAME,
-GIT_AUTHOR_EMAIL, GIT_AUTHOR_DATE, GIT_COMMITTER_NAME, GIT_COMMITTER_EMAIL,
-and GIT_COMMITTER_DATE are taken from the current commit and exported to
-the environment, in order to affect the author and committer identities of
-the replacement commit created by linkgit:git-commit-tree[1] after the
-filters have run.
-
-If any evaluation of <command> returns a non-zero exit status, the whole
-operation will be aborted.
-
-A 'map' function is available that takes an "original sha1 id" argument
-and outputs a "rewritten sha1 id" if the commit has been already
-rewritten, and "original sha1 id" otherwise; the 'map' function can
-return several ids on separate lines if your commit filter emitted
-multiple commits.
-
-
-OPTIONS
--------
-
---setup <command>::
-	This is not a real filter executed for each commit but a one
-	time setup just before the loop. Therefore no commit-specific
-	variables are defined yet.  Functions or variables defined here
-	can be used or modified in the following filter steps except
-	the commit filter, for technical reasons.
-
---subdirectory-filter <directory>::
-	Only look at the history which touches the given subdirectory.
-	The result will contain that directory (and only that) as its
-	project root. Implies <<Remap_to_ancestor>>.
-
---env-filter <command>::
-	This filter may be used if you only need to modify the environment
-	in which the commit will be performed.  Specifically, you might
-	want to rewrite the author/committer name/email/time environment
-	variables (see linkgit:git-commit-tree[1] for details).
-
---tree-filter <command>::
-	This is the filter for rewriting the tree and its contents.
-	The argument is evaluated in shell with the working
-	directory set to the root of the checked out tree.  The new tree
-	is then used as-is (new files are auto-added, disappeared files
-	are auto-removed - neither .gitignore files nor any other ignore
-	rules *HAVE ANY EFFECT*!).
-
---index-filter <command>::
-	This is the filter for rewriting the index.  It is similar to the
-	tree filter but does not check out the tree, which makes it much
-	faster.  Frequently used with `git rm --cached
-	--ignore-unmatch ...`, see EXAMPLES below.  For hairy
-	cases, see linkgit:git-update-index[1].
-
---parent-filter <command>::
-	This is the filter for rewriting the commit's parent list.
-	It will receive the parent string on stdin and shall output
-	the new parent string on stdout.  The parent string is in
-	the format described in linkgit:git-commit-tree[1]: empty for
-	the initial commit, "-p parent" for a normal commit and
-	"-p parent1 -p parent2 -p parent3 ..." for a merge commit.
-
---msg-filter <command>::
-	This is the filter for rewriting the commit messages.
-	The argument is evaluated in the shell with the original
-	commit message on standard input; its standard output is
-	used as the new commit message.
-
---commit-filter <command>::
-	This is the filter for performing the commit.
-	If this filter is specified, it will be called instead of the
-	'git commit-tree' command, with arguments of the form
-	"<TREE_ID> [(-p <PARENT_COMMIT_ID>)...]" and the log message on
-	stdin.  The commit id is expected on stdout.
-+
-As a special extension, the commit filter may emit multiple
-commit ids; in that case, the rewritten children of the original commit will
-have all of them as parents.
-+
-You can use the 'map' convenience function in this filter, and other
-convenience functions, too.  For example, calling 'skip_commit "$@"'
-will leave out the current commit (but not its changes! If you want
-that, use 'git rebase' instead).
-+
-You can also use the `git_commit_non_empty_tree "$@"` instead of
-`git commit-tree "$@"` if you don't wish to keep commits with a single parent
-and that makes no change to the tree.
-
---tag-name-filter <command>::
-	This is the filter for rewriting tag names. When passed,
-	it will be called for every tag ref that points to a rewritten
-	object (or to a tag object which points to a rewritten object).
-	The original tag name is passed via standard input, and the new
-	tag name is expected on standard output.
-+
-The original tags are not deleted, but can be overwritten;
-use "--tag-name-filter cat" to simply update the tags.  In this
-case, be very careful and make sure you have the old tags
-backed up in case the conversion has run afoul.
-+
-Nearly proper rewriting of tag objects is supported. If the tag has
-a message attached, a new tag object will be created with the same message,
-author, and timestamp. If the tag has a signature attached, the
-signature will be stripped. It is by definition impossible to preserve
-signatures. The reason this is "nearly" proper, is because ideally if
-the tag did not change (points to the same object, has the same name, etc.)
-it should retain any signature. That is not the case, signatures will always
-be removed, buyer beware. There is also no support for changing the
-author or timestamp (or the tag message for that matter). Tags which point
-to other tags will be rewritten to point to the underlying commit.
-
---prune-empty::
-	Some filters will generate empty commits that leave the tree untouched.
-	This option instructs git-filter-branch to remove such commits if they
-	have exactly one or zero non-pruned parents; merge commits will
-	therefore remain intact.  This option cannot be used together with
-	`--commit-filter`, though the same effect can be achieved by using the
-	provided `git_commit_non_empty_tree` function in a commit filter.
-
---original <namespace>::
-	Use this option to set the namespace where the original commits
-	will be stored. The default value is 'refs/original'.
-
--d <directory>::
-	Use this option to set the path to the temporary directory used for
-	rewriting.  When applying a tree filter, the command needs to
-	temporarily check out the tree to some directory, which may consume
-	considerable space in case of large projects.  By default it
-	does this in the `.git-rewrite/` directory but you can override
-	that choice by this parameter.
-
--f::
---force::
-	'git filter-branch' refuses to start with an existing temporary
-	directory or when there are already refs starting with
-	'refs/original/', unless forced.
-
---state-branch <branch>::
-	This option will cause the mapping from old to new objects to
-	be loaded from named branch upon startup and saved as a new
-	commit to that branch upon exit, enabling incremental of large
-	trees. If '<branch>' does not exist it will be created.
-
-<rev-list options>...::
-	Arguments for 'git rev-list'.  All positive refs included by
-	these options are rewritten.  You may also specify options
-	such as `--all`, but you must use `--` to separate them from
-	the 'git filter-branch' options. Implies <<Remap_to_ancestor>>.
-
-
-[[Remap_to_ancestor]]
-Remap to ancestor
-~~~~~~~~~~~~~~~~~
-
-By using linkgit:git-rev-list[1] arguments, e.g., path limiters, you can limit the
-set of revisions which get rewritten. However, positive refs on the command
-line are distinguished: we don't let them be excluded by such limiters. For
-this purpose, they are instead rewritten to point at the nearest ancestor that
-was not excluded.
-
-
-EXIT STATUS
------------
-
-On success, the exit status is `0`.  If the filter can't find any commits to
-rewrite, the exit status is `2`.  On any other error, the exit status may be
-any other non-zero value.
-
-
-EXAMPLES
---------
-
-Suppose you want to remove a file (containing confidential information
-or copyright violation) from all commits:
-
--------------------------------------------------------
-git filter-branch --tree-filter 'rm filename' HEAD
--------------------------------------------------------
-
-However, if the file is absent from the tree of some commit,
-a simple `rm filename` will fail for that tree and commit.
-Thus you may instead want to use `rm -f filename` as the script.
-
-Using `--index-filter` with 'git rm' yields a significantly faster
-version.  Like with using `rm filename`, `git rm --cached filename`
-will fail if the file is absent from the tree of a commit.  If you
-want to "completely forget" a file, it does not matter when it entered
-history, so we also add `--ignore-unmatch`:
-
---------------------------------------------------------------------------
-git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD
---------------------------------------------------------------------------
-
-Now, you will get the rewritten history saved in HEAD.
-
-To rewrite the repository to look as if `foodir/` had been its project
-root, and discard all other history:
-
--------------------------------------------------------
-git filter-branch --subdirectory-filter foodir -- --all
--------------------------------------------------------
-
-Thus you can, e.g., turn a library subdirectory into a repository of
-its own.  Note the `--` that separates 'filter-branch' options from
-revision options, and the `--all` to rewrite all branches and tags.
-
-To set a commit (which typically is at the tip of another
-history) to be the parent of the current initial commit, in
-order to paste the other history behind the current history:
-
--------------------------------------------------------------------
-git filter-branch --parent-filter 'sed "s/^\$/-p <graft-id>/"' HEAD
--------------------------------------------------------------------
-
-(if the parent string is empty - which happens when we are dealing with
-the initial commit - add graftcommit as a parent).  Note that this assumes
-history with a single root (that is, no merge without common ancestors
-happened).  If this is not the case, use:
-
---------------------------------------------------------------------------
-git filter-branch --parent-filter \
-	'test $GIT_COMMIT = <commit-id> && echo "-p <graft-id>" || cat' HEAD
---------------------------------------------------------------------------
-
-or even simpler:
-
------------------------------------------------
-git replace --graft $commit-id $graft-id
-git filter-branch $graft-id..HEAD
------------------------------------------------
-
-To remove commits authored by "Darl McBribe" from the history:
-
-------------------------------------------------------------------------------
-git filter-branch --commit-filter '
-	if [ "$GIT_AUTHOR_NAME" = "Darl McBribe" ];
-	then
-		skip_commit "$@";
-	else
-		git commit-tree "$@";
-	fi' HEAD
-------------------------------------------------------------------------------
-
-The function 'skip_commit' is defined as follows:
-
---------------------------
-skip_commit()
-{
-	shift;
-	while [ -n "$1" ];
-	do
-		shift;
-		map "$1";
-		shift;
-	done;
-}
---------------------------
-
-The shift magic first throws away the tree id and then the -p
-parameters.  Note that this handles merges properly! In case Darl
-committed a merge between P1 and P2, it will be propagated properly
-and all children of the merge will become merge commits with P1,P2
-as their parents instead of the merge commit.
-
-*NOTE* the changes introduced by the commits, and which are not reverted
-by subsequent commits, will still be in the rewritten branch. If you want
-to throw out _changes_ together with the commits, you should use the
-interactive mode of 'git rebase'.
-
-You can rewrite the commit log messages using `--msg-filter`.  For
-example, 'git svn-id' strings in a repository created by 'git svn' can
-be removed this way:
-
--------------------------------------------------------
-git filter-branch --msg-filter '
-	sed -e "/^git-svn-id:/d"
-'
--------------------------------------------------------
-
-If you need to add 'Acked-by' lines to, say, the last 10 commits (none
-of which is a merge), use this command:
-
---------------------------------------------------------
-git filter-branch --msg-filter '
-	cat &&
-	echo "Acked-by: Bugs Bunny <bunny@bugzilla.org>"
-' HEAD~10..HEAD
---------------------------------------------------------
-
-The `--env-filter` option can be used to modify committer and/or author
-identity.  For example, if you found out that your commits have the wrong
-identity due to a misconfigured user.email, you can make a correction,
-before publishing the project, like this:
-
---------------------------------------------------------
-git filter-branch --env-filter '
-	if test "$GIT_AUTHOR_EMAIL" = "root@localhost"
-	then
-		GIT_AUTHOR_EMAIL=john@example.com
-	fi
-	if test "$GIT_COMMITTER_EMAIL" = "root@localhost"
-	then
-		GIT_COMMITTER_EMAIL=john@example.com
-	fi
-' -- --all
---------------------------------------------------------
-
-To restrict rewriting to only part of the history, specify a revision
-range in addition to the new branch name.  The new branch name will
-point to the top-most revision that a 'git rev-list' of this range
-will print.
-
-Consider this history:
-
-------------------
-     D--E--F--G--H
-    /     /
-A--B-----C
-------------------
-
-To rewrite only commits D,E,F,G,H, but leave A, B and C alone, use:
-
---------------------------------
-git filter-branch ... C..H
---------------------------------
-
-To rewrite commits E,F,G,H, use one of these:
-
-----------------------------------------
-git filter-branch ... C..H --not D
-git filter-branch ... D..H --not C
-----------------------------------------
-
-To move the whole tree into a subdirectory, or remove it from there:
-
----------------------------------------------------------------
-git filter-branch --index-filter \
-	'git ls-files -s | sed "s-\t\"*-&newsubdir/-" |
-		GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
-			git update-index --index-info &&
-	 mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD
----------------------------------------------------------------
-
-
-
-CHECKLIST FOR SHRINKING A REPOSITORY
-------------------------------------
-
-git-filter-branch can be used to get rid of a subset of files,
-usually with some combination of `--index-filter` and
-`--subdirectory-filter`.  People expect the resulting repository to
-be smaller than the original, but you need a few more steps to
-actually make it smaller, because Git tries hard not to lose your
-objects until you tell it to.  First make sure that:
-
-* You really removed all variants of a filename, if a blob was moved
-  over its lifetime.  `git log --name-only --follow --all -- filename`
-  can help you find renames.
-
-* You really filtered all refs: use `--tag-name-filter cat -- --all`
-  when calling git-filter-branch.
-
-Then there are two ways to get a smaller repository.  A safer way is
-to clone, that keeps your original intact.
-
-* Clone it with `git clone file:///path/to/repo`.  The clone
-  will not have the removed objects.  See linkgit:git-clone[1].  (Note
-  that cloning with a plain path just hardlinks everything!)
-
-If you really don't want to clone it, for whatever reasons, check the
-following points instead (in this order).  This is a very destructive
-approach, so *make a backup* or go back to cloning it.  You have been
-warned.
-
-* Remove the original refs backed up by git-filter-branch: say `git
-  for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git
-  update-ref -d`.
-
-* Expire all reflogs with `git reflog expire --expire=now --all`.
-
-* Garbage collect all unreferenced objects with `git gc --prune=now`
-  (or if your git-gc is not new enough to support arguments to
-  `--prune`, use `git repack -ad; git prune` instead).
-
-GIT
----
-Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index f9255344ae..20850def5d 100644
--- a/Makefile
+++ b/Makefile
@@ -607,7 +607,6 @@ unexport CDPATH
 
 SCRIPT_SH += git-bisect.sh
 SCRIPT_SH += git-difftool--helper.sh
-SCRIPT_SH += git-filter-branch.sh
 SCRIPT_SH += git-merge-octopus.sh
 SCRIPT_SH += git-merge-one-file.sh
 SCRIPT_SH += git-merge-resolve.sh
diff --git a/command-list.txt b/command-list.txt
index a9ac72bef4..1ba65d9516 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -90,7 +90,6 @@ git-fast-export                         ancillarymanipulators
 git-fast-import                         ancillarymanipulators
 git-fetch                               mainporcelain           remote
 git-fetch-pack                          synchingrepositories
-git-filter-branch                       ancillarymanipulators
 git-fmt-merge-msg                       purehelpers
 git-for-each-ref                        plumbinginterrogators
 git-format-patch                        mainporcelain
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
deleted file mode 100644
index 7b1865c1d5..0000000000
--- a/git-filter-branch.sh
+++ /dev/null
@@ -1,675 +0,0 @@
-#!/bin/sh
-#
-# Rewrite revision history
-# Copyright (c) Petr Baudis, 2006
-# Minimal changes to "port" it to core-git (c) Johannes Schindelin, 2007
-#
-# Lets you rewrite the revision history of the current branch, creating
-# a new branch. You can specify a number of filters to modify the commits,
-# files and trees.
-
-# The following functions will also be available in the commit filter:
-
-functions=$(cat << \EOF
-EMPTY_TREE=$(git hash-object -t tree /dev/null)
-
-warn () {
-	echo "$*" >&2
-}
-
-map()
-{
-	# if it was not rewritten, take the original
-	if test -r "$workdir/../map/$1"
-	then
-		cat "$workdir/../map/$1"
-	else
-		echo "$1"
-	fi
-}
-
-# if you run 'skip_commit "$@"' in a commit filter, it will print
-# the (mapped) parents, effectively skipping the commit.
-
-skip_commit()
-{
-	shift;
-	while [ -n "$1" ];
-	do
-		shift;
-		map "$1";
-		shift;
-	done;
-}
-
-# if you run 'git_commit_non_empty_tree "$@"' in a commit filter,
-# it will skip commits that leave the tree untouched, commit the other.
-git_commit_non_empty_tree()
-{
-	if test $# = 3 && test "$1" = $(git rev-parse "$3^{tree}"); then
-		map "$3"
-	elif test $# = 1 && test "$1" = $EMPTY_TREE; then
-		:
-	else
-		git commit-tree "$@"
-	fi
-}
-# override die(): this version puts in an extra line break, so that
-# the progress is still visible
-
-die()
-{
-	echo >&2
-	echo "$*" >&2
-	exit 1
-}
-EOF
-)
-
-eval "$functions"
-
-finish_ident() {
-	# Ensure non-empty id name.
-	echo "case \"\$GIT_$1_NAME\" in \"\") GIT_$1_NAME=\"\${GIT_$1_EMAIL%%@*}\" && export GIT_$1_NAME;; esac"
-	# And make sure everything is exported.
-	echo "export GIT_$1_NAME"
-	echo "export GIT_$1_EMAIL"
-	echo "export GIT_$1_DATE"
-}
-
-set_ident () {
-	parse_ident_from_commit author AUTHOR committer COMMITTER
-	finish_ident AUTHOR
-	finish_ident COMMITTER
-}
-
-if [ -z "$FILTER_BRANCH_SQUELCH_WARNING" -a \
-     -z "$GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS" ]; then
-	cat <<EOF
-WARNING: git-filter-branch has a glut of gotchas generating mangled history
-         rewrites.  Please use an alternative filtering tool such as 'git
-         filter-repo' (https://github.com/newren/git-filter-repo/) instead.
-         See the filter-branch manual page for more details; to squelch
-         this warning and pause, set FILTER_BRANCH_SQUELCH_WARNING=1.
-
-EOF
-	sleep 5
-fi
-
-USAGE="[--setup <command>] [--subdirectory-filter <directory>] [--env-filter <command>]
-	[--tree-filter <command>] [--index-filter <command>]
-	[--parent-filter <command>] [--msg-filter <command>]
-	[--commit-filter <command>] [--tag-name-filter <command>]
-	[--original <namespace>]
-	[-d <directory>] [-f | --force] [--state-branch <branch>]
-	[--] [<rev-list options>...]"
-
-OPTIONS_SPEC=
-. git-sh-setup
-
-if [ "$(is_bare_repository)" = false ]; then
-	require_clean_work_tree 'rewrite branches'
-fi
-
-tempdir=.git-rewrite
-filter_setup=
-filter_env=
-filter_tree=
-filter_index=
-filter_parent=
-filter_msg=cat
-filter_commit=
-filter_tag_name=
-filter_subdir=
-state_branch=
-orig_namespace=refs/original/
-force=
-prune_empty=
-remap_to_ancestor=
-while :
-do
-	case "$1" in
-	--)
-		shift
-		break
-		;;
-	--force|-f)
-		shift
-		force=t
-		continue
-		;;
-	--remap-to-ancestor)
-		# deprecated ($remap_to_ancestor is set now automatically)
-		shift
-		remap_to_ancestor=t
-		continue
-		;;
-	--prune-empty)
-		shift
-		prune_empty=t
-		continue
-		;;
-	-*)
-		;;
-	*)
-		break;
-	esac
-
-	# all switches take one argument
-	ARG="$1"
-	case "$#" in 1) usage ;; esac
-	shift
-	OPTARG="$1"
-	shift
-
-	case "$ARG" in
-	-d)
-		tempdir="$OPTARG"
-		;;
-	--setup)
-		filter_setup="$OPTARG"
-		;;
-	--subdirectory-filter)
-		filter_subdir="$OPTARG"
-		remap_to_ancestor=t
-		;;
-	--env-filter)
-		filter_env="$OPTARG"
-		;;
-	--tree-filter)
-		filter_tree="$OPTARG"
-		;;
-	--index-filter)
-		filter_index="$OPTARG"
-		;;
-	--parent-filter)
-		filter_parent="$OPTARG"
-		;;
-	--msg-filter)
-		filter_msg="$OPTARG"
-		;;
-	--commit-filter)
-		filter_commit="$functions; $OPTARG"
-		;;
-	--tag-name-filter)
-		filter_tag_name="$OPTARG"
-		;;
-	--original)
-		orig_namespace=$(expr "$OPTARG/" : '\(.*[^/]\)/*$')/
-		;;
-	--state-branch)
-		state_branch="$OPTARG"
-		;;
-	*)
-		usage
-		;;
-	esac
-done
-
-case "$prune_empty,$filter_commit" in
-,)
-	filter_commit='git commit-tree "$@"';;
-t,)
-	filter_commit="$functions;"' git_commit_non_empty_tree "$@"';;
-,*)
-	;;
-*)
-	die "Cannot set --prune-empty and --commit-filter at the same time"
-esac
-
-case "$force" in
-t)
-	rm -rf "$tempdir"
-;;
-'')
-	test -d "$tempdir" &&
-		die "$tempdir already exists, please remove it"
-esac
-orig_dir=$(pwd)
-mkdir -p "$tempdir/t" &&
-tempdir="$(cd "$tempdir"; pwd)" &&
-cd "$tempdir/t" &&
-workdir="$(pwd)" ||
-die ""
-
-# Remove tempdir on exit
-trap 'cd "$orig_dir"; rm -rf "$tempdir"' 0
-
-ORIG_GIT_DIR="$GIT_DIR"
-ORIG_GIT_WORK_TREE="$GIT_WORK_TREE"
-ORIG_GIT_INDEX_FILE="$GIT_INDEX_FILE"
-ORIG_GIT_AUTHOR_NAME="$GIT_AUTHOR_NAME"
-ORIG_GIT_AUTHOR_EMAIL="$GIT_AUTHOR_EMAIL"
-ORIG_GIT_AUTHOR_DATE="$GIT_AUTHOR_DATE"
-ORIG_GIT_COMMITTER_NAME="$GIT_COMMITTER_NAME"
-ORIG_GIT_COMMITTER_EMAIL="$GIT_COMMITTER_EMAIL"
-ORIG_GIT_COMMITTER_DATE="$GIT_COMMITTER_DATE"
-
-GIT_WORK_TREE=.
-export GIT_DIR GIT_WORK_TREE
-
-# Make sure refs/original is empty
-git for-each-ref > "$tempdir"/backup-refs || exit
-while read sha1 type name
-do
-	case "$force,$name" in
-	,$orig_namespace*)
-		die "Cannot create a new backup.
-A previous backup already exists in $orig_namespace
-Force overwriting the backup with -f"
-	;;
-	t,$orig_namespace*)
-		git update-ref -d "$name" $sha1
-	;;
-	esac
-done < "$tempdir"/backup-refs
-
-# The refs should be updated if their heads were rewritten
-git rev-parse --no-flags --revs-only --symbolic-full-name \
-	--default HEAD "$@" > "$tempdir"/raw-refs || exit
-while read ref
-do
-	case "$ref" in ^?*) continue ;; esac
-
-	if git rev-parse --verify "$ref"^0 >/dev/null 2>&1
-	then
-		echo "$ref"
-	else
-		warn "WARNING: not rewriting '$ref' (not a committish)"
-	fi
-done >"$tempdir"/heads <"$tempdir"/raw-refs
-
-test -s "$tempdir"/heads ||
-	die "You must specify a ref to rewrite."
-
-GIT_INDEX_FILE="$(pwd)/../index"
-export GIT_INDEX_FILE
-
-# map old->new commit ids for rewriting parents
-mkdir ../map || die "Could not create map/ directory"
-
-if test -n "$state_branch"
-then
-	state_commit=$(git rev-parse --no-flags --revs-only "$state_branch")
-	if test -n "$state_commit"
-	then
-		echo "Populating map from $state_branch ($state_commit)" 1>&2
-		perl -e'open(MAP, "-|", "git show $ARGV[0]:filter.map") or die;
-			while (<MAP>) {
-				m/(.*):(.*)/ or die;
-				open F, ">../map/$1" or die;
-				print F "$2" or die;
-				close(F) or die;
-			}
-			close(MAP) or die;' "$state_commit" \
-				|| die "Unable to load state from $state_branch:filter.map"
-	else
-		echo "Branch $state_branch does not exist. Will create" 1>&2
-	fi
-fi
-
-# we need "--" only if there are no path arguments in $@
-nonrevs=$(git rev-parse --no-revs "$@") || exit
-if test -z "$nonrevs"
-then
-	dashdash=--
-else
-	dashdash=
-	remap_to_ancestor=t
-fi
-
-git rev-parse --revs-only "$@" >../parse
-
-case "$filter_subdir" in
-"")
-	eval set -- "$(git rev-parse --sq --no-revs "$@")"
-	;;
-*)
-	eval set -- "$(git rev-parse --sq --no-revs "$@" $dashdash \
-		"$filter_subdir")"
-	;;
-esac
-
-git rev-list --reverse --topo-order --default HEAD \
-	--parents --simplify-merges --stdin "$@" <../parse >../revs ||
-	die "Could not get the commits"
-commits=$(wc -l <../revs | tr -d " ")
-
-test $commits -eq 0 && die_with_status 2 "Found nothing to rewrite"
-
-# Rewrite the commits
-report_progress ()
-{
-	if test -n "$progress" &&
-		test $git_filter_branch__commit_count -gt $next_sample_at
-	then
-		count=$git_filter_branch__commit_count
-
-		now=$(date +%s)
-		elapsed=$(($now - $start_timestamp))
-		remaining=$(( ($commits - $count) * $elapsed / $count ))
-		if test $elapsed -gt 0
-		then
-			next_sample_at=$(( ($elapsed + 1) * $count / $elapsed ))
-		else
-			next_sample_at=$(($next_sample_at + 1))
-		fi
-		progress=" ($elapsed seconds passed, remaining $remaining predicted)"
-	fi
-	printf "\rRewrite $commit ($count/$commits)$progress    "
-}
-
-git_filter_branch__commit_count=0
-
-progress= start_timestamp=
-if date '+%s' 2>/dev/null | grep -q '^[0-9][0-9]*$'
-then
-	next_sample_at=0
-	progress="dummy to ensure this is not empty"
-	start_timestamp=$(date '+%s')
-fi
-
-if test -n "$filter_index" ||
-   test -n "$filter_tree" ||
-   test -n "$filter_subdir"
-then
-	need_index=t
-else
-	need_index=
-fi
-
-eval "$filter_setup" < /dev/null ||
-	die "filter setup failed: $filter_setup"
-
-while read commit parents; do
-	git_filter_branch__commit_count=$(($git_filter_branch__commit_count+1))
-
-	report_progress
-	test -f "$workdir"/../map/$commit && continue
-
-	case "$filter_subdir" in
-	"")
-		if test -n "$need_index"
-		then
-			GIT_ALLOW_NULL_SHA1=1 git read-tree -i -m $commit
-		fi
-		;;
-	*)
-		# The commit may not have the subdirectory at all
-		err=$(GIT_ALLOW_NULL_SHA1=1 \
-		      git read-tree -i -m $commit:"$filter_subdir" 2>&1) || {
-			if ! git rev-parse -q --verify $commit:"$filter_subdir"
-			then
-				rm -f "$GIT_INDEX_FILE"
-			else
-				echo >&2 "$err"
-				false
-			fi
-		}
-	esac || die "Could not initialize the index"
-
-	GIT_COMMIT=$commit
-	export GIT_COMMIT
-	git cat-file commit "$commit" >../commit ||
-		die "Cannot read commit $commit"
-
-	eval "$(set_ident <../commit)" ||
-		die "setting author/committer failed for commit $commit"
-	eval "$filter_env" < /dev/null ||
-		die "env filter failed: $filter_env"
-
-	if [ "$filter_tree" ]; then
-		git checkout-index -f -u -a ||
-			die "Could not checkout the index"
-		# files that $commit removed are now still in the working tree;
-		# remove them, else they would be added again
-		git clean -d -q -f -x
-		eval "$filter_tree" < /dev/null ||
-			die "tree filter failed: $filter_tree"
-
-		(
-			git diff-index -r --name-only --ignore-submodules $commit -- &&
-			git ls-files --others
-		) > "$tempdir"/tree-state || exit
-		git update-index --add --replace --remove --stdin \
-			< "$tempdir"/tree-state || exit
-	fi
-
-	eval "$filter_index" < /dev/null ||
-		die "index filter failed: $filter_index"
-
-	parentstr=
-	for parent in $parents; do
-		for reparent in $(map "$parent"); do
-			case "$parentstr " in
-			*" -p $reparent "*)
-				;;
-			*)
-				parentstr="$parentstr -p $reparent"
-				;;
-			esac
-		done
-	done
-	if [ "$filter_parent" ]; then
-		parentstr="$(echo "$parentstr" | eval "$filter_parent")" ||
-				die "parent filter failed: $filter_parent"
-	fi
-
-	{
-		while IFS='' read -r header_line && test -n "$header_line"
-		do
-			# skip header lines...
-			:;
-		done
-		# and output the actual commit message
-		cat
-	} <../commit |
-		eval "$filter_msg" > ../message ||
-			die "msg filter failed: $filter_msg"
-
-	if test -n "$need_index"
-	then
-		tree=$(git write-tree)
-	else
-		tree=$(git rev-parse "$commit^{tree}")
-	fi
-	workdir=$workdir @SHELL_PATH@ -c "$filter_commit" "git commit-tree" \
-		"$tree" $parentstr < ../message > ../map/$commit ||
-			die "could not write rewritten commit"
-done <../revs
-
-# If we are filtering for paths, as in the case of a subdirectory
-# filter, it is possible that a specified head is not in the set of
-# rewritten commits, because it was pruned by the revision walker.
-# Ancestor remapping fixes this by mapping these heads to the unique
-# nearest ancestor that survived the pruning.
-
-if test "$remap_to_ancestor" = t
-then
-	while read ref
-	do
-		sha1=$(git rev-parse "$ref"^0)
-		test -f "$workdir"/../map/$sha1 && continue
-		ancestor=$(git rev-list --simplify-merges -1 "$ref" "$@")
-		test "$ancestor" && echo $(map $ancestor) >> "$workdir"/../map/$sha1
-	done < "$tempdir"/heads
-fi
-
-# Finally update the refs
-
-_x40='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]'
-_x40="$_x40$_x40$_x40$_x40$_x40$_x40$_x40$_x40"
-echo
-while read ref
-do
-	# avoid rewriting a ref twice
-	test -f "$orig_namespace$ref" && continue
-
-	sha1=$(git rev-parse "$ref"^0)
-	rewritten=$(map $sha1)
-
-	test $sha1 = "$rewritten" &&
-		warn "WARNING: Ref '$ref' is unchanged" &&
-		continue
-
-	case "$rewritten" in
-	'')
-		echo "Ref '$ref' was deleted"
-		git update-ref -m "filter-branch: delete" -d "$ref" $sha1 ||
-			die "Could not delete $ref"
-	;;
-	$_x40)
-		echo "Ref '$ref' was rewritten"
-		if ! git update-ref -m "filter-branch: rewrite" \
-					"$ref" $rewritten $sha1 2>/dev/null; then
-			if test $(git cat-file -t "$ref") = tag; then
-				if test -z "$filter_tag_name"; then
-					warn "WARNING: You said to rewrite tagged commits, but not the corresponding tag."
-					warn "WARNING: Perhaps use '--tag-name-filter cat' to rewrite the tag."
-				fi
-			else
-				die "Could not rewrite $ref"
-			fi
-		fi
-	;;
-	*)
-		# NEEDSWORK: possibly add -Werror, making this an error
-		warn "WARNING: '$ref' was rewritten into multiple commits:"
-		warn "$rewritten"
-		warn "WARNING: Ref '$ref' points to the first one now."
-		rewritten=$(echo "$rewritten" | head -n 1)
-		git update-ref -m "filter-branch: rewrite to first" \
-				"$ref" $rewritten $sha1 ||
-			die "Could not rewrite $ref"
-	;;
-	esac
-	git update-ref -m "filter-branch: backup" "$orig_namespace$ref" $sha1 ||
-		 exit
-done < "$tempdir"/heads
-
-# TODO: This should possibly go, with the semantics that all positive given
-#       refs are updated, and their original heads stored in refs/original/
-# Filter tags
-
-if [ "$filter_tag_name" ]; then
-	git for-each-ref --format='%(objectname) %(objecttype) %(refname)' refs/tags |
-	while read sha1 type ref; do
-		ref="${ref#refs/tags/}"
-		# XXX: Rewrite tagged trees as well?
-		if [ "$type" != "commit" -a "$type" != "tag" ]; then
-			continue;
-		fi
-
-		if [ "$type" = "tag" ]; then
-			# Dereference to a commit
-			sha1t="$sha1"
-			sha1="$(git rev-parse -q "$sha1"^{commit})" || continue
-		fi
-
-		[ -f "../map/$sha1" ] || continue
-		new_sha1="$(cat "../map/$sha1")"
-		GIT_COMMIT="$sha1"
-		export GIT_COMMIT
-		new_ref="$(echo "$ref" | eval "$filter_tag_name")" ||
-			die "tag name filter failed: $filter_tag_name"
-
-		echo "$ref -> $new_ref ($sha1 -> $new_sha1)"
-
-		if [ "$type" = "tag" ]; then
-			new_sha1=$( ( printf 'object %s\ntype commit\ntag %s\n' \
-						"$new_sha1" "$new_ref"
-				git cat-file tag "$ref" |
-				sed -n \
-				    -e '1,/^$/{
-					  /^object /d
-					  /^type /d
-					  /^tag /d
-					}' \
-				    -e '/^-----BEGIN PGP SIGNATURE-----/q' \
-				    -e 'p' ) |
-				git hash-object -t tag -w --stdin) ||
-				die "Could not create new tag object for $ref"
-			if git cat-file tag "$ref" | \
-			   sane_grep '^-----BEGIN PGP SIGNATURE-----' >/dev/null 2>&1
-			then
-				warn "gpg signature stripped from tag object $sha1t"
-			fi
-		fi
-
-		git update-ref "refs/tags/$new_ref" "$new_sha1" ||
-			die "Could not write tag $new_ref"
-	done
-fi
-
-unset GIT_DIR GIT_WORK_TREE GIT_INDEX_FILE
-unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE
-unset GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL GIT_COMMITTER_DATE
-test -z "$ORIG_GIT_DIR" || {
-	GIT_DIR="$ORIG_GIT_DIR" && export GIT_DIR
-}
-test -z "$ORIG_GIT_WORK_TREE" || {
-	GIT_WORK_TREE="$ORIG_GIT_WORK_TREE" &&
-	export GIT_WORK_TREE
-}
-test -z "$ORIG_GIT_INDEX_FILE" || {
-	GIT_INDEX_FILE="$ORIG_GIT_INDEX_FILE" &&
-	export GIT_INDEX_FILE
-}
-test -z "$ORIG_GIT_AUTHOR_NAME" || {
-	GIT_AUTHOR_NAME="$ORIG_GIT_AUTHOR_NAME" &&
-	export GIT_AUTHOR_NAME
-}
-test -z "$ORIG_GIT_AUTHOR_EMAIL" || {
-	GIT_AUTHOR_EMAIL="$ORIG_GIT_AUTHOR_EMAIL" &&
-	export GIT_AUTHOR_EMAIL
-}
-test -z "$ORIG_GIT_AUTHOR_DATE" || {
-	GIT_AUTHOR_DATE="$ORIG_GIT_AUTHOR_DATE" &&
-	export GIT_AUTHOR_DATE
-}
-test -z "$ORIG_GIT_COMMITTER_NAME" || {
-	GIT_COMMITTER_NAME="$ORIG_GIT_COMMITTER_NAME" &&
-	export GIT_COMMITTER_NAME
-}
-test -z "$ORIG_GIT_COMMITTER_EMAIL" || {
-	GIT_COMMITTER_EMAIL="$ORIG_GIT_COMMITTER_EMAIL" &&
-	export GIT_COMMITTER_EMAIL
-}
-test -z "$ORIG_GIT_COMMITTER_DATE" || {
-	GIT_COMMITTER_DATE="$ORIG_GIT_COMMITTER_DATE" &&
-	export GIT_COMMITTER_DATE
-}
-
-if test -n "$state_branch"
-then
-	echo "Saving rewrite state to $state_branch" 1>&2
-	state_blob=$(
-		perl -e'opendir D, "../map" or die;
-			open H, "|-", "git hash-object -w --stdin" or die;
-			foreach (sort readdir(D)) {
-				next if m/^\.\.?$/;
-				open F, "<../map/$_" or die;
-				chomp($f = <F>);
-				print H "$_:$f\n" or die;
-			}
-			close(H) or die;' || die "Unable to save state")
-	state_tree=$(printf '100644 blob %s\tfilter.map\n' "$state_blob" | git mktree)
-	if test -n "$state_commit"
-	then
-		state_commit=$(echo "Sync" | git commit-tree "$state_tree" -p "$state_commit")
-	else
-		state_commit=$(echo "Sync" | git commit-tree "$state_tree" )
-	fi
-	git update-ref "$state_branch" "$state_commit"
-fi
-
-cd "$orig_dir"
-rm -rf "$tempdir"
-
-trap - 0
-
-if [ "$(is_bare_repository)" = false ]; then
-	git read-tree -u -m HEAD || exit
-fi
-
-exit 0
diff --git a/t/perf/p7000-filter-branch.sh b/t/perf/p7000-filter-branch.sh
deleted file mode 100755
index b029586ccb..0000000000
--- a/t/perf/p7000-filter-branch.sh
+++ /dev/null
@@ -1,24 +0,0 @@
-#!/bin/sh
-
-test_description='performance of filter-branch'
-. ./perf-lib.sh
-
-test_perf_default_repo
-test_checkout_worktree
-
-test_expect_success 'mark bases for tests' '
-	git tag -f tip &&
-	git tag -f base HEAD~100
-'
-
-test_perf 'noop filter' '
-	git checkout --detach tip &&
-	git filter-branch -f base..HEAD
-'
-
-test_perf 'noop prune-empty' '
-	git checkout --detach tip &&
-	git filter-branch -f --prune-empty base..HEAD
-'
-
-test_done
diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh
deleted file mode 100755
index e23de7d0b5..0000000000
--- a/t/t7003-filter-branch.sh
+++ /dev/null
@@ -1,505 +0,0 @@
-#!/bin/sh
-
-test_description='git filter-branch'
-. ./test-lib.sh
-. "$TEST_DIRECTORY/lib-gpg.sh"
-
-test_expect_success 'setup' '
-	test_commit A &&
-	GIT_COMMITTER_DATE="@0 +0000" GIT_AUTHOR_DATE="@0 +0000" &&
-	test_commit --notick B &&
-	git checkout -b branch B &&
-	test_commit D &&
-	mkdir dir &&
-	test_commit dir/D &&
-	test_commit E &&
-	git checkout master &&
-	test_commit C &&
-	git checkout branch &&
-	git merge C &&
-	git tag F &&
-	test_commit G &&
-	test_commit H
-'
-# * (HEAD, branch) H
-# * G
-# *   Merge commit 'C' into branch
-# |\
-# | * (master) C
-# * | E
-# * | dir/D
-# * | D
-# |/
-# * B
-# * A
-
-
-H=$(git rev-parse H)
-
-test_expect_success 'rewrite identically' '
-	git filter-branch branch
-'
-test_expect_success 'result is really identical' '
-	test $H = $(git rev-parse HEAD)
-'
-
-test_expect_success 'rewrite bare repository identically' '
-	(git config core.bare true && cd .git &&
-	 git filter-branch branch > filter-output 2>&1 &&
-	! fgrep fatal filter-output)
-'
-git config core.bare false
-test_expect_success 'result is really identical' '
-	test $H = $(git rev-parse HEAD)
-'
-
-TRASHDIR=$(pwd)
-test_expect_success 'correct GIT_DIR while using -d' '
-	mkdir drepo &&
-	( cd drepo &&
-	git init &&
-	test_commit drepo &&
-	git filter-branch -d "$TRASHDIR/dfoo" \
-		--index-filter "cp \"$TRASHDIR\"/dfoo/backup-refs \"$TRASHDIR\"" \
-	) &&
-	grep drepo "$TRASHDIR/backup-refs"
-'
-
-test_expect_success 'tree-filter works with -d' '
-	git init drepo-tree &&
-	(
-		cd drepo-tree &&
-		test_commit one &&
-		git filter-branch -d "$TRASHDIR/dfoo" \
-			--tree-filter "echo changed >one.t" &&
-		echo changed >expect &&
-		git cat-file blob HEAD:one.t >actual &&
-		test_cmp expect actual &&
-		test_cmp one.t actual
-	)
-'
-
-test_expect_success 'Fail if commit filter fails' '
-	test_must_fail git filter-branch -f --commit-filter "exit 1" HEAD
-'
-
-test_expect_success 'rewrite, renaming a specific file' '
-	git filter-branch -f --tree-filter "mv D.t doh || :" HEAD
-'
-
-test_expect_success 'test that the file was renamed' '
-	test D = "$(git show HEAD:doh --)" &&
-	! test -f D.t &&
-	test -f doh &&
-	test D = "$(cat doh)"
-'
-
-test_expect_success 'rewrite, renaming a specific directory' '
-	git filter-branch -f --tree-filter "mv dir diroh || :" HEAD
-'
-
-test_expect_success 'test that the directory was renamed' '
-	test dir/D = "$(git show HEAD:diroh/D.t --)" &&
-	! test -d dir &&
-	test -d diroh &&
-	! test -d diroh/dir &&
-	test -f diroh/D.t &&
-	test dir/D = "$(cat diroh/D.t)"
-'
-
-V=$(git rev-parse HEAD)
-
-test_expect_success 'populate --state-branch' '
-	git filter-branch --state-branch state -f --tree-filter "touch file || :" HEAD
-'
-
-W=$(git rev-parse HEAD)
-
-test_expect_success 'using --state-branch to skip already rewritten commits' '
-	test_when_finished git reset --hard $V &&
-	git reset --hard $V &&
-	git filter-branch --state-branch state -f --tree-filter "touch file || :" HEAD &&
-	test_cmp_rev $W HEAD
-'
-
-git tag oldD HEAD~4
-test_expect_success 'rewrite one branch, keeping a side branch' '
-	git branch modD oldD &&
-	git filter-branch -f --tree-filter "mv B.t boh || :" D..modD
-'
-
-test_expect_success 'common ancestor is still common (unchanged)' '
-	test "$(git merge-base modD D)" = "$(git rev-parse B)"
-'
-
-test_expect_success 'filter subdirectory only' '
-	mkdir subdir &&
-	touch subdir/new &&
-	git add subdir/new &&
-	test_tick &&
-	git commit -m "subdir" &&
-	echo H > A.t &&
-	test_tick &&
-	git commit -m "not subdir" A.t &&
-	echo A > subdir/new &&
-	test_tick &&
-	git commit -m "again subdir" subdir/new &&
-	git rm A.t &&
-	test_tick &&
-	git commit -m "again not subdir" &&
-	git branch sub &&
-	git branch sub-earlier HEAD~2 &&
-	git filter-branch -f --subdirectory-filter subdir \
-		refs/heads/sub refs/heads/sub-earlier
-'
-
-test_expect_success 'subdirectory filter result looks okay' '
-	test 2 = $(git rev-list sub | wc -l) &&
-	git show sub:new &&
-	test_must_fail git show sub:subdir &&
-	git show sub-earlier:new &&
-	test_must_fail git show sub-earlier:subdir
-'
-
-test_expect_success 'more setup' '
-	git checkout master &&
-	mkdir subdir &&
-	echo A > subdir/new &&
-	git add subdir/new &&
-	test_tick &&
-	git commit -m "subdir on master" subdir/new &&
-	git rm A.t &&
-	test_tick &&
-	git commit -m "again subdir on master" &&
-	git merge branch
-'
-
-test_expect_success 'use index-filter to move into a subdirectory' '
-	git branch directorymoved &&
-	git filter-branch -f --index-filter \
-		 "git ls-files -s | sed \"s-	-&newsubdir/-\" |
-	          GIT_INDEX_FILE=\$GIT_INDEX_FILE.new \
-			git update-index --index-info &&
-		  mv \"\$GIT_INDEX_FILE.new\" \"\$GIT_INDEX_FILE\"" directorymoved &&
-	git diff --exit-code HEAD directorymoved:newsubdir
-'
-
-test_expect_success 'stops when msg filter fails' '
-	old=$(git rev-parse HEAD) &&
-	test_must_fail git filter-branch -f --msg-filter false HEAD &&
-	test $old = $(git rev-parse HEAD) &&
-	rm -rf .git-rewrite
-'
-
-test_expect_success 'author information is preserved' '
-	: > i &&
-	git add i &&
-	test_tick &&
-	GIT_AUTHOR_NAME="B V Uips" git commit -m bvuips &&
-	git branch preserved-author &&
-	(sane_unset GIT_AUTHOR_NAME &&
-	 git filter-branch -f --msg-filter "cat; \
-			test \$GIT_COMMIT != $(git rev-parse master) || \
-			echo Hallo" \
-		preserved-author) &&
-	git rev-list --author="B V Uips" preserved-author >actual &&
-	test_line_count = 1 actual
-'
-
-test_expect_success "remove a certain author's commits" '
-	echo i > i &&
-	test_tick &&
-	git commit -m i i &&
-	git branch removed-author &&
-	git filter-branch -f --commit-filter "\
-		if [ \"\$GIT_AUTHOR_NAME\" = \"B V Uips\" ];\
-		then\
-			skip_commit \"\$@\";
-		else\
-			git commit-tree \"\$@\";\
-		fi" removed-author &&
-	cnt1=$(git rev-list master | wc -l) &&
-	cnt2=$(git rev-list removed-author | wc -l) &&
-	test $cnt1 -eq $(($cnt2 + 1)) &&
-	git rev-list --author="B V Uips" removed-author >actual &&
-	test_line_count = 0 actual
-'
-
-test_expect_success 'barf on invalid name' '
-	test_must_fail git filter-branch -f master xy-problem &&
-	test_must_fail git filter-branch -f HEAD^
-'
-
-test_expect_success '"map" works in commit filter' '
-	git filter-branch -f --commit-filter "\
-		parent=\$(git rev-parse \$GIT_COMMIT^) &&
-		mapped=\$(map \$parent) &&
-		actual=\$(echo \"\$@\" | sed \"s/^.*-p //\") &&
-		test \$mapped = \$actual &&
-		git commit-tree \"\$@\";" master~2..master &&
-	git rev-parse --verify master
-'
-
-test_expect_success 'Name needing quotes' '
-
-	git checkout -b rerere A &&
-	mkdir foo &&
-	name="れれれ" &&
-	>foo/$name &&
-	git add foo &&
-	git commit -m "Adding a file" &&
-	git filter-branch --tree-filter "rm -fr foo" &&
-	test_must_fail git ls-files --error-unmatch "foo/$name" &&
-	test $(git rev-parse --verify rerere) != $(git rev-parse --verify A)
-
-'
-
-test_expect_success 'Subdirectory filter with disappearing trees' '
-	git reset --hard &&
-	git checkout master &&
-
-	mkdir foo &&
-	touch foo/bar &&
-	git add foo &&
-	test_tick &&
-	git commit -m "Adding foo" &&
-
-	git rm -r foo &&
-	test_tick &&
-	git commit -m "Removing foo" &&
-
-	mkdir foo &&
-	touch foo/bar &&
-	git add foo &&
-	test_tick &&
-	git commit -m "Re-adding foo" &&
-
-	git filter-branch -f --subdirectory-filter foo &&
-	git rev-list master >actual &&
-	test_line_count = 3 actual
-'
-
-test_expect_success 'Tag name filtering retains tag message' '
-	git tag -m atag T &&
-	git cat-file tag T > expect &&
-	git filter-branch -f --tag-name-filter cat &&
-	git cat-file tag T > actual &&
-	test_cmp expect actual
-'
-
-faux_gpg_tag='object XXXXXX
-type commit
-tag S
-tagger T A Gger <tagger@example.com> 1206026339 -0500
-
-This is a faux gpg signed tag.
------BEGIN PGP SIGNATURE-----
-Version: FauxGPG v0.0.0 (FAUX/Linux)
-
-gdsfoewhxu/6l06f1kxyxhKdZkrcbaiOMtkJUA9ITAc1mlamh0ooasxkH1XwMbYQ
-acmwXaWET20H0GeAGP+7vow=
-=agpO
------END PGP SIGNATURE-----
-'
-test_expect_success 'Tag name filtering strips gpg signature' '
-	sha1=$(git rev-parse HEAD) &&
-	sha1t=$(echo "$faux_gpg_tag" | sed -e s/XXXXXX/$sha1/ | git mktag) &&
-	git update-ref "refs/tags/S" "$sha1t" &&
-	echo "$faux_gpg_tag" | sed -e s/XXXXXX/$sha1/ | head -n 6 > expect &&
-	git filter-branch -f --tag-name-filter cat &&
-	git cat-file tag S > actual &&
-	test_cmp expect actual
-'
-
-test_expect_success GPG 'Filtering retains message of gpg signed commit' '
-	mkdir gpg &&
-	touch gpg/foo &&
-	git add gpg &&
-	test_tick &&
-	git commit -S -m "Adding gpg" &&
-
-	git log -1 --format="%s" > expect &&
-	git filter-branch -f --msg-filter "cat" &&
-	git log -1 --format="%s" > actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'Tag name filtering allows slashes in tag names' '
-	git tag -m tag-with-slash X/1 &&
-	git cat-file tag X/1 | sed -e s,X/1,X/2, > expect &&
-	git filter-branch -f --tag-name-filter "echo X/2" &&
-	git cat-file tag X/2 > actual &&
-	test_cmp expect actual
-'
-test_expect_success 'setup --prune-empty comparisons' '
-	git checkout --orphan master-no-a &&
-	git rm -rf . &&
-	unset test_tick &&
-	test_tick &&
-	GIT_COMMITTER_DATE="@0 +0000" GIT_AUTHOR_DATE="@0 +0000" &&
-	test_commit --notick B B.t B Bx &&
-	git checkout -b branch-no-a Bx &&
-	test_commit D D.t D Dx &&
-	mkdir dir &&
-	test_commit dir/D dir/D.t dir/D dir/Dx &&
-	test_commit E E.t E Ex &&
-	git checkout master-no-a &&
-	test_commit C C.t C Cx &&
-	git checkout branch-no-a &&
-	git merge Cx -m "Merge tag '\''C'\'' into branch" &&
-	git tag Fx &&
-	test_commit G G.t G Gx &&
-	test_commit H H.t H Hx &&
-	git checkout branch
-'
-
-test_expect_success 'Prune empty commits' '
-	git rev-list HEAD > expect &&
-	test_commit to_remove &&
-	git filter-branch -f --index-filter "git update-index --remove to_remove.t" --prune-empty HEAD &&
-	git rev-list HEAD > actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'prune empty collapsed merges' '
-	test_config merge.ff false &&
-	git rev-list HEAD >expect &&
-	test_commit to_remove_2 &&
-	git reset --hard HEAD^ &&
-	test_merge non-ff to_remove_2 &&
-	git filter-branch -f --index-filter "git update-index --remove to_remove_2.t" --prune-empty HEAD &&
-	git rev-list HEAD >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'prune empty works even without index/tree filters' '
-	git rev-list HEAD >expect &&
-	git commit --allow-empty -m empty &&
-	git filter-branch -f --prune-empty HEAD &&
-	git rev-list HEAD >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success '--prune-empty is able to prune root commit' '
-	git rev-list branch-no-a >expect &&
-	git branch testing H &&
-	git filter-branch -f --prune-empty --index-filter "git update-index --remove A.t" testing &&
-	git rev-list testing >actual &&
-	git branch -D testing &&
-	test_cmp expect actual
-'
-
-test_expect_success '--prune-empty is able to prune entire branch' '
-	git branch prune-entire B &&
-	git filter-branch -f --prune-empty --index-filter "git update-index --remove A.t B.t" prune-entire &&
-	test_path_is_missing .git/refs/heads/prune-entire &&
-	test_must_fail git reflog exists refs/heads/prune-entire
-'
-
-test_expect_success '--remap-to-ancestor with filename filters' '
-	git checkout master &&
-	git reset --hard A &&
-	test_commit add-foo foo 1 &&
-	git branch moved-foo &&
-	test_commit add-bar bar a &&
-	git branch invariant &&
-	orig_invariant=$(git rev-parse invariant) &&
-	git branch moved-bar &&
-	test_commit change-foo foo 2 &&
-	git filter-branch -f --remap-to-ancestor \
-		moved-foo moved-bar A..master \
-		-- -- foo &&
-	test $(git rev-parse moved-foo) = $(git rev-parse moved-bar) &&
-	test $(git rev-parse moved-foo) = $(git rev-parse master^) &&
-	test $orig_invariant = $(git rev-parse invariant)
-'
-
-test_expect_success 'automatic remapping to ancestor with filename filters' '
-	git checkout master &&
-	git reset --hard A &&
-	test_commit add-foo2 foo 1 &&
-	git branch moved-foo2 &&
-	test_commit add-bar2 bar a &&
-	git branch invariant2 &&
-	orig_invariant=$(git rev-parse invariant2) &&
-	git branch moved-bar2 &&
-	test_commit change-foo2 foo 2 &&
-	git filter-branch -f \
-		moved-foo2 moved-bar2 A..master \
-		-- -- foo &&
-	test $(git rev-parse moved-foo2) = $(git rev-parse moved-bar2) &&
-	test $(git rev-parse moved-foo2) = $(git rev-parse master^) &&
-	test $orig_invariant = $(git rev-parse invariant2)
-'
-
-test_expect_success 'setup submodule' '
-	rm -fr ?* .git &&
-	git init &&
-	test_commit file &&
-	mkdir submod &&
-	submodurl="$PWD/submod" &&
-	( cd submod &&
-	  git init &&
-	  test_commit file-in-submod ) &&
-	git submodule add "$submodurl" &&
-	git commit -m "added submodule" &&
-	test_commit add-file &&
-	( cd submod && test_commit add-in-submodule ) &&
-	git add submod &&
-	git commit -m "changed submodule" &&
-	git branch original HEAD
-'
-
-orig_head=$(git show-ref --hash --head HEAD)
-
-test_expect_success 'rewrite submodule with another content' '
-	git filter-branch --tree-filter "test -d submod && {
-					 rm -rf submod &&
-					 git rm -rf --quiet submod &&
-					 mkdir submod &&
-					 : > submod/file
-					 } || :" HEAD &&
-	test $orig_head != $(git show-ref --hash --head HEAD)
-'
-
-test_expect_success 'replace submodule revision' '
-	git reset --hard original &&
-	git filter-branch -f --tree-filter \
-	    "if git ls-files --error-unmatch -- submod > /dev/null 2>&1
-	     then git update-index --cacheinfo 160000 0123456789012345678901234567890123456789 submod
-	     fi" HEAD &&
-	test $orig_head != $(git show-ref --hash --head HEAD)
-'
-
-test_expect_success 'filter commit message without trailing newline' '
-	git reset --hard original &&
-	commit=$(printf "no newline" | git commit-tree HEAD^{tree}) &&
-	git update-ref refs/heads/no-newline $commit &&
-	git filter-branch -f refs/heads/no-newline &&
-	echo $commit >expect &&
-	git rev-parse refs/heads/no-newline >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'tree-filter deals with object name vs pathname ambiguity' '
-	test_when_finished "git reset --hard original" &&
-	ambiguous=$(git rev-list -1 HEAD) &&
-	git filter-branch --tree-filter "mv file.t $ambiguous" HEAD^.. &&
-	git show HEAD:$ambiguous
-'
-
-test_expect_success 'rewrite repository including refs that point at non-commit object' '
-	test_when_finished "git reset --hard original" &&
-	tree=$(git rev-parse HEAD^{tree}) &&
-	test_when_finished "git replace -d $tree" &&
-	echo A >new &&
-	git add new &&
-	new_tree=$(git write-tree) &&
-	git replace $tree $new_tree &&
-	git tag -a -m "tag to a tree" treetag $new_tree &&
-	git reset --hard HEAD &&
-	git filter-branch -f -- --all >filter-output 2>&1 &&
-	! fgrep fatal filter-output
-'
-
-test_done
diff --git a/t/t7009-filter-branch-null-sha1.sh b/t/t7009-filter-branch-null-sha1.sh
deleted file mode 100755
index 9ba9f24ad2..0000000000
--- a/t/t7009-filter-branch-null-sha1.sh
+++ /dev/null
@@ -1,55 +0,0 @@
-#!/bin/sh
-
-test_description='filter-branch removal of trees with null sha1'
-. ./test-lib.sh
-
-test_expect_success 'setup: base commits' '
-	test_commit one &&
-	test_commit two &&
-	test_commit three
-'
-
-test_expect_success 'setup: a commit with a bogus null sha1 in the tree' '
-	{
-		git ls-tree HEAD &&
-		printf "160000 commit $ZERO_OID\\tbroken\\n"
-	} >broken-tree &&
-	echo "add broken entry" >msg &&
-
-	tree=$(git mktree <broken-tree) &&
-	test_tick &&
-	commit=$(git commit-tree $tree -p HEAD <msg) &&
-	git update-ref HEAD "$commit"
-'
-
-# we have to make one more commit on top removing the broken
-# entry, since otherwise our index does not match HEAD (and filter-branch will
-# complain). We could make the index match HEAD, but doing so would involve
-# writing a null sha1 into the index.
-test_expect_success 'setup: bring HEAD and index in sync' '
-	test_tick &&
-	git commit -a -m "back to normal"
-'
-
-test_expect_success 'noop filter-branch complains' '
-	test_must_fail git filter-branch \
-		--force --prune-empty \
-		--index-filter "true"
-'
-
-test_expect_success 'filter commands are still checked' '
-	test_must_fail git filter-branch \
-		--force --prune-empty \
-		--index-filter "git rm --cached --ignore-unmatch three.t"
-'
-
-test_expect_success 'removing the broken entry works' '
-	echo three >expect &&
-	git filter-branch \
-		--force --prune-empty \
-		--index-filter "git rm --cached --ignore-unmatch broken" &&
-	git log -1 --format=%s >actual &&
-	test_cmp expect actual
-'
-
-test_done
diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index 75512c3403..4e7f669c76 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -28,10 +28,10 @@ complete ()
 #
 # (2) A test makes sure that common subcommands are included in the
 #     completion for "git <TAB>", and a plumbing is excluded.  "add",
-#     "filter-branch" and "ls-files" are listed for this.
+#     "rebase" and "ls-files" are listed for this.
 
-GIT_TESTING_ALL_COMMAND_LIST='add checkout check-attr filter-branch ls-files'
-GIT_TESTING_PORCELAIN_COMMAND_LIST='add checkout filter-branch'
+GIT_TESTING_ALL_COMMAND_LIST='add checkout check-attr rebase ls-files'
+GIT_TESTING_PORCELAIN_COMMAND_LIST='add checkout rebase'
 
 . "$GIT_BUILD_DIR/contrib/completion/git-completion.bash"
 
@@ -1392,12 +1392,12 @@ test_expect_success 'basic' '
 	# built-in
 	grep -q "^add \$" out &&
 	# script
-	grep -q "^filter-branch \$" out &&
+	grep -q "^rebase \$" out &&
 	# plumbing
 	! grep -q "^ls-files \$" out &&
 
-	run_completion "git f" &&
-	! grep -q -v "^f" out
+	run_completion "git r" &&
+	! grep -q -v "^r" out
 '
 
 test_expect_success 'double dash "git" itself' '
-- 
2.23.0.3.gcc10030edf.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 2/4] t3427: accelerate this test by using fast-export and fast-import
  2019-08-28  0:22           ` [PATCH v2 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
@ 2019-08-28  6:00             ` Eric Sunshine
  0 siblings, 0 replies; 73+ messages in thread
From: Eric Sunshine @ 2019-08-28  6:00 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Git List, Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder

On Tue, Aug 27, 2019 at 8:22 PM Elijah Newren <newren@gmail.com> wrote:
> fast-export and fast-import can easily handle the simple rewrite that
> was being done by filter-branch, and should be significantly faster on
> systems with a slow fork.  Timings from before and after on two laptops
> that I have access to (measured via `time ./t3427-rebase-subtree.sh`,
> i.e. including everything in this test -- not just the filter-branch or
> fast-export/fast-import pair):
> [...]
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
> diff --git a/t/t3427-rebase-subtree.sh b/t/t3427-rebase-subtree.sh
> @@ -11,6 +11,12 @@ commit_message() {
> +extract_files_subtree() {

Style nit: add space before opening '('

(However, commit_message() function just above this doesn't follow
that style, so...)

> +       git fast-export --no-data HEAD -- files_subtree/ \
> +               | sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" \
> +               | git fast-import --force --quiet

This would be a bit less noisy if you ended each line with the pipe
operator, allowing you to drop the backslashes:

    git fast-export --no-data HEAD -- files_subtree/ |
        sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" |
        git fast-import --force --quiet

> +}

Not sure any of this is worth a re-roll.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 3/4] Recommend git-filter-repo instead of git-filter-branch
  2019-08-28  0:22           ` [PATCH v2 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
@ 2019-08-28  6:17             ` Eric Sunshine
  2019-08-28 21:48               ` Elijah Newren
  0 siblings, 1 reply; 73+ messages in thread
From: Eric Sunshine @ 2019-08-28  6:17 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Git List, Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder

On Tue, Aug 27, 2019 at 8:22 PM Elijah Newren <newren@gmail.com> wrote:
> filter-branch suffers from a deluge of disguised dangers that disfigure
> history rewrites (i.e. deviate from the deliberate changes). [...]
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
> diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
> @@ -16,6 +16,20 @@ SYNOPSIS
> +WARNING
> +-------
> +'git filter-branch' has a plethora of pitfalls that can produce non-obvious
> +manglings of the intended history rewrite (and can leave you with little
> +time to investigate such problems since it has such abysmal performance).
> +These safety and performance issues cannot be backward compatibly fixed and
> +as such, its use is not recommended.  Please use an alternative history
> +filtering tool such as https://github.com/newren/git-filter-repo/[git
> +filter-repo].  If you still need to use 'git filter-branch', please
> +carefully read the "Safety" section of the message on the Git mailing list
> +https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@mail.gmail.com/[detailing
> +the land mines of filter-branch] and vigilantly avoid as many of the
> +hazards listed there as reasonably possible.

Is there a good reason to not simply copy the "Safety" section from
that email directly into this document so that readers don't have to
go chasing down the link (especially those who are reading
documentation offline)?

> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> @@ -832,7 +832,7 @@ Hard case: The changes are not the same.::
>         This happens if the 'subsystem' rebase had conflicts, or used
>         `--interactive` to omit, edit, squash, or fixup commits; or
>         if the upstream used one of `commit --amend`, `reset`, or
> -       `filter-branch`.
> +       a full history rewriting command like `filter-repo`.

Do we want a clickable link to `filter-repo` here?

> diff --git a/Documentation/git-replace.txt b/Documentation/git-replace.txt
> @@ -123,10 +123,10 @@ The following format are available:
> +linkgit:git-hash-object[1], linkgit:git-rebase[1], and
> +linkgit:git-filter-repo[1], among other git commands, can be used to
> [...]
> @@ -148,8 +148,8 @@ pending objects.
>  linkgit:git-hash-object[1]
>  linkgit:git-rebase[1]
> +linkgit:git-filter-repo[1]

Are these 'linkgit:' references to `filter-repo` going to be
meaningful if that tool is not incorporated into the Git project
proper? Perhaps use a generic clickable link instead.

Same comment applies to other 'linkgit:' invocations in the remainder
of the patch.

> diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> old mode 100755
> new mode 100644

Why lose the executable bit?

> @@ -83,6 +83,19 @@ set_ident () {
> +if [ -z "$FILTER_BRANCH_SQUELCH_WARNING" -a \
> +     -z "$GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS" ]; then

If this script didn't already have a mix of styles, I'd say something
about modern style being:

    if test -z "$FILTER_BRANCH_SQUELCH_WARNING" &&
        test -z "$GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS"
    then
        ...
    fi

> +       cat <<EOF
> +WARNING: git-filter-branch has a glut of gotchas generating mangled history
> +         rewrites.  Please use an alternative filtering tool such as 'git
> +         filter-repo' (https://github.com/newren/git-filter-repo/) instead.
> +         See the filter-branch manual page for more details; to squelch
> +         this warning and pause, set FILTER_BRANCH_SQUELCH_WARNING=1.

The "and pause" threw me. There's more than a bit of ambiguity
surrounding it. Perhaps drop it?

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere
  2019-08-27 22:18             ` Elijah Newren
@ 2019-08-28  8:52               ` Sergey Organov
  2019-08-28 17:16                 ` Elijah Newren
  0 siblings, 1 reply; 73+ messages in thread
From: Sergey Organov @ 2019-08-28  8:52 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Eric Wong, Git Mailing List, Junio C Hamano, Derrick Stolee,
	Jeff King, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Lars Schneider, Jonathan Nieder

Elijah Newren <newren@gmail.com> writes:

> On Tue, Aug 27, 2019 at 1:43 AM Sergey Organov <sorganov@gmail.com> wrote:
>>
>> Eric Wong <e@80x24.org> writes:
>>
>>
>> [...]
>>
>> > AFAIK, filter-branch is not causing support headaches for any
>> > git developers today.  With so many commands in git, it's
>> > unlikely newbies will ever get around to discover it :)
>> > So I think think we should be in any rush to remove it.
>>
>> Nah, discovering it is simple. Just Google for "git change author". That
>> eventually leads to a script that uses "git filter-branch --env-filter"
>> to get the job done, and I'm afraid it is spread all over the world.
>>
>> See, e.g.:
>>
>> https://help.github.com/en/articles/changing-author-info
>
> Side note: Is the goal to "fix names and email addresses in this
> repository"?  If so, this guide fails: it doesn't update tagger names
> or email addresses.  Indeed, filter-branch doesn't provide a way to do
> that.  (Not to mention other problems like not updating references to
> commit hashes in commit messages when it busy rewriting everything.)

No. Maybe the original goal was like that, by I, personally, use
modified version of this to change my "Author" credentials from
"internal" to "public" in branches that I'm going to send upstream, so
the actual aim is to change e-mail of particular Author from a@b to c@d
in all the commits in a (feature) branch.

>
>> > But I agree that filter-branch isn't useful and certainly
>> > shouldn't be encouraged/promoted.
>>
>> Well, is there more suitable way to change author for a (large) set of
>> commits then?
>
> I would say yes, use git filter-repo (note that this thread started
> with me proposing filter-repo for inclusion in git.git -- and getting
> suggestions that we should remove stuff instead of adding more stuff).
> I'm biased, but I think it's much better at this particular job as
> well:

Well, I don't want to change the entire repo, and I don't immediately
see how to do it with git filter-repo. Is it at all possible?

> You can create a mailmap file and pass it to the --mailmap option to
> git-filter-repo.
>
> Or, if you prefer (perhaps you don't like git's mailmap format as used
> by shortlog and now log, or perhaps you really want to be able to do
> regex replacement or something), you can use the --name-callback or
> --email-callback to work on those fields more directly.
>
> Or, if you prefer (e.g. you want to handle author vs. committer vs.
> tagger differently), you can use the --commit-callback and
> --tag-callback filters.
>
> As an added bonus, filter-repo will also perform the rewrite far
> faster than filter-branch (and rewrite commit hashes in commit
> messages as alluded to above).

These things are nice to have indeed, but it always changes the entire
repo, right? If so, it's not a suitable substitute for git-filter-branch
for particular job at hand.

Actually, I'd rather expect some support for this in "git rebase", being
git history editing/reshaping tool, but it looks like it only has it in
the form that is very difficult to automate.

>
>> > Yet there's probably still users which ARE happy with it, that
>> > will never hit the edge cases and problems it poses; and will
>> > never read release notes.  And said users are probably getting
>> > git from a slow-moving distro, so it'd be a disservice to them
>> > if they lost a tool they depend on without any warning.
>>
>> Personally, I'm far from happy with it, but I have no clue how to
>> substitute it in the job above. Anybody?
>
> The start of this thread where I proposed git filter-repo for
> inclusion in git[1] had links to documentation and comparisons to
> other tools and such.  You may find those links helpful; if not, let
> me know what needs to be fixed in the documentation.

Thank you for the references, I find it a very nice tool to have!

Pity it's not an entire substitute for git filter-branch.

-- Sergey

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: Proposing git-filter-repo for inclusion in git.git
  2019-08-23 18:06       ` Elijah Newren
  2019-08-23 18:29         ` Elijah Newren
@ 2019-08-28 11:09         ` Johannes Schindelin
  2019-08-28 15:06           ` Junio C Hamano
  1 sibling, 1 reply; 73+ messages in thread
From: Johannes Schindelin @ 2019-08-28 11:09 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Eric Wong, Junio C Hamano, Git Mailing List,
	Ævar Arnfjörð Bjarmason, Lars Schneider,
	Jonathan Nieder

Hi Elijah,

On Fri, 23 Aug 2019, Elijah Newren wrote:

> On Thu, Aug 22, 2019 at 8:01 PM Eric Wong <e@80x24.org> wrote:
> >
> > Elijah Newren <newren@gmail.com> wrote:
> > >   * Remove git-filter-branch from git.git.  Mention in the release
> > > notes where people can go to get it.[1]
> > >
> > > filter-branch is not merely a slow or difficult-to-use tool, it's one
> > > that *fosters* mistakes by making it hard to get things right in
> > > several different ways.  Granted, people exercise extra caution using
> > > filter-branch because they know they need to, but there are so many
> > > gotchas that they're likely to accidentally mess something up.  Those
> > > mess-ups are not always discovered immediately, and by then it's
> > > nearly cast into stone (rewriting being something you want to do very
> > > rarely).
> >
> > Is it possible to turn git-filter-branch into a fast, compatible,
> > and (maybe) safe wrapper for git-filter-repo?  That would "fix"
> > filter-branch and (if done carefully) not break existing uses.
>
> Ooh, what an interesting question.  I can probably ramble on a LOT
> longer than you expected about this...
>
> [...]

FWIW if anybody cares about my opinion: I would be totally fine with
integrating git-filter-repo into git.git, have it there for a major
version or two, then patch `git filter-branch` to spew out a deprecation
warning, and then remove that latter command a major version (or two)
later.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: RFC: Proposing git-filter-repo for inclusion in git.git
  2019-08-28 11:09         ` Johannes Schindelin
@ 2019-08-28 15:06           ` Junio C Hamano
  0 siblings, 0 replies; 73+ messages in thread
From: Junio C Hamano @ 2019-08-28 15:06 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Elijah Newren, Eric Wong, Git Mailing List,
	Ævar Arnfjörð Bjarmason, Lars Schneider,
	Jonathan Nieder

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> FWIW if anybody cares about my opinion: I would be totally fine with
> integrating git-filter-repo into git.git, have it there for a major
> version or two, then patch `git filter-branch` to spew out a deprecation
> warning, and then remove that latter command a major version (or two)
> later.

Yup, that's just the usual deprecate then delete sequence.  The
compatibility wrapper brought up in the discussion earlier would be
a big plus ;-)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere
  2019-08-28  8:52               ` Sergey Organov
@ 2019-08-28 17:16                 ` Elijah Newren
  2019-08-28 19:03                   ` Sergey Organov
  2019-08-30 20:40                   ` Johannes Schindelin
  0 siblings, 2 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-28 17:16 UTC (permalink / raw)
  To: Sergey Organov
  Cc: Eric Wong, Git Mailing List, Junio C Hamano, Derrick Stolee,
	Jeff King, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Lars Schneider, Jonathan Nieder

Hi Sergey,

On Wed, Aug 28, 2019 at 1:52 AM Sergey Organov <sorganov@gmail.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > On Tue, Aug 27, 2019 at 1:43 AM Sergey Organov <sorganov@gmail.com> wrote:
> >>
> >> Eric Wong <e@80x24.org> writes:
> >>
> >>
> >> [...]
> >>
> >> > AFAIK, filter-branch is not causing support headaches for any
> >> > git developers today.  With so many commands in git, it's
> >> > unlikely newbies will ever get around to discover it :)
> >> > So I think think we should be in any rush to remove it.
> >>
> >> Nah, discovering it is simple. Just Google for "git change author". That
> >> eventually leads to a script that uses "git filter-branch --env-filter"
> >> to get the job done, and I'm afraid it is spread all over the world.
> >>
> >> See, e.g.:
> >>
> >> https://help.github.com/en/articles/changing-author-info
> >
> > Side note: Is the goal to "fix names and email addresses in this
> > repository"?  If so, this guide fails: it doesn't update tagger names
> > or email addresses.  Indeed, filter-branch doesn't provide a way to do
> > that.  (Not to mention other problems like not updating references to
> > commit hashes in commit messages when it busy rewriting everything.)
>
> No. Maybe the original goal was like that, by I, personally, use
> modified version of this to change my "Author" credentials from
> "internal" to "public" in branches that I'm going to send upstream, so
> the actual aim is to change e-mail of particular Author from a@b to c@d
> in all the commits in a (feature) branch.

There's an interesting usecase I hadn't heard of or thought of before.
Quick question to see if I'm understanding correctly: "all commits in
a branch" or "all commits *unique* to a branch"?

(Perhaps the only commits with the author you want to change are among
the commits that are unique to that branch and so the distinction
doesn't matter, but it wasn't clear from the description.)

> >> > But I agree that filter-branch isn't useful and certainly
> >> > shouldn't be encouraged/promoted.
> >>
> >> Well, is there more suitable way to change author for a (large) set of
> >> commits then?
> >
> > I would say yes, use git filter-repo (note that this thread started
> > with me proposing filter-repo for inclusion in git.git -- and getting
> > suggestions that we should remove stuff instead of adding more stuff).
> > I'm biased, but I think it's much better at this particular job as
> > well:
>
> Well, I don't want to change the entire repo, and I don't immediately
> see how to do it with git filter-repo. Is it at all possible?

Yes, it is possible.  filter-repo has a hidden --refs argument
defaulting to --all; you could instead set it to e.g.
origin/master..master.

--refs is the only hidden option in filter-repo.  I know it may look
funny that I spent a bunch of effort to create the
--reference-excluded-parents option to fast-export explicitly so that
it would be possible to do partial history rewrites like this, and
then to hide and avoid documenting this option (though I did hint that
it existed in the documentation if you search for "Partial-repo
filtering"), but there was a few reasons for this:

  * mixing old and new history for most rewrites that
filter-branch/filter-repo/bfg/etc are used for can really mess things
up and make it hard to recover from.  I don't like trying to clean up
repos with accidental duplicate copies of most commits in the repo,
and I suspect others like it even less.  So, anything that makes it
easier to make such mistakes needs to have a really good rationale in
order for me to expose it.
  * The only usecases I knew of for partial repo filtering prior to
your email were (1) side-stepping insanely slow execution time of poor
filtering tools like filter-branch, and (2) performing operations
better suited to git-rebase anyway (e.g. the --signoff option to
rebase did not exist once upon a time and so folks could have used
filter-branch to fake it, but using rebase is the better way to make
this change).  And, even after your email, I'm not sure that has
changed though, as noted below.

> > You can create a mailmap file and pass it to the --mailmap option to
> > git-filter-repo.
> >
> > Or, if you prefer (perhaps you don't like git's mailmap format as used
> > by shortlog and now log, or perhaps you really want to be able to do
> > regex replacement or something), you can use the --name-callback or
> > --email-callback to work on those fields more directly.
> >
> > Or, if you prefer (e.g. you want to handle author vs. committer vs.
> > tagger differently), you can use the --commit-callback and
> > --tag-callback filters.
> >
> > As an added bonus, filter-repo will also perform the rewrite far
> > faster than filter-branch (and rewrite commit hashes in commit
> > messages as alluded to above).
>
> These things are nice to have indeed, but it always changes the entire
> repo, right? If so, it's not a suitable substitute for git-filter-branch
> for particular job at hand.

It *defaults* to changing the entire repo.  You may also be interested
to note that two of the demos in contrib/filter-repo-demos[1]
explicitly make use of partial history filtering -- one of the two
being the filter-branch reimplementation (I created a script that
reimplemented filter-branch on top of filter-repo and made it accept
the exact same flags as filter-branch.  That script passes all the
filter-branch regression tests from git.git and it's much faster than
filter-branch, though it's still glacially slow compared to
filter-repo, and has all the safety problems that filter-branch does).

[1] https://github.com/newren/git-filter-repo/tree/master/contrib/filter-repo-demos

> Actually, I'd rather expect some support for this in "git rebase", being
> git history editing/reshaping tool, but it looks like it only has it in
> the form that is very difficult to automate.

I agree that git rebase would be the better choice here; I typically
feel it's the better choice for rewrites of recent history.  I think
it provides just what you need:

  git rebase --exec="git commit --amend --reset-author -C HEAD" $UPSTREAM

(Assuming, of course, that you've either set the right environment
variables or set user.name and user.email to the new values you want
so that commit's --reset-author flag can reset to the *new* author.)

> >> > Yet there's probably still users which ARE happy with it, that
> >> > will never hit the edge cases and problems it poses; and will
> >> > never read release notes.  And said users are probably getting
> >> > git from a slow-moving distro, so it'd be a disservice to them
> >> > if they lost a tool they depend on without any warning.
> >>
> >> Personally, I'm far from happy with it, but I have no clue how to
> >> substitute it in the job above. Anybody?
> >
> > The start of this thread where I proposed git filter-repo for
> > inclusion in git[1] had links to documentation and comparisons to
> > other tools and such.  You may find those links helpful; if not, let
> > me know what needs to be fixed in the documentation.
>
> Thank you for the references, I find it a very nice tool to have!
>
> Pity it's not an entire substitute for git filter-branch.

Au contraire, I believe it is.  :-)

Thanks for the interesting usecase.  It sounds like we both think this
one happens to be better solved by rebase, and the command snippet I
provided above should show you to use rebase to solve it.  However, if
you come up with any others where partial repo filtering makes sense,
I'm always willing to reconsider my decision to make the --refs
argument hidden; it may just mean adding more warnings, but it might
also involve changing other defaults (e.g. the automatic
repacking/pruning).  I'd need concrete usecases to know for sure how
I'd want to handle it.

Hope that helps,
Elijah

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere
  2019-08-28 17:16                 ` Elijah Newren
@ 2019-08-28 19:03                   ` Sergey Organov
  2019-08-30 20:40                   ` Johannes Schindelin
  1 sibling, 0 replies; 73+ messages in thread
From: Sergey Organov @ 2019-08-28 19:03 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Eric Wong, Git Mailing List, Junio C Hamano, Derrick Stolee,
	Jeff King, Ævar Arnfjörð Bjarmason,
	Johannes Schindelin, Lars Schneider, Jonathan Nieder

Hi Elijah,

Elijah Newren <newren@gmail.com> writes:

> Hi Sergey,
>
> On Wed, Aug 28, 2019 at 1:52 AM Sergey Organov <sorganov@gmail.com> wrote:
>>
>> Elijah Newren <newren@gmail.com> writes:
>>
>> > On Tue, Aug 27, 2019 at 1:43 AM Sergey Organov <sorganov@gmail.com> wrote:
>> >>
>> >> Eric Wong <e@80x24.org> writes:
>> >>
>> >>
>> >> [...]

[...]

>> >
>> > Side note: Is the goal to "fix names and email addresses in this
>> > repository"?  If so, this guide fails: it doesn't update tagger names
>> > or email addresses.  Indeed, filter-branch doesn't provide a way to do
>> > that.  (Not to mention other problems like not updating references to
>> > commit hashes in commit messages when it busy rewriting everything.)
>>
>> No. Maybe the original goal was like that, by I, personally, use
>> modified version of this to change my "Author" credentials from
>> "internal" to "public" in branches that I'm going to send upstream, so
>> the actual aim is to change e-mail of particular Author from a@b to c@d
>> in all the commits in a (feature) branch.
>
> There's an interesting usecase I hadn't heard of or thought of before.
> Quick question to see if I'm understanding correctly: "all commits in
> a branch" or "all commits *unique* to a branch"?
>
> (Perhaps the only commits with the author you want to change are among
> the commits that are unique to that branch and so the distinction
> doesn't matter, but it wasn't clear from the description.)

Yes, this is exactly the case for me, as I'm changing entirely linear
topic branch that is going to become patch series to send out. No
complications.

>
>> >> > But I agree that filter-branch isn't useful and certainly
>> >> > shouldn't be encouraged/promoted.
>> >>
>> >> Well, is there more suitable way to change author for a (large) set of
>> >> commits then?
>> >
>> > I would say yes, use git filter-repo (note that this thread started
>> > with me proposing filter-repo for inclusion in git.git -- and getting
>> > suggestions that we should remove stuff instead of adding more stuff).
>> > I'm biased, but I think it's much better at this particular job as
>> > well:
>>
>> Well, I don't want to change the entire repo, and I don't immediately
>> see how to do it with git filter-repo. Is it at all possible?
>
> Yes, it is possible.  filter-repo has a hidden --refs argument
> defaulting to --all; you could instead set it to e.g.
> origin/master..master.

Cool!

>
> --refs is the only hidden option in filter-repo.  I know it may look
> funny that I spent a bunch of effort to create the
> --reference-excluded-parents option to fast-export explicitly so that
> it would be possible to do partial history rewrites like this, and
> then to hide and avoid documenting this option (though I did hint that
> it existed in the documentation if you search for "Partial-repo
> filtering"), but there was a few reasons for this:
>
>   * mixing old and new history for most rewrites that
> filter-branch/filter-repo/bfg/etc are used for can really mess things
> up and make it hard to recover from.  I don't like trying to clean up
> repos with accidental duplicate copies of most commits in the repo,
> and I suspect others like it even less.  So, anything that makes it
> easier to make such mistakes needs to have a really good rationale in
> order for me to expose it.
>   * The only usecases I knew of for partial repo filtering prior to
> your email were (1) side-stepping insanely slow execution time of poor
> filtering tools like filter-branch, and (2) performing operations
> better suited to git-rebase anyway (e.g. the --signoff option to
> rebase did not exist once upon a time and so folks could have used
> filter-branch to fake it, but using rebase is the better way to make
> this change).  And, even after your email, I'm not sure that has
> changed though, as noted below.

Yeah, I share your worries.

[...]

>> Actually, I'd rather expect some support for this in "git rebase", being
>> git history editing/reshaping tool, but it looks like it only has it in
>> the form that is very difficult to automate.
>
> I agree that git rebase would be the better choice here; I typically
> feel it's the better choice for rewrites of recent history.  I think
> it provides just what you need:
>
>   git rebase --exec="git commit --amend --reset-author -C HEAD" $UPSTREAM
>
> (Assuming, of course, that you've either set the right environment
> variables or set user.name and user.email to the new values you want
> so that commit's --reset-author flag can reset to the *new* author.)

This should do the trick for me most of times, thanks a lot for the clue!

However, the script that I'm using doesn't change _all_ the authors, it
only changes those that match particular specific author specified in
the script. I didn't yet actually need this feature, but I can well
imagine it's probable that I will have commits by other author(s) in the
branch and I won't want to attribute their job to myself.

Hmm... That said, using the generic "--exec" to "git rebase" I could
probably come-up with a script that will check the Author of the latest
commit and will choose to either rewrite it or not. Nothing terribly
complex.

>
>> >> > Yet there's probably still users which ARE happy with it, that
>> >> > will never hit the edge cases and problems it poses; and will
>> >> > never read release notes.  And said users are probably getting
>> >> > git from a slow-moving distro, so it'd be a disservice to them
>> >> > if they lost a tool they depend on without any warning.
>> >>
>> >> Personally, I'm far from happy with it, but I have no clue how to
>> >> substitute it in the job above. Anybody?
>> >
>> > The start of this thread where I proposed git filter-repo for
>> > inclusion in git[1] had links to documentation and comparisons to
>> > other tools and such.  You may find those links helpful; if not, let
>> > me know what needs to be fixed in the documentation.
>>
>> Thank you for the references, I find it a very nice tool to have!
>>
>> Pity it's not an entire substitute for git filter-branch.
>
> Au contraire, I believe it is.  :-)

I take your word for it :-)

>
> Thanks for the interesting usecase.  It sounds like we both think this
> one happens to be better solved by rebase, and the command snippet I
> provided above should show you to use rebase to solve it.  However, if
> you come up with any others where partial repo filtering makes sense,
> I'm always willing to reconsider my decision to make the --refs
> argument hidden; it may just mean adding more warnings, but it might
> also involve changing other defaults (e.g. the automatic
> repacking/pruning).  I'd need concrete usecases to know for sure how
> I'd want to handle it.

OK, thanks a lot! Doesn't seem to be necessary for now due to the rebase
trick you've suggested.

>
> Hope that helps,

Sure it does!

-- Sergey

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2 3/4] Recommend git-filter-repo instead of git-filter-branch
  2019-08-28  6:17             ` Eric Sunshine
@ 2019-08-28 21:48               ` Elijah Newren
  0 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-28 21:48 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder

On Tue, Aug 27, 2019 at 11:17 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Tue, Aug 27, 2019 at 8:22 PM Elijah Newren <newren@gmail.com> wrote:
> > filter-branch suffers from a deluge of disguised dangers that disfigure
> > history rewrites (i.e. deviate from the deliberate changes). [...]
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> > diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
> > @@ -16,6 +16,20 @@ SYNOPSIS
> > +WARNING
> > +-------
> > +'git filter-branch' has a plethora of pitfalls that can produce non-obvious
> > +manglings of the intended history rewrite (and can leave you with little
> > +time to investigate such problems since it has such abysmal performance).
> > +These safety and performance issues cannot be backward compatibly fixed and
> > +as such, its use is not recommended.  Please use an alternative history
> > +filtering tool such as https://github.com/newren/git-filter-repo/[git
> > +filter-repo].  If you still need to use 'git filter-branch', please
> > +carefully read the "Safety" section of the message on the Git mailing list
> > +https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@mail.gmail.com/[detailing
> > +the land mines of filter-branch] and vigilantly avoid as many of the
> > +hazards listed there as reasonably possible.
>
> Is there a good reason to not simply copy the "Safety" section from
> that email directly into this document so that readers don't have to
> go chasing down the link (especially those who are reading
> documentation offline)?

Makes sense, I can include it.  However, saying e.g. "the
git-filter-branch manpage is missing..." or "the git-filter-branch
manpage actually documents <crazy buggy behavior> as expected" feels
really weird to include on the git-filter-branch manpage.  I'll try to
touch it up.

> > diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> > @@ -832,7 +832,7 @@ Hard case: The changes are not the same.::
> >         This happens if the 'subsystem' rebase had conflicts, or used
> >         `--interactive` to omit, edit, squash, or fixup commits; or
> >         if the upstream used one of `commit --amend`, `reset`, or
> > -       `filter-branch`.
> > +       a full history rewriting command like `filter-repo`.
>
> Do we want a clickable link to `filter-repo` here?

I guess it can't hurt.

> > diff --git a/Documentation/git-replace.txt b/Documentation/git-replace.txt
> > @@ -123,10 +123,10 @@ The following format are available:
> > +linkgit:git-hash-object[1], linkgit:git-rebase[1], and
> > +linkgit:git-filter-repo[1], among other git commands, can be used to
> > [...]
> > @@ -148,8 +148,8 @@ pending objects.
> >  linkgit:git-hash-object[1]
> >  linkgit:git-rebase[1]
> > +linkgit:git-filter-repo[1]
>
> Are these 'linkgit:' references to `filter-repo` going to be
> meaningful if that tool is not incorporated into the Git project
> proper? Perhaps use a generic clickable link instead.
>
> Same comment applies to other 'linkgit:' invocations in the remainder
> of the patch.

I'm fixing them up.

> > diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> > old mode 100755
> > new mode 100644
>
> Why lose the executable bit?

Whoops.  Did some rebasing and fixups, then continued editing my
buffer of the file after one of the rebases, realized the file was
deleted (because of the final patch in the series), moved the file out
of the way and rebased again and copied the file back into place, and
forgot to check the filemode.

> > @@ -83,6 +83,19 @@ set_ident () {
> > +if [ -z "$FILTER_BRANCH_SQUELCH_WARNING" -a \
> > +     -z "$GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS" ]; then
>
> If this script didn't already have a mix of styles, I'd say something
> about modern style being:
>
>     if test -z "$FILTER_BRANCH_SQUELCH_WARNING" &&
>         test -z "$GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS"
>     then
>         ...
>     fi
>
> > +       cat <<EOF
> > +WARNING: git-filter-branch has a glut of gotchas generating mangled history
> > +         rewrites.  Please use an alternative filtering tool such as 'git
> > +         filter-repo' (https://github.com/newren/git-filter-repo/) instead.
> > +         See the filter-branch manual page for more details; to squelch
> > +         this warning and pause, set FILTER_BRANCH_SQUELCH_WARNING=1.
>
> The "and pause" threw me. There's more than a bit of ambiguity
> surrounding it. Perhaps drop it?

Sure, will do.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 0/4] Warn about git-filter-branch usage and avoid it
  2019-08-28  0:22         ` [PATCH v2 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
                             ` (3 preceding siblings ...)
  2019-08-28  0:22           ` [RFC PATCH v2 4/4] Remove git-filter-branch, it is now external to git.git Elijah Newren
@ 2019-08-29  0:06           ` Elijah Newren
  2019-08-29  0:06             ` [PATCH v3 1/4] t6006: simplify and optimize empty message test Elijah Newren
                               ` (4 more replies)
  2019-09-03 18:55           ` [PATCH v5 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
  5 siblings, 5 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-29  0:06 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine, Elijah Newren

Here's a series warns about git-filter-branch usage and avoids it
ourselves.

Changes since v2 (full range-diff below):
  * [Patch 2] testcase syntax cleanups
  * [Patch 3] fixed "linkgit:" references to filter-repo to be url
    links (or footnotes)
  * [Patch 3] fixed the mode on filter-branch.sh (oops) and dropped
    the ambiguous "and pause".  Linkified "filter-repo" in a place
    where there was no link.
  * [Patch 3] As suggested by Eric (Sunshine), just make the manpage
    and directly include the safety and performance sections of the
    referenced email (the performance section was referenced by the
    safety section).  Being included directly in the manpage should
    help with folks reading the documentation offline.  Anyway, the
    text is really long, so it took a while to format it nicely,
    recheck for typos, reword based on the fact that it'll be in the
    manpage (because it's weird to have the manpage refer to itself),
    etc.
  * [Patch 4] Dropped almost all the original patch 4; only including
    the bits about t9902-completion.sh.  Removed the RFC label, since
    that one piece should be good for including now.

Elijah Newren (4):
  t6006: simplify and optimize empty message test
  t3427: accelerate this test by using fast-export and fast-import
  Recommend git-filter-repo instead of git-filter-branch
  t9902: use a non-deprecated command for testing

 Documentation/git-fast-export.txt   |   6 +-
 Documentation/git-filter-branch.txt | 302 +++++++++++++++++++++++++---
 Documentation/git-gc.txt            |  17 +-
 Documentation/git-rebase.txt        |   3 +-
 Documentation/git-replace.txt       |  10 +-
 Documentation/git-svn.txt           |  10 +-
 Documentation/githooks.txt          |  10 +-
 contrib/svn-fe/svn-fe.txt           |   4 +-
 git-filter-branch.sh                |  13 ++
 t/t3427-rebase-subtree.sh           |  24 ++-
 t/t6006-rev-list-format.sh          |   5 +-
 t/t9902-completion.sh               |  12 +-
 12 files changed, 339 insertions(+), 77 deletions(-)

Range-diff:
1:  7ddbeea2ca = 1:  7ddbeea2ca t6006: simplify and optimize empty message test
2:  f18bd7a609 ! 2:  e1e63189c1 t3427: accelerate this test by using fast-export and fast-import
    @@ Commit message
         Signed-off-by: Elijah Newren <newren@gmail.com>
     
      ## t/t3427-rebase-subtree.sh ##
    -@@ t/t3427-rebase-subtree.sh: commit_message() {
    +@@ t/t3427-rebase-subtree.sh: This test runs git rebase and tests the subtree strategy.
    + . ./test-lib.sh
    + . "$TEST_DIRECTORY"/lib-rebase.sh
    + 
    +-commit_message() {
    ++commit_message () {
      	git log --pretty=format:%s -1 "$1"
      }
      
    -+extract_files_subtree() {
    -+	git fast-export --no-data HEAD -- files_subtree/ \
    -+		| sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" \
    -+		| git fast-import --force --quiet
    ++extract_files_subtree () {
    ++	git fast-export --no-data HEAD -- files_subtree/ |
    ++		sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" |
    ++		git fast-import --force --quiet
     +}
     +
      test_expect_success 'setup' '
3:  7008c16984 ! 3:  59c7446927 Recommend git-filter-repo instead of git-filter-branch
    @@ Documentation/git-filter-branch.txt: SYNOPSIS
     +as such, its use is not recommended.  Please use an alternative history
     +filtering tool such as https://github.com/newren/git-filter-repo/[git
     +filter-repo].  If you still need to use 'git filter-branch', please
    -+carefully read the "Safety" section of the message on the Git mailing list
    ++carefully read <<SAFETY>> (and <<PERFORMANCE>>) to learn about the land
    ++mines of filter-branch, and then vigilantly avoid as many of the hazards
    ++listed there as reasonably possible.
    ++
     +https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@mail.gmail.com/[detailing
    -+the land mines of filter-branch] and vigilantly avoid as many of the
    -+hazards listed there as reasonably possible.
    ++the land mines of filter-branch]
     +
      DESCRIPTION
      -----------
    @@ Documentation/git-filter-branch.txt: warned.
     -  are much more restrictive than git-filter branch, and dedicated just
     -  to the tasks of removing unwanted data- e.g:
     -  `--strip-blobs-bigger-than 1M`.
    --
    ++[[PERFORMANCE]]
    ++PERFORMANCE
    ++-----------
    ++
    ++The performance of filter-branch is glacially slow; its design makes it
    ++impossible for a backward-compatible implementation to ever be fast:
    ++
    ++* In editing files, git-filter-branch by design checks out each and
    ++every commit as it existed in the original repo.  If your repo has 10\^5
    ++files and 10\^5 commits, but each commit only modifies 5 files, then
    ++git-filter-branch will make you do 10\^10 modifications, despite only
    ++having (at most) 5*10^5 unique blobs.
    ++
    ++* If you try and cheat and try to make filter-branch only work on
    ++files modified in a commit, then two things happen
    ++
    ++  . you run into problems with deletions whenever the user is simply
    ++    trying to rename files (because attempting to delete files that
    ++    don't exist looks like a no-op; it takes some chicanery to remap
    ++    deletes across file renames when the renames happen via arbitrary
    ++    user-provided shell)
    ++
    ++  . even if you succeed at the map-deletes-for-renames chicanery, you
    ++    still technically violate backward compatibility because users are
    ++    allowed to filter files in ways that depend upon topology of commits
    ++    instead of filtering solely based on file contents or names (though
    ++    I have never seen any user ever do this).
    ++
    ++* Even if you don't need to edit files but only want to e.g. rename or
    ++remove some and thus can avoid checking out each file (i.e. you can use
    ++--index-filter), you still are passing shell snippets for your filters.
    ++This means that for every commit, you have to have a prepared git repo
    ++where users can run git commands.  That's a lot of setup.  It also means
    ++you have to fork at least one process to run the user-provided shell
    ++snippet, and odds are that the user's shell snippet invokes lots of
    ++commands in some long pipeline, so you will have lots and lots of forks.
    ++For every. single. commit.  That's a massive amount of overhead to
    ++rename a few files.
    ++
    ++* filter-branch is written in shell, which is kind of slow.  Naturally,
    ++it makes sense to want to rewrite that in some other language.  However,
    ++filter-branch documentation states that several additional shell
    ++functions are provided for users to call, e.g. 'map', 'skip_commit',
    ++'git_commit_non_empty_tree'.  If filter-branch itself isn't a shell
    ++script, then in order to make those shell functions available to the
    ++users' shell snippets you have to prepend the shell definitions of these
    ++functions to every one of the users' shell snippets and thus make these
    ++special shell functions be parsed with each and every commit.
    ++
    ++* filter-branch provides a --setup option which is a shell snippet that
    ++can be sourced to make shell functions and variables available to all
    ++other filters.  If filter-branch is a shell script, it can simply eval
    ++this shell snippet once at the beginning.  If you try to fix performance
    ++by making filter-branch not be a shell script, then you have to prepend
    ++the setup shell snippet to all other filters and parse it with every
    ++single commit.
    ++
    ++* filter-branch writes lots of files to $workdir/../map/ to keep a
    ++mapping of commits, which it uses pruning commits and remapping to
    ++ancestors and the map() command more generally.  Other files like
    ++$tempdir/backup-refs, $tempdir/raw-refs, $tempdir/heads,
    ++$tempdir/tree-state are all created internally too.  It is possible
    ++(though strongly discouraged) that users could have accessed any of
    ++these directly.  Users even had a pointer to follow in the form of
    ++Documentation that the 'map' command existed, which naturally uses the
    ++$workdir/../map/* files.  So, even if you don't have to edit files, for
    ++strict backward compatibility you need to still write a bunch of files
    ++to disk somewhere and keep them updated for every commit.  You can claim
    ++it was an implementation detail that users should not have depended
    ++upon, but the truth is they've had a decade where they could so.  So, if
    ++you want full compatibility, it has to be there.  Besides, the
    ++regression tests depend on at least one of these details, specifying an
    ++--index-filter that reaches down and grabs backup-refs from $tempdir,
    ++and thus provides resourceful users who do google searches an example
    ++that there are files there for them to read and grab and use.  (And if
    ++you want to pass the existing regression tests, you have to at least put
    ++the backup-refs file there even if it's irrelevant to your
    ++implementation otherwise.)
    ++
    ++All of that said, performance of filter-branch could be improved by
    ++reimplementing it in a non-shell language and taking a couple small
    ++liberties with backward compatibility (such as having it only run
    ++filters on files changed within each commit).  filter-repo provides a
    ++demo script named
    ++https://github.com/newren/git-filter-repo/blob/master/contrib/filter-repo-demos/filter-lamely[filter-lamely]
    ++which does exactly that and which passes all the git-filter-branch
    ++regression tests.  It's much faster than git-filter-branch, though it
    ++suffers from all the same safety issues as git-filter-branch, and is
    ++still glacially slow compared to
    ++https://github.com/newren/git-filter-repo/[git filter-repo].
    ++
    ++[[SAFETY]]
    ++SAFETY
    ++------
    ++
    ++filter-branch is riddled with gotchas resulting in various ways to
    ++easily corrupt repos or end up with a mess worse than what you started
    ++with:
    ++
    ++* Someone can have a set of "working and tested filters" which they
    ++document or provide to a coworker, who then runs them on a different OS
    ++where the same commands are not working/tested (some examples in the
    ++git-filter-branch manpage are also affected by this).  BSD vs. GNU
    ++userland differences can really bite.  If you're lucky, you get ugly
    ++error messages spewed.  But just as likely, the commands either don't do
    ++the filtering requested, or silently corrupt making some unwanted
    ++change.  The unwanted change may only affect a few commits, so it's not
    ++necessarily obvious either.  (The fact that problems won't necessarily
    ++be obvious means they are likely to go unnoticed until the rewritten
    ++history is in use for quite a while, at which point it's really hard to
    ++justify another flag-day for another rewrite.)
    ++
    ++* Filenames with spaces (which are rare) are often mishandled by shell
    ++snippets since they cause problems for shell pipelines.  Not everyone is
    ++familiar with find -print0, xargs -0, ls-files -z, etc.  Even people who
    ++are familiar with these may assume such needs are not relevant because
    ++someone else renamed any such files in their repo back before the person
    ++doing the filtering joined the project.  And, often, even those familiar
    ++with handling arguments with spaces my not do so just because they
    ++aren't in the mindset of thinking about everything that could possibly
    ++go wrong.
    ++
    ++* Non-ascii filenames (which are rare) can be silently removed despite
    ++being in a desired directory.  The desire to select paths to keep often
    ++use pipelines like `git ls-files | grep -v ^WANTED_DIR/ | xargs git rm`.
    ++ls-files will only quote filenames if needed so folks may not notice
    ++that one of the files didn't match the regex, again until it's much too
    ++late.  Yes, someone who knows about core.quotePath can avoid this
    ++(unless they have other special characters like \t, \n, or "), and
    ++people who use ls-files -z with something other than grep can avoid
    ++this, but that doesn't mean they will.
    ++
    ++* Similarly, when moving files around, one can find that filenames with
    ++non-ascii or special characters end up in a different directory, one
    ++that includes a double quote character.  (This is technically the same
    ++issue as above with quoting, but perhaps an interesting different way
    ++that it can and has manifested as a problem.)
    ++
    ++* It's far too easy to accidentally mix up old and new history.  It's
    ++still possible with any tool, but filter-branch almost invites it.  If
    ++we're lucky, the only downside is users getting frustrated that they
    ++don't know how to shrink their repo and remove the old stuff.  If we're
    ++unlucky, they merge old and new history and end up with multiple
    ++"copies" of each commit, some of which have unwanted or sensitive files
    ++and others which don't.  This comes about in multiple different ways:
    ++
    ++  ** the default to only doing a partial history rewrite ('--all' is not
    ++     the default and over 80% of the examples in the manpage don't use
    ++     it)
    ++
    ++  ** the fact that there's no automatic post-run cleanup
    ++
    ++  ** the fact that --tag-name-filter (when used to rename tags) doesn't
    ++     remove the old tags but just adds new ones with the new name (this
    ++     manpage has documented this for a long time so it's presumably not
    ++     a "bug" even though it feels like it)
    ++
    ++  ** the fact that little educational information is provided to inform
    ++     users of the ramifications of a rewrite and how to avoid mixing old
    ++     and new history.  For example, this man page discusses how users
    ++     need to understand that they need to rebase their changes for all
    ++     their branches on top of new history (or delete and reclone), but
    ++     that's only one of multiple concerns to consider.  See the
    ++     "DISCUSSION" section of the git filter-repo manual page for more
    ++     details.
    ++
    ++* Annotated tags can be accidentally converted to lightweight tags, due
    ++to either of two issues:
    ++
    ++  . Someone can do a history rewrite, realize they messed up, restore
    ++    from the backups in refs/original/, and then redo their
    ++    filter-branch command.  (The backup in refs/original/ is not a real
    ++    backup; it dereferences tags first.)
    ++
    ++  . Running filter-branch with either --tags or --all in your <rev-list
    ++    options>.  In order to retain annotated tags as annotated, you must
    ++    use --tag-name-filter (and must not have restored from
    ++    refs/original/ in a previously botched rewrite).
    ++
    ++* Any commit messages that specify an encoding will become corrupted
    ++by the rewrite; filter-branch ignores the encoding, takes the original
    ++bytes, and feeds it to commit-tree without telling it the proper
    ++encoding.  (This happens whether or not --msg-filter is used, though I
    ++suspect --msg-filter provides additional ways to really mess things
    ++up).
    ++
    ++* Commit messages (even if they are all UTF-8) by default become
    ++corrupted due to not being updated -- any references to other commit
    ++hashes in commit messages will now refer to no-longer-extant commits.
    ++
    ++* There are no facilities for helping users find what unwanted crud they
    ++should delete, which means they are much more likely to have incomplete
    ++or partial cleanups that sometimes result in confusion and people
    ++wasting time trying to understand.  (For example, folks tend to just
    ++look for big files to delete instead of big directories or extensions,
    ++and once they do so, then sometime later folks using the new repository
    ++who are going through history will notice a build artifact directory
    ++that has some files but not others, or a cache of dependencies
    ++(node_modules or similar) which couldn't have ever been functional since
    ++it's missing some files.)
    ++
    ++* If --prune-empty isn't specified, then the filtering process can
    ++create hoards of confusing empty commits
    ++
    ++* If --prune-empty is specified, then intentionally placed empty
    ++commits from before the filtering operation are also pruned instead of
    ++just pruning commits that became empty due to filtering rules.
    ++
    ++* If --prune empty is specified, sometimes empty commits are missed
    ++and left around anyway (a somewhat rare bug, but it happens...)
    ++
    ++* A minor issue, but users who have a goal to update all names and
    ++emails in a repository may be led to --env-filter which will only update
    ++authors and committers, missing taggers.
    ++
    ++* If the user provides a --tag-name-filter that maps multiple tags to
    ++the same name, no warning or error is provided; filter-branch simply
    ++overwrites each tag in some undocumented pre-defined order resulting in
    ++only one tag at the end.  If you try to "fix" this bug in filter-branch
    ++and make it error out and warn the user instead, one of the
    ++filter-branch regression tests will fail.  (So, if you are trying to
    ++make a backward compatible reimplementation you have to add extra code
    ++to detect collisions and make sure that only the lexicographically last
    ++one is rewritten to avoid fast-import from seeing both since fast-import
    ++will naturally do the sane thing and error out if told to write the same
    ++tag more than once.)
    ++
    ++Also, the poor performance of filter-branch often leads to safety issues:
    ++
    ++* Coming up with the correct shell snippet to do the filtering you want
    ++is sometimes difficult unless you're just doing a trivial modification
    ++such as deleting a couple files.  People have often come to me for help,
    ++so I should be practiced and an expert, but even for fairly simple cases
    ++I still sometimes take over 10 minutes and several iterations to get
    ++the right commands -- and that's assuming they are working on a tiny
    ++repository.  Unfortunately, people often learn if the snippet is right
    ++or wrong by trying it out, but the rightness or wrongness can vary
    ++depending on special circumstances (spaces in filenames, non-ascii
    ++filenames, funny author names or emails, invalid timezones, presence of
    ++grafts or replace objects, etc.), meaning they may have to wait a long
    ++time, hit an error, then restart.  The performance of filter-branch is
    ++so bad that this cycle is painful, reducing the time available to
    ++carefully re-check (to say nothing about what it does to the patience of
    ++the person doing the rewrite even if they do technically have more time
    ++available).  This problem is extra compounded because errors from broken
    ++filters may not be shown for a long time and/or get lost in a sea of
    ++output.  Even worse, broken filters often just result in silent
    ++incorrect rewrites.
    ++
    ++* To top it all off, even when users finally find working commands, they
    ++naturally want to share them.  But they may be unaware that their repo
    ++didn't have some special cases that someone else's does.  So, when
    ++someone else with a different repository runs the same commands, they
    ++get hit by the problems above.  Or, the user just runs commands that
    ++really were vetted for special cases, but they run it on a different OS
    ++where it doesn't work, as noted above.
    + 
      GIT
      ---
    - Part of the linkgit:git[1] suite
     
      ## Documentation/git-gc.txt ##
     @@ Documentation/git-gc.txt: NOTES
    @@ Documentation/git-rebase.txt: Hard case: The changes are not the same.::
      	`--interactive` to omit, edit, squash, or fixup commits; or
      	if the upstream used one of `commit --amend`, `reset`, or
     -	`filter-branch`.
    -+	a full history rewriting command like `filter-repo`.
    ++	a full history rewriting command like
    ++	https://github.com/newren/git-filter-repo[`filter-repo`].
      
      
      The easy case
    @@ Documentation/git-replace.txt: The following format are available:
     -replacement objects from existing objects. The `--edit` option can
     -also be used with 'git replace' to create a replacement object by
     +linkgit:git-hash-object[1], linkgit:git-rebase[1], and
    -+linkgit:git-filter-repo[1], among other git commands, can be used to
    ++https://github.com/newren/git-filter-repo[git-filter-repo], among other git commands, can be used to
     +create replacement objects from existing objects. The `--edit` option
     +can also be used with 'git replace' to create a replacement object by
      editing an existing object.
    @@ Documentation/git-replace.txt: pending objects.
      linkgit:git-hash-object[1]
     -linkgit:git-filter-branch[1]
      linkgit:git-rebase[1]
    -+linkgit:git-filter-repo[1]
      linkgit:git-tag[1]
      linkgit:git-branch[1]
      linkgit:git-commit[1]
    + linkgit:git-var[1]
    + linkgit:git[1]
    ++https://github.com/newren/git-filter-repo[git-filter-repo]
    + 
    + GIT
    + ---
     
      ## Documentation/git-svn.txt ##
     @@ Documentation/git-svn.txt: option for (hopefully) obvious reasons.
    @@ Documentation/git-svn.txt: option for (hopefully) obvious reasons.
      This option is NOT recommended as it makes it difficult to track down
      old references to SVN revision numbers in existing documentation, bug
     -reports and archives.  If you plan to eventually migrate from SVN to Git
    -+reports, and archives.  If you plan to eventually migrate from SVN to Git
    - and are certain about dropping SVN history, consider
    +-and are certain about dropping SVN history, consider
     -linkgit:git-filter-branch[1] instead.  filter-branch also allows
    -+linkgit:git-filter-repo[1] instead.  filter-repo also allows
    - reformatting of metadata for ease-of-reading and rewriting authorship
    - info for non-"svn.authorsFile" users.
    +-reformatting of metadata for ease-of-reading and rewriting authorship
    +-info for non-"svn.authorsFile" users.
    ++reports, and archives.  If you plan to eventually migrate from SVN to
    ++Git and are certain about dropping SVN history, consider
    ++https://github.com/newren/git-filter-repo[git-filter-repo] instead.
    ++filter-repo also allows reformatting of metadata for ease-of-reading
    ++and rewriting authorship info for non-"svn.authorsFile" users.
      
    + svn.useSvmProps::
    + svn-remote.<name>.useSvmProps::
     
      ## Documentation/githooks.txt ##
     @@ Documentation/githooks.txt: post-rewrite
    @@ Documentation/githooks.txt: post-rewrite
     -linkgit:git-rebase[1]; currently `git filter-branch` does 'not' call
     -it!).  Its first argument denotes the command it was invoked by:
     -currently one of `amend` or `rebase`.  Further command-dependent
    +-arguments may be passed in the future.
     +linkgit:git-rebase[1]; however, full-history (re)writing tools like
    -+linkgit:git-fast-import[1] or linkgit:git-filter-repo[1] typically do
    -+not call it!).  Its first argument denotes the command it was invoked
    -+by: currently one of `amend` or `rebase`.  Further command-dependent
    - arguments may be passed in the future.
    ++linkgit:git-fast-import[1] or
    ++https://github.com/newren/git-filter-repo[git-filter-repo] typically
    ++do not call it!).  Its first argument denotes the command it was
    ++invoked by: currently one of `amend` or `rebase`.  Further
    ++command-dependent arguments may be passed in the future.
      
      The hook receives a list of the rewritten commits on stdin, in the
    + format
     
      ## contrib/svn-fe/svn-fe.txt ##
     @@ contrib/svn-fe/svn-fe.txt: line.  This line has the form `git-svn-id: URL@REVNO UUID`.
    @@ contrib/svn-fe/svn-fe.txt: The exit status does not reflect whether an error was
     +git-svn(1), svn2git(1), svk(1), git-filter-repo(1), git-fast-import(1),
      https://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt
     
    - ## git-filter-branch.sh (mode change 100755 => 100644) ##
    + ## git-filter-branch.sh ##
     @@ git-filter-branch.sh: set_ident () {
      	finish_ident COMMITTER
      }
    @@ git-filter-branch.sh: set_ident () {
     +         rewrites.  Please use an alternative filtering tool such as 'git
     +         filter-repo' (https://github.com/newren/git-filter-repo/) instead.
     +         See the filter-branch manual page for more details; to squelch
    -+         this warning and pause, set FILTER_BRANCH_SQUELCH_WARNING=1.
    ++         this warning, set FILTER_BRANCH_SQUELCH_WARNING=1.
     +
     +EOF
     +	sleep 5
4:  ff3e04e558 < -:  ---------- Remove git-filter-branch, it is now external to git.git
-:  ---------- > 4:  1dbca82408 t9902: use a non-deprecated command for testing
-- 
2.23.0.3.g59c7446927.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 1/4] t6006: simplify and optimize empty message test
  2019-08-29  0:06           ` [PATCH v3 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
@ 2019-08-29  0:06             ` Elijah Newren
  2019-08-29  0:06             ` [PATCH v3 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
                               ` (3 subsequent siblings)
  4 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-29  0:06 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine, Elijah Newren

Test t6006.71 ("oneline with empty message") was creating two commits
with simple commit messages, and then running filter-branch to rewrite
the commit messages to be empty.  This test was written this way because
the --allow-empty-message option to git commit did not exist at the
time.  Simplify this test and avoid the need to invoke filter-branch by
just using --allow-empty-message when creating the commit.

Despite only being one piece of the 71st test and there being 73 tests
overall, this small change to just this one test speeds up the overall
execution time of t6006 (as measured by the best of 3 runs of `time
./t6006-rev-list-format.sh`) by about 11% on Linux and by 13% on
Mac.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t6006-rev-list-format.sh | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
index da113d975b..d30e41c9f7 100755
--- a/t/t6006-rev-list-format.sh
+++ b/t/t6006-rev-list-format.sh
@@ -501,9 +501,8 @@ test_expect_success 'reflog identity' '
 '
 
 test_expect_success 'oneline with empty message' '
-	git commit -m "dummy" --allow-empty &&
-	git commit -m "dummy" --allow-empty &&
-	git filter-branch --msg-filter "sed -e s/dummy//" HEAD^^.. &&
+	git commit --allow-empty --allow-empty-message &&
+	git commit --allow-empty --allow-empty-message &&
 	git rev-list --oneline HEAD >test.txt &&
 	test_line_count = 5 test.txt &&
 	git rev-list --oneline --graph HEAD >testg.txt &&
-- 
2.23.0.3.g59c7446927.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 2/4] t3427: accelerate this test by using fast-export and fast-import
  2019-08-29  0:06           ` [PATCH v3 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
  2019-08-29  0:06             ` [PATCH v3 1/4] t6006: simplify and optimize empty message test Elijah Newren
@ 2019-08-29  0:06             ` Elijah Newren
  2019-08-29  0:06             ` [PATCH v3 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
                               ` (2 subsequent siblings)
  4 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-29  0:06 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine, Elijah Newren

fast-export and fast-import can easily handle the simple rewrite that
was being done by filter-branch, and should be significantly faster on
systems with a slow fork.  Timings from before and after on two laptops
that I have access to (measured via `time ./t3427-rebase-subtree.sh`,
i.e. including everything in this test -- not just the filter-branch or
fast-export/fast-import pair):

   Linux:  4.305s -> 3.684s (~17% speedup)
   Mac:   10.128s -> 7.038s (~30% speedup)

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t3427-rebase-subtree.sh | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/t/t3427-rebase-subtree.sh b/t/t3427-rebase-subtree.sh
index d8640522a0..c1f6102921 100755
--- a/t/t3427-rebase-subtree.sh
+++ b/t/t3427-rebase-subtree.sh
@@ -7,10 +7,16 @@ This test runs git rebase and tests the subtree strategy.
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-rebase.sh
 
-commit_message() {
+commit_message () {
 	git log --pretty=format:%s -1 "$1"
 }
 
+extract_files_subtree () {
+	git fast-export --no-data HEAD -- files_subtree/ |
+		sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" |
+		git fast-import --force --quiet
+}
+
 test_expect_success 'setup' '
 	test_commit README &&
 	mkdir files &&
@@ -42,7 +48,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --preserve-merges --onto commit 4' '
 	reset_rebase &&
 	git checkout -b rebase-preserve-merges-4 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD~)" = "files_subtree/master4"
@@ -53,7 +59,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --preserve-merges --onto commit 5' '
 	reset_rebase &&
 	git checkout -b rebase-preserve-merges-5 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD)" = "files_subtree/master5"
@@ -64,7 +70,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --keep-empty --preserve-merges --onto commit 4' '
 	reset_rebase &&
 	git checkout -b rebase-keep-empty-4 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --keep-empty --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD~2)" = "files_subtree/master4"
@@ -75,7 +81,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --keep-empty --preserve-merges --onto commit 5' '
 	reset_rebase &&
 	git checkout -b rebase-keep-empty-5 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --keep-empty --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD~)" = "files_subtree/master5"
@@ -86,7 +92,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --keep-empty --preserve-merges --onto empty commit' '
 	reset_rebase &&
 	git checkout -b rebase-keep-empty-empty master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --keep-empty --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD)" = "Empty commit"
@@ -96,7 +102,7 @@ test_expect_failure REBASE_P \
 test_expect_failure 'Rebase -Xsubtree --onto commit 4' '
 	reset_rebase &&
 	git checkout -b rebase-onto-4 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --onto files-master master &&
 	verbose test "$(commit_message HEAD~2)" = "files_subtree/master4"
@@ -106,7 +112,7 @@ test_expect_failure 'Rebase -Xsubtree --onto commit 4' '
 test_expect_failure 'Rebase -Xsubtree --onto commit 5' '
 	reset_rebase &&
 	git checkout -b rebase-onto-5 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --onto files-master master &&
 	verbose test "$(commit_message HEAD~)" = "files_subtree/master5"
@@ -115,7 +121,7 @@ test_expect_failure 'Rebase -Xsubtree --onto commit 5' '
 test_expect_failure 'Rebase -Xsubtree --onto empty commit' '
 	reset_rebase &&
 	git checkout -b rebase-onto-empty master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --onto files-master master &&
 	verbose test "$(commit_message HEAD)" = "Empty commit"
-- 
2.23.0.3.g59c7446927.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 3/4] Recommend git-filter-repo instead of git-filter-branch
  2019-08-29  0:06           ` [PATCH v3 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
  2019-08-29  0:06             ` [PATCH v3 1/4] t6006: simplify and optimize empty message test Elijah Newren
  2019-08-29  0:06             ` [PATCH v3 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
@ 2019-08-29  0:06             ` Elijah Newren
  2019-08-29 18:10               ` Eric Sunshine
  2019-08-29  0:06             ` [PATCH v3 4/4] t9902: use a non-deprecated command for testing Elijah Newren
  2019-08-30  5:57             ` [PATCH v4 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
  4 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-08-29  0:06 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine, Elijah Newren

filter-branch suffers from a deluge of disguised dangers that disfigure
history rewrites (i.e. deviate from the deliberate changes).  Many of
these problems are unobtrusive and can easily go undiscovered until the
new repository is in use.  This can result in problems ranging from an
even messier history than what led folks to filter-branch in the first
place, to data loss or corruption.  These issues cannot be backward
compatibly fixed, so add a warning to both filter-branch and its manpage
recommending that another tool (such as filter-repo) be used instead.

Also, update other manpages that referenced filter-branch.  Several of
these needed updates even if we could continue recommending
filter-branch, either due to implying that something was unique to
filter-branch when it applied more generally to all history rewriting
tools (e.g. BFG, reposurgeon, fast-import, filter-repo), or because
something about filter-branch was used as an example despite other more
commonly known examples now existing.  Reword these sections to fix
these issues and to avoid recommending filter-branch.

Finally, remove the section explaining BFG Repo Cleaner as an
alternative to filter-branch.  I feel somewhat bad about this,
especially since I feel like I learned so much from BFG that I put to
good use in filter-repo (which is much more than I can say for
filter-branch), but keeping that section presented a few problems:
  * In order to recommend that people quit using filter-branch, we need
    to provide them a recomendation for something else to use that
    can handle all the same types of rewrites.  To my knowledge,
    filter-repo is the only such tool.  So it needs to be mentioned.
  * I don't want to give conflicting recommendations to users
  * If we recommend two tools, we shouldn't expect users to learn both
    and pick which one to use; we should explain which problems one
    can solve that the other can't or when one is much faster than
    the other.
  * BFG and filter-repo have similar performance
  * All filtering types that BFG can do, filter-repo can also do.  In
    fact, filter-repo comes with a reimplementation of BFG named
    bfg-ish which provides the same user-interface as BFG but with
    several bugfixes and new features that are hard to implement in
    BFG due to its technical underpinnings.
While I could still mention both tools, it seems like I would need to
provide some kind of comparison and I would ultimately just say that
filter-repo can do everything BFG can, so ultimately it seems that it
is just better to remove that section altogether.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-fast-export.txt   |   6 +-
 Documentation/git-filter-branch.txt | 302 +++++++++++++++++++++++++---
 Documentation/git-gc.txt            |  17 +-
 Documentation/git-rebase.txt        |   3 +-
 Documentation/git-replace.txt       |  10 +-
 Documentation/git-svn.txt           |  10 +-
 Documentation/githooks.txt          |  10 +-
 contrib/svn-fe/svn-fe.txt           |   4 +-
 git-filter-branch.sh                |  13 ++
 9 files changed, 316 insertions(+), 59 deletions(-)

diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
index cc940eb9ad..784e934009 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -17,9 +17,9 @@ This program dumps the given revisions in a form suitable to be piped
 into 'git fast-import'.
 
 You can use it as a human-readable bundle replacement (see
-linkgit:git-bundle[1]), or as a kind of an interactive
-'git filter-branch'.
-
+linkgit:git-bundle[1]), or as a format that can be edited before being
+fed to 'git fast-import' in order to do history rewrites (an ability
+relied on by tools like 'git filter-repo').
 
 OPTIONS
 -------
diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
index 6b53dd7e06..c3f874b692 100644
--- a/Documentation/git-filter-branch.txt
+++ b/Documentation/git-filter-branch.txt
@@ -16,6 +16,22 @@ SYNOPSIS
 	[--original <namespace>] [-d <directory>] [-f | --force]
 	[--state-branch <branch>] [--] [<rev-list options>...]
 
+WARNING
+-------
+'git filter-branch' has a plethora of pitfalls that can produce non-obvious
+manglings of the intended history rewrite (and can leave you with little
+time to investigate such problems since it has such abysmal performance).
+These safety and performance issues cannot be backward compatibly fixed and
+as such, its use is not recommended.  Please use an alternative history
+filtering tool such as https://github.com/newren/git-filter-repo/[git
+filter-repo].  If you still need to use 'git filter-branch', please
+carefully read <<SAFETY>> (and <<PERFORMANCE>>) to learn about the land
+mines of filter-branch, and then vigilantly avoid as many of the hazards
+listed there as reasonably possible.
+
+https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@mail.gmail.com/[detailing
+the land mines of filter-branch]
+
 DESCRIPTION
 -----------
 Lets you rewrite Git revision history by rewriting the branches mentioned
@@ -445,36 +461,262 @@ warned.
   (or if your git-gc is not new enough to support arguments to
   `--prune`, use `git repack -ad; git prune` instead).
 
-NOTES
------
-
-git-filter-branch allows you to make complex shell-scripted rewrites
-of your Git history, but you probably don't need this flexibility if
-you're simply _removing unwanted data_ like large files or passwords.
-For those operations you may want to consider
-http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner],
-a JVM-based alternative to git-filter-branch, typically at least
-10-50x faster for those use-cases, and with quite different
-characteristics:
-
-* Any particular version of a file is cleaned exactly _once_. The BFG,
-  unlike git-filter-branch, does not give you the opportunity to
-  handle a file differently based on where or when it was committed
-  within your history. This constraint gives the core performance
-  benefit of The BFG, and is well-suited to the task of cleansing bad
-  data - you don't care _where_ the bad data is, you just want it
-  _gone_.
-
-* By default The BFG takes full advantage of multi-core machines,
-  cleansing commit file-trees in parallel. git-filter-branch cleans
-  commits sequentially (i.e. in a single-threaded manner), though it
-  _is_ possible to write filters that include their own parallelism,
-  in the scripts executed against each commit.
-
-* The http://rtyley.github.io/bfg-repo-cleaner/#examples[command options]
-  are much more restrictive than git-filter branch, and dedicated just
-  to the tasks of removing unwanted data- e.g:
-  `--strip-blobs-bigger-than 1M`.
+[[PERFORMANCE]]
+PERFORMANCE
+-----------
+
+The performance of filter-branch is glacially slow; its design makes it
+impossible for a backward-compatible implementation to ever be fast:
+
+* In editing files, git-filter-branch by design checks out each and
+every commit as it existed in the original repo.  If your repo has 10\^5
+files and 10\^5 commits, but each commit only modifies 5 files, then
+git-filter-branch will make you do 10\^10 modifications, despite only
+having (at most) 5*10^5 unique blobs.
+
+* If you try and cheat and try to make filter-branch only work on
+files modified in a commit, then two things happen
+
+  . you run into problems with deletions whenever the user is simply
+    trying to rename files (because attempting to delete files that
+    don't exist looks like a no-op; it takes some chicanery to remap
+    deletes across file renames when the renames happen via arbitrary
+    user-provided shell)
+
+  . even if you succeed at the map-deletes-for-renames chicanery, you
+    still technically violate backward compatibility because users are
+    allowed to filter files in ways that depend upon topology of commits
+    instead of filtering solely based on file contents or names (though
+    I have never seen any user ever do this).
+
+* Even if you don't need to edit files but only want to e.g. rename or
+remove some and thus can avoid checking out each file (i.e. you can use
+--index-filter), you still are passing shell snippets for your filters.
+This means that for every commit, you have to have a prepared git repo
+where users can run git commands.  That's a lot of setup.  It also means
+you have to fork at least one process to run the user-provided shell
+snippet, and odds are that the user's shell snippet invokes lots of
+commands in some long pipeline, so you will have lots and lots of forks.
+For every. single. commit.  That's a massive amount of overhead to
+rename a few files.
+
+* filter-branch is written in shell, which is kind of slow.  Naturally,
+it makes sense to want to rewrite that in some other language.  However,
+filter-branch documentation states that several additional shell
+functions are provided for users to call, e.g. 'map', 'skip_commit',
+'git_commit_non_empty_tree'.  If filter-branch itself isn't a shell
+script, then in order to make those shell functions available to the
+users' shell snippets you have to prepend the shell definitions of these
+functions to every one of the users' shell snippets and thus make these
+special shell functions be parsed with each and every commit.
+
+* filter-branch provides a --setup option which is a shell snippet that
+can be sourced to make shell functions and variables available to all
+other filters.  If filter-branch is a shell script, it can simply eval
+this shell snippet once at the beginning.  If you try to fix performance
+by making filter-branch not be a shell script, then you have to prepend
+the setup shell snippet to all other filters and parse it with every
+single commit.
+
+* filter-branch writes lots of files to $workdir/../map/ to keep a
+mapping of commits, which it uses pruning commits and remapping to
+ancestors and the map() command more generally.  Other files like
+$tempdir/backup-refs, $tempdir/raw-refs, $tempdir/heads,
+$tempdir/tree-state are all created internally too.  It is possible
+(though strongly discouraged) that users could have accessed any of
+these directly.  Users even had a pointer to follow in the form of
+Documentation that the 'map' command existed, which naturally uses the
+$workdir/../map/* files.  So, even if you don't have to edit files, for
+strict backward compatibility you need to still write a bunch of files
+to disk somewhere and keep them updated for every commit.  You can claim
+it was an implementation detail that users should not have depended
+upon, but the truth is they've had a decade where they could so.  So, if
+you want full compatibility, it has to be there.  Besides, the
+regression tests depend on at least one of these details, specifying an
+--index-filter that reaches down and grabs backup-refs from $tempdir,
+and thus provides resourceful users who do google searches an example
+that there are files there for them to read and grab and use.  (And if
+you want to pass the existing regression tests, you have to at least put
+the backup-refs file there even if it's irrelevant to your
+implementation otherwise.)
+
+All of that said, performance of filter-branch could be improved by
+reimplementing it in a non-shell language and taking a couple small
+liberties with backward compatibility (such as having it only run
+filters on files changed within each commit).  filter-repo provides a
+demo script named
+https://github.com/newren/git-filter-repo/blob/master/contrib/filter-repo-demos/filter-lamely[filter-lamely]
+which does exactly that and which passes all the git-filter-branch
+regression tests.  It's much faster than git-filter-branch, though it
+suffers from all the same safety issues as git-filter-branch, and is
+still glacially slow compared to
+https://github.com/newren/git-filter-repo/[git filter-repo].
+
+[[SAFETY]]
+SAFETY
+------
+
+filter-branch is riddled with gotchas resulting in various ways to
+easily corrupt repos or end up with a mess worse than what you started
+with:
+
+* Someone can have a set of "working and tested filters" which they
+document or provide to a coworker, who then runs them on a different OS
+where the same commands are not working/tested (some examples in the
+git-filter-branch manpage are also affected by this).  BSD vs. GNU
+userland differences can really bite.  If you're lucky, you get ugly
+error messages spewed.  But just as likely, the commands either don't do
+the filtering requested, or silently corrupt making some unwanted
+change.  The unwanted change may only affect a few commits, so it's not
+necessarily obvious either.  (The fact that problems won't necessarily
+be obvious means they are likely to go unnoticed until the rewritten
+history is in use for quite a while, at which point it's really hard to
+justify another flag-day for another rewrite.)
+
+* Filenames with spaces (which are rare) are often mishandled by shell
+snippets since they cause problems for shell pipelines.  Not everyone is
+familiar with find -print0, xargs -0, ls-files -z, etc.  Even people who
+are familiar with these may assume such needs are not relevant because
+someone else renamed any such files in their repo back before the person
+doing the filtering joined the project.  And, often, even those familiar
+with handling arguments with spaces my not do so just because they
+aren't in the mindset of thinking about everything that could possibly
+go wrong.
+
+* Non-ascii filenames (which are rare) can be silently removed despite
+being in a desired directory.  The desire to select paths to keep often
+use pipelines like `git ls-files | grep -v ^WANTED_DIR/ | xargs git rm`.
+ls-files will only quote filenames if needed so folks may not notice
+that one of the files didn't match the regex, again until it's much too
+late.  Yes, someone who knows about core.quotePath can avoid this
+(unless they have other special characters like \t, \n, or "), and
+people who use ls-files -z with something other than grep can avoid
+this, but that doesn't mean they will.
+
+* Similarly, when moving files around, one can find that filenames with
+non-ascii or special characters end up in a different directory, one
+that includes a double quote character.  (This is technically the same
+issue as above with quoting, but perhaps an interesting different way
+that it can and has manifested as a problem.)
+
+* It's far too easy to accidentally mix up old and new history.  It's
+still possible with any tool, but filter-branch almost invites it.  If
+we're lucky, the only downside is users getting frustrated that they
+don't know how to shrink their repo and remove the old stuff.  If we're
+unlucky, they merge old and new history and end up with multiple
+"copies" of each commit, some of which have unwanted or sensitive files
+and others which don't.  This comes about in multiple different ways:
+
+  ** the default to only doing a partial history rewrite ('--all' is not
+     the default and over 80% of the examples in the manpage don't use
+     it)
+
+  ** the fact that there's no automatic post-run cleanup
+
+  ** the fact that --tag-name-filter (when used to rename tags) doesn't
+     remove the old tags but just adds new ones with the new name (this
+     manpage has documented this for a long time so it's presumably not
+     a "bug" even though it feels like it)
+
+  ** the fact that little educational information is provided to inform
+     users of the ramifications of a rewrite and how to avoid mixing old
+     and new history.  For example, this man page discusses how users
+     need to understand that they need to rebase their changes for all
+     their branches on top of new history (or delete and reclone), but
+     that's only one of multiple concerns to consider.  See the
+     "DISCUSSION" section of the git filter-repo manual page for more
+     details.
+
+* Annotated tags can be accidentally converted to lightweight tags, due
+to either of two issues:
+
+  . Someone can do a history rewrite, realize they messed up, restore
+    from the backups in refs/original/, and then redo their
+    filter-branch command.  (The backup in refs/original/ is not a real
+    backup; it dereferences tags first.)
+
+  . Running filter-branch with either --tags or --all in your <rev-list
+    options>.  In order to retain annotated tags as annotated, you must
+    use --tag-name-filter (and must not have restored from
+    refs/original/ in a previously botched rewrite).
+
+* Any commit messages that specify an encoding will become corrupted
+by the rewrite; filter-branch ignores the encoding, takes the original
+bytes, and feeds it to commit-tree without telling it the proper
+encoding.  (This happens whether or not --msg-filter is used, though I
+suspect --msg-filter provides additional ways to really mess things
+up).
+
+* Commit messages (even if they are all UTF-8) by default become
+corrupted due to not being updated -- any references to other commit
+hashes in commit messages will now refer to no-longer-extant commits.
+
+* There are no facilities for helping users find what unwanted crud they
+should delete, which means they are much more likely to have incomplete
+or partial cleanups that sometimes result in confusion and people
+wasting time trying to understand.  (For example, folks tend to just
+look for big files to delete instead of big directories or extensions,
+and once they do so, then sometime later folks using the new repository
+who are going through history will notice a build artifact directory
+that has some files but not others, or a cache of dependencies
+(node_modules or similar) which couldn't have ever been functional since
+it's missing some files.)
+
+* If --prune-empty isn't specified, then the filtering process can
+create hoards of confusing empty commits
+
+* If --prune-empty is specified, then intentionally placed empty
+commits from before the filtering operation are also pruned instead of
+just pruning commits that became empty due to filtering rules.
+
+* If --prune empty is specified, sometimes empty commits are missed
+and left around anyway (a somewhat rare bug, but it happens...)
+
+* A minor issue, but users who have a goal to update all names and
+emails in a repository may be led to --env-filter which will only update
+authors and committers, missing taggers.
+
+* If the user provides a --tag-name-filter that maps multiple tags to
+the same name, no warning or error is provided; filter-branch simply
+overwrites each tag in some undocumented pre-defined order resulting in
+only one tag at the end.  If you try to "fix" this bug in filter-branch
+and make it error out and warn the user instead, one of the
+filter-branch regression tests will fail.  (So, if you are trying to
+make a backward compatible reimplementation you have to add extra code
+to detect collisions and make sure that only the lexicographically last
+one is rewritten to avoid fast-import from seeing both since fast-import
+will naturally do the sane thing and error out if told to write the same
+tag more than once.)
+
+Also, the poor performance of filter-branch often leads to safety issues:
+
+* Coming up with the correct shell snippet to do the filtering you want
+is sometimes difficult unless you're just doing a trivial modification
+such as deleting a couple files.  People have often come to me for help,
+so I should be practiced and an expert, but even for fairly simple cases
+I still sometimes take over 10 minutes and several iterations to get
+the right commands -- and that's assuming they are working on a tiny
+repository.  Unfortunately, people often learn if the snippet is right
+or wrong by trying it out, but the rightness or wrongness can vary
+depending on special circumstances (spaces in filenames, non-ascii
+filenames, funny author names or emails, invalid timezones, presence of
+grafts or replace objects, etc.), meaning they may have to wait a long
+time, hit an error, then restart.  The performance of filter-branch is
+so bad that this cycle is painful, reducing the time available to
+carefully re-check (to say nothing about what it does to the patience of
+the person doing the rewrite even if they do technically have more time
+available).  This problem is extra compounded because errors from broken
+filters may not be shown for a long time and/or get lost in a sea of
+output.  Even worse, broken filters often just result in silent
+incorrect rewrites.
+
+* To top it all off, even when users finally find working commands, they
+naturally want to share them.  But they may be unaware that their repo
+didn't have some special cases that someone else's does.  So, when
+someone else with a different repository runs the same commands, they
+get hit by the problems above.  Or, the user just runs commands that
+really were vetted for special cases, but they run it on a different OS
+where it doesn't work, as noted above.
 
 GIT
 ---
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 247f765604..0c114ad1ca 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -115,15 +115,14 @@ NOTES
 -----
 
 'git gc' tries very hard not to delete objects that are referenced
-anywhere in your repository. In
-particular, it will keep not only objects referenced by your current set
-of branches and tags, but also objects referenced by the index,
-remote-tracking branches, refs saved by 'git filter-branch' in
-refs/original/, reflogs (which may reference commits in branches
-that were later amended or rewound), and anything else in the refs/* namespace.
-If you are expecting some objects to be deleted and they aren't, check
-all of those locations and decide whether it makes sense in your case to
-remove those references.
+anywhere in your repository. In particular, it will keep not only
+objects referenced by your current set of branches and tags, but also
+objects referenced by the index, remote-tracking branches, notes saved
+by 'git notes' under refs/notes/, reflogs (which may reference commits
+in branches that were later amended or rewound), and anything else in
+the refs/* namespace.  If you are expecting some objects to be deleted
+and they aren't, check all of those locations and decide whether it
+makes sense in your case to remove those references.
 
 On the other hand, when 'git gc' runs concurrently with another process,
 there is a risk of it deleting an object that the other process is using
diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 6156609cf7..a8cfc0ad82 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -832,7 +832,8 @@ Hard case: The changes are not the same.::
 	This happens if the 'subsystem' rebase had conflicts, or used
 	`--interactive` to omit, edit, squash, or fixup commits; or
 	if the upstream used one of `commit --amend`, `reset`, or
-	`filter-branch`.
+	a full history rewriting command like
+	https://github.com/newren/git-filter-repo[`filter-repo`].
 
 
 The easy case
diff --git a/Documentation/git-replace.txt b/Documentation/git-replace.txt
index 246dc9943c..f271d758c3 100644
--- a/Documentation/git-replace.txt
+++ b/Documentation/git-replace.txt
@@ -123,10 +123,10 @@ The following format are available:
 CREATING REPLACEMENT OBJECTS
 ----------------------------
 
-linkgit:git-filter-branch[1], linkgit:git-hash-object[1] and
-linkgit:git-rebase[1], among other git commands, can be used to create
-replacement objects from existing objects. The `--edit` option can
-also be used with 'git replace' to create a replacement object by
+linkgit:git-hash-object[1], linkgit:git-rebase[1], and
+https://github.com/newren/git-filter-repo[git-filter-repo], among other git commands, can be used to
+create replacement objects from existing objects. The `--edit` option
+can also be used with 'git replace' to create a replacement object by
 editing an existing object.
 
 If you want to replace many blobs, trees or commits that are part of a
@@ -148,13 +148,13 @@ pending objects.
 SEE ALSO
 --------
 linkgit:git-hash-object[1]
-linkgit:git-filter-branch[1]
 linkgit:git-rebase[1]
 linkgit:git-tag[1]
 linkgit:git-branch[1]
 linkgit:git-commit[1]
 linkgit:git-var[1]
 linkgit:git[1]
+https://github.com/newren/git-filter-repo[git-filter-repo]
 
 GIT
 ---
diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
index 30711625fd..53774f5b64 100644
--- a/Documentation/git-svn.txt
+++ b/Documentation/git-svn.txt
@@ -769,11 +769,11 @@ option for (hopefully) obvious reasons.
 +
 This option is NOT recommended as it makes it difficult to track down
 old references to SVN revision numbers in existing documentation, bug
-reports and archives.  If you plan to eventually migrate from SVN to Git
-and are certain about dropping SVN history, consider
-linkgit:git-filter-branch[1] instead.  filter-branch also allows
-reformatting of metadata for ease-of-reading and rewriting authorship
-info for non-"svn.authorsFile" users.
+reports, and archives.  If you plan to eventually migrate from SVN to
+Git and are certain about dropping SVN history, consider
+https://github.com/newren/git-filter-repo[git-filter-repo] instead.
+filter-repo also allows reformatting of metadata for ease-of-reading
+and rewriting authorship info for non-"svn.authorsFile" users.
 
 svn.useSvmProps::
 svn-remote.<name>.useSvmProps::
diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 82cd573776..5a789c91df 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -425,10 +425,12 @@ post-rewrite
 
 This hook is invoked by commands that rewrite commits
 (linkgit:git-commit[1] when called with `--amend` and
-linkgit:git-rebase[1]; currently `git filter-branch` does 'not' call
-it!).  Its first argument denotes the command it was invoked by:
-currently one of `amend` or `rebase`.  Further command-dependent
-arguments may be passed in the future.
+linkgit:git-rebase[1]; however, full-history (re)writing tools like
+linkgit:git-fast-import[1] or
+https://github.com/newren/git-filter-repo[git-filter-repo] typically
+do not call it!).  Its first argument denotes the command it was
+invoked by: currently one of `amend` or `rebase`.  Further
+command-dependent arguments may be passed in the future.
 
 The hook receives a list of the rewritten commits on stdin, in the
 format
diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
index a3425f4770..19333fc8df 100644
--- a/contrib/svn-fe/svn-fe.txt
+++ b/contrib/svn-fe/svn-fe.txt
@@ -56,7 +56,7 @@ line.  This line has the form `git-svn-id: URL@REVNO UUID`.
 
 The resulting repository will generally require further processing
 to put each project in its own repository and to separate the history
-of each branch.  The 'git filter-branch --subdirectory-filter' command
+of each branch.  The 'git filter-repo --subdirectory-filter' command
 may be useful for this purpose.
 
 BUGS
@@ -67,5 +67,5 @@ The exit status does not reflect whether an error was detected.
 
 SEE ALSO
 --------
-git-svn(1), svn2git(1), svk(1), git-filter-branch(1), git-fast-import(1),
+git-svn(1), svn2git(1), svk(1), git-filter-repo(1), git-fast-import(1),
 https://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 5c5afa2b98..f805965d87 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -83,6 +83,19 @@ set_ident () {
 	finish_ident COMMITTER
 }
 
+if [ -z "$FILTER_BRANCH_SQUELCH_WARNING" -a \
+     -z "$GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS" ]; then
+	cat <<EOF
+WARNING: git-filter-branch has a glut of gotchas generating mangled history
+         rewrites.  Please use an alternative filtering tool such as 'git
+         filter-repo' (https://github.com/newren/git-filter-repo/) instead.
+         See the filter-branch manual page for more details; to squelch
+         this warning, set FILTER_BRANCH_SQUELCH_WARNING=1.
+
+EOF
+	sleep 5
+fi
+
 USAGE="[--setup <command>] [--subdirectory-filter <directory>] [--env-filter <command>]
 	[--tree-filter <command>] [--index-filter <command>]
 	[--parent-filter <command>] [--msg-filter <command>]
-- 
2.23.0.3.g59c7446927.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 4/4] t9902: use a non-deprecated command for testing
  2019-08-29  0:06           ` [PATCH v3 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
                               ` (2 preceding siblings ...)
  2019-08-29  0:06             ` [PATCH v3 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
@ 2019-08-29  0:06             ` Elijah Newren
  2019-08-30  5:57             ` [PATCH v4 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
  4 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-29  0:06 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine, Elijah Newren

t9902 had a list of three random porcelain commands as a sanity check,
one of which was filter-branch.  Since we are recommending people not
use filter-branch, let's update this test to use rebase instead of
filter-branch.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t9902-completion.sh | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index 75512c3403..4e7f669c76 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -28,10 +28,10 @@ complete ()
 #
 # (2) A test makes sure that common subcommands are included in the
 #     completion for "git <TAB>", and a plumbing is excluded.  "add",
-#     "filter-branch" and "ls-files" are listed for this.
+#     "rebase" and "ls-files" are listed for this.
 
-GIT_TESTING_ALL_COMMAND_LIST='add checkout check-attr filter-branch ls-files'
-GIT_TESTING_PORCELAIN_COMMAND_LIST='add checkout filter-branch'
+GIT_TESTING_ALL_COMMAND_LIST='add checkout check-attr rebase ls-files'
+GIT_TESTING_PORCELAIN_COMMAND_LIST='add checkout rebase'
 
 . "$GIT_BUILD_DIR/contrib/completion/git-completion.bash"
 
@@ -1392,12 +1392,12 @@ test_expect_success 'basic' '
 	# built-in
 	grep -q "^add \$" out &&
 	# script
-	grep -q "^filter-branch \$" out &&
+	grep -q "^rebase \$" out &&
 	# plumbing
 	! grep -q "^ls-files \$" out &&
 
-	run_completion "git f" &&
-	! grep -q -v "^f" out
+	run_completion "git r" &&
+	! grep -q -v "^r" out
 '
 
 test_expect_success 'double dash "git" itself' '
-- 
2.23.0.3.g59c7446927.dirty


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 3/4] Recommend git-filter-repo instead of git-filter-branch
  2019-08-29  0:06             ` [PATCH v3 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
@ 2019-08-29 18:10               ` Eric Sunshine
  2019-08-30  0:04                 ` Elijah Newren
  0 siblings, 1 reply; 73+ messages in thread
From: Eric Sunshine @ 2019-08-29 18:10 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Git List, Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder

On Wed, Aug 28, 2019 at 8:07 PM Elijah Newren <newren@gmail.com> wrote:
> filter-branch suffers from a deluge of disguised dangers that disfigure
> history rewrites (i.e. deviate from the deliberate changes). [...]
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
> diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
> @@ -16,6 +16,22 @@ SYNOPSIS
> +WARNING
> +-------
> +'git filter-branch' has a plethora of pitfalls that can produce non-obvious
> +manglings of the intended history rewrite (and can leave you with little
> +time to investigate such problems since it has such abysmal performance).
> +These safety and performance issues cannot be backward compatibly fixed and
> +as such, its use is not recommended.  Please use an alternative history
> +filtering tool such as https://github.com/newren/git-filter-repo/[git
> +filter-repo].  If you still need to use 'git filter-branch', please
> +carefully read <<SAFETY>> (and <<PERFORMANCE>>) to learn about the land
> +mines of filter-branch, and then vigilantly avoid as many of the hazards
> +listed there as reasonably possible.
> +
> +https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@mail.gmail.com/[detailing
> +the land mines of filter-branch]

This stray link looks like leftover gunk from the previous revision.

> +PERFORMANCE
> +-----------
> +
> +The performance of filter-branch is glacially slow; its design makes it

The rest of this document spells it git-filter-branch or 'git
filter-branch', not plain filter-branch.

> +* In editing files, git-filter-branch by design checks out each and
> +every commit as it existed in the original repo.  If your repo has 10\^5
> +files and 10\^5 commits, but each commit only modifies 5 files, then
> +git-filter-branch will make you do 10\^10 modifications, despite only
> +having (at most) 5*10^5 unique blobs.
> +
> +* If you try and cheat and try to make filter-branch only work on
> +files modified in a commit, then two things happen

s/filter-branch/git-&/

> +
> +  . you run into problems with deletions whenever the user is simply
> +    trying to rename files (because attempting to delete files that
> +    don't exist looks like a no-op; it takes some chicanery to remap
> +    deletes across file renames when the renames happen via arbitrary
> +    user-provided shell)
> +
> +  . even if you succeed at the map-deletes-for-renames chicanery, you
> +    still technically violate backward compatibility because users are
> +    allowed to filter files in ways that depend upon topology of commits
> +    instead of filtering solely based on file contents or names (though
> +    I have never seen any user ever do this).

Maybe avoid first-person:

    ... contents or names (though this has not been observed in
    the wild).

> +* filter-branch is written in shell, which is kind of slow.  Naturally,
> +it makes sense to want to rewrite that in some other language.  However,
> +filter-branch documentation states that several additional shell
> +functions are provided for users to call, e.g. 'map', 'skip_commit',
> +'git_commit_non_empty_tree'.  If filter-branch itself isn't a shell
> +script, then in order to make those shell functions available to the
> +users' shell snippets you have to prepend the shell definitions of these
> +functions to every one of the users' shell snippets and thus make these
> +special shell functions be parsed with each and every commit.
> +
> +* filter-branch provides a --setup option which is a shell snippet that
> +can be sourced to make shell functions and variables available to all
> +other filters.  If filter-branch is a shell script, it can simply eval
> +this shell snippet once at the beginning.  If you try to fix performance
> +by making filter-branch not be a shell script, then you have to prepend
> +the setup shell snippet to all other filters and parse it with every
> +single commit.

Even though they made sense in the context of the original email
message, these two bullet points may not belong in the man page since
someone reading the man page is doing so to learn about
git-filter-branch usage, not because he or she is thinking about
re-implementing it. It might make sense, however, to collapse these
points to some general statement about shell being slow and process
startup being costly.

Also, these bullet points and others below need a  s/filter-branch/git-&/.

> +* filter-branch writes lots of files to $workdir/../map/ to keep a

Should that path have three dots "..." instead of two ".."?

> +mapping of commits, which it uses pruning commits and remapping to
> +ancestors and the map() command more generally.  Other files like
> +$tempdir/backup-refs, $tempdir/raw-refs, $tempdir/heads,
> +$tempdir/tree-state are all created internally too.  It is possible
> +(though strongly discouraged) that users could have accessed any of
> +these directly.  Users even had a pointer to follow in the form of
> +Documentation that the 'map' command existed, which naturally uses the
> +$workdir/../map/* files.  So, even if you don't have to edit files, for
> +strict backward compatibility you need to still write a bunch of files
> +to disk somewhere and keep them updated for every commit.  You can claim
> +it was an implementation detail that users should not have depended
> +upon, but the truth is they've had a decade where they could so.  So, if
> +you want full compatibility, it has to be there.  Besides, the
> +regression tests depend on at least one of these details, specifying an
> +--index-filter that reaches down and grabs backup-refs from $tempdir,
> +and thus provides resourceful users who do google searches an example
> +that there are files there for them to read and grab and use.  (And if
> +you want to pass the existing regression tests, you have to at least put
> +the backup-refs file there even if it's irrelevant to your
> +implementation otherwise.)

As with the earlier comment, this bullet point is aimed at someone
thinking about re-implementing the command; it sounds out of place in
the "Performance" section of the man page. However, it does make sense
to mention all the files git-filter-branch creates since that can have
an impact on performance. So, perhaps this section can be collapsed so
it just talks about that.

> +All of that said, performance of filter-branch could be improved by
> +reimplementing it in a non-shell language and taking a couple small
> +liberties with backward compatibility (such as having it only run
> +filters on files changed within each commit).  filter-repo provides a
> +demo script named
> +https://github.com/newren/git-filter-repo/blob/master/contrib/filter-repo-demos/filter-lamely[filter-lamely]
> +which does exactly that and which passes all the git-filter-branch
> +regression tests.  It's much faster than git-filter-branch, though it
> +suffers from all the same safety issues as git-filter-branch, and is
> +still glacially slow compared to
> +https://github.com/newren/git-filter-repo/[git filter-repo].

This paragraph could be collapsed to say merely that, for those with
existing tooling relying upon git-filter-branch, filter-repo's
"filter-lamely" provides a drop-in replacement with somewhat improved
performance and a few caveats.

Taking the above comments into consideration, here is a possible
rewrite of the final three bullet points and the closing paragraph:

    * filter-branch is written in shell, which is kind of slow, and it
      potentially can run many other commands which can slow down its
      operation significantly, especially on platforms for which
      process startup is costly.

    * filter-branch writes lots of files to $workdir/.../map/ to keep
      a mapping of commits, which it uses for pruning commits and
      remapping to ancestors and for the map() command more generally.
      Other files like $tempdir/backup-refs, $tempdir/raw-refs,
      $tempdir/heads, $tempdir/tree-state are created internally too.
      Such file creation can be costly in general, but especially on
      platforms with slow filesystems.

    The tool https://github.com/newren/git-filter-repo/[git
    filter-repo] is an alternative to git-filter-branch which does not
    suffer from these performance problems or the safety problems
    (mentioned below). For those with existing tooling which relies
    upon git-filter-branch, 'git repo-filter' also provides
    https://github.com/newren/git-filter-repo/blob/master/contrib/filter-repo-demos/filter-lamely[filter-lamely],
    a drop-in git-filter-branch replacement (with a few caveats).

> +SAFETY
> +------
> +
> +* Non-ascii filenames (which are rare) can be silently removed despite

Perhaps drop "(which are rare)" to make this sound more formal and
less like an email message.

Comment below also are intended to make the prose sound a bit more formal.

> +being in a desired directory.  The desire to select paths to keep often
> +use pipelines like `git ls-files | grep -v ^WANTED_DIR/ | xargs git rm`.
> +ls-files will only quote filenames if needed so folks may not notice

s/ls-files/git-&/

> +that one of the files didn't match the regex, again until it's much too
> +late.  Yes, someone who knows about core.quotePath can avoid this
> +(unless they have other special characters like \t, \n, or "), and
> +people who use ls-files -z with something other than grep can avoid
> +this, but that doesn't mean they will.
> +
> +* It's far too easy to accidentally mix up old and new history.  It's
> +still possible with any tool, but filter-branch almost invites it.  If
> +we're lucky, the only downside is users getting frustrated that they

s/we're//

> +don't know how to shrink their repo and remove the old stuff.  If we're

s/we're//

> +unlucky, they merge old and new history and end up with multiple
> +"copies" of each commit, some of which have unwanted or sensitive files
> +and others which don't.  This comes about in multiple different ways:
> +
> +  ** the default to only doing a partial history rewrite ('--all' is not
> +     the default and over 80% of the examples in the manpage don't use
> +     it)

Maybe just shorten this to:

   ('--all is not the default, and few examples show it)

> +  ** the fact that there's no automatic post-run cleanup
> +
> +  ** the fact that --tag-name-filter (when used to rename tags) doesn't
> +     remove the old tags but just adds new ones with the new name (this
> +     manpage has documented this for a long time so it's presumably not
> +     a "bug" even though it feels like it)

Perhaps drop the final parenthetical comment.

> +  ** the fact that little educational information is provided to inform
> +     users of the ramifications of a rewrite and how to avoid mixing old
> +     and new history.  For example, this man page discusses how users
> +     need to understand that they need to rebase their changes for all
> +     their branches on top of new history (or delete and reclone), but
> +     that's only one of multiple concerns to consider.  See the
> +     "DISCUSSION" section of the git filter-repo manual page for more
> +     details.
> +
> +* Annotated tags can be accidentally converted to lightweight tags, due
> +to either of two issues:
> +
> +  . Someone can do a history rewrite, realize they messed up, restore
> +    from the backups in refs/original/, and then redo their
> +    filter-branch command.  (The backup in refs/original/ is not a real
> +    backup; it dereferences tags first.)
> +
> +  . Running filter-branch with either --tags or --all in your <rev-list
> +    options>.  In order to retain annotated tags as annotated, you must
> +    use --tag-name-filter (and must not have restored from
> +    refs/original/ in a previously botched rewrite).

Should these bullet points use "**" rather than "."?

> +* Any commit messages that specify an encoding will become corrupted
> +by the rewrite; filter-branch ignores the encoding, takes the original
> +bytes, and feeds it to commit-tree without telling it the proper
> +encoding.  (This happens whether or not --msg-filter is used, though I
> +suspect --msg-filter provides additional ways to really mess things
> +up).

Perhaps shorten simply to:

    (This happens whether or not --msg-filter is used.)

> +* If the user provides a --tag-name-filter that maps multiple tags to
> +the same name, no warning or error is provided; filter-branch simply
> +overwrites each tag in some undocumented pre-defined order resulting in
> +only one tag at the end.  If you try to "fix" this bug in filter-branch
> +and make it error out and warn the user instead, one of the
> +filter-branch regression tests will fail.  (So, if you are trying to
> +make a backward compatible reimplementation you have to add extra code
> +to detect collisions and make sure that only the lexicographically last
> +one is rewritten to avoid fast-import from seeing both since fast-import
> +will naturally do the sane thing and error out if told to write the same
> +tag more than once.)

Maybe drop everything from "If you try to 'fix'..." to the end of paragraph.

> +Also, the poor performance of filter-branch often leads to safety issues:
> +
> +* Coming up with the correct shell snippet to do the filtering you want
> +is sometimes difficult unless you're just doing a trivial modification
> +such as deleting a couple files.  People have often come to me for help,
> +so I should be practiced and an expert, but even for fairly simple cases
> +I still sometimes take over 10 minutes and several iterations to get
> +the right commands -- and that's assuming they are working on a tiny
> +repository.  Unfortunately, people often learn if the snippet is right
> +or wrong by trying it out, but the rightness or wrongness can vary
> +depending on special circumstances (spaces in filenames, non-ascii
> +filenames, funny author names or emails, invalid timezones, presence of
> +grafts or replace objects, etc.), meaning they may have to wait a long
> +time, hit an error, then restart.  The performance of filter-branch is
> +so bad that this cycle is painful, reducing the time available to
> +carefully re-check (to say nothing about what it does to the patience of
> +the person doing the rewrite even if they do technically have more time
> +available).  This problem is extra compounded because errors from broken
> +filters may not be shown for a long time and/or get lost in a sea of
> +output.  Even worse, broken filters often just result in silent
> +incorrect rewrites.

Drop the "People have often come to me..." sentence from this paragraph.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 3/4] Recommend git-filter-repo instead of git-filter-branch
  2019-08-29 18:10               ` Eric Sunshine
@ 2019-08-30  0:04                 ` Elijah Newren
  0 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-30  0:04 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder

Hi Eric,

Thanks for the careful and thoughtful review.

On Thu, Aug 29, 2019 at 11:11 AM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Wed, Aug 28, 2019 at 8:07 PM Elijah Newren <newren@gmail.com> wrote:
> > filter-branch suffers from a deluge of disguised dangers that disfigure
> > history rewrites (i.e. deviate from the deliberate changes). [...]
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> > diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
> > @@ -16,6 +16,22 @@ SYNOPSIS
> > +WARNING
> > +-------
> > +'git filter-branch' has a plethora of pitfalls that can produce non-obvious
> > +manglings of the intended history rewrite (and can leave you with little
> > +time to investigate such problems since it has such abysmal performance).
> > +These safety and performance issues cannot be backward compatibly fixed and
> > +as such, its use is not recommended.  Please use an alternative history
> > +filtering tool such as https://github.com/newren/git-filter-repo/[git
> > +filter-repo].  If you still need to use 'git filter-branch', please
> > +carefully read <<SAFETY>> (and <<PERFORMANCE>>) to learn about the land
> > +mines of filter-branch, and then vigilantly avoid as many of the hazards
> > +listed there as reasonably possible.
> > +
> > +https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@mail.gmail.com/[detailing
> > +the land mines of filter-branch]
>
> This stray link looks like leftover gunk from the previous revision.

Ugh, indeed.

>
> > +PERFORMANCE
> > +-----------
> > +
> > +The performance of filter-branch is glacially slow; its design makes it
>
> The rest of this document spells it git-filter-branch or 'git
> filter-branch', not plain filter-branch.
>
> > +* In editing files, git-filter-branch by design checks out each and
> > +every commit as it existed in the original repo.  If your repo has 10\^5
> > +files and 10\^5 commits, but each commit only modifies 5 files, then
> > +git-filter-branch will make you do 10\^10 modifications, despite only
> > +having (at most) 5*10^5 unique blobs.
> > +
> > +* If you try and cheat and try to make filter-branch only work on
> > +files modified in a commit, then two things happen
>
> s/filter-branch/git-&/

I can fix these up.

>
> > +
> > +  . you run into problems with deletions whenever the user is simply
> > +    trying to rename files (because attempting to delete files that
> > +    don't exist looks like a no-op; it takes some chicanery to remap
> > +    deletes across file renames when the renames happen via arbitrary
> > +    user-provided shell)
> > +
> > +  . even if you succeed at the map-deletes-for-renames chicanery, you
> > +    still technically violate backward compatibility because users are
> > +    allowed to filter files in ways that depend upon topology of commits
> > +    instead of filtering solely based on file contents or names (though
> > +    I have never seen any user ever do this).
>
> Maybe avoid first-person:
>
>     ... contents or names (though this has not been observed in
>     the wild).

Thanks for providing alternative wording.

> > +* filter-branch is written in shell, which is kind of slow.  Naturally,
> > +it makes sense to want to rewrite that in some other language.  However,
> > +filter-branch documentation states that several additional shell
> > +functions are provided for users to call, e.g. 'map', 'skip_commit',
> > +'git_commit_non_empty_tree'.  If filter-branch itself isn't a shell
> > +script, then in order to make those shell functions available to the
> > +users' shell snippets you have to prepend the shell definitions of these
> > +functions to every one of the users' shell snippets and thus make these
> > +special shell functions be parsed with each and every commit.
> > +
> > +* filter-branch provides a --setup option which is a shell snippet that
> > +can be sourced to make shell functions and variables available to all
> > +other filters.  If filter-branch is a shell script, it can simply eval
> > +this shell snippet once at the beginning.  If you try to fix performance
> > +by making filter-branch not be a shell script, then you have to prepend
> > +the setup shell snippet to all other filters and parse it with every
> > +single commit.
>
> Even though they made sense in the context of the original email
> message, these two bullet points may not belong in the man page since
> someone reading the man page is doing so to learn about
> git-filter-branch usage, not because he or she is thinking about
> re-implementing it. It might make sense, however, to collapse these
> points to some general statement about shell being slow and process
> startup being costly.

Hmm.  I see where you're coming from, but the performance section
isn't really user actionable stuff anyway; it's just a warning.  And I
have repeatedly seen over the years the question brought up on the
list of "Can we make filter-branch fast by making it a builtin?"  (Or
"Can't _you_ make filter-branch fast by rewriting it in C?")

I could try to reword it so that there's some general statement about
shell being slow and process startup being costly, and then add these
two items as sub-bullets to try to stave off that obvious but
misguided question from coming up.  Or maybe I just add a reference to
the original email?

> Also, these bullet points and others below need a  s/filter-branch/git-&/.

Thanks, will fix.

> > +* filter-branch writes lots of files to $workdir/../map/ to keep a
>
> Should that path have three dots "..." instead of two ".."?

No, it's a literal parent directory reference.  Users have access to
$workdir; it's where their commands run.  There is no name for the
parent of that directory, other than by appending '/..' to wherever
they are.  Maybe if I had spelled it as $(pwd)/../map/ it would be
better?

Or maybe I don't need to name the files at all; does it really matter
to the user?

> > +mapping of commits, which it uses pruning commits and remapping to
> > +ancestors and the map() command more generally.  Other files like
> > +$tempdir/backup-refs, $tempdir/raw-refs, $tempdir/heads,
> > +$tempdir/tree-state are all created internally too.  It is possible
> > +(though strongly discouraged) that users could have accessed any of
> > +these directly.  Users even had a pointer to follow in the form of
> > +Documentation that the 'map' command existed, which naturally uses the
> > +$workdir/../map/* files.  So, even if you don't have to edit files, for
> > +strict backward compatibility you need to still write a bunch of files
> > +to disk somewhere and keep them updated for every commit.  You can claim
> > +it was an implementation detail that users should not have depended
> > +upon, but the truth is they've had a decade where they could so.  So, if
> > +you want full compatibility, it has to be there.  Besides, the
> > +regression tests depend on at least one of these details, specifying an
> > +--index-filter that reaches down and grabs backup-refs from $tempdir,
> > +and thus provides resourceful users who do google searches an example
> > +that there are files there for them to read and grab and use.  (And if
> > +you want to pass the existing regression tests, you have to at least put
> > +the backup-refs file there even if it's irrelevant to your
> > +implementation otherwise.)
>
> As with the earlier comment, this bullet point is aimed at someone
> thinking about re-implementing the command; it sounds out of place in
> the "Performance" section of the man page. However, it does make sense
> to mention all the files git-filter-branch creates since that can have
> an impact on performance. So, perhaps this section can be collapsed so
> it just talks about that.

I think there's both a how-performance-affects-user component and a
component addressing the common incorrect question/statement/thought
that filter-branch performance could just be fixed by making it a
builtin.  But splitting this may make sense.  And maybe the portions
addressing making-it-a-builtin-wouldn't-fix-it could be a short
sentence with a link to the original email for more details.

> > +All of that said, performance of filter-branch could be improved by
> > +reimplementing it in a non-shell language and taking a couple small
> > +liberties with backward compatibility (such as having it only run
> > +filters on files changed within each commit).  filter-repo provides a
> > +demo script named
> > +https://github.com/newren/git-filter-repo/blob/master/contrib/filter-repo-demos/filter-lamely[filter-lamely]
> > +which does exactly that and which passes all the git-filter-branch
> > +regression tests.  It's much faster than git-filter-branch, though it
> > +suffers from all the same safety issues as git-filter-branch, and is
> > +still glacially slow compared to
> > +https://github.com/newren/git-filter-repo/[git filter-repo].
>
> This paragraph could be collapsed to say merely that, for those with
> existing tooling relying upon git-filter-branch, filter-repo's
> "filter-lamely" provides a drop-in replacement with somewhat improved
> performance and a few caveats.

Sounds good.

> Taking the above comments into consideration, here is a possible
> rewrite of the final three bullet points and the closing paragraph:

Oh, sweet, thanks for providing this.  I really like the simplicity of
your suggested wording in general; it will be really helpful in
rewording.  I do have a nitpick with each one, though...

>     * filter-branch is written in shell, which is kind of slow, and it
>       potentially can run many other commands which can slow down its
>       operation significantly, especially on platforms for which
>       process startup is costly.

Even if it's not the emphasis you intended, I'm worried this makes it
sound as if filter-branch performance is only bad on Windows or Mac.
Compared to invoking a function (even in a bytecode interpreted
language), creating and running another process is slow on any
platform.

>     * filter-branch writes lots of files to $workdir/.../map/ to keep
>       a mapping of commits, which it uses for pruning commits and
>       remapping to ancestors and for the map() command more generally.
>       Other files like $tempdir/backup-refs, $tempdir/raw-refs,
>       $tempdir/heads, $tempdir/tree-state are created internally too.
>       Such file creation can be costly in general, but especially on
>       platforms with slow filesystems.

Again, it may not have been your intended emphasis, but I think this
may be read as singling out slow filesystems, and make people think
the performance problems from this bullet point only affects some
OSes.  Filesystems are part of the problem.  Disks being slow is part
of the problem.  But it's not all of it.  I guess part of what really
gets me with these is that they represent forced synchronization (e.g.
the kernel has to flush the data upon close() to make sure any other
processes can see the file contents and all further filtering is
blocked waiting for this to finish).  By way of comparison, in
filter-repo I have to both write data to fast-import and read back
information from fast-import (in order to find out the new commit
names, for example).  When I did the straightforward thing of writing
a commit, writing a 'get-mark' directive, and then reading the answer,
it ruined performance.  So I had to be a bit smarter and defer reading
back the resulting sha1.  There's no room for anything similarly
clever in filter-branch; writing these files out is a synchronization
point that is needed before the user's filter can be eval'ed.

>     The tool https://github.com/newren/git-filter-repo/[git
>     filter-repo] is an alternative to git-filter-branch which does not
>     suffer from these performance problems or the safety problems
>     (mentioned below). For those with existing tooling which relies
>     upon git-filter-branch, 'git repo-filter' also provides
>     https://github.com/newren/git-filter-repo/blob/master/contrib/filter-repo-demos/filter-lamely[filter-lamely],
>     a drop-in git-filter-branch replacement (with a few caveats).

This suggests filter-lamely doesn't suffer from performance or safety
problems, which is very misleading.  filter-lamely doesn't improve the
safety story at all and only ameliorates the performance problems
somewhat.

> > +SAFETY
> > +------
> > +
> > +* Non-ascii filenames (which are rare) can be silently removed despite
>
> Perhaps drop "(which are rare)" to make this sound more formal and
> less like an email message.

Makes sense; and I'm guessing I should also drop it from the bullet
point above this one.

I'll stop commenting on the individual comments since there's not much
to say with most of them other than they look like obviously good
suggestions...

> Comment below also are intended to make the prose sound a bit more formal.
>
> > +being in a desired directory.  The desire to select paths to keep often
> > +use pipelines like `git ls-files | grep -v ^WANTED_DIR/ | xargs git rm`.
> > +ls-files will only quote filenames if needed so folks may not notice
>
> s/ls-files/git-&/
>
> > +that one of the files didn't match the regex, again until it's much too
> > +late.  Yes, someone who knows about core.quotePath can avoid this
> > +(unless they have other special characters like \t, \n, or "), and
> > +people who use ls-files -z with something other than grep can avoid
> > +this, but that doesn't mean they will.
> > +
> > +* It's far too easy to accidentally mix up old and new history.  It's
> > +still possible with any tool, but filter-branch almost invites it.  If
> > +we're lucky, the only downside is users getting frustrated that they
>
> s/we're//
>
> > +don't know how to shrink their repo and remove the old stuff.  If we're
>
> s/we're//
>
> > +unlucky, they merge old and new history and end up with multiple
> > +"copies" of each commit, some of which have unwanted or sensitive files
> > +and others which don't.  This comes about in multiple different ways:
> > +
> > +  ** the default to only doing a partial history rewrite ('--all' is not
> > +     the default and over 80% of the examples in the manpage don't use
> > +     it)
>
> Maybe just shorten this to:
>
>    ('--all is not the default, and few examples show it)

I know I said I'd not comment unless I disagreed, but I just wanted to
say thanks so much for providing concrete suggestions in so many
places.  It's *very* helpful.

> > +  ** the fact that there's no automatic post-run cleanup
> > +
> > +  ** the fact that --tag-name-filter (when used to rename tags) doesn't
> > +     remove the old tags but just adds new ones with the new name (this
> > +     manpage has documented this for a long time so it's presumably not
> > +     a "bug" even though it feels like it)
>
> Perhaps drop the final parenthetical comment.
>
> > +  ** the fact that little educational information is provided to inform
> > +     users of the ramifications of a rewrite and how to avoid mixing old
> > +     and new history.  For example, this man page discusses how users
> > +     need to understand that they need to rebase their changes for all
> > +     their branches on top of new history (or delete and reclone), but
> > +     that's only one of multiple concerns to consider.  See the
> > +     "DISCUSSION" section of the git filter-repo manual page for more
> > +     details.
> > +
> > +* Annotated tags can be accidentally converted to lightweight tags, due
> > +to either of two issues:
> > +
> > +  . Someone can do a history rewrite, realize they messed up, restore
> > +    from the backups in refs/original/, and then redo their
> > +    filter-branch command.  (The backup in refs/original/ is not a real
> > +    backup; it dereferences tags first.)
> > +
> > +  . Running filter-branch with either --tags or --all in your <rev-list
> > +    options>.  In order to retain annotated tags as annotated, you must
> > +    use --tag-name-filter (and must not have restored from
> > +    refs/original/ in a previously botched rewrite).
>
> Should these bullet points use "**" rather than "."?

I guess it could but the "either of two issues" above it made me think
of numbering them.  I also had a couple sub-bullets in the performance
section that were numbered.  But I guess it is slightly weird coming
so close after another section that used un-numbered subbullets.  I
guess I'll just make them all un-numbered.

> > +* Any commit messages that specify an encoding will become corrupted
> > +by the rewrite; filter-branch ignores the encoding, takes the original
> > +bytes, and feeds it to commit-tree without telling it the proper
> > +encoding.  (This happens whether or not --msg-filter is used, though I
> > +suspect --msg-filter provides additional ways to really mess things
> > +up).
>
> Perhaps shorten simply to:
>
>     (This happens whether or not --msg-filter is used.)
>
> > +* If the user provides a --tag-name-filter that maps multiple tags to
> > +the same name, no warning or error is provided; filter-branch simply
> > +overwrites each tag in some undocumented pre-defined order resulting in
> > +only one tag at the end.  If you try to "fix" this bug in filter-branch
> > +and make it error out and warn the user instead, one of the
> > +filter-branch regression tests will fail.  (So, if you are trying to
> > +make a backward compatible reimplementation you have to add extra code
> > +to detect collisions and make sure that only the lexicographically last
> > +one is rewritten to avoid fast-import from seeing both since fast-import
> > +will naturally do the sane thing and error out if told to write the same
> > +tag more than once.)
>
> Maybe drop everything from "If you try to 'fix'..." to the end of paragraph.

Or just replace that long section you highlight with a parenthetical
comment, "(a git-filter-branch regression test requires this.)"

> > +Also, the poor performance of filter-branch often leads to safety issues:
> > +
> > +* Coming up with the correct shell snippet to do the filtering you want
> > +is sometimes difficult unless you're just doing a trivial modification
> > +such as deleting a couple files.  People have often come to me for help,
> > +so I should be practiced and an expert, but even for fairly simple cases
> > +I still sometimes take over 10 minutes and several iterations to get
> > +the right commands -- and that's assuming they are working on a tiny
> > +repository.  Unfortunately, people often learn if the snippet is right
> > +or wrong by trying it out, but the rightness or wrongness can vary
> > +depending on special circumstances (spaces in filenames, non-ascii
> > +filenames, funny author names or emails, invalid timezones, presence of
> > +grafts or replace objects, etc.), meaning they may have to wait a long
> > +time, hit an error, then restart.  The performance of filter-branch is
> > +so bad that this cycle is painful, reducing the time available to
> > +carefully re-check (to say nothing about what it does to the patience of
> > +the person doing the rewrite even if they do technically have more time
> > +available).  This problem is extra compounded because errors from broken
> > +filters may not be shown for a long time and/or get lost in a sea of
> > +output.  Even worse, broken filters often just result in silent
> > +incorrect rewrites.
>
> Drop the "People have often come to me..." sentence from this paragraph.


Thanks again for the careful reading and many suggestions!

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 0/4] Warn about git-filter-branch usage and avoid it
  2019-08-29  0:06           ` [PATCH v3 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
                               ` (3 preceding siblings ...)
  2019-08-29  0:06             ` [PATCH v3 4/4] t9902: use a non-deprecated command for testing Elijah Newren
@ 2019-08-30  5:57             ` Elijah Newren
  2019-08-30  5:57               ` [PATCH v4 1/4] t6006: simplify and optimize empty message test Elijah Newren
                                 ` (3 more replies)
  4 siblings, 4 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-30  5:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine, Elijah Newren

Here's a series warns about git-filter-branch usage and avoids it
ourselves.

Changes since v3
  * Incorporated Eric's detailed feedback on the git-filter-branch
    manpage, some notes:
      * s/filter-branch/git-&/ (and similar for ls-files)
      * Multiple sections removed (and existing sections had a
        number of sentences removed)
      * I ended up not linking to the original html, but just added
        a small "Side Note" in a sub-bullet to address how fixing the
	written-in-shell attribute of git-filter-branch would do less
	than proponents expect.
      * ...and lots of other miscellaneous wording fixes and cleanups
  * The full range-diff is below, but it's kinda hard to read due to
    line wrapping and such.

Elijah Newren (4):
  t6006: simplify and optimize empty message test
  t3427: accelerate this test by using fast-export and fast-import
  Recommend git-filter-repo instead of git-filter-branch
  t9902: use a non-deprecated command for testing

 Documentation/git-fast-export.txt   |   6 +-
 Documentation/git-filter-branch.txt | 272 +++++++++++++++++++++++++---
 Documentation/git-gc.txt            |  17 +-
 Documentation/git-rebase.txt        |   3 +-
 Documentation/git-replace.txt       |  10 +-
 Documentation/git-svn.txt           |  10 +-
 Documentation/githooks.txt          |  10 +-
 contrib/svn-fe/svn-fe.txt           |   4 +-
 git-filter-branch.sh                |  13 ++
 t/t3427-rebase-subtree.sh           |  24 ++-
 t/t6006-rev-list-format.sh          |   5 +-
 t/t9902-completion.sh               |  12 +-
 12 files changed, 309 insertions(+), 77 deletions(-)

Range-diff:
1:  7ddbeea2ca = 1:  7ddbeea2ca t6006: simplify and optimize empty message test
2:  e1e63189c1 = 2:  e1e63189c1 t3427: accelerate this test by using fast-export and fast-import
3:  59c7446927 ! 3:  ed6505584f Recommend git-filter-repo instead of git-filter-branch
    @@ Documentation/git-filter-branch.txt: SYNOPSIS
     +carefully read <<SAFETY>> (and <<PERFORMANCE>>) to learn about the land
     +mines of filter-branch, and then vigilantly avoid as many of the hazards
     +listed there as reasonably possible.
    -+
    -+https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@mail.gmail.com/[detailing
    -+the land mines of filter-branch]
     +
      DESCRIPTION
      -----------
    @@ Documentation/git-filter-branch.txt: warned.
     +PERFORMANCE
     +-----------
     +
    -+The performance of filter-branch is glacially slow; its design makes it
    ++The performance of git-filter-branch is glacially slow; its design makes it
     +impossible for a backward-compatible implementation to ever be fast:
     +
     +* In editing files, git-filter-branch by design checks out each and
    @@ Documentation/git-filter-branch.txt: warned.
     +git-filter-branch will make you do 10\^10 modifications, despite only
     +having (at most) 5*10^5 unique blobs.
     +
    -+* If you try and cheat and try to make filter-branch only work on
    ++* If you try and cheat and try to make git-filter-branch only work on
     +files modified in a commit, then two things happen
     +
    -+  . you run into problems with deletions whenever the user is simply
    -+    trying to rename files (because attempting to delete files that
    -+    don't exist looks like a no-op; it takes some chicanery to remap
    -+    deletes across file renames when the renames happen via arbitrary
    -+    user-provided shell)
    ++  ** you run into problems with deletions whenever the user is simply
    ++     trying to rename files (because attempting to delete files that
    ++     don't exist looks like a no-op; it takes some chicanery to remap
    ++     deletes across file renames when the renames happen via arbitrary
    ++     user-provided shell)
     +
    -+  . even if you succeed at the map-deletes-for-renames chicanery, you
    -+    still technically violate backward compatibility because users are
    -+    allowed to filter files in ways that depend upon topology of commits
    -+    instead of filtering solely based on file contents or names (though
    -+    I have never seen any user ever do this).
    ++  ** even if you succeed at the map-deletes-for-renames chicanery, you
    ++     still technically violate backward compatibility because users are
    ++     allowed to filter files in ways that depend upon topology of
    ++     commits instead of filtering solely based on file contents or names
    ++     (though this has not been observed in the wild).
     +
     +* Even if you don't need to edit files but only want to e.g. rename or
     +remove some and thus can avoid checking out each file (i.e. you can use
     +--index-filter), you still are passing shell snippets for your filters.
     +This means that for every commit, you have to have a prepared git repo
    -+where users can run git commands.  That's a lot of setup.  It also means
    -+you have to fork at least one process to run the user-provided shell
    -+snippet, and odds are that the user's shell snippet invokes lots of
    -+commands in some long pipeline, so you will have lots and lots of forks.
    -+For every. single. commit.  That's a massive amount of overhead to
    -+rename a few files.
    -+
    -+* filter-branch is written in shell, which is kind of slow.  Naturally,
    -+it makes sense to want to rewrite that in some other language.  However,
    -+filter-branch documentation states that several additional shell
    -+functions are provided for users to call, e.g. 'map', 'skip_commit',
    -+'git_commit_non_empty_tree'.  If filter-branch itself isn't a shell
    -+script, then in order to make those shell functions available to the
    -+users' shell snippets you have to prepend the shell definitions of these
    -+functions to every one of the users' shell snippets and thus make these
    -+special shell functions be parsed with each and every commit.
    -+
    -+* filter-branch provides a --setup option which is a shell snippet that
    -+can be sourced to make shell functions and variables available to all
    -+other filters.  If filter-branch is a shell script, it can simply eval
    -+this shell snippet once at the beginning.  If you try to fix performance
    -+by making filter-branch not be a shell script, then you have to prepend
    -+the setup shell snippet to all other filters and parse it with every
    -+single commit.
    -+
    -+* filter-branch writes lots of files to $workdir/../map/ to keep a
    -+mapping of commits, which it uses pruning commits and remapping to
    -+ancestors and the map() command more generally.  Other files like
    -+$tempdir/backup-refs, $tempdir/raw-refs, $tempdir/heads,
    -+$tempdir/tree-state are all created internally too.  It is possible
    -+(though strongly discouraged) that users could have accessed any of
    -+these directly.  Users even had a pointer to follow in the form of
    -+Documentation that the 'map' command existed, which naturally uses the
    -+$workdir/../map/* files.  So, even if you don't have to edit files, for
    -+strict backward compatibility you need to still write a bunch of files
    -+to disk somewhere and keep them updated for every commit.  You can claim
    -+it was an implementation detail that users should not have depended
    -+upon, but the truth is they've had a decade where they could so.  So, if
    -+you want full compatibility, it has to be there.  Besides, the
    -+regression tests depend on at least one of these details, specifying an
    -+--index-filter that reaches down and grabs backup-refs from $tempdir,
    -+and thus provides resourceful users who do google searches an example
    -+that there are files there for them to read and grab and use.  (And if
    -+you want to pass the existing regression tests, you have to at least put
    -+the backup-refs file there even if it's irrelevant to your
    -+implementation otherwise.)
    -+
    -+All of that said, performance of filter-branch could be improved by
    -+reimplementing it in a non-shell language and taking a couple small
    -+liberties with backward compatibility (such as having it only run
    -+filters on files changed within each commit).  filter-repo provides a
    -+demo script named
    -+https://github.com/newren/git-filter-repo/blob/master/contrib/filter-repo-demos/filter-lamely[filter-lamely]
    -+which does exactly that and which passes all the git-filter-branch
    -+regression tests.  It's much faster than git-filter-branch, though it
    -+suffers from all the same safety issues as git-filter-branch, and is
    -+still glacially slow compared to
    -+https://github.com/newren/git-filter-repo/[git filter-repo].
    ++where those filters can be run.  That's a significant setup.
    ++
    ++* Further, several additional files are created or updated per commit by
    ++git-filter-branch.  Some of these are for supporting the convenience
    ++functions provided by git-filter-branch (such as map()), while others
    ++are for keeping track of internal state (but could have also been
    ++accessed by user filters; one of git-filter-branch's regression tests
    ++does so).  This essentially amounts to using the filesystem as an IPC
    ++mechanism between git-filter-branch and the user-provided filters.
    ++Disks tend to be a slow IPC mechanism, and writing these files also
    ++effectively represents a forced synchronization point between separate
    ++processes that we hit with every commit.
    ++
    ++* The user-provided shell commands will likely involve a pipeline of
    ++commands, resulting in the creation of many processes per commit.
    ++Creating and running another process takes a widely varying amount of
    ++time between operating systems, but on any platform it is very slow
    ++relative to invoking a function.
    ++
    ++* git-filter-branch itself is written in shell, which is kind of slow.
    ++This is the one performance issue that could be backward-compatibly
    ++fixed, but compared to the above problems that are intrinsic to the
    ++design of git-filter-branch, the language of the tool itself is a
    ++relatively minor issue.
    ++
    ++  ** Side note: Unfortunately, people tend to fixate on the
    ++     written-in-shell aspect and periodically ask if git-filter-branch
    ++     could be rewritten in another language to fix the performance
    ++     issues.  Not only does that ignore the bigger intrinsic problems
    ++     with the design, it'd help less than you'd expect: if
    ++     git-filter-branch itself were not shell, then the convenience
    ++     functions (map(), skip_commit(), etc) and the `--setup` argument
    ++     could no longer be executed once at the beginning of the program
    ++     but would instead need to be prepended to every user filter (and
    ++     thus re-executed with every commit).
    ++
    ++The https://github.com/newren/git-filter-repo/[git filter-repo] tool is
    ++an alternative to git-filter-branch which does not suffer from these
    ++performance problems or the safety problems (mentioned below). For those
    ++with existing tooling which relies upon git-filter-branch, 'git
    ++repo-filter' also provides
    ++https://github.com/newren/git-filter-repo/blob/master/contrib/filter-repo-demos/filter-lamely[filter-lamely],
    ++a drop-in git-filter-branch replacement (with a few caveats).  While
    ++filter-lamely suffers from all the same safety issues as
    ++git-filter-branch, it at least ameloriates the performance issues a
    ++little.
     +
     +[[SAFETY]]
     +SAFETY
     +------
     +
    -+filter-branch is riddled with gotchas resulting in various ways to
    ++git-filter-branch is riddled with gotchas resulting in various ways to
     +easily corrupt repos or end up with a mess worse than what you started
     +with:
     +
    @@ Documentation/git-filter-branch.txt: warned.
     +history is in use for quite a while, at which point it's really hard to
     +justify another flag-day for another rewrite.)
     +
    -+* Filenames with spaces (which are rare) are often mishandled by shell
    -+snippets since they cause problems for shell pipelines.  Not everyone is
    -+familiar with find -print0, xargs -0, ls-files -z, etc.  Even people who
    -+are familiar with these may assume such needs are not relevant because
    ++* Filenames with spaces are often mishandled by shell snippets since
    ++they cause problems for shell pipelines.  Not everyone is familiar with
    ++find -print0, xargs -0, git-ls-files -z, etc.  Even people who are
    ++familiar with these may assume such needs are not relevant because
     +someone else renamed any such files in their repo back before the person
     +doing the filtering joined the project.  And, often, even those familiar
     +with handling arguments with spaces my not do so just because they
     +aren't in the mindset of thinking about everything that could possibly
     +go wrong.
     +
    -+* Non-ascii filenames (which are rare) can be silently removed despite
    -+being in a desired directory.  The desire to select paths to keep often
    -+use pipelines like `git ls-files | grep -v ^WANTED_DIR/ | xargs git rm`.
    -+ls-files will only quote filenames if needed so folks may not notice
    -+that one of the files didn't match the regex, again until it's much too
    -+late.  Yes, someone who knows about core.quotePath can avoid this
    -+(unless they have other special characters like \t, \n, or "), and
    -+people who use ls-files -z with something other than grep can avoid
    -+this, but that doesn't mean they will.
    ++* Non-ascii filenames can be silently removed despite being in a desired
    ++directory.  The desire to select paths to keep often use pipelines like
    ++`git ls-files | grep -v ^WANTED_DIR/ | xargs git rm`.  ls-files will
    ++only quote filenames if needed so folks may not notice that one of the
    ++files didn't match the regex, again until it's much too late.  Yes,
    ++someone who knows about core.quotePath can avoid this (unless they have
    ++other special characters like \t, \n, or "), and people who use ls-files
    ++-z with something other than grep can avoid this, but that doesn't mean
    ++they will.
     +
     +* Similarly, when moving files around, one can find that filenames with
     +non-ascii or special characters end up in a different directory, one
    @@ Documentation/git-filter-branch.txt: warned.
     +that it can and has manifested as a problem.)
     +
     +* It's far too easy to accidentally mix up old and new history.  It's
    -+still possible with any tool, but filter-branch almost invites it.  If
    -+we're lucky, the only downside is users getting frustrated that they
    -+don't know how to shrink their repo and remove the old stuff.  If we're
    -+unlucky, they merge old and new history and end up with multiple
    -+"copies" of each commit, some of which have unwanted or sensitive files
    -+and others which don't.  This comes about in multiple different ways:
    ++still possible with any tool, but git-filter-branch almost invites it.
    ++If lucky, the only downside is users getting frustrated that they don't
    ++know how to shrink their repo and remove the old stuff.  If unlucky,
    ++they merge old and new history and end up with multiple "copies" of each
    ++commit, some of which have unwanted or sensitive files and others which
    ++don't.  This comes about in multiple different ways:
     +
     +  ** the default to only doing a partial history rewrite ('--all' is not
    -+     the default and over 80% of the examples in the manpage don't use
    -+     it)
    ++     the default and few examples show it)
     +
     +  ** the fact that there's no automatic post-run cleanup
     +
     +  ** the fact that --tag-name-filter (when used to rename tags) doesn't
    -+     remove the old tags but just adds new ones with the new name (this
    -+     manpage has documented this for a long time so it's presumably not
    -+     a "bug" even though it feels like it)
    ++     remove the old tags but just adds new ones with the new name
     +
     +  ** the fact that little educational information is provided to inform
     +     users of the ramifications of a rewrite and how to avoid mixing old
    @@ Documentation/git-filter-branch.txt: warned.
     +* Annotated tags can be accidentally converted to lightweight tags, due
     +to either of two issues:
     +
    -+  . Someone can do a history rewrite, realize they messed up, restore
    -+    from the backups in refs/original/, and then redo their
    -+    filter-branch command.  (The backup in refs/original/ is not a real
    -+    backup; it dereferences tags first.)
    ++  ** Someone can do a history rewrite, realize they messed up, restore
    ++     from the backups in refs/original/, and then redo their
    ++     git-filter-branch command.  (The backup in refs/original/ is not a
    ++     real backup; it dereferences tags first.)
     +
    -+  . Running filter-branch with either --tags or --all in your <rev-list
    -+    options>.  In order to retain annotated tags as annotated, you must
    -+    use --tag-name-filter (and must not have restored from
    -+    refs/original/ in a previously botched rewrite).
    ++  ** Running git-filter-branch with either --tags or --all in your
    ++     <rev-list options>.  In order to retain annotated tags as
    ++     annotated, you must use --tag-name-filter (and must not have
    ++     restored from refs/original/ in a previously botched rewrite).
     +
     +* Any commit messages that specify an encoding will become corrupted
    -+by the rewrite; filter-branch ignores the encoding, takes the original
    ++by the rewrite; git-filter-branch ignores the encoding, takes the original
     +bytes, and feeds it to commit-tree without telling it the proper
    -+encoding.  (This happens whether or not --msg-filter is used, though I
    -+suspect --msg-filter provides additional ways to really mess things
    -+up).
    ++encoding.  (This happens whether or not --msg-filter is used.)
     +
     +* Commit messages (even if they are all UTF-8) by default become
     +corrupted due to not being updated -- any references to other commit
    @@ Documentation/git-filter-branch.txt: warned.
     +authors and committers, missing taggers.
     +
     +* If the user provides a --tag-name-filter that maps multiple tags to
    -+the same name, no warning or error is provided; filter-branch simply
    ++the same name, no warning or error is provided; git-filter-branch simply
     +overwrites each tag in some undocumented pre-defined order resulting in
    -+only one tag at the end.  If you try to "fix" this bug in filter-branch
    -+and make it error out and warn the user instead, one of the
    -+filter-branch regression tests will fail.  (So, if you are trying to
    -+make a backward compatible reimplementation you have to add extra code
    -+to detect collisions and make sure that only the lexicographically last
    -+one is rewritten to avoid fast-import from seeing both since fast-import
    -+will naturally do the sane thing and error out if told to write the same
    -+tag more than once.)
    ++only one tag at the end.  (A git-filter-branch regression test requires
    ++this.)
     +
    -+Also, the poor performance of filter-branch often leads to safety issues:
    ++Also, the poor performance of git-filter-branch often leads to safety issues:
     +
     +* Coming up with the correct shell snippet to do the filtering you want
     +is sometimes difficult unless you're just doing a trivial modification
    -+such as deleting a couple files.  People have often come to me for help,
    -+so I should be practiced and an expert, but even for fairly simple cases
    -+I still sometimes take over 10 minutes and several iterations to get
    -+the right commands -- and that's assuming they are working on a tiny
    -+repository.  Unfortunately, people often learn if the snippet is right
    -+or wrong by trying it out, but the rightness or wrongness can vary
    -+depending on special circumstances (spaces in filenames, non-ascii
    -+filenames, funny author names or emails, invalid timezones, presence of
    -+grafts or replace objects, etc.), meaning they may have to wait a long
    -+time, hit an error, then restart.  The performance of filter-branch is
    -+so bad that this cycle is painful, reducing the time available to
    -+carefully re-check (to say nothing about what it does to the patience of
    -+the person doing the rewrite even if they do technically have more time
    -+available).  This problem is extra compounded because errors from broken
    -+filters may not be shown for a long time and/or get lost in a sea of
    -+output.  Even worse, broken filters often just result in silent
    -+incorrect rewrites.
    ++such as deleting a couple files.  Unfortunately, people often learn if
    ++the snippet is right or wrong by trying it out, but the rightness or
    ++wrongness can vary depending on special circumstances (spaces in
    ++filenames, non-ascii filenames, funny author names or emails, invalid
    ++timezones, presence of grafts or replace objects, etc.), meaning they
    ++may have to wait a long time, hit an error, then restart.  The
    ++performance of git-filter-branch is so bad that this cycle is painful,
    ++reducing the time available to carefully re-check (to say nothing about
    ++what it does to the patience of the person doing the rewrite even if
    ++they do technically have more time available).  This problem is extra
    ++compounded because errors from broken filters may not be shown for a
    ++long time and/or get lost in a sea of output.  Even worse, broken
    ++filters often just result in silent incorrect rewrites.
     +
     +* To top it all off, even when users finally find working commands, they
     +naturally want to share them.  But they may be unaware that their repo
4:  1dbca82408 = 4:  ca8e124cb3 t9902: use a non-deprecated command for testing
5:  762d63d6a5 < -:  ---------- Remove git-filter-branch, it is now external to git.git
-- 
2.23.0.38.g892688c90e


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 1/4] t6006: simplify and optimize empty message test
  2019-08-30  5:57             ` [PATCH v4 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
@ 2019-08-30  5:57               ` Elijah Newren
  2019-09-02 14:47                 ` Johannes Schindelin
  2019-08-30  5:57               ` [PATCH v4 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
                                 ` (2 subsequent siblings)
  3 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-08-30  5:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine, Elijah Newren

Test t6006.71 ("oneline with empty message") was creating two commits
with simple commit messages, and then running filter-branch to rewrite
the commit messages to be empty.  This test was written this way because
the --allow-empty-message option to git commit did not exist at the
time.  Simplify this test and avoid the need to invoke filter-branch by
just using --allow-empty-message when creating the commit.

Despite only being one piece of the 71st test and there being 73 tests
overall, this small change to just this one test speeds up the overall
execution time of t6006 (as measured by the best of 3 runs of `time
./t6006-rev-list-format.sh`) by about 11% on Linux and by 13% on
Mac.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t6006-rev-list-format.sh | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
index da113d975b..d30e41c9f7 100755
--- a/t/t6006-rev-list-format.sh
+++ b/t/t6006-rev-list-format.sh
@@ -501,9 +501,8 @@ test_expect_success 'reflog identity' '
 '
 
 test_expect_success 'oneline with empty message' '
-	git commit -m "dummy" --allow-empty &&
-	git commit -m "dummy" --allow-empty &&
-	git filter-branch --msg-filter "sed -e s/dummy//" HEAD^^.. &&
+	git commit --allow-empty --allow-empty-message &&
+	git commit --allow-empty --allow-empty-message &&
 	git rev-list --oneline HEAD >test.txt &&
 	test_line_count = 5 test.txt &&
 	git rev-list --oneline --graph HEAD >testg.txt &&
-- 
2.23.0.38.g892688c90e


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 2/4] t3427: accelerate this test by using fast-export and fast-import
  2019-08-30  5:57             ` [PATCH v4 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
  2019-08-30  5:57               ` [PATCH v4 1/4] t6006: simplify and optimize empty message test Elijah Newren
@ 2019-08-30  5:57               ` Elijah Newren
  2019-09-02 14:45                 ` Johannes Schindelin
  2019-08-30  5:57               ` [PATCH v4 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
  2019-08-30  5:57               ` [PATCH v4 4/4] t9902: use a non-deprecated command for testing Elijah Newren
  3 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-08-30  5:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine, Elijah Newren

fast-export and fast-import can easily handle the simple rewrite that
was being done by filter-branch, and should be significantly faster on
systems with a slow fork.  Timings from before and after on two laptops
that I have access to (measured via `time ./t3427-rebase-subtree.sh`,
i.e. including everything in this test -- not just the filter-branch or
fast-export/fast-import pair):

   Linux:  4.305s -> 3.684s (~17% speedup)
   Mac:   10.128s -> 7.038s (~30% speedup)

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t3427-rebase-subtree.sh | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/t/t3427-rebase-subtree.sh b/t/t3427-rebase-subtree.sh
index d8640522a0..c1f6102921 100755
--- a/t/t3427-rebase-subtree.sh
+++ b/t/t3427-rebase-subtree.sh
@@ -7,10 +7,16 @@ This test runs git rebase and tests the subtree strategy.
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-rebase.sh
 
-commit_message() {
+commit_message () {
 	git log --pretty=format:%s -1 "$1"
 }
 
+extract_files_subtree () {
+	git fast-export --no-data HEAD -- files_subtree/ |
+		sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" |
+		git fast-import --force --quiet
+}
+
 test_expect_success 'setup' '
 	test_commit README &&
 	mkdir files &&
@@ -42,7 +48,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --preserve-merges --onto commit 4' '
 	reset_rebase &&
 	git checkout -b rebase-preserve-merges-4 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD~)" = "files_subtree/master4"
@@ -53,7 +59,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --preserve-merges --onto commit 5' '
 	reset_rebase &&
 	git checkout -b rebase-preserve-merges-5 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD)" = "files_subtree/master5"
@@ -64,7 +70,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --keep-empty --preserve-merges --onto commit 4' '
 	reset_rebase &&
 	git checkout -b rebase-keep-empty-4 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --keep-empty --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD~2)" = "files_subtree/master4"
@@ -75,7 +81,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --keep-empty --preserve-merges --onto commit 5' '
 	reset_rebase &&
 	git checkout -b rebase-keep-empty-5 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --keep-empty --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD~)" = "files_subtree/master5"
@@ -86,7 +92,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --keep-empty --preserve-merges --onto empty commit' '
 	reset_rebase &&
 	git checkout -b rebase-keep-empty-empty master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --keep-empty --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD)" = "Empty commit"
@@ -96,7 +102,7 @@ test_expect_failure REBASE_P \
 test_expect_failure 'Rebase -Xsubtree --onto commit 4' '
 	reset_rebase &&
 	git checkout -b rebase-onto-4 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --onto files-master master &&
 	verbose test "$(commit_message HEAD~2)" = "files_subtree/master4"
@@ -106,7 +112,7 @@ test_expect_failure 'Rebase -Xsubtree --onto commit 4' '
 test_expect_failure 'Rebase -Xsubtree --onto commit 5' '
 	reset_rebase &&
 	git checkout -b rebase-onto-5 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --onto files-master master &&
 	verbose test "$(commit_message HEAD~)" = "files_subtree/master5"
@@ -115,7 +121,7 @@ test_expect_failure 'Rebase -Xsubtree --onto commit 5' '
 test_expect_failure 'Rebase -Xsubtree --onto empty commit' '
 	reset_rebase &&
 	git checkout -b rebase-onto-empty master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --onto files-master master &&
 	verbose test "$(commit_message HEAD)" = "Empty commit"
-- 
2.23.0.38.g892688c90e


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 3/4] Recommend git-filter-repo instead of git-filter-branch
  2019-08-30  5:57             ` [PATCH v4 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
  2019-08-30  5:57               ` [PATCH v4 1/4] t6006: simplify and optimize empty message test Elijah Newren
  2019-08-30  5:57               ` [PATCH v4 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
@ 2019-08-30  5:57               ` Elijah Newren
  2019-08-30  5:57               ` [PATCH v4 4/4] t9902: use a non-deprecated command for testing Elijah Newren
  3 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-30  5:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine, Elijah Newren

filter-branch suffers from a deluge of disguised dangers that disfigure
history rewrites (i.e. deviate from the deliberate changes).  Many of
these problems are unobtrusive and can easily go undiscovered until the
new repository is in use.  This can result in problems ranging from an
even messier history than what led folks to filter-branch in the first
place, to data loss or corruption.  These issues cannot be backward
compatibly fixed, so add a warning to both filter-branch and its manpage
recommending that another tool (such as filter-repo) be used instead.

Also, update other manpages that referenced filter-branch.  Several of
these needed updates even if we could continue recommending
filter-branch, either due to implying that something was unique to
filter-branch when it applied more generally to all history rewriting
tools (e.g. BFG, reposurgeon, fast-import, filter-repo), or because
something about filter-branch was used as an example despite other more
commonly known examples now existing.  Reword these sections to fix
these issues and to avoid recommending filter-branch.

Finally, remove the section explaining BFG Repo Cleaner as an
alternative to filter-branch.  I feel somewhat bad about this,
especially since I feel like I learned so much from BFG that I put to
good use in filter-repo (which is much more than I can say for
filter-branch), but keeping that section presented a few problems:
  * In order to recommend that people quit using filter-branch, we need
    to provide them a recomendation for something else to use that
    can handle all the same types of rewrites.  To my knowledge,
    filter-repo is the only such tool.  So it needs to be mentioned.
  * I don't want to give conflicting recommendations to users
  * If we recommend two tools, we shouldn't expect users to learn both
    and pick which one to use; we should explain which problems one
    can solve that the other can't or when one is much faster than
    the other.
  * BFG and filter-repo have similar performance
  * All filtering types that BFG can do, filter-repo can also do.  In
    fact, filter-repo comes with a reimplementation of BFG named
    bfg-ish which provides the same user-interface as BFG but with
    several bugfixes and new features that are hard to implement in
    BFG due to its technical underpinnings.
While I could still mention both tools, it seems like I would need to
provide some kind of comparison and I would ultimately just say that
filter-repo can do everything BFG can, so ultimately it seems that it
is just better to remove that section altogether.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-fast-export.txt   |   6 +-
 Documentation/git-filter-branch.txt | 272 +++++++++++++++++++++++++---
 Documentation/git-gc.txt            |  17 +-
 Documentation/git-rebase.txt        |   3 +-
 Documentation/git-replace.txt       |  10 +-
 Documentation/git-svn.txt           |  10 +-
 Documentation/githooks.txt          |  10 +-
 contrib/svn-fe/svn-fe.txt           |   4 +-
 git-filter-branch.sh                |  13 ++
 9 files changed, 286 insertions(+), 59 deletions(-)

diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
index cc940eb9ad..784e934009 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -17,9 +17,9 @@ This program dumps the given revisions in a form suitable to be piped
 into 'git fast-import'.
 
 You can use it as a human-readable bundle replacement (see
-linkgit:git-bundle[1]), or as a kind of an interactive
-'git filter-branch'.
-
+linkgit:git-bundle[1]), or as a format that can be edited before being
+fed to 'git fast-import' in order to do history rewrites (an ability
+relied on by tools like 'git filter-repo').
 
 OPTIONS
 -------
diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
index 6b53dd7e06..c199f2ee20 100644
--- a/Documentation/git-filter-branch.txt
+++ b/Documentation/git-filter-branch.txt
@@ -16,6 +16,19 @@ SYNOPSIS
 	[--original <namespace>] [-d <directory>] [-f | --force]
 	[--state-branch <branch>] [--] [<rev-list options>...]
 
+WARNING
+-------
+'git filter-branch' has a plethora of pitfalls that can produce non-obvious
+manglings of the intended history rewrite (and can leave you with little
+time to investigate such problems since it has such abysmal performance).
+These safety and performance issues cannot be backward compatibly fixed and
+as such, its use is not recommended.  Please use an alternative history
+filtering tool such as https://github.com/newren/git-filter-repo/[git
+filter-repo].  If you still need to use 'git filter-branch', please
+carefully read <<SAFETY>> (and <<PERFORMANCE>>) to learn about the land
+mines of filter-branch, and then vigilantly avoid as many of the hazards
+listed there as reasonably possible.
+
 DESCRIPTION
 -----------
 Lets you rewrite Git revision history by rewriting the branches mentioned
@@ -445,36 +458,235 @@ warned.
   (or if your git-gc is not new enough to support arguments to
   `--prune`, use `git repack -ad; git prune` instead).
 
-NOTES
------
-
-git-filter-branch allows you to make complex shell-scripted rewrites
-of your Git history, but you probably don't need this flexibility if
-you're simply _removing unwanted data_ like large files or passwords.
-For those operations you may want to consider
-http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner],
-a JVM-based alternative to git-filter-branch, typically at least
-10-50x faster for those use-cases, and with quite different
-characteristics:
-
-* Any particular version of a file is cleaned exactly _once_. The BFG,
-  unlike git-filter-branch, does not give you the opportunity to
-  handle a file differently based on where or when it was committed
-  within your history. This constraint gives the core performance
-  benefit of The BFG, and is well-suited to the task of cleansing bad
-  data - you don't care _where_ the bad data is, you just want it
-  _gone_.
-
-* By default The BFG takes full advantage of multi-core machines,
-  cleansing commit file-trees in parallel. git-filter-branch cleans
-  commits sequentially (i.e. in a single-threaded manner), though it
-  _is_ possible to write filters that include their own parallelism,
-  in the scripts executed against each commit.
-
-* The http://rtyley.github.io/bfg-repo-cleaner/#examples[command options]
-  are much more restrictive than git-filter branch, and dedicated just
-  to the tasks of removing unwanted data- e.g:
-  `--strip-blobs-bigger-than 1M`.
+[[PERFORMANCE]]
+PERFORMANCE
+-----------
+
+The performance of git-filter-branch is glacially slow; its design makes it
+impossible for a backward-compatible implementation to ever be fast:
+
+* In editing files, git-filter-branch by design checks out each and
+every commit as it existed in the original repo.  If your repo has 10\^5
+files and 10\^5 commits, but each commit only modifies 5 files, then
+git-filter-branch will make you do 10\^10 modifications, despite only
+having (at most) 5*10^5 unique blobs.
+
+* If you try and cheat and try to make git-filter-branch only work on
+files modified in a commit, then two things happen
+
+  ** you run into problems with deletions whenever the user is simply
+     trying to rename files (because attempting to delete files that
+     don't exist looks like a no-op; it takes some chicanery to remap
+     deletes across file renames when the renames happen via arbitrary
+     user-provided shell)
+
+  ** even if you succeed at the map-deletes-for-renames chicanery, you
+     still technically violate backward compatibility because users are
+     allowed to filter files in ways that depend upon topology of
+     commits instead of filtering solely based on file contents or names
+     (though this has not been observed in the wild).
+
+* Even if you don't need to edit files but only want to e.g. rename or
+remove some and thus can avoid checking out each file (i.e. you can use
+--index-filter), you still are passing shell snippets for your filters.
+This means that for every commit, you have to have a prepared git repo
+where those filters can be run.  That's a significant setup.
+
+* Further, several additional files are created or updated per commit by
+git-filter-branch.  Some of these are for supporting the convenience
+functions provided by git-filter-branch (such as map()), while others
+are for keeping track of internal state (but could have also been
+accessed by user filters; one of git-filter-branch's regression tests
+does so).  This essentially amounts to using the filesystem as an IPC
+mechanism between git-filter-branch and the user-provided filters.
+Disks tend to be a slow IPC mechanism, and writing these files also
+effectively represents a forced synchronization point between separate
+processes that we hit with every commit.
+
+* The user-provided shell commands will likely involve a pipeline of
+commands, resulting in the creation of many processes per commit.
+Creating and running another process takes a widely varying amount of
+time between operating systems, but on any platform it is very slow
+relative to invoking a function.
+
+* git-filter-branch itself is written in shell, which is kind of slow.
+This is the one performance issue that could be backward-compatibly
+fixed, but compared to the above problems that are intrinsic to the
+design of git-filter-branch, the language of the tool itself is a
+relatively minor issue.
+
+  ** Side note: Unfortunately, people tend to fixate on the
+     written-in-shell aspect and periodically ask if git-filter-branch
+     could be rewritten in another language to fix the performance
+     issues.  Not only does that ignore the bigger intrinsic problems
+     with the design, it'd help less than you'd expect: if
+     git-filter-branch itself were not shell, then the convenience
+     functions (map(), skip_commit(), etc) and the `--setup` argument
+     could no longer be executed once at the beginning of the program
+     but would instead need to be prepended to every user filter (and
+     thus re-executed with every commit).
+
+The https://github.com/newren/git-filter-repo/[git filter-repo] tool is
+an alternative to git-filter-branch which does not suffer from these
+performance problems or the safety problems (mentioned below). For those
+with existing tooling which relies upon git-filter-branch, 'git
+repo-filter' also provides
+https://github.com/newren/git-filter-repo/blob/master/contrib/filter-repo-demos/filter-lamely[filter-lamely],
+a drop-in git-filter-branch replacement (with a few caveats).  While
+filter-lamely suffers from all the same safety issues as
+git-filter-branch, it at least ameloriates the performance issues a
+little.
+
+[[SAFETY]]
+SAFETY
+------
+
+git-filter-branch is riddled with gotchas resulting in various ways to
+easily corrupt repos or end up with a mess worse than what you started
+with:
+
+* Someone can have a set of "working and tested filters" which they
+document or provide to a coworker, who then runs them on a different OS
+where the same commands are not working/tested (some examples in the
+git-filter-branch manpage are also affected by this).  BSD vs. GNU
+userland differences can really bite.  If you're lucky, you get ugly
+error messages spewed.  But just as likely, the commands either don't do
+the filtering requested, or silently corrupt making some unwanted
+change.  The unwanted change may only affect a few commits, so it's not
+necessarily obvious either.  (The fact that problems won't necessarily
+be obvious means they are likely to go unnoticed until the rewritten
+history is in use for quite a while, at which point it's really hard to
+justify another flag-day for another rewrite.)
+
+* Filenames with spaces are often mishandled by shell snippets since
+they cause problems for shell pipelines.  Not everyone is familiar with
+find -print0, xargs -0, git-ls-files -z, etc.  Even people who are
+familiar with these may assume such needs are not relevant because
+someone else renamed any such files in their repo back before the person
+doing the filtering joined the project.  And, often, even those familiar
+with handling arguments with spaces my not do so just because they
+aren't in the mindset of thinking about everything that could possibly
+go wrong.
+
+* Non-ascii filenames can be silently removed despite being in a desired
+directory.  The desire to select paths to keep often use pipelines like
+`git ls-files | grep -v ^WANTED_DIR/ | xargs git rm`.  ls-files will
+only quote filenames if needed so folks may not notice that one of the
+files didn't match the regex, again until it's much too late.  Yes,
+someone who knows about core.quotePath can avoid this (unless they have
+other special characters like \t, \n, or "), and people who use ls-files
+-z with something other than grep can avoid this, but that doesn't mean
+they will.
+
+* Similarly, when moving files around, one can find that filenames with
+non-ascii or special characters end up in a different directory, one
+that includes a double quote character.  (This is technically the same
+issue as above with quoting, but perhaps an interesting different way
+that it can and has manifested as a problem.)
+
+* It's far too easy to accidentally mix up old and new history.  It's
+still possible with any tool, but git-filter-branch almost invites it.
+If lucky, the only downside is users getting frustrated that they don't
+know how to shrink their repo and remove the old stuff.  If unlucky,
+they merge old and new history and end up with multiple "copies" of each
+commit, some of which have unwanted or sensitive files and others which
+don't.  This comes about in multiple different ways:
+
+  ** the default to only doing a partial history rewrite ('--all' is not
+     the default and few examples show it)
+
+  ** the fact that there's no automatic post-run cleanup
+
+  ** the fact that --tag-name-filter (when used to rename tags) doesn't
+     remove the old tags but just adds new ones with the new name
+
+  ** the fact that little educational information is provided to inform
+     users of the ramifications of a rewrite and how to avoid mixing old
+     and new history.  For example, this man page discusses how users
+     need to understand that they need to rebase their changes for all
+     their branches on top of new history (or delete and reclone), but
+     that's only one of multiple concerns to consider.  See the
+     "DISCUSSION" section of the git filter-repo manual page for more
+     details.
+
+* Annotated tags can be accidentally converted to lightweight tags, due
+to either of two issues:
+
+  ** Someone can do a history rewrite, realize they messed up, restore
+     from the backups in refs/original/, and then redo their
+     git-filter-branch command.  (The backup in refs/original/ is not a
+     real backup; it dereferences tags first.)
+
+  ** Running git-filter-branch with either --tags or --all in your
+     <rev-list options>.  In order to retain annotated tags as
+     annotated, you must use --tag-name-filter (and must not have
+     restored from refs/original/ in a previously botched rewrite).
+
+* Any commit messages that specify an encoding will become corrupted
+by the rewrite; git-filter-branch ignores the encoding, takes the original
+bytes, and feeds it to commit-tree without telling it the proper
+encoding.  (This happens whether or not --msg-filter is used.)
+
+* Commit messages (even if they are all UTF-8) by default become
+corrupted due to not being updated -- any references to other commit
+hashes in commit messages will now refer to no-longer-extant commits.
+
+* There are no facilities for helping users find what unwanted crud they
+should delete, which means they are much more likely to have incomplete
+or partial cleanups that sometimes result in confusion and people
+wasting time trying to understand.  (For example, folks tend to just
+look for big files to delete instead of big directories or extensions,
+and once they do so, then sometime later folks using the new repository
+who are going through history will notice a build artifact directory
+that has some files but not others, or a cache of dependencies
+(node_modules or similar) which couldn't have ever been functional since
+it's missing some files.)
+
+* If --prune-empty isn't specified, then the filtering process can
+create hoards of confusing empty commits
+
+* If --prune-empty is specified, then intentionally placed empty
+commits from before the filtering operation are also pruned instead of
+just pruning commits that became empty due to filtering rules.
+
+* If --prune empty is specified, sometimes empty commits are missed
+and left around anyway (a somewhat rare bug, but it happens...)
+
+* A minor issue, but users who have a goal to update all names and
+emails in a repository may be led to --env-filter which will only update
+authors and committers, missing taggers.
+
+* If the user provides a --tag-name-filter that maps multiple tags to
+the same name, no warning or error is provided; git-filter-branch simply
+overwrites each tag in some undocumented pre-defined order resulting in
+only one tag at the end.  (A git-filter-branch regression test requires
+this.)
+
+Also, the poor performance of git-filter-branch often leads to safety issues:
+
+* Coming up with the correct shell snippet to do the filtering you want
+is sometimes difficult unless you're just doing a trivial modification
+such as deleting a couple files.  Unfortunately, people often learn if
+the snippet is right or wrong by trying it out, but the rightness or
+wrongness can vary depending on special circumstances (spaces in
+filenames, non-ascii filenames, funny author names or emails, invalid
+timezones, presence of grafts or replace objects, etc.), meaning they
+may have to wait a long time, hit an error, then restart.  The
+performance of git-filter-branch is so bad that this cycle is painful,
+reducing the time available to carefully re-check (to say nothing about
+what it does to the patience of the person doing the rewrite even if
+they do technically have more time available).  This problem is extra
+compounded because errors from broken filters may not be shown for a
+long time and/or get lost in a sea of output.  Even worse, broken
+filters often just result in silent incorrect rewrites.
+
+* To top it all off, even when users finally find working commands, they
+naturally want to share them.  But they may be unaware that their repo
+didn't have some special cases that someone else's does.  So, when
+someone else with a different repository runs the same commands, they
+get hit by the problems above.  Or, the user just runs commands that
+really were vetted for special cases, but they run it on a different OS
+where it doesn't work, as noted above.
 
 GIT
 ---
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 247f765604..0c114ad1ca 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -115,15 +115,14 @@ NOTES
 -----
 
 'git gc' tries very hard not to delete objects that are referenced
-anywhere in your repository. In
-particular, it will keep not only objects referenced by your current set
-of branches and tags, but also objects referenced by the index,
-remote-tracking branches, refs saved by 'git filter-branch' in
-refs/original/, reflogs (which may reference commits in branches
-that were later amended or rewound), and anything else in the refs/* namespace.
-If you are expecting some objects to be deleted and they aren't, check
-all of those locations and decide whether it makes sense in your case to
-remove those references.
+anywhere in your repository. In particular, it will keep not only
+objects referenced by your current set of branches and tags, but also
+objects referenced by the index, remote-tracking branches, notes saved
+by 'git notes' under refs/notes/, reflogs (which may reference commits
+in branches that were later amended or rewound), and anything else in
+the refs/* namespace.  If you are expecting some objects to be deleted
+and they aren't, check all of those locations and decide whether it
+makes sense in your case to remove those references.
 
 On the other hand, when 'git gc' runs concurrently with another process,
 there is a risk of it deleting an object that the other process is using
diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 6156609cf7..a8cfc0ad82 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -832,7 +832,8 @@ Hard case: The changes are not the same.::
 	This happens if the 'subsystem' rebase had conflicts, or used
 	`--interactive` to omit, edit, squash, or fixup commits; or
 	if the upstream used one of `commit --amend`, `reset`, or
-	`filter-branch`.
+	a full history rewriting command like
+	https://github.com/newren/git-filter-repo[`filter-repo`].
 
 
 The easy case
diff --git a/Documentation/git-replace.txt b/Documentation/git-replace.txt
index 246dc9943c..f271d758c3 100644
--- a/Documentation/git-replace.txt
+++ b/Documentation/git-replace.txt
@@ -123,10 +123,10 @@ The following format are available:
 CREATING REPLACEMENT OBJECTS
 ----------------------------
 
-linkgit:git-filter-branch[1], linkgit:git-hash-object[1] and
-linkgit:git-rebase[1], among other git commands, can be used to create
-replacement objects from existing objects. The `--edit` option can
-also be used with 'git replace' to create a replacement object by
+linkgit:git-hash-object[1], linkgit:git-rebase[1], and
+https://github.com/newren/git-filter-repo[git-filter-repo], among other git commands, can be used to
+create replacement objects from existing objects. The `--edit` option
+can also be used with 'git replace' to create a replacement object by
 editing an existing object.
 
 If you want to replace many blobs, trees or commits that are part of a
@@ -148,13 +148,13 @@ pending objects.
 SEE ALSO
 --------
 linkgit:git-hash-object[1]
-linkgit:git-filter-branch[1]
 linkgit:git-rebase[1]
 linkgit:git-tag[1]
 linkgit:git-branch[1]
 linkgit:git-commit[1]
 linkgit:git-var[1]
 linkgit:git[1]
+https://github.com/newren/git-filter-repo[git-filter-repo]
 
 GIT
 ---
diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
index 30711625fd..53774f5b64 100644
--- a/Documentation/git-svn.txt
+++ b/Documentation/git-svn.txt
@@ -769,11 +769,11 @@ option for (hopefully) obvious reasons.
 +
 This option is NOT recommended as it makes it difficult to track down
 old references to SVN revision numbers in existing documentation, bug
-reports and archives.  If you plan to eventually migrate from SVN to Git
-and are certain about dropping SVN history, consider
-linkgit:git-filter-branch[1] instead.  filter-branch also allows
-reformatting of metadata for ease-of-reading and rewriting authorship
-info for non-"svn.authorsFile" users.
+reports, and archives.  If you plan to eventually migrate from SVN to
+Git and are certain about dropping SVN history, consider
+https://github.com/newren/git-filter-repo[git-filter-repo] instead.
+filter-repo also allows reformatting of metadata for ease-of-reading
+and rewriting authorship info for non-"svn.authorsFile" users.
 
 svn.useSvmProps::
 svn-remote.<name>.useSvmProps::
diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 82cd573776..5a789c91df 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -425,10 +425,12 @@ post-rewrite
 
 This hook is invoked by commands that rewrite commits
 (linkgit:git-commit[1] when called with `--amend` and
-linkgit:git-rebase[1]; currently `git filter-branch` does 'not' call
-it!).  Its first argument denotes the command it was invoked by:
-currently one of `amend` or `rebase`.  Further command-dependent
-arguments may be passed in the future.
+linkgit:git-rebase[1]; however, full-history (re)writing tools like
+linkgit:git-fast-import[1] or
+https://github.com/newren/git-filter-repo[git-filter-repo] typically
+do not call it!).  Its first argument denotes the command it was
+invoked by: currently one of `amend` or `rebase`.  Further
+command-dependent arguments may be passed in the future.
 
 The hook receives a list of the rewritten commits on stdin, in the
 format
diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
index a3425f4770..19333fc8df 100644
--- a/contrib/svn-fe/svn-fe.txt
+++ b/contrib/svn-fe/svn-fe.txt
@@ -56,7 +56,7 @@ line.  This line has the form `git-svn-id: URL@REVNO UUID`.
 
 The resulting repository will generally require further processing
 to put each project in its own repository and to separate the history
-of each branch.  The 'git filter-branch --subdirectory-filter' command
+of each branch.  The 'git filter-repo --subdirectory-filter' command
 may be useful for this purpose.
 
 BUGS
@@ -67,5 +67,5 @@ The exit status does not reflect whether an error was detected.
 
 SEE ALSO
 --------
-git-svn(1), svn2git(1), svk(1), git-filter-branch(1), git-fast-import(1),
+git-svn(1), svn2git(1), svk(1), git-filter-repo(1), git-fast-import(1),
 https://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 5c5afa2b98..f805965d87 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -83,6 +83,19 @@ set_ident () {
 	finish_ident COMMITTER
 }
 
+if [ -z "$FILTER_BRANCH_SQUELCH_WARNING" -a \
+     -z "$GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS" ]; then
+	cat <<EOF
+WARNING: git-filter-branch has a glut of gotchas generating mangled history
+         rewrites.  Please use an alternative filtering tool such as 'git
+         filter-repo' (https://github.com/newren/git-filter-repo/) instead.
+         See the filter-branch manual page for more details; to squelch
+         this warning, set FILTER_BRANCH_SQUELCH_WARNING=1.
+
+EOF
+	sleep 5
+fi
+
 USAGE="[--setup <command>] [--subdirectory-filter <directory>] [--env-filter <command>]
 	[--tree-filter <command>] [--index-filter <command>]
 	[--parent-filter <command>] [--msg-filter <command>]
-- 
2.23.0.38.g892688c90e


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v4 4/4] t9902: use a non-deprecated command for testing
  2019-08-30  5:57             ` [PATCH v4 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
                                 ` (2 preceding siblings ...)
  2019-08-30  5:57               ` [PATCH v4 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
@ 2019-08-30  5:57               ` Elijah Newren
  3 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-08-30  5:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine, Elijah Newren

t9902 had a list of three random porcelain commands as a sanity check,
one of which was filter-branch.  Since we are recommending people not
use filter-branch, let's update this test to use rebase instead of
filter-branch.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t9902-completion.sh | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index 75512c3403..4e7f669c76 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -28,10 +28,10 @@ complete ()
 #
 # (2) A test makes sure that common subcommands are included in the
 #     completion for "git <TAB>", and a plumbing is excluded.  "add",
-#     "filter-branch" and "ls-files" are listed for this.
+#     "rebase" and "ls-files" are listed for this.
 
-GIT_TESTING_ALL_COMMAND_LIST='add checkout check-attr filter-branch ls-files'
-GIT_TESTING_PORCELAIN_COMMAND_LIST='add checkout filter-branch'
+GIT_TESTING_ALL_COMMAND_LIST='add checkout check-attr rebase ls-files'
+GIT_TESTING_PORCELAIN_COMMAND_LIST='add checkout rebase'
 
 . "$GIT_BUILD_DIR/contrib/completion/git-completion.bash"
 
@@ -1392,12 +1392,12 @@ test_expect_success 'basic' '
 	# built-in
 	grep -q "^add \$" out &&
 	# script
-	grep -q "^filter-branch \$" out &&
+	grep -q "^rebase \$" out &&
 	# plumbing
 	! grep -q "^ls-files \$" out &&
 
-	run_completion "git f" &&
-	! grep -q -v "^f" out
+	run_completion "git r" &&
+	! grep -q -v "^r" out
 '
 
 test_expect_success 'double dash "git" itself' '
-- 
2.23.0.38.g892688c90e


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere
  2019-08-28 17:16                 ` Elijah Newren
  2019-08-28 19:03                   ` Sergey Organov
@ 2019-08-30 20:40                   ` Johannes Schindelin
  2019-08-30 23:22                     ` Elijah Newren
  1 sibling, 1 reply; 73+ messages in thread
From: Johannes Schindelin @ 2019-08-30 20:40 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Sergey Organov, Eric Wong, Git Mailing List, Junio C Hamano,
	Derrick Stolee, Jeff King,
	Ævar Arnfjörð Bjarmason, Lars Schneider,
	Jonathan Nieder

Hi Elijah,


On Wed, 28 Aug 2019, Elijah Newren wrote:

> Hi Sergey,
>
> On Wed, Aug 28, 2019 at 1:52 AM Sergey Organov <sorganov@gmail.com> wrote:
> >
> > Elijah Newren <newren@gmail.com> writes:
> >
> > > On Tue, Aug 27, 2019 at 1:43 AM Sergey Organov <sorganov@gmail.com> wrote:
> > >>
> > >> Eric Wong <e@80x24.org> writes:
> > >>
> > >>
> > >> [...]
> > >>
> > >> > AFAIK, filter-branch is not causing support headaches for any
> > >> > git developers today.  With so many commands in git, it's
> > >> > unlikely newbies will ever get around to discover it :)
> > >> > So I think think we should be in any rush to remove it.
> > >>
> > >> Nah, discovering it is simple. Just Google for "git change author". That
> > >> eventually leads to a script that uses "git filter-branch --env-filter"
> > >> to get the job done, and I'm afraid it is spread all over the world.
> > >>
> > >> See, e.g.:
> > >>
> > >> https://help.github.com/en/articles/changing-author-info
> > >
> > > Side note: Is the goal to "fix names and email addresses in this
> > > repository"?  If so, this guide fails: it doesn't update tagger names
> > > or email addresses.  Indeed, filter-branch doesn't provide a way to do
> > > that.  (Not to mention other problems like not updating references to
> > > commit hashes in commit messages when it busy rewriting everything.)
> >
> > No. Maybe the original goal was like that, by I, personally, use
> > modified version of this to change my "Author" credentials from
> > "internal" to "public" in branches that I'm going to send upstream, so
> > the actual aim is to change e-mail of particular Author from a@b to c@d
> > in all the commits in a (feature) branch.
>
> There's an interesting usecase I hadn't heard of or thought of before.

I'll throw in another use case that's kinda related: extracting the
history of one file (or subdirectory).

In my most recent instance of this, I wanted to publish the script I
used to use for submitting patch series to the Git mailing list,
maintaining tags for iterations and generating cover letters from branch
descriptions and interdiffs (this script eventually became GitGitGadget,
https://github.com/gitgitgadget/gitgitgadget/commits?after=6fb0ede48f86e729292ee1542729bc0f5a30cfa6+0
demonstrates this).

To do that, I ran a `git filter-branch` in the repository where I track
all the scripts I deem unsuitable for public consumption, to remove all
files but `mail-patch-series.sh`, then pushed it to
https://github.com/dscho/mail-patch-series

Please note that most crucially, I wanted to rewrite a newly-created
branch, and only that branch.

Could I have done the same using `git fast-export`, filtering the output
with a Perl script, then passing it to `git fast-import`? Sure, I was
really tempted to do that. In the end, it took less of _my_ time to just
let `git filter-branch` do its work with a not-too-complicated index
filter.

In another instance, a long, long time ago, I needed to restart a
repository which had included way too many files for its own good, then
rename the old repository and start with a fresh `master` that contained
but a single commit whose tree was identical to the previous `master`'s
tip commit. I simply grafted that commit, ran `git filter-branch` and
had precisely what I needed.

I would be _delighted_ if these kinds of use case (rewriting a branch,
or even just a commit range) became more of a first-class citizen with
`git filter-repo`.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere
  2019-08-30 20:40                   ` Johannes Schindelin
@ 2019-08-30 23:22                     ` Elijah Newren
  2019-09-02  9:29                       ` Johannes Schindelin
  0 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-08-30 23:22 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Sergey Organov, Eric Wong, Git Mailing List, Junio C Hamano,
	Derrick Stolee, Jeff King,
	Ævar Arnfjörð Bjarmason, Lars Schneider,
	Jonathan Nieder

Hi Dscho,

On Fri, Aug 30, 2019 at 1:40 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Elijah,
>
>
> On Wed, 28 Aug 2019, Elijah Newren wrote:
>
> > Hi Sergey,
> >
> > On Wed, Aug 28, 2019 at 1:52 AM Sergey Organov <sorganov@gmail.com> wrote:
> > >
> > > Elijah Newren <newren@gmail.com> writes:
> > >
> > > > On Tue, Aug 27, 2019 at 1:43 AM Sergey Organov <sorganov@gmail.com> wrote:
> > > >>
> > > >> Eric Wong <e@80x24.org> writes:
> > > >>
> > > >>
> > > >> [...]
> > > >>
> > > >> > AFAIK, filter-branch is not causing support headaches for any
> > > >> > git developers today.  With so many commands in git, it's
> > > >> > unlikely newbies will ever get around to discover it :)
> > > >> > So I think think we should be in any rush to remove it.
> > > >>
> > > >> Nah, discovering it is simple. Just Google for "git change author". That
> > > >> eventually leads to a script that uses "git filter-branch --env-filter"
> > > >> to get the job done, and I'm afraid it is spread all over the world.
> > > >>
> > > >> See, e.g.:
> > > >>
> > > >> https://help.github.com/en/articles/changing-author-info
> > > >
> > > > Side note: Is the goal to "fix names and email addresses in this
> > > > repository"?  If so, this guide fails: it doesn't update tagger names
> > > > or email addresses.  Indeed, filter-branch doesn't provide a way to do
> > > > that.  (Not to mention other problems like not updating references to
> > > > commit hashes in commit messages when it busy rewriting everything.)
> > >
> > > No. Maybe the original goal was like that, by I, personally, use
> > > modified version of this to change my "Author" credentials from
> > > "internal" to "public" in branches that I'm going to send upstream, so
> > > the actual aim is to change e-mail of particular Author from a@b to c@d
> > > in all the commits in a (feature) branch.
> >
> > There's an interesting usecase I hadn't heard of or thought of before.
>
> I'll throw in another use case that's kinda related: extracting the
> history of one file (or subdirectory).

Thanks for sending these along!  I do have some comments, and a bunch
of questions...

> In my most recent instance of this, I wanted to publish the script I
> used to use for submitting patch series to the Git mailing list,
> maintaining tags for iterations and generating cover letters from branch
> descriptions and interdiffs (this script eventually became GitGitGadget,
> https://github.com/gitgitgadget/gitgitgadget/commits?after=6fb0ede48f86e729292ee1542729bc0f5a30cfa6+0
> demonstrates this).
>
> To do that, I ran a `git filter-branch` in the repository where I track
> all the scripts I deem unsuitable for public consumption, to remove all
> files but `mail-patch-series.sh`, then pushed it to
> https://github.com/dscho/mail-patch-series
>
> Please note that most crucially, I wanted to rewrite a newly-created
> branch, and only that branch.
>
> Could I have done the same using `git fast-export`, filtering the output
> with a Perl script, then passing it to `git fast-import`? Sure, I was
> really tempted to do that. In the end, it took less of _my_ time to just
> let `git filter-branch` do its work with a not-too-complicated index
> filter.

Why a perl script?  Shouldn't
    git fast-export [--no-data] HEAD -- $PATH | git fast-import --force --quiet
do the trick?  And it's probably simpler and shorter than the index
filter you used.

That said, yeah it'd be nice to get automatic rewriting of commit
hashes in commit messages and other niceties from filter-repo (e.g.
future automatic reattaching of notes to the rewritten commits).  Some
questions:

  * What's the backup strategy in case you specify the wrong filters
(e.g. you have a typo in the pathnames)?  filter-repo encourages folks
to make a clone and then filter the fresh clone, because if anything
goes awry, you can just delete and restart.  (I am heavily opposed to
the refs/original/ backup mechanism used by filter-branch, for
multiple reasons.)  Is your safety stance just "If I mess up it's my
own fault; do the rewrite?"  Or are you okay with cloning before
filtering?
  * If you're okay with cloning before filtering...then is there an
issue with rewriting all branches, and just pushing the one you need?
(Is there an issue with "this branch is small, the others are huge,
and filter-branch is slow -- so rewriting one branch saves me lots of
time"?  Or are there other issues at play too?)
  * What if the user has auxiliary information for the branch in other
refs?  For example, git-notes pointing at any of the commits, or tags
in the history of the branch that might be relevant, or perhaps even
replace refs in combination with GIT_NO_REPLACE_OBJECTS=1?  Is this an
"I don't care, toss that stuff and just rewrite just this branch?"
  * filter-repo by default creates new replace references so that you
can refer to new commit IDs using old (unabbreviated) commit IDs.
Would that be considered helpful for this usecase?  unhelpful?
irrelevant, since you'll just push the branch you want somewhere and
nuke the temporary clone?


I'm not by any means ruling out the possibility of documenting --refs
and adjusting the defaults when it is used so the user can just run
something like
   git filter-repo --path $PATH --refs $MYBRANCH
but I feel like I need to understand answers to questions like the
above ones so that I can know how to phrase warnings and adjust
defaults and update the documentation.

> In another instance, a long, long time ago, I needed to restart a
> repository which had included way too many files for its own good, then
> rename the old repository and start with a fresh `master` that contained
> but a single commit whose tree was identical to the previous `master`'s
> tip commit. I simply grafted that commit, ran `git filter-branch` and
> had precisely what I needed.

filter-repo supports grafts and replace objects, the same as
filter-branch.  (Although, technically, I didn't have to do a thing to
support it; fast-export does the special handling of rewriting based
on grafts and replace objects.)  So, I'd say this is fully supported.

Side question: the git-replace documents suggest that the graft file
is deprecated.  Are there any timeframes or plans for phasing out
beyond the git-replace manpage existing?  Should I avoid documenting
the graft file support in filter-repo?  Should I include examples
using not just git-replace but also using the graft file?

> I would be _delighted_ if these kinds of use case (rewriting a branch,
> or even just a commit range) became more of a first-class citizen with
> `git filter-repo`.

I've got all the pieces for supporting a single branch or a commit
range (e.g. 'git filter-repo --path foo --refs ^master~4 ^stable~23
mybranch'), but the defaults (error out unless in a bare repo, move
refs/remotes/origin/* to refs/heads/*, disconnect origin remote,
expire reflogs & repack & prune, create new replace references so
folks can access new commits using old commit IDs) may be somewhat
friction-filled for this usecase.  Those defaults other than the new
replace refs happen to all be turned off with the combination of
--force and --target, so, assuming turning them off is what you need,
you could cheat and just specify 'git filter-repo --force --target .
--refs $MYBRANCH' today and perhaps get what you want, but that's a
really non-intuitive command line that is way too ugly to recommend.
And I don't want to tie myself to '--target .' being the magic sauce
in the future either.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere
  2019-08-30 23:22                     ` Elijah Newren
@ 2019-09-02  9:29                       ` Johannes Schindelin
  2019-09-03 17:37                         ` Elijah Newren
  0 siblings, 1 reply; 73+ messages in thread
From: Johannes Schindelin @ 2019-09-02  9:29 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Sergey Organov, Eric Wong, Git Mailing List, Junio C Hamano,
	Derrick Stolee, Jeff King,
	Ævar Arnfjörð Bjarmason, Lars Schneider,
	Jonathan Nieder

Hi Elijah,

On Fri, 30 Aug 2019, Elijah Newren wrote:

> On Fri, Aug 30, 2019 at 1:40 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>
> > [...]
> > In my most recent instance of this, I wanted to publish the script I
> > used to use for submitting patch series to the Git mailing list,
> > maintaining tags for iterations and generating cover letters from branch
> > descriptions and interdiffs (this script eventually became GitGitGadget,
> > https://github.com/gitgitgadget/gitgitgadget/commits?after=6fb0ede48f86e729292ee1542729bc0f5a30cfa6+0
> > demonstrates this).
> >
> > To do that, I ran a `git filter-branch` in the repository where I track
> > all the scripts I deem unsuitable for public consumption, to remove all
> > files but `mail-patch-series.sh`, then pushed it to
> > https://github.com/dscho/mail-patch-series
> >
> > Please note that most crucially, I wanted to rewrite a newly-created
> > branch, and only that branch.
> >
> > Could I have done the same using `git fast-export`, filtering the output
> > with a Perl script, then passing it to `git fast-import`? Sure, I was
> > really tempted to do that. In the end, it took less of _my_ time to just
> > let `git filter-branch` do its work with a not-too-complicated index
> > filter.
>
> Why a perl script?  Shouldn't
>     git fast-export [--no-data] HEAD -- $PATH | git fast-import --force --quiet
> do the trick?  And it's probably simpler and shorter than the index
> filter you used.

Does that not keep the full `$PATH`? I wanted the resulting branch to
have the file in the top-level directory.

> That said, yeah it'd be nice to get automatic rewriting of commit
> hashes in commit messages and other niceties from filter-repo (e.g.
> future automatic reattaching of notes to the rewritten commits).  Some
> questions:
>
>   * What's the backup strategy in case you specify the wrong filters
> (e.g. you have a typo in the pathnames)?  filter-repo encourages folks
> to make a clone and then filter the fresh clone, because if anything
> goes awry, you can just delete and restart.  (I am heavily opposed to
> the refs/original/ backup mechanism used by filter-branch, for
> multiple reasons.)  Is your safety stance just "If I mess up it's my
> own fault; do the rewrite?"  Or are you okay with cloning before
> filtering?

Please note that the `refs/original/` refs should not have been written
at all anymore, not after reflogs were introduced.

Incidentally, that is my answer to your question: the reflog is my
backup.

>   * If you're okay with cloning before filtering...then is there an
> issue with rewriting all branches, and just pushing the one you need?
> (Is there an issue with "this branch is small, the others are huge,
> and filter-branch is slow -- so rewriting one branch saves me lots of
> time"?  Or are there other issues at play too?)

I am not okay with cloning before filtering.

First of all, it is wasteful.

Second of all, in my case it would have been *particularly* wasteful
because the repository in question also has quite a few quite large
blobs (hysterical raisins, don't ask).

>   * What if the user has auxiliary information for the branch in other
> refs?  For example, git-notes pointing at any of the commits, or tags
> in the history of the branch that might be relevant, or perhaps even
> replace refs in combination with GIT_NO_REPLACE_OBJECTS=1?  Is this an
> "I don't care, toss that stuff and just rewrite just this branch?"

In my case: there are no notes. The only time when I make heavy use of
notes is in GitGitGadget. I don't use that feature otherwise.

>   * filter-repo by default creates new replace references so that you
> can refer to new commit IDs using old (unabbreviated) commit IDs.
> Would that be considered helpful for this usecase?  unhelpful?
> irrelevant, since you'll just push the branch you want somewhere and
> nuke the temporary clone?

I definitely did not need that mapping in all of my `git filter-branch`
use cases.

Of course, I can see how it can come in handy in other circumstances,
just not in the ones I experienced so far.

> I'm not by any means ruling out the possibility of documenting --refs
> and adjusting the defaults when it is used so the user can just run
> something like
>    git filter-repo --path $PATH --refs $MYBRANCH
> but I feel like I need to understand answers to questions like the
> above ones so that I can know how to phrase warnings and adjust
> defaults and update the documentation.

In all the scenarios where I used `git filter-branch` (some dozen per
year, so not all *that* many), I needed to rewrite one particular
branch, typically a freshly-created one. I never, ever ever needed to
rewrite all the refs in the repository. Not once ;-)

> > In another instance, a long, long time ago, I needed to restart a
> > repository which had included way too many files for its own good, then
> > rename the old repository and start with a fresh `master` that contained
> > but a single commit whose tree was identical to the previous `master`'s
> > tip commit. I simply grafted that commit, ran `git filter-branch` and
> > had precisely what I needed.
>
> filter-repo supports grafts and replace objects, the same as
> filter-branch.  (Although, technically, I didn't have to do a thing to
> support it; fast-export does the special handling of rewriting based
> on grafts and replace objects.)  So, I'd say this is fully supported.
>
> Side question: the git-replace documents suggest that the graft file
> is deprecated.  Are there any timeframes or plans for phasing out
> beyond the git-replace manpage existing?  Should I avoid documenting
> the graft file support in filter-repo?  Should I include examples
> using not just git-replace but also using the graft file?

I had meant to prepare a patch series to remove `grafts` support that
Junio could carry in `pu` until the time he considers it appropriate to
merge to `master`, but it seems that this task fell under the rag.

The deprecation itself has been introduced in tags/v2.18.0-rc0~54^2~4,
i.e. it is official as of Git v2.18.0, which was released in mid-June
last year.

My personal gut feeling is that we should let it simmer for another year
before removing support for the `grafts` file (and we may want to update
the label "grafted" when `git log` shows a shallow commit before we
remove that support for `grafts`).

So I'll not work on that patch for now.

> > I would be _delighted_ if these kinds of use case (rewriting a branch,
> > or even just a commit range) became more of a first-class citizen with
> > `git filter-repo`.
>
> I've got all the pieces for supporting a single branch or a commit
> range (e.g. 'git filter-repo --path foo --refs ^master~4 ^stable~23
> mybranch'), but the defaults (error out unless in a bare repo, move
> refs/remotes/origin/* to refs/heads/*, disconnect origin remote,
> expire reflogs & repack & prune, create new replace references so
> folks can access new commits using old commit IDs) may be somewhat
> friction-filled for this usecase.  Those defaults other than the new
> replace refs happen to all be turned off with the combination of
> --force and --target, so, assuming turning them off is what you need,
> you could cheat and just specify 'git filter-repo --force --target .
> --refs $MYBRANCH' today and perhaps get what you want, but that's a
> really non-intuitive command line that is way too ugly to recommend.
> And I don't want to tie myself to '--target .' being the magic sauce
> in the future either.

I agree. I would love for my use cases to become more of first-class
citizens. Maybe `--branch <branch>` could serve as the knob?

What I also found really helpful in `git filter-branch` is that it was
possible to pass one-liner shell scripts directly to the command, giving
a lot of freedom about the transformations. I understand that Python
makes it hard to write spaghetti-code one-liners, so you cannot really
pass the snippet in via the command-line, but I hope there is a way to
script things in `git filter-repo`?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v4 2/4] t3427: accelerate this test by using fast-export and fast-import
  2019-08-30  5:57               ` [PATCH v4 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
@ 2019-09-02 14:45                 ` Johannes Schindelin
  0 siblings, 0 replies; 73+ messages in thread
From: Johannes Schindelin @ 2019-09-02 14:45 UTC (permalink / raw)
  To: Elijah Newren
  Cc: git, Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Lars Schneider,
	Jonathan Nieder, Eric Sunshine

Hi Elijah,

On Thu, 29 Aug 2019, Elijah Newren wrote:

> fast-export and fast-import can easily handle the simple rewrite that
> was being done by filter-branch, and should be significantly faster on
> systems with a slow fork.  Timings from before and after on two laptops
> that I have access to (measured via `time ./t3427-rebase-subtree.sh`,
> i.e. including everything in this test -- not just the filter-branch or
> fast-export/fast-import pair):
>
>    Linux:  4.305s -> 3.684s (~17% speedup)
>    Mac:   10.128s -> 7.038s (~30% speedup)

This patch seems to accelerate t3427 on my Windows laptop, too, from
~1m37s to ~1m17s, i.e. ~20%.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v4 1/4] t6006: simplify and optimize empty message test
  2019-08-30  5:57               ` [PATCH v4 1/4] t6006: simplify and optimize empty message test Elijah Newren
@ 2019-09-02 14:47                 ` Johannes Schindelin
  0 siblings, 0 replies; 73+ messages in thread
From: Johannes Schindelin @ 2019-09-02 14:47 UTC (permalink / raw)
  To: Elijah Newren
  Cc: git, Junio C Hamano, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Lars Schneider,
	Jonathan Nieder, Eric Sunshine

Hi Elijah,

On Thu, 29 Aug 2019, Elijah Newren wrote:

> Despite only being one piece of the 71st test and there being 73 tests
> overall, this small change to just this one test speeds up the overall
> execution time of t6006 (as measured by the best of 3 runs of `time
> ./t6006-rev-list-format.sh`) by about 11% on Linux and by 13% on
> Mac.

A similar effect can be observed on my Windows laptop: from 25s to 21s,
i.e. ~15%.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere
  2019-09-02  9:29                       ` Johannes Schindelin
@ 2019-09-03 17:37                         ` Elijah Newren
  0 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-03 17:37 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Sergey Organov, Eric Wong, Git Mailing List, Junio C Hamano,
	Derrick Stolee, Jeff King,
	Ævar Arnfjörð Bjarmason, Lars Schneider,
	Jonathan Nieder

Hi Dscho,

On Mon, Sep 2, 2019 at 2:30 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Elijah,
>
> On Fri, 30 Aug 2019, Elijah Newren wrote:
>
> > On Fri, Aug 30, 2019 at 1:40 PM Johannes Schindelin
> > <Johannes.Schindelin@gmx.de> wrote:
> >
> > > [...]
> > > In my most recent instance of this, I wanted to publish the script I
> > > used to use for submitting patch series to the Git mailing list,
> > > maintaining tags for iterations and generating cover letters from branch
> > > descriptions and interdiffs (this script eventually became GitGitGadget,
> > > https://github.com/gitgitgadget/gitgitgadget/commits?after=6fb0ede48f86e729292ee1542729bc0f5a30cfa6+0
> > > demonstrates this).
> > >
> > > To do that, I ran a `git filter-branch` in the repository where I track
> > > all the scripts I deem unsuitable for public consumption, to remove all
> > > files but `mail-patch-series.sh`, then pushed it to
> > > https://github.com/dscho/mail-patch-series
> > >
> > > Please note that most crucially, I wanted to rewrite a newly-created
> > > branch, and only that branch.
> > >
> > > Could I have done the same using `git fast-export`, filtering the output
> > > with a Perl script, then passing it to `git fast-import`? Sure, I was
> > > really tempted to do that. In the end, it took less of _my_ time to just
> > > let `git filter-branch` do its work with a not-too-complicated index
> > > filter.
> >
> > Why a perl script?  Shouldn't
> >     git fast-export [--no-data] HEAD -- $PATH | git fast-import --force --quiet
> > do the trick?  And it's probably simpler and shorter than the index
> > filter you used.
>
> Does that not keep the full `$PATH`? I wanted the resulting branch to
> have the file in the top-level directory.

Ah, gotcha; I read your original description to suggest that the
script was already at the toplevel.

> > That said, yeah it'd be nice to get automatic rewriting of commit
> > hashes in commit messages and other niceties from filter-repo (e.g.
> > future automatic reattaching of notes to the rewritten commits).  Some
> > questions:
> >
> >   * What's the backup strategy in case you specify the wrong filters
> > (e.g. you have a typo in the pathnames)?  filter-repo encourages folks
> > to make a clone and then filter the fresh clone, because if anything
> > goes awry, you can just delete and restart.  (I am heavily opposed to
> > the refs/original/ backup mechanism used by filter-branch, for
> > multiple reasons.)  Is your safety stance just "If I mess up it's my
> > own fault; do the rewrite?"  Or are you okay with cloning before
> > filtering?
>
> Please note that the `refs/original/` refs should not have been written
> at all anymore, not after reflogs were introduced.
>
> Incidentally, that is my answer to your question: the reflog is my
> backup.

The reflog is great, but while it works in your special case please note that:

1. Anyone filtering a subset of refs more in number than one may have
some difficulty restoring correctly (they have to look in several
reflogs, and can't script restoring from all older reflog versions).
2. Few folks have core.logAllRefUpdates set to 'always', meaning
they'll lack a backup for some refs if the reflog is relied upon.
3. If the filter specifies only keeping a list of files that happen to
not exist within one of the branches (perhaps a filename was typo'ed)
and if pruning empty commits, then the branch can be deleted, and git
doesn't have a mechanism for deleting a branch without deleting its
reflog as far as I know.

Point 1 is kind of minor, but points 2 and 3 are showstoppers in
regards to me recommending the reflogs as a reliable recovery
mechanism after general filtering operations, and this is true for
either filter-branch or filter-repo.  (That said, I can definitely
allow people to choose their risks and just provide some
here-be-dragons warnings.)

> >   * If you're okay with cloning before filtering...then is there an
> > issue with rewriting all branches, and just pushing the one you need?
> > (Is there an issue with "this branch is small, the others are huge,
> > and filter-branch is slow -- so rewriting one branch saves me lots of
> > time"?  Or are there other issues at play too?)
>
> I am not okay with cloning before filtering.
>
> First of all, it is wasteful.
>
> Second of all, in my case it would have been *particularly* wasteful
> because the repository in question also has quite a few quite large
> blobs (hysterical raisins, don't ask).
>
> >   * What if the user has auxiliary information for the branch in other
> > refs?  For example, git-notes pointing at any of the commits, or tags
> > in the history of the branch that might be relevant, or perhaps even
> > replace refs in combination with GIT_NO_REPLACE_OBJECTS=1?  Is this an
> > "I don't care, toss that stuff and just rewrite just this branch?"
>
> In my case: there are no notes. The only time when I make heavy use of
> notes is in GitGitGadget. I don't use that feature otherwise.
>
> >   * filter-repo by default creates new replace references so that you
> > can refer to new commit IDs using old (unabbreviated) commit IDs.
> > Would that be considered helpful for this usecase?  unhelpful?
> > irrelevant, since you'll just push the branch you want somewhere and
> > nuke the temporary clone?
>
> I definitely did not need that mapping in all of my `git filter-branch`
> use cases.
>
> Of course, I can see how it can come in handy in other circumstances,
> just not in the ones I experienced so far.
>
> > I'm not by any means ruling out the possibility of documenting --refs
> > and adjusting the defaults when it is used so the user can just run
> > something like
> >    git filter-repo --path $PATH --refs $MYBRANCH
> > but I feel like I need to understand answers to questions like the
> > above ones so that I can know how to phrase warnings and adjust
> > defaults and update the documentation.
>
> In all the scenarios where I used `git filter-branch` (some dozen per
> year, so not all *that* many), I needed to rewrite one particular
> branch, typically a freshly-created one. I never, ever ever needed to
> rewrite all the refs in the repository. Not once ;-)

Thanks for answering all these and providing the extra context.  Very helpful.

> > > In another instance, a long, long time ago, I needed to restart a
> > > repository which had included way too many files for its own good, then
> > > rename the old repository and start with a fresh `master` that contained
> > > but a single commit whose tree was identical to the previous `master`'s
> > > tip commit. I simply grafted that commit, ran `git filter-branch` and
> > > had precisely what I needed.
> >
> > filter-repo supports grafts and replace objects, the same as
> > filter-branch.  (Although, technically, I didn't have to do a thing to
> > support it; fast-export does the special handling of rewriting based
> > on grafts and replace objects.)  So, I'd say this is fully supported.
> >
> > Side question: the git-replace documents suggest that the graft file
> > is deprecated.  Are there any timeframes or plans for phasing out
> > beyond the git-replace manpage existing?  Should I avoid documenting
> > the graft file support in filter-repo?  Should I include examples
> > using not just git-replace but also using the graft file?
>
> I had meant to prepare a patch series to remove `grafts` support that
> Junio could carry in `pu` until the time he considers it appropriate to
> merge to `master`, but it seems that this task fell under the rag.
>
> The deprecation itself has been introduced in tags/v2.18.0-rc0~54^2~4,
> i.e. it is official as of Git v2.18.0, which was released in mid-June
> last year.
>
> My personal gut feeling is that we should let it simmer for another year
> before removing support for the `grafts` file (and we may want to update
> the label "grafted" when `git log` shows a shallow commit before we
> remove that support for `grafts`).
>
> So I'll not work on that patch for now.

Thanks for the extra history.

> > > I would be _delighted_ if these kinds of use case (rewriting a branch,
> > > or even just a commit range) became more of a first-class citizen with
> > > `git filter-repo`.
> >
> > I've got all the pieces for supporting a single branch or a commit
> > range (e.g. 'git filter-repo --path foo --refs ^master~4 ^stable~23
> > mybranch'), but the defaults (error out unless in a bare repo, move
> > refs/remotes/origin/* to refs/heads/*, disconnect origin remote,
> > expire reflogs & repack & prune, create new replace references so
> > folks can access new commits using old commit IDs) may be somewhat
> > friction-filled for this usecase.  Those defaults other than the new
> > replace refs happen to all be turned off with the combination of
> > --force and --target, so, assuming turning them off is what you need,
> > you could cheat and just specify 'git filter-repo --force --target .
> > --refs $MYBRANCH' today and perhaps get what you want, but that's a
> > really non-intuitive command line that is way too ugly to recommend.
> > And I don't want to tie myself to '--target .' being the magic sauce
> > in the future either.
>
> I agree. I would love for my use cases to become more of first-class
> citizens. Maybe `--branch <branch>` could serve as the knob?

I think I can put something together to make your usecases better.  I
dislike --branch, though, because:
  * It suggests it doesn't work for tags or other things outside of refs/heads/*
  * It suggests that revision ranges are unwelcome.
So, I'd prefer a more generic --refs which can potentially take
multiple arguments, e.g. any of
  --refs mybranch
  --refs mytag
  --refs HEAD~5..mybranch
  --refs ^origin/master ^origin/other-feature mybranch1 mybranch2

> What I also found really helpful in `git filter-branch` is that it was
> possible to pass one-liner shell scripts directly to the command, giving
> a lot of freedom about the transformations. I understand that Python
> makes it hard to write spaghetti-code one-liners, so you cannot really
> pass the snippet in via the command-line, but I hope there is a way to
> script things in `git filter-repo`?

I agree, having the ability to use a programming language snippet or
even full script for special cases is really nice. You can totally do
that filter-repo, at three different levels:

1. Light-control for easier cases: You can use the command line
arguments --filename-callback, --message-callback, --name-callback,
--email-callback, or --refname-callback and provide python snippets
(usually one-liners) that return a new value.  Most of these are for
editing fields shared across multiple object types (e.g. the name
callback will be used to edit author && committer && tagger names).
However, the filename callback allows both editing the filename and
also filtering based upon filename (you can return the original
filename, OR a new name, OR you can return None to state that you want
files with that name filtered out of commits).

2. Moderate-control: You can use the command line arguments
--blob-callback, --commit-callback, --tag-callback, or
--reset-callback and provide python snippets (possibly one-liners but
more likely to be complex enough that you want newlines) that modify
these fast-import-stream objects.  These provide more control but tend
to be slightly more work. (For example, if you want to rename
branches, you need to worry about three callbacks: commit && tag &&
reset; by comparison, you'd only need to use the refname callback from
lighter control.  Also, if you're worried about filtering based on
filenames, you'll need to dig the filenames (and modes and change
types) out of the list commit.file_changes).

3. All-in: You can write a python script that imports filter-repo as a
python module and (among other things) set up your own
functions/classes as callbacks.  You have to do a bit more setup to
specify the options you are running with, list how many export and
import processes you want to run with, name all your callbacks, etc.,
but it allows you to do anything from just providing a slightly more
involved callback up to and including creating your own filtering tool
with a totally different user interface while still leveraging
filter-repo's capability.  There are multiple examples provided along
that range too (including bfg-ish and
filter-lamely/filter-branch-ish.)

For more details about all of these, see:
https://github.com/newren/git-filter-repo/blob/a6a6a1b0/README.md#callbacks
https://github.com/newren/git-filter-repo/blob/a6a6a1b0/README.md#using-filter-repo-as-a-library

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v5 0/4] Warn about git-filter-branch usage and avoid it
  2019-08-28  0:22         ` [PATCH v2 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
                             ` (4 preceding siblings ...)
  2019-08-29  0:06           ` [PATCH v3 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
@ 2019-09-03 18:55           ` Elijah Newren
  2019-09-03 18:55             ` [PATCH v5 1/4] t6006: simplify and optimize empty message test Elijah Newren
                               ` (4 more replies)
  5 siblings, 5 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-03 18:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine, Elijah Newren

It's been about 5 days with no further feedback, other than some timings
from Dscho for Windows showing that my fixes help there too.  So, I did
one last re-read, made a couple small wording tweaks, and am resending as
ready for inclusion.

Changes since v4:
  * Included the windows timings from Dscho in the commit messages for
    the first two perf patches
  * A few slight wording tweaks to the manpage

Elijah Newren (4):
  t6006: simplify and optimize empty message test
  t3427: accelerate this test by using fast-export and fast-import
  Recommend git-filter-repo instead of git-filter-branch
  t9902: use a non-deprecated command for testing

 Documentation/git-fast-export.txt   |   6 +-
 Documentation/git-filter-branch.txt | 273 +++++++++++++++++++++++++---
 Documentation/git-gc.txt            |  17 +-
 Documentation/git-rebase.txt        |   3 +-
 Documentation/git-replace.txt       |  10 +-
 Documentation/git-svn.txt           |  10 +-
 Documentation/githooks.txt          |  10 +-
 contrib/svn-fe/svn-fe.txt           |   4 +-
 git-filter-branch.sh                |  13 ++
 t/t3427-rebase-subtree.sh           |  24 ++-
 t/t6006-rev-list-format.sh          |   5 +-
 t/t9902-completion.sh               |  12 +-
 12 files changed, 310 insertions(+), 77 deletions(-)

Range-diff:
1:  7ddbeea2ca ! 1:  ccea0e5846 t6006: simplify and optimize empty message test
    @@ Commit message
         Despite only being one piece of the 71st test and there being 73 tests
         overall, this small change to just this one test speeds up the overall
         execution time of t6006 (as measured by the best of 3 runs of `time
    -    ./t6006-rev-list-format.sh`) by about 11% on Linux and by 13% on
    -    Mac.
    +    ./t6006-rev-list-format.sh`) by about 11% on Linux, 13% on Mac, and
    +    about 15% on Windows.
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
     
2:  e1e63189c1 ! 2:  6d73135006 t3427: accelerate this test by using fast-export and fast-import
    @@ Commit message
     
         fast-export and fast-import can easily handle the simple rewrite that
         was being done by filter-branch, and should be significantly faster on
    -    systems with a slow fork.  Timings from before and after on two laptops
    -    that I have access to (measured via `time ./t3427-rebase-subtree.sh`,
    -    i.e. including everything in this test -- not just the filter-branch or
    -    fast-export/fast-import pair):
    +    systems with a slow fork.  Timings from before and after on a few
    +    laptops that I or others measured on (measured via `time
    +    ./t3427-rebase-subtree.sh`, i.e. including everything in this test --
    +    not just the filter-branch or fast-export/fast-import pair):
     
    -       Linux:  4.305s -> 3.684s (~17% speedup)
    -       Mac:   10.128s -> 7.038s (~30% speedup)
    +       Linux:    4.305s -> 3.684s (~17% speedup)
    +       Mac:     10.128s -> 7.038s (~30% speedup)
    +       Windows:  1m 37s -> 1m 17s (~26% speedup)
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
     
3:  ed6505584f ! 3:  2f225c8697 Recommend git-filter-repo instead of git-filter-branch
    @@ Documentation/git-filter-branch.txt: warned.
     +document or provide to a coworker, who then runs them on a different OS
     +where the same commands are not working/tested (some examples in the
     +git-filter-branch manpage are also affected by this).  BSD vs. GNU
    -+userland differences can really bite.  If you're lucky, you get ugly
    -+error messages spewed.  But just as likely, the commands either don't do
    -+the filtering requested, or silently corrupt making some unwanted
    -+change.  The unwanted change may only affect a few commits, so it's not
    -+necessarily obvious either.  (The fact that problems won't necessarily
    -+be obvious means they are likely to go unnoticed until the rewritten
    -+history is in use for quite a while, at which point it's really hard to
    -+justify another flag-day for another rewrite.)
    ++userland differences can really bite.  If lucky, error messages are
    ++spewed.  But just as likely, the commands either don't do the filtering
    ++requested, or silently corrupt by making some unwanted change.  The
    ++unwanted change may only affect a few commits, so it's not necessarily
    ++obvious either.  (The fact that problems won't necessarily be obvious
    ++means they are likely to go unnoticed until the rewritten history is in
    ++use for quite a while, at which point it's really hard to justify
    ++another flag-day for another rewrite.)
     +
     +* Filenames with spaces are often mishandled by shell snippets since
     +they cause problems for shell pipelines.  Not everyone is familiar with
     +find -print0, xargs -0, git-ls-files -z, etc.  Even people who are
    -+familiar with these may assume such needs are not relevant because
    ++familiar with these may assume such flags are not relevant because
     +someone else renamed any such files in their repo back before the person
    -+doing the filtering joined the project.  And, often, even those familiar
    -+with handling arguments with spaces my not do so just because they
    ++doing the filtering joined the project.  And often, even those familiar
    ++with handling arguments with spaces may not do so just because they
     +aren't in the mindset of thinking about everything that could possibly
     +go wrong.
     +
     +* Non-ascii filenames can be silently removed despite being in a desired
    -+directory.  The desire to select paths to keep often use pipelines like
    ++directory.  Keeping only wanted paths is often done using pipelines like
     +`git ls-files | grep -v ^WANTED_DIR/ | xargs git rm`.  ls-files will
    -+only quote filenames if needed so folks may not notice that one of the
    -+files didn't match the regex, again until it's much too late.  Yes,
    -+someone who knows about core.quotePath can avoid this (unless they have
    -+other special characters like \t, \n, or "), and people who use ls-files
    -+-z with something other than grep can avoid this, but that doesn't mean
    -+they will.
    ++only quote filenames if needed, so folks may not notice that one of the
    ++files didn't match the regex (at least not until it's much too late).
    ++Yes, someone who knows about core.quotePath can avoid this (unless they
    ++have other special characters like \t, \n, or "), and people who use
    ++ls-files -z with something other than grep can avoid this, but that
    ++doesn't mean they will.
     +
     +* Similarly, when moving files around, one can find that filenames with
     +non-ascii or special characters end up in a different directory, one
    @@ Documentation/git-filter-branch.txt: warned.
     +the same name, no warning or error is provided; git-filter-branch simply
     +overwrites each tag in some undocumented pre-defined order resulting in
     +only one tag at the end.  (A git-filter-branch regression test requires
    -+this.)
    ++this surprising behavior.)
     +
    -+Also, the poor performance of git-filter-branch often leads to safety issues:
    ++Also, the poor performance of git-filter-branch often leads to safety
    ++issues:
     +
     +* Coming up with the correct shell snippet to do the filtering you want
     +is sometimes difficult unless you're just doing a trivial modification
4:  ca8e124cb3 = 4:  048eba375b t9902: use a non-deprecated command for testing
-- 
2.23.0.39.gf92d9de5c3


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v5 1/4] t6006: simplify and optimize empty message test
  2019-09-03 18:55           ` [PATCH v5 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
@ 2019-09-03 18:55             ` Elijah Newren
  2019-09-03 21:08               ` Junio C Hamano
  2019-09-03 18:55             ` [PATCH v5 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
                               ` (3 subsequent siblings)
  4 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-09-03 18:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine, Elijah Newren

Test t6006.71 ("oneline with empty message") was creating two commits
with simple commit messages, and then running filter-branch to rewrite
the commit messages to be empty.  This test was written this way because
the --allow-empty-message option to git commit did not exist at the
time.  Simplify this test and avoid the need to invoke filter-branch by
just using --allow-empty-message when creating the commit.

Despite only being one piece of the 71st test and there being 73 tests
overall, this small change to just this one test speeds up the overall
execution time of t6006 (as measured by the best of 3 runs of `time
./t6006-rev-list-format.sh`) by about 11% on Linux, 13% on Mac, and
about 15% on Windows.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t6006-rev-list-format.sh | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
index da113d975b..d30e41c9f7 100755
--- a/t/t6006-rev-list-format.sh
+++ b/t/t6006-rev-list-format.sh
@@ -501,9 +501,8 @@ test_expect_success 'reflog identity' '
 '
 
 test_expect_success 'oneline with empty message' '
-	git commit -m "dummy" --allow-empty &&
-	git commit -m "dummy" --allow-empty &&
-	git filter-branch --msg-filter "sed -e s/dummy//" HEAD^^.. &&
+	git commit --allow-empty --allow-empty-message &&
+	git commit --allow-empty --allow-empty-message &&
 	git rev-list --oneline HEAD >test.txt &&
 	test_line_count = 5 test.txt &&
 	git rev-list --oneline --graph HEAD >testg.txt &&
-- 
2.23.0.39.gf92d9de5c3


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v5 2/4] t3427: accelerate this test by using fast-export and fast-import
  2019-09-03 18:55           ` [PATCH v5 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
  2019-09-03 18:55             ` [PATCH v5 1/4] t6006: simplify and optimize empty message test Elijah Newren
@ 2019-09-03 18:55             ` Elijah Newren
  2019-09-03 21:26               ` Junio C Hamano
  2019-09-03 18:55             ` [PATCH v5 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
                               ` (2 subsequent siblings)
  4 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-09-03 18:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine, Elijah Newren

fast-export and fast-import can easily handle the simple rewrite that
was being done by filter-branch, and should be significantly faster on
systems with a slow fork.  Timings from before and after on a few
laptops that I or others measured on (measured via `time
./t3427-rebase-subtree.sh`, i.e. including everything in this test --
not just the filter-branch or fast-export/fast-import pair):

   Linux:    4.305s -> 3.684s (~17% speedup)
   Mac:     10.128s -> 7.038s (~30% speedup)
   Windows:  1m 37s -> 1m 17s (~26% speedup)

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t3427-rebase-subtree.sh | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/t/t3427-rebase-subtree.sh b/t/t3427-rebase-subtree.sh
index d8640522a0..c1f6102921 100755
--- a/t/t3427-rebase-subtree.sh
+++ b/t/t3427-rebase-subtree.sh
@@ -7,10 +7,16 @@ This test runs git rebase and tests the subtree strategy.
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-rebase.sh
 
-commit_message() {
+commit_message () {
 	git log --pretty=format:%s -1 "$1"
 }
 
+extract_files_subtree () {
+	git fast-export --no-data HEAD -- files_subtree/ |
+		sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" |
+		git fast-import --force --quiet
+}
+
 test_expect_success 'setup' '
 	test_commit README &&
 	mkdir files &&
@@ -42,7 +48,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --preserve-merges --onto commit 4' '
 	reset_rebase &&
 	git checkout -b rebase-preserve-merges-4 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD~)" = "files_subtree/master4"
@@ -53,7 +59,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --preserve-merges --onto commit 5' '
 	reset_rebase &&
 	git checkout -b rebase-preserve-merges-5 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD)" = "files_subtree/master5"
@@ -64,7 +70,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --keep-empty --preserve-merges --onto commit 4' '
 	reset_rebase &&
 	git checkout -b rebase-keep-empty-4 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --keep-empty --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD~2)" = "files_subtree/master4"
@@ -75,7 +81,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --keep-empty --preserve-merges --onto commit 5' '
 	reset_rebase &&
 	git checkout -b rebase-keep-empty-5 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --keep-empty --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD~)" = "files_subtree/master5"
@@ -86,7 +92,7 @@ test_expect_failure REBASE_P \
 	'Rebase -Xsubtree --keep-empty --preserve-merges --onto empty commit' '
 	reset_rebase &&
 	git checkout -b rebase-keep-empty-empty master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --keep-empty --preserve-merges --onto files-master master &&
 	verbose test "$(commit_message HEAD)" = "Empty commit"
@@ -96,7 +102,7 @@ test_expect_failure REBASE_P \
 test_expect_failure 'Rebase -Xsubtree --onto commit 4' '
 	reset_rebase &&
 	git checkout -b rebase-onto-4 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --onto files-master master &&
 	verbose test "$(commit_message HEAD~2)" = "files_subtree/master4"
@@ -106,7 +112,7 @@ test_expect_failure 'Rebase -Xsubtree --onto commit 4' '
 test_expect_failure 'Rebase -Xsubtree --onto commit 5' '
 	reset_rebase &&
 	git checkout -b rebase-onto-5 master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --onto files-master master &&
 	verbose test "$(commit_message HEAD~)" = "files_subtree/master5"
@@ -115,7 +121,7 @@ test_expect_failure 'Rebase -Xsubtree --onto commit 5' '
 test_expect_failure 'Rebase -Xsubtree --onto empty commit' '
 	reset_rebase &&
 	git checkout -b rebase-onto-empty master &&
-	git filter-branch --prune-empty -f --subdirectory-filter files_subtree &&
+	extract_files_subtree &&
 	git commit -m "Empty commit" --allow-empty &&
 	git rebase -Xsubtree=files_subtree --onto files-master master &&
 	verbose test "$(commit_message HEAD)" = "Empty commit"
-- 
2.23.0.39.gf92d9de5c3


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v5 3/4] Recommend git-filter-repo instead of git-filter-branch
  2019-09-03 18:55           ` [PATCH v5 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
  2019-09-03 18:55             ` [PATCH v5 1/4] t6006: simplify and optimize empty message test Elijah Newren
  2019-09-03 18:55             ` [PATCH v5 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
@ 2019-09-03 18:55             ` Elijah Newren
  2019-09-03 21:40               ` Junio C Hamano
  2019-09-03 18:55             ` [PATCH v5 4/4] t9902: use a non-deprecated command for testing Elijah Newren
  2019-09-04 22:32             ` [PATCH v6 0/3] Warn about git-filter-branch usage and avoid it Elijah Newren
  4 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-09-03 18:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine, Elijah Newren

filter-branch suffers from a deluge of disguised dangers that disfigure
history rewrites (i.e. deviate from the deliberate changes).  Many of
these problems are unobtrusive and can easily go undiscovered until the
new repository is in use.  This can result in problems ranging from an
even messier history than what led folks to filter-branch in the first
place, to data loss or corruption.  These issues cannot be backward
compatibly fixed, so add a warning to both filter-branch and its manpage
recommending that another tool (such as filter-repo) be used instead.

Also, update other manpages that referenced filter-branch.  Several of
these needed updates even if we could continue recommending
filter-branch, either due to implying that something was unique to
filter-branch when it applied more generally to all history rewriting
tools (e.g. BFG, reposurgeon, fast-import, filter-repo), or because
something about filter-branch was used as an example despite other more
commonly known examples now existing.  Reword these sections to fix
these issues and to avoid recommending filter-branch.

Finally, remove the section explaining BFG Repo Cleaner as an
alternative to filter-branch.  I feel somewhat bad about this,
especially since I feel like I learned so much from BFG that I put to
good use in filter-repo (which is much more than I can say for
filter-branch), but keeping that section presented a few problems:
  * In order to recommend that people quit using filter-branch, we need
    to provide them a recomendation for something else to use that
    can handle all the same types of rewrites.  To my knowledge,
    filter-repo is the only such tool.  So it needs to be mentioned.
  * I don't want to give conflicting recommendations to users
  * If we recommend two tools, we shouldn't expect users to learn both
    and pick which one to use; we should explain which problems one
    can solve that the other can't or when one is much faster than
    the other.
  * BFG and filter-repo have similar performance
  * All filtering types that BFG can do, filter-repo can also do.  In
    fact, filter-repo comes with a reimplementation of BFG named
    bfg-ish which provides the same user-interface as BFG but with
    several bugfixes and new features that are hard to implement in
    BFG due to its technical underpinnings.
While I could still mention both tools, it seems like I would need to
provide some kind of comparison and I would ultimately just say that
filter-repo can do everything BFG can, so ultimately it seems that it
is just better to remove that section altogether.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-fast-export.txt   |   6 +-
 Documentation/git-filter-branch.txt | 273 +++++++++++++++++++++++++---
 Documentation/git-gc.txt            |  17 +-
 Documentation/git-rebase.txt        |   3 +-
 Documentation/git-replace.txt       |  10 +-
 Documentation/git-svn.txt           |  10 +-
 Documentation/githooks.txt          |  10 +-
 contrib/svn-fe/svn-fe.txt           |   4 +-
 git-filter-branch.sh                |  13 ++
 9 files changed, 287 insertions(+), 59 deletions(-)

diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
index cc940eb9ad..784e934009 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -17,9 +17,9 @@ This program dumps the given revisions in a form suitable to be piped
 into 'git fast-import'.
 
 You can use it as a human-readable bundle replacement (see
-linkgit:git-bundle[1]), or as a kind of an interactive
-'git filter-branch'.
-
+linkgit:git-bundle[1]), or as a format that can be edited before being
+fed to 'git fast-import' in order to do history rewrites (an ability
+relied on by tools like 'git filter-repo').
 
 OPTIONS
 -------
diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
index 6b53dd7e06..5876598852 100644
--- a/Documentation/git-filter-branch.txt
+++ b/Documentation/git-filter-branch.txt
@@ -16,6 +16,19 @@ SYNOPSIS
 	[--original <namespace>] [-d <directory>] [-f | --force]
 	[--state-branch <branch>] [--] [<rev-list options>...]
 
+WARNING
+-------
+'git filter-branch' has a plethora of pitfalls that can produce non-obvious
+manglings of the intended history rewrite (and can leave you with little
+time to investigate such problems since it has such abysmal performance).
+These safety and performance issues cannot be backward compatibly fixed and
+as such, its use is not recommended.  Please use an alternative history
+filtering tool such as https://github.com/newren/git-filter-repo/[git
+filter-repo].  If you still need to use 'git filter-branch', please
+carefully read <<SAFETY>> (and <<PERFORMANCE>>) to learn about the land
+mines of filter-branch, and then vigilantly avoid as many of the hazards
+listed there as reasonably possible.
+
 DESCRIPTION
 -----------
 Lets you rewrite Git revision history by rewriting the branches mentioned
@@ -445,36 +458,236 @@ warned.
   (or if your git-gc is not new enough to support arguments to
   `--prune`, use `git repack -ad; git prune` instead).
 
-NOTES
------
-
-git-filter-branch allows you to make complex shell-scripted rewrites
-of your Git history, but you probably don't need this flexibility if
-you're simply _removing unwanted data_ like large files or passwords.
-For those operations you may want to consider
-http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner],
-a JVM-based alternative to git-filter-branch, typically at least
-10-50x faster for those use-cases, and with quite different
-characteristics:
-
-* Any particular version of a file is cleaned exactly _once_. The BFG,
-  unlike git-filter-branch, does not give you the opportunity to
-  handle a file differently based on where or when it was committed
-  within your history. This constraint gives the core performance
-  benefit of The BFG, and is well-suited to the task of cleansing bad
-  data - you don't care _where_ the bad data is, you just want it
-  _gone_.
-
-* By default The BFG takes full advantage of multi-core machines,
-  cleansing commit file-trees in parallel. git-filter-branch cleans
-  commits sequentially (i.e. in a single-threaded manner), though it
-  _is_ possible to write filters that include their own parallelism,
-  in the scripts executed against each commit.
-
-* The http://rtyley.github.io/bfg-repo-cleaner/#examples[command options]
-  are much more restrictive than git-filter branch, and dedicated just
-  to the tasks of removing unwanted data- e.g:
-  `--strip-blobs-bigger-than 1M`.
+[[PERFORMANCE]]
+PERFORMANCE
+-----------
+
+The performance of git-filter-branch is glacially slow; its design makes it
+impossible for a backward-compatible implementation to ever be fast:
+
+* In editing files, git-filter-branch by design checks out each and
+every commit as it existed in the original repo.  If your repo has 10\^5
+files and 10\^5 commits, but each commit only modifies 5 files, then
+git-filter-branch will make you do 10\^10 modifications, despite only
+having (at most) 5*10^5 unique blobs.
+
+* If you try and cheat and try to make git-filter-branch only work on
+files modified in a commit, then two things happen
+
+  ** you run into problems with deletions whenever the user is simply
+     trying to rename files (because attempting to delete files that
+     don't exist looks like a no-op; it takes some chicanery to remap
+     deletes across file renames when the renames happen via arbitrary
+     user-provided shell)
+
+  ** even if you succeed at the map-deletes-for-renames chicanery, you
+     still technically violate backward compatibility because users are
+     allowed to filter files in ways that depend upon topology of
+     commits instead of filtering solely based on file contents or names
+     (though this has not been observed in the wild).
+
+* Even if you don't need to edit files but only want to e.g. rename or
+remove some and thus can avoid checking out each file (i.e. you can use
+--index-filter), you still are passing shell snippets for your filters.
+This means that for every commit, you have to have a prepared git repo
+where those filters can be run.  That's a significant setup.
+
+* Further, several additional files are created or updated per commit by
+git-filter-branch.  Some of these are for supporting the convenience
+functions provided by git-filter-branch (such as map()), while others
+are for keeping track of internal state (but could have also been
+accessed by user filters; one of git-filter-branch's regression tests
+does so).  This essentially amounts to using the filesystem as an IPC
+mechanism between git-filter-branch and the user-provided filters.
+Disks tend to be a slow IPC mechanism, and writing these files also
+effectively represents a forced synchronization point between separate
+processes that we hit with every commit.
+
+* The user-provided shell commands will likely involve a pipeline of
+commands, resulting in the creation of many processes per commit.
+Creating and running another process takes a widely varying amount of
+time between operating systems, but on any platform it is very slow
+relative to invoking a function.
+
+* git-filter-branch itself is written in shell, which is kind of slow.
+This is the one performance issue that could be backward-compatibly
+fixed, but compared to the above problems that are intrinsic to the
+design of git-filter-branch, the language of the tool itself is a
+relatively minor issue.
+
+  ** Side note: Unfortunately, people tend to fixate on the
+     written-in-shell aspect and periodically ask if git-filter-branch
+     could be rewritten in another language to fix the performance
+     issues.  Not only does that ignore the bigger intrinsic problems
+     with the design, it'd help less than you'd expect: if
+     git-filter-branch itself were not shell, then the convenience
+     functions (map(), skip_commit(), etc) and the `--setup` argument
+     could no longer be executed once at the beginning of the program
+     but would instead need to be prepended to every user filter (and
+     thus re-executed with every commit).
+
+The https://github.com/newren/git-filter-repo/[git filter-repo] tool is
+an alternative to git-filter-branch which does not suffer from these
+performance problems or the safety problems (mentioned below). For those
+with existing tooling which relies upon git-filter-branch, 'git
+repo-filter' also provides
+https://github.com/newren/git-filter-repo/blob/master/contrib/filter-repo-demos/filter-lamely[filter-lamely],
+a drop-in git-filter-branch replacement (with a few caveats).  While
+filter-lamely suffers from all the same safety issues as
+git-filter-branch, it at least ameloriates the performance issues a
+little.
+
+[[SAFETY]]
+SAFETY
+------
+
+git-filter-branch is riddled with gotchas resulting in various ways to
+easily corrupt repos or end up with a mess worse than what you started
+with:
+
+* Someone can have a set of "working and tested filters" which they
+document or provide to a coworker, who then runs them on a different OS
+where the same commands are not working/tested (some examples in the
+git-filter-branch manpage are also affected by this).  BSD vs. GNU
+userland differences can really bite.  If lucky, error messages are
+spewed.  But just as likely, the commands either don't do the filtering
+requested, or silently corrupt by making some unwanted change.  The
+unwanted change may only affect a few commits, so it's not necessarily
+obvious either.  (The fact that problems won't necessarily be obvious
+means they are likely to go unnoticed until the rewritten history is in
+use for quite a while, at which point it's really hard to justify
+another flag-day for another rewrite.)
+
+* Filenames with spaces are often mishandled by shell snippets since
+they cause problems for shell pipelines.  Not everyone is familiar with
+find -print0, xargs -0, git-ls-files -z, etc.  Even people who are
+familiar with these may assume such flags are not relevant because
+someone else renamed any such files in their repo back before the person
+doing the filtering joined the project.  And often, even those familiar
+with handling arguments with spaces may not do so just because they
+aren't in the mindset of thinking about everything that could possibly
+go wrong.
+
+* Non-ascii filenames can be silently removed despite being in a desired
+directory.  Keeping only wanted paths is often done using pipelines like
+`git ls-files | grep -v ^WANTED_DIR/ | xargs git rm`.  ls-files will
+only quote filenames if needed, so folks may not notice that one of the
+files didn't match the regex (at least not until it's much too late).
+Yes, someone who knows about core.quotePath can avoid this (unless they
+have other special characters like \t, \n, or "), and people who use
+ls-files -z with something other than grep can avoid this, but that
+doesn't mean they will.
+
+* Similarly, when moving files around, one can find that filenames with
+non-ascii or special characters end up in a different directory, one
+that includes a double quote character.  (This is technically the same
+issue as above with quoting, but perhaps an interesting different way
+that it can and has manifested as a problem.)
+
+* It's far too easy to accidentally mix up old and new history.  It's
+still possible with any tool, but git-filter-branch almost invites it.
+If lucky, the only downside is users getting frustrated that they don't
+know how to shrink their repo and remove the old stuff.  If unlucky,
+they merge old and new history and end up with multiple "copies" of each
+commit, some of which have unwanted or sensitive files and others which
+don't.  This comes about in multiple different ways:
+
+  ** the default to only doing a partial history rewrite ('--all' is not
+     the default and few examples show it)
+
+  ** the fact that there's no automatic post-run cleanup
+
+  ** the fact that --tag-name-filter (when used to rename tags) doesn't
+     remove the old tags but just adds new ones with the new name
+
+  ** the fact that little educational information is provided to inform
+     users of the ramifications of a rewrite and how to avoid mixing old
+     and new history.  For example, this man page discusses how users
+     need to understand that they need to rebase their changes for all
+     their branches on top of new history (or delete and reclone), but
+     that's only one of multiple concerns to consider.  See the
+     "DISCUSSION" section of the git filter-repo manual page for more
+     details.
+
+* Annotated tags can be accidentally converted to lightweight tags, due
+to either of two issues:
+
+  ** Someone can do a history rewrite, realize they messed up, restore
+     from the backups in refs/original/, and then redo their
+     git-filter-branch command.  (The backup in refs/original/ is not a
+     real backup; it dereferences tags first.)
+
+  ** Running git-filter-branch with either --tags or --all in your
+     <rev-list options>.  In order to retain annotated tags as
+     annotated, you must use --tag-name-filter (and must not have
+     restored from refs/original/ in a previously botched rewrite).
+
+* Any commit messages that specify an encoding will become corrupted
+by the rewrite; git-filter-branch ignores the encoding, takes the original
+bytes, and feeds it to commit-tree without telling it the proper
+encoding.  (This happens whether or not --msg-filter is used.)
+
+* Commit messages (even if they are all UTF-8) by default become
+corrupted due to not being updated -- any references to other commit
+hashes in commit messages will now refer to no-longer-extant commits.
+
+* There are no facilities for helping users find what unwanted crud they
+should delete, which means they are much more likely to have incomplete
+or partial cleanups that sometimes result in confusion and people
+wasting time trying to understand.  (For example, folks tend to just
+look for big files to delete instead of big directories or extensions,
+and once they do so, then sometime later folks using the new repository
+who are going through history will notice a build artifact directory
+that has some files but not others, or a cache of dependencies
+(node_modules or similar) which couldn't have ever been functional since
+it's missing some files.)
+
+* If --prune-empty isn't specified, then the filtering process can
+create hoards of confusing empty commits
+
+* If --prune-empty is specified, then intentionally placed empty
+commits from before the filtering operation are also pruned instead of
+just pruning commits that became empty due to filtering rules.
+
+* If --prune empty is specified, sometimes empty commits are missed
+and left around anyway (a somewhat rare bug, but it happens...)
+
+* A minor issue, but users who have a goal to update all names and
+emails in a repository may be led to --env-filter which will only update
+authors and committers, missing taggers.
+
+* If the user provides a --tag-name-filter that maps multiple tags to
+the same name, no warning or error is provided; git-filter-branch simply
+overwrites each tag in some undocumented pre-defined order resulting in
+only one tag at the end.  (A git-filter-branch regression test requires
+this surprising behavior.)
+
+Also, the poor performance of git-filter-branch often leads to safety
+issues:
+
+* Coming up with the correct shell snippet to do the filtering you want
+is sometimes difficult unless you're just doing a trivial modification
+such as deleting a couple files.  Unfortunately, people often learn if
+the snippet is right or wrong by trying it out, but the rightness or
+wrongness can vary depending on special circumstances (spaces in
+filenames, non-ascii filenames, funny author names or emails, invalid
+timezones, presence of grafts or replace objects, etc.), meaning they
+may have to wait a long time, hit an error, then restart.  The
+performance of git-filter-branch is so bad that this cycle is painful,
+reducing the time available to carefully re-check (to say nothing about
+what it does to the patience of the person doing the rewrite even if
+they do technically have more time available).  This problem is extra
+compounded because errors from broken filters may not be shown for a
+long time and/or get lost in a sea of output.  Even worse, broken
+filters often just result in silent incorrect rewrites.
+
+* To top it all off, even when users finally find working commands, they
+naturally want to share them.  But they may be unaware that their repo
+didn't have some special cases that someone else's does.  So, when
+someone else with a different repository runs the same commands, they
+get hit by the problems above.  Or, the user just runs commands that
+really were vetted for special cases, but they run it on a different OS
+where it doesn't work, as noted above.
 
 GIT
 ---
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 247f765604..0c114ad1ca 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -115,15 +115,14 @@ NOTES
 -----
 
 'git gc' tries very hard not to delete objects that are referenced
-anywhere in your repository. In
-particular, it will keep not only objects referenced by your current set
-of branches and tags, but also objects referenced by the index,
-remote-tracking branches, refs saved by 'git filter-branch' in
-refs/original/, reflogs (which may reference commits in branches
-that were later amended or rewound), and anything else in the refs/* namespace.
-If you are expecting some objects to be deleted and they aren't, check
-all of those locations and decide whether it makes sense in your case to
-remove those references.
+anywhere in your repository. In particular, it will keep not only
+objects referenced by your current set of branches and tags, but also
+objects referenced by the index, remote-tracking branches, notes saved
+by 'git notes' under refs/notes/, reflogs (which may reference commits
+in branches that were later amended or rewound), and anything else in
+the refs/* namespace.  If you are expecting some objects to be deleted
+and they aren't, check all of those locations and decide whether it
+makes sense in your case to remove those references.
 
 On the other hand, when 'git gc' runs concurrently with another process,
 there is a risk of it deleting an object that the other process is using
diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 6156609cf7..a8cfc0ad82 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -832,7 +832,8 @@ Hard case: The changes are not the same.::
 	This happens if the 'subsystem' rebase had conflicts, or used
 	`--interactive` to omit, edit, squash, or fixup commits; or
 	if the upstream used one of `commit --amend`, `reset`, or
-	`filter-branch`.
+	a full history rewriting command like
+	https://github.com/newren/git-filter-repo[`filter-repo`].
 
 
 The easy case
diff --git a/Documentation/git-replace.txt b/Documentation/git-replace.txt
index 246dc9943c..f271d758c3 100644
--- a/Documentation/git-replace.txt
+++ b/Documentation/git-replace.txt
@@ -123,10 +123,10 @@ The following format are available:
 CREATING REPLACEMENT OBJECTS
 ----------------------------
 
-linkgit:git-filter-branch[1], linkgit:git-hash-object[1] and
-linkgit:git-rebase[1], among other git commands, can be used to create
-replacement objects from existing objects. The `--edit` option can
-also be used with 'git replace' to create a replacement object by
+linkgit:git-hash-object[1], linkgit:git-rebase[1], and
+https://github.com/newren/git-filter-repo[git-filter-repo], among other git commands, can be used to
+create replacement objects from existing objects. The `--edit` option
+can also be used with 'git replace' to create a replacement object by
 editing an existing object.
 
 If you want to replace many blobs, trees or commits that are part of a
@@ -148,13 +148,13 @@ pending objects.
 SEE ALSO
 --------
 linkgit:git-hash-object[1]
-linkgit:git-filter-branch[1]
 linkgit:git-rebase[1]
 linkgit:git-tag[1]
 linkgit:git-branch[1]
 linkgit:git-commit[1]
 linkgit:git-var[1]
 linkgit:git[1]
+https://github.com/newren/git-filter-repo[git-filter-repo]
 
 GIT
 ---
diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
index 30711625fd..53774f5b64 100644
--- a/Documentation/git-svn.txt
+++ b/Documentation/git-svn.txt
@@ -769,11 +769,11 @@ option for (hopefully) obvious reasons.
 +
 This option is NOT recommended as it makes it difficult to track down
 old references to SVN revision numbers in existing documentation, bug
-reports and archives.  If you plan to eventually migrate from SVN to Git
-and are certain about dropping SVN history, consider
-linkgit:git-filter-branch[1] instead.  filter-branch also allows
-reformatting of metadata for ease-of-reading and rewriting authorship
-info for non-"svn.authorsFile" users.
+reports, and archives.  If you plan to eventually migrate from SVN to
+Git and are certain about dropping SVN history, consider
+https://github.com/newren/git-filter-repo[git-filter-repo] instead.
+filter-repo also allows reformatting of metadata for ease-of-reading
+and rewriting authorship info for non-"svn.authorsFile" users.
 
 svn.useSvmProps::
 svn-remote.<name>.useSvmProps::
diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 82cd573776..5a789c91df 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -425,10 +425,12 @@ post-rewrite
 
 This hook is invoked by commands that rewrite commits
 (linkgit:git-commit[1] when called with `--amend` and
-linkgit:git-rebase[1]; currently `git filter-branch` does 'not' call
-it!).  Its first argument denotes the command it was invoked by:
-currently one of `amend` or `rebase`.  Further command-dependent
-arguments may be passed in the future.
+linkgit:git-rebase[1]; however, full-history (re)writing tools like
+linkgit:git-fast-import[1] or
+https://github.com/newren/git-filter-repo[git-filter-repo] typically
+do not call it!).  Its first argument denotes the command it was
+invoked by: currently one of `amend` or `rebase`.  Further
+command-dependent arguments may be passed in the future.
 
 The hook receives a list of the rewritten commits on stdin, in the
 format
diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
index a3425f4770..19333fc8df 100644
--- a/contrib/svn-fe/svn-fe.txt
+++ b/contrib/svn-fe/svn-fe.txt
@@ -56,7 +56,7 @@ line.  This line has the form `git-svn-id: URL@REVNO UUID`.
 
 The resulting repository will generally require further processing
 to put each project in its own repository and to separate the history
-of each branch.  The 'git filter-branch --subdirectory-filter' command
+of each branch.  The 'git filter-repo --subdirectory-filter' command
 may be useful for this purpose.
 
 BUGS
@@ -67,5 +67,5 @@ The exit status does not reflect whether an error was detected.
 
 SEE ALSO
 --------
-git-svn(1), svn2git(1), svk(1), git-filter-branch(1), git-fast-import(1),
+git-svn(1), svn2git(1), svk(1), git-filter-repo(1), git-fast-import(1),
 https://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 5c5afa2b98..f805965d87 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -83,6 +83,19 @@ set_ident () {
 	finish_ident COMMITTER
 }
 
+if [ -z "$FILTER_BRANCH_SQUELCH_WARNING" -a \
+     -z "$GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS" ]; then
+	cat <<EOF
+WARNING: git-filter-branch has a glut of gotchas generating mangled history
+         rewrites.  Please use an alternative filtering tool such as 'git
+         filter-repo' (https://github.com/newren/git-filter-repo/) instead.
+         See the filter-branch manual page for more details; to squelch
+         this warning, set FILTER_BRANCH_SQUELCH_WARNING=1.
+
+EOF
+	sleep 5
+fi
+
 USAGE="[--setup <command>] [--subdirectory-filter <directory>] [--env-filter <command>]
 	[--tree-filter <command>] [--index-filter <command>]
 	[--parent-filter <command>] [--msg-filter <command>]
-- 
2.23.0.39.gf92d9de5c3


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v5 4/4] t9902: use a non-deprecated command for testing
  2019-09-03 18:55           ` [PATCH v5 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
                               ` (2 preceding siblings ...)
  2019-09-03 18:55             ` [PATCH v5 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
@ 2019-09-03 18:55             ` Elijah Newren
  2019-09-04 22:32             ` [PATCH v6 0/3] Warn about git-filter-branch usage and avoid it Elijah Newren
  4 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-03 18:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine, Elijah Newren

t9902 had a list of three random porcelain commands as a sanity check,
one of which was filter-branch.  Since we are recommending people not
use filter-branch, let's update this test to use rebase instead of
filter-branch.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t9902-completion.sh | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index 75512c3403..4e7f669c76 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -28,10 +28,10 @@ complete ()
 #
 # (2) A test makes sure that common subcommands are included in the
 #     completion for "git <TAB>", and a plumbing is excluded.  "add",
-#     "filter-branch" and "ls-files" are listed for this.
+#     "rebase" and "ls-files" are listed for this.
 
-GIT_TESTING_ALL_COMMAND_LIST='add checkout check-attr filter-branch ls-files'
-GIT_TESTING_PORCELAIN_COMMAND_LIST='add checkout filter-branch'
+GIT_TESTING_ALL_COMMAND_LIST='add checkout check-attr rebase ls-files'
+GIT_TESTING_PORCELAIN_COMMAND_LIST='add checkout rebase'
 
 . "$GIT_BUILD_DIR/contrib/completion/git-completion.bash"
 
@@ -1392,12 +1392,12 @@ test_expect_success 'basic' '
 	# built-in
 	grep -q "^add \$" out &&
 	# script
-	grep -q "^filter-branch \$" out &&
+	grep -q "^rebase \$" out &&
 	# plumbing
 	! grep -q "^ls-files \$" out &&
 
-	run_completion "git f" &&
-	! grep -q -v "^f" out
+	run_completion "git r" &&
+	! grep -q -v "^r" out
 '
 
 test_expect_success 'double dash "git" itself' '
-- 
2.23.0.39.gf92d9de5c3


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 1/4] t6006: simplify and optimize empty message test
  2019-09-03 18:55             ` [PATCH v5 1/4] t6006: simplify and optimize empty message test Elijah Newren
@ 2019-09-03 21:08               ` Junio C Hamano
  2019-09-03 21:58                 ` Elijah Newren
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2019-09-03 21:08 UTC (permalink / raw)
  To: Elijah Newren
  Cc: git, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine

Elijah Newren <newren@gmail.com> writes:

> Test t6006.71 ("oneline with empty message") was creating two commits
> with simple commit messages, and then running filter-branch to rewrite
> the commit messages to be empty.  This test was written this way because
> the --allow-empty-message option to git commit did not exist at the
> time.  Simplify this test and avoid the need to invoke filter-branch by
> just using --allow-empty-message when creating the commit.

The result of filter-branch seems to have one empty line as the body
(i.e. "echo X; git cat-file commit A; echo Y" will show two blank
lines between the committer line and Y), while "--allow-empty-message"
does not leave any body (i.e. the same will give you only one blank
line there).

Was this test verifying the right thing in the first place, I have
to wonder.

IOW,

	git commit --allow-empty --cleanup=verbatim -m "$LF" &&

would be more faithful conversion of the original (and hopefully
just as performant).

> Despite only being one piece of the 71st test and there being 73 tests
> overall, this small change to just this one test speeds up the overall
> execution time of t6006 (as measured by the best of 3 runs of `time
> ./t6006-rev-list-format.sh`) by about 11% on Linux, 13% on Mac, and
> about 15% on Windows.

Quite an improvement ;-)

>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  t/t6006-rev-list-format.sh | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
> index da113d975b..d30e41c9f7 100755
> --- a/t/t6006-rev-list-format.sh
> +++ b/t/t6006-rev-list-format.sh
> @@ -501,9 +501,8 @@ test_expect_success 'reflog identity' '
>  '
>  
>  test_expect_success 'oneline with empty message' '
> -	git commit -m "dummy" --allow-empty &&
> -	git commit -m "dummy" --allow-empty &&
> -	git filter-branch --msg-filter "sed -e s/dummy//" HEAD^^.. &&
> +	git commit --allow-empty --allow-empty-message &&
> +	git commit --allow-empty --allow-empty-message &&
>  	git rev-list --oneline HEAD >test.txt &&
>  	test_line_count = 5 test.txt &&
>  	git rev-list --oneline --graph HEAD >testg.txt &&

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 2/4] t3427: accelerate this test by using fast-export and fast-import
  2019-09-03 18:55             ` [PATCH v5 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
@ 2019-09-03 21:26               ` Junio C Hamano
  2019-09-03 22:46                 ` Junio C Hamano
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2019-09-03 21:26 UTC (permalink / raw)
  To: Elijah Newren
  Cc: git, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine

Elijah Newren <newren@gmail.com> writes:

> +extract_files_subtree () {
> +	git fast-export --no-data HEAD -- files_subtree/ |
> +		sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" |
> +		git fast-import --force --quiet
> +}

Clever, if a bit filthy ;-).  We expect to see something like

	M 100644 dead...beef files_subtree/bar
	M 100755 c0f.....fee files_subtree/foo

in the --no-data output, and the assumption here is that 40-hex
followed by " files_subtree/" would never appear anywhere in the
stream other than these tree dump, so the sed script can rewrite
the above to

	M 100644 dead...beef bar
	M 100755 c0f.....fee foo

by getting rid of the leading directory name (plus the slash at the
end).

Thanks.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 3/4] Recommend git-filter-repo instead of git-filter-branch
  2019-09-03 18:55             ` [PATCH v5 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
@ 2019-09-03 21:40               ` Junio C Hamano
  2019-09-04 20:30                 ` Elijah Newren
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2019-09-03 21:40 UTC (permalink / raw)
  To: Elijah Newren
  Cc: git, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine

Elijah Newren <newren@gmail.com> writes:

> diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> index 5c5afa2b98..f805965d87 100755
> --- a/git-filter-branch.sh
> +++ b/git-filter-branch.sh
> @@ -83,6 +83,19 @@ set_ident () {
>  	finish_ident COMMITTER
>  }
>  
> +if [ -z "$FILTER_BRANCH_SQUELCH_WARNING" -a \
> +     -z "$GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS" ]; then

This is probably the only place where [] instead of "test" is used
in our shell scripts.

if test -z "$FILTER_BRANCH_SQUELCH_WARNING$GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS"
then
    ...

> +	cat <<EOF
> +WARNING: git-filter-branch has a glut of gotchas generating mangled history
> +         rewrites.  Please use an alternative filtering tool such as 'git
> +         filter-repo' (https://github.com/newren/git-filter-repo/) instead.
> +         See the filter-branch manual page for more details; to squelch
> +         this warning, set FILTER_BRANCH_SQUELCH_WARNING=1.
> +
> +EOF
> +	sleep 5
> +fi

This should say it is "sleeping while showing the message and can
safely be killed before starting to do any harm"; alternatively it
should lose the "sleep".  The user would have fear against typing ^C
to get out of a bulk history rewrite command, and the message itself
is making the fear worse.  If your goal is to discourage its use,
then it would be a good idea to make it clear when it is safe to
kill it before going and studying the alternative.  Otherwise, the
sleep does not help that much---the main complaint is that filter
branch is too slow, so the user has plenty of time to read the
message anyway, right? ;-)

>  USAGE="[--setup <command>] [--subdirectory-filter <directory>] [--env-filter <command>]
>  	[--tree-filter <command>] [--index-filter <command>]
>  	[--parent-filter <command>] [--msg-filter <command>]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 1/4] t6006: simplify and optimize empty message test
  2019-09-03 21:08               ` Junio C Hamano
@ 2019-09-03 21:58                 ` Elijah Newren
  2019-09-03 22:25                   ` Junio C Hamano
  0 siblings, 1 reply; 73+ messages in thread
From: Elijah Newren @ 2019-09-03 21:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine

On Tue, Sep 3, 2019 at 2:08 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > Test t6006.71 ("oneline with empty message") was creating two commits
> > with simple commit messages, and then running filter-branch to rewrite
> > the commit messages to be empty.  This test was written this way because
> > the --allow-empty-message option to git commit did not exist at the
> > time.  Simplify this test and avoid the need to invoke filter-branch by
> > just using --allow-empty-message when creating the commit.
>
> The result of filter-branch seems to have one empty line as the body
> (i.e. "echo X; git cat-file commit A; echo Y" will show two blank
> lines between the committer line and Y), while "--allow-empty-message"
> does not leave any body (i.e. the same will give you only one blank
> line there).

Ah, good catch.  I checked out the commit before 1fb5fdd25f0
("rev-list: fix --pretty=oneline with empty message", 2010-03-21), to
try and see the error before that testcase was introduced.  I tried it
on a repo with both an actual empty commit message, and one with a
commit message consisting solely of a newline.  Both styles exhibited
the bug that the testcase was introduced to guard against.

> Was this test verifying the right thing in the first place, I have
> to wonder.
>
> IOW,
>
>         git commit --allow-empty --cleanup=verbatim -m "$LF" &&
>
> would be more faithful conversion of the original (and hopefully
> just as performant).

Yeah, it'd be a more faithful conversion of the original, though the
original didn't match the testcase description nor the commit message
(it claimed it was testing with an empty message).  Also, in terms of
future proofing, any code changes are more likely to omit a needed
trailing LF if the commit message doesn't have one than if it does, so
I think it's a more robust test with this change.

I can update the commit message to explain this, or, if you prefer, I
could duplicate the testcase and tweak the second as you suggest so we
test both with and without the LF.  What's your preference?

> > Despite only being one piece of the 71st test and there being 73 tests
> > overall, this small change to just this one test speeds up the overall
> > execution time of t6006 (as measured by the best of 3 runs of `time
> > ./t6006-rev-list-format.sh`) by about 11% on Linux, 13% on Mac, and
> > about 15% on Windows.
>
> Quite an improvement ;-)
>
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  t/t6006-rev-list-format.sh | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
> > index da113d975b..d30e41c9f7 100755
> > --- a/t/t6006-rev-list-format.sh
> > +++ b/t/t6006-rev-list-format.sh
> > @@ -501,9 +501,8 @@ test_expect_success 'reflog identity' '
> >  '
> >
> >  test_expect_success 'oneline with empty message' '
> > -     git commit -m "dummy" --allow-empty &&
> > -     git commit -m "dummy" --allow-empty &&
> > -     git filter-branch --msg-filter "sed -e s/dummy//" HEAD^^.. &&
> > +     git commit --allow-empty --allow-empty-message &&
> > +     git commit --allow-empty --allow-empty-message &&
> >       git rev-list --oneline HEAD >test.txt &&
> >       test_line_count = 5 test.txt &&
> >       git rev-list --oneline --graph HEAD >testg.txt &&

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 1/4] t6006: simplify and optimize empty message test
  2019-09-03 21:58                 ` Elijah Newren
@ 2019-09-03 22:25                   ` Junio C Hamano
  0 siblings, 0 replies; 73+ messages in thread
From: Junio C Hamano @ 2019-09-03 22:25 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Git Mailing List, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine

Elijah Newren <newren@gmail.com> writes:

> Ah, good catch.  I checked out the commit before 1fb5fdd25f0
> ("rev-list: fix --pretty=oneline with empty message", 2010-03-21), to
> try and see the error before that testcase was introduced.  I tried it
> on a repo with both an actual empty commit message, and one with a
> commit message consisting solely of a newline.  Both styles exhibited
> the bug that the testcase was introduced to guard against.

That's a good thing to know to decide what is a reasonable
thing to do here.

As we are creating two commits, perhaps adding one with and another
without the extra blank line may give us more diversity, and
explaining why we are adding two slightly different one
(i.e. because the original bug was there for both shapes of commits)
would help us not wasting the time we already spent discussing this
change ;-)

Of course, we can alternatively just keep the patch as-is and update
the explanation as to why we are testing with commits different from
the original when we are supposed to be making this change for
performance reasons (i.e. the symptom manifests either way, so why
not using the form that is easier to create?).

Thanks for working on this ;-)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 2/4] t3427: accelerate this test by using fast-export and fast-import
  2019-09-03 21:26               ` Junio C Hamano
@ 2019-09-03 22:46                 ` Junio C Hamano
  2019-09-04 20:32                   ` Elijah Newren
  0 siblings, 1 reply; 73+ messages in thread
From: Junio C Hamano @ 2019-09-03 22:46 UTC (permalink / raw)
  To: Elijah Newren
  Cc: git, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine

Junio C Hamano <gitster@pobox.com> writes:

> Elijah Newren <newren@gmail.com> writes:
>
>> +extract_files_subtree () {
>> +	git fast-export --no-data HEAD -- files_subtree/ |
>> +		sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" |
>> +		git fast-import --force --quiet
>> +}

This change has obvious interactions with Dscho's d51b771d ("t3427:
move the `filter-branch` invocation into the `setup` case",
2019-07-31) that is still in flight, but in a good way.  There only
needs a single callsite for the above helper function after that
step.

I think I'll discard this step from the "move us closer to deprecate
filter-branch" topic, and ask you and Dscho to work together to have
it or its moral equivalent included as part of js/rebase-r-strategy
topic.

Thanks.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 3/4] Recommend git-filter-repo instead of git-filter-branch
  2019-09-03 21:40               ` Junio C Hamano
@ 2019-09-04 20:30                 ` Elijah Newren
  0 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-04 20:30 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine

On Tue, Sep 3, 2019 at 2:40 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> > index 5c5afa2b98..f805965d87 100755
> > --- a/git-filter-branch.sh
> > +++ b/git-filter-branch.sh
> > @@ -83,6 +83,19 @@ set_ident () {
> >       finish_ident COMMITTER
> >  }
> >
> > +if [ -z "$FILTER_BRANCH_SQUELCH_WARNING" -a \
> > +     -z "$GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS" ]; then
>
> This is probably the only place where [] instead of "test" is used
> in our shell scripts.
>
> if test -z "$FILTER_BRANCH_SQUELCH_WARNING$GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS"
> then
>     ...

Yeah, git-filter-branch.sh has approximately twice as many uses of []
than "test", so it seemed in line with its coding style.  I can switch
it over.

> > +     cat <<EOF
> > +WARNING: git-filter-branch has a glut of gotchas generating mangled history
> > +         rewrites.  Please use an alternative filtering tool such as 'git
> > +         filter-repo' (https://github.com/newren/git-filter-repo/) instead.
> > +         See the filter-branch manual page for more details; to squelch
> > +         this warning, set FILTER_BRANCH_SQUELCH_WARNING=1.
> > +
> > +EOF
> > +     sleep 5
> > +fi
>
> This should say it is "sleeping while showing the message and can
> safely be killed before starting to do any harm"; alternatively it
> should lose the "sleep".  The user would have fear against typing ^C
> to get out of a bulk history rewrite command, and the message itself
> is making the fear worse.  If your goal is to discourage its use,
> then it would be a good idea to make it clear when it is safe to
> kill it before going and studying the alternative.  Otherwise, the
> sleep does not help that much---the main complaint is that filter
> branch is too slow, so the user has plenty of time to read the
> message anyway, right? ;-)

Makes sense; will fix.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5 2/4] t3427: accelerate this test by using fast-export and fast-import
  2019-09-03 22:46                 ` Junio C Hamano
@ 2019-09-04 20:32                   ` Elijah Newren
  0 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-04 20:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine

On Tue, Sep 3, 2019 at 3:46 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Elijah Newren <newren@gmail.com> writes:
> >
> >> +extract_files_subtree () {
> >> +    git fast-export --no-data HEAD -- files_subtree/ |
> >> +            sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%" |
> >> +            git fast-import --force --quiet
> >> +}
>
> This change has obvious interactions with Dscho's d51b771d ("t3427:
> move the `filter-branch` invocation into the `setup` case",
> 2019-07-31) that is still in flight, but in a good way.  There only
> needs a single callsite for the above helper function after that
> step.
>
> I think I'll discard this step from the "move us closer to deprecate
> filter-branch" topic, and ask you and Dscho to work together to have
> it or its moral equivalent included as part of js/rebase-r-strategy
> topic.

Sounds good.  I'll resubmit it separately as a patch on top of his topic.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v6 0/3] Warn about git-filter-branch usage and avoid it
  2019-09-03 18:55           ` [PATCH v5 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
                               ` (3 preceding siblings ...)
  2019-09-03 18:55             ` [PATCH v5 4/4] t9902: use a non-deprecated command for testing Elijah Newren
@ 2019-09-04 22:32             ` Elijah Newren
  2019-09-04 22:32               ` [PATCH v6 1/3] t6006: simplify, fix, and optimize empty message test Elijah Newren
                                 ` (2 more replies)
  4 siblings, 3 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-04 22:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine, Elijah Newren

Changes since v5 (full range-diff below):
  * Dropped patch 3 (which was rebased on top of js/rebase-r-strategy and
    submitted separately)[1]
  * Updated t6006 to include both an empty commit message and a commit
    message with just a line feed
  * Made the two small tweaks Junio suggested to git-filter-branch.sh

[1] https://public-inbox.org/git/20190904214048.29331-1-newren@gmail.com/

Elijah Newren (3):
  t6006: simplify, fix, and optimize empty message test
  Recommend git-filter-repo instead of git-filter-branch
  t9902: use a non-deprecated command for testing

 Documentation/git-fast-export.txt   |   6 +-
 Documentation/git-filter-branch.txt | 273 +++++++++++++++++++++++++---
 Documentation/git-gc.txt            |  17 +-
 Documentation/git-rebase.txt        |   3 +-
 Documentation/git-replace.txt       |  10 +-
 Documentation/git-svn.txt           |  10 +-
 Documentation/githooks.txt          |  10 +-
 contrib/svn-fe/svn-fe.txt           |   4 +-
 git-filter-branch.sh                |  14 ++
 t/t6006-rev-list-format.sh          |   5 +-
 t/t9902-completion.sh               |  12 +-
 11 files changed, 296 insertions(+), 68 deletions(-)

Range-diff:
1:  ccea0e5846 ! 1:  d5370568a4 t6006: simplify and optimize empty message test
    @@ Metadata
     Author: Elijah Newren <newren@gmail.com>
     
      ## Commit message ##
    -    t6006: simplify and optimize empty message test
    +    t6006: simplify, fix, and optimize empty message test
     
         Test t6006.71 ("oneline with empty message") was creating two commits
         with simple commit messages, and then running filter-branch to rewrite
    -    the commit messages to be empty.  This test was written this way because
    -    the --allow-empty-message option to git commit did not exist at the
    -    time.  Simplify this test and avoid the need to invoke filter-branch by
    -    just using --allow-empty-message when creating the commit.
    +    the commit messages to be "empty".  This test was introduced in commit
    +    1fb5fdd25f0 ("rev-list: fix --pretty=oneline with empty message",
    +    2010-03-21) and written this way because the --allow-empty-message
    +    option to git commit did not exist at the time.
    +
    +    However, the filter-branch invocation used differed slightly from
    +    --allow-empty-message in that it would have a commit message consisting
    +    solely of a single newline, and as such was not testing what the
    +    original commit intended to test.  Since both a truly empty commit
    +    message and a commit message with a single linefeed could trigger the
    +    original bug, modify the test slightly to include an example of each.
     
         Despite only being one piece of the 71st test and there being 73 tests
         overall, this small change to just this one test speeds up the overall
    @@ t/t6006-rev-list-format.sh: test_expect_success 'reflog identity' '
     -	git commit -m "dummy" --allow-empty &&
     -	git commit -m "dummy" --allow-empty &&
     -	git filter-branch --msg-filter "sed -e s/dummy//" HEAD^^.. &&
    -+	git commit --allow-empty --allow-empty-message &&
    ++	git commit --allow-empty --cleanup=verbatim -m "$LF" &&
     +	git commit --allow-empty --allow-empty-message &&
      	git rev-list --oneline HEAD >test.txt &&
      	test_line_count = 5 test.txt &&
2:  6d73135006 < -:  ---------- t3427: accelerate this test by using fast-export and fast-import
3:  2f225c8697 ! 2:  8635410b88 Recommend git-filter-repo instead of git-filter-branch
    @@ git-filter-branch.sh: set_ident () {
      	finish_ident COMMITTER
      }
      
    -+if [ -z "$FILTER_BRANCH_SQUELCH_WARNING" -a \
    -+     -z "$GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS" ]; then
    ++if test -z "$FILTER_BRANCH_SQUELCH_WARNING$GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS"
    ++then
     +	cat <<EOF
     +WARNING: git-filter-branch has a glut of gotchas generating mangled history
    -+         rewrites.  Please use an alternative filtering tool such as 'git
    -+         filter-repo' (https://github.com/newren/git-filter-repo/) instead.
    -+         See the filter-branch manual page for more details; to squelch
    -+         this warning, set FILTER_BRANCH_SQUELCH_WARNING=1.
    -+
    ++	 rewrites.  Hit Ctrl-C before proceeding to abort, then use an
    ++	 alternative filtering tool such as 'git filter-repo'
    ++	 (https://github.com/newren/git-filter-repo/) instead.  See the
    ++	 filter-branch manual page for more details; to squelch this warning,
    ++	 set FILTER_BRANCH_SQUELCH_WARNING=1.
     +EOF
    -+	sleep 5
    ++	sleep 10
    ++	printf "Proceeding with filter-branch...\n\n"
     +fi
     +
      USAGE="[--setup <command>] [--subdirectory-filter <directory>] [--env-filter <command>]
4:  048eba375b = 3:  19edb94ec2 t9902: use a non-deprecated command for testing
-- 
2.23.0.3.g19edb94ec2


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v6 1/3] t6006: simplify, fix, and optimize empty message test
  2019-09-04 22:32             ` [PATCH v6 0/3] Warn about git-filter-branch usage and avoid it Elijah Newren
@ 2019-09-04 22:32               ` Elijah Newren
  2019-09-04 22:32               ` [PATCH v6 2/3] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
  2019-09-04 22:32               ` [PATCH v6 3/3] t9902: use a non-deprecated command for testing Elijah Newren
  2 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-04 22:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine, Elijah Newren

Test t6006.71 ("oneline with empty message") was creating two commits
with simple commit messages, and then running filter-branch to rewrite
the commit messages to be "empty".  This test was introduced in commit
1fb5fdd25f0 ("rev-list: fix --pretty=oneline with empty message",
2010-03-21) and written this way because the --allow-empty-message
option to git commit did not exist at the time.

However, the filter-branch invocation used differed slightly from
--allow-empty-message in that it would have a commit message consisting
solely of a single newline, and as such was not testing what the
original commit intended to test.  Since both a truly empty commit
message and a commit message with a single linefeed could trigger the
original bug, modify the test slightly to include an example of each.

Despite only being one piece of the 71st test and there being 73 tests
overall, this small change to just this one test speeds up the overall
execution time of t6006 (as measured by the best of 3 runs of `time
./t6006-rev-list-format.sh`) by about 11% on Linux, 13% on Mac, and
about 15% on Windows.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t6006-rev-list-format.sh | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
index da113d975b..cfb74d0e03 100755
--- a/t/t6006-rev-list-format.sh
+++ b/t/t6006-rev-list-format.sh
@@ -501,9 +501,8 @@ test_expect_success 'reflog identity' '
 '
 
 test_expect_success 'oneline with empty message' '
-	git commit -m "dummy" --allow-empty &&
-	git commit -m "dummy" --allow-empty &&
-	git filter-branch --msg-filter "sed -e s/dummy//" HEAD^^.. &&
+	git commit --allow-empty --cleanup=verbatim -m "$LF" &&
+	git commit --allow-empty --allow-empty-message &&
 	git rev-list --oneline HEAD >test.txt &&
 	test_line_count = 5 test.txt &&
 	git rev-list --oneline --graph HEAD >testg.txt &&
-- 
2.23.0.3.g19edb94ec2


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v6 2/3] Recommend git-filter-repo instead of git-filter-branch
  2019-09-04 22:32             ` [PATCH v6 0/3] Warn about git-filter-branch usage and avoid it Elijah Newren
  2019-09-04 22:32               ` [PATCH v6 1/3] t6006: simplify, fix, and optimize empty message test Elijah Newren
@ 2019-09-04 22:32               ` Elijah Newren
  2019-09-04 22:32               ` [PATCH v6 3/3] t9902: use a non-deprecated command for testing Elijah Newren
  2 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-04 22:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine, Elijah Newren

filter-branch suffers from a deluge of disguised dangers that disfigure
history rewrites (i.e. deviate from the deliberate changes).  Many of
these problems are unobtrusive and can easily go undiscovered until the
new repository is in use.  This can result in problems ranging from an
even messier history than what led folks to filter-branch in the first
place, to data loss or corruption.  These issues cannot be backward
compatibly fixed, so add a warning to both filter-branch and its manpage
recommending that another tool (such as filter-repo) be used instead.

Also, update other manpages that referenced filter-branch.  Several of
these needed updates even if we could continue recommending
filter-branch, either due to implying that something was unique to
filter-branch when it applied more generally to all history rewriting
tools (e.g. BFG, reposurgeon, fast-import, filter-repo), or because
something about filter-branch was used as an example despite other more
commonly known examples now existing.  Reword these sections to fix
these issues and to avoid recommending filter-branch.

Finally, remove the section explaining BFG Repo Cleaner as an
alternative to filter-branch.  I feel somewhat bad about this,
especially since I feel like I learned so much from BFG that I put to
good use in filter-repo (which is much more than I can say for
filter-branch), but keeping that section presented a few problems:
  * In order to recommend that people quit using filter-branch, we need
    to provide them a recomendation for something else to use that
    can handle all the same types of rewrites.  To my knowledge,
    filter-repo is the only such tool.  So it needs to be mentioned.
  * I don't want to give conflicting recommendations to users
  * If we recommend two tools, we shouldn't expect users to learn both
    and pick which one to use; we should explain which problems one
    can solve that the other can't or when one is much faster than
    the other.
  * BFG and filter-repo have similar performance
  * All filtering types that BFG can do, filter-repo can also do.  In
    fact, filter-repo comes with a reimplementation of BFG named
    bfg-ish which provides the same user-interface as BFG but with
    several bugfixes and new features that are hard to implement in
    BFG due to its technical underpinnings.
While I could still mention both tools, it seems like I would need to
provide some kind of comparison and I would ultimately just say that
filter-repo can do everything BFG can, so ultimately it seems that it
is just better to remove that section altogether.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-fast-export.txt   |   6 +-
 Documentation/git-filter-branch.txt | 273 +++++++++++++++++++++++++---
 Documentation/git-gc.txt            |  17 +-
 Documentation/git-rebase.txt        |   3 +-
 Documentation/git-replace.txt       |  10 +-
 Documentation/git-svn.txt           |  10 +-
 Documentation/githooks.txt          |  10 +-
 contrib/svn-fe/svn-fe.txt           |   4 +-
 git-filter-branch.sh                |  14 ++
 9 files changed, 288 insertions(+), 59 deletions(-)

diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
index cc940eb9ad..784e934009 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -17,9 +17,9 @@ This program dumps the given revisions in a form suitable to be piped
 into 'git fast-import'.
 
 You can use it as a human-readable bundle replacement (see
-linkgit:git-bundle[1]), or as a kind of an interactive
-'git filter-branch'.
-
+linkgit:git-bundle[1]), or as a format that can be edited before being
+fed to 'git fast-import' in order to do history rewrites (an ability
+relied on by tools like 'git filter-repo').
 
 OPTIONS
 -------
diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
index 6b53dd7e06..5876598852 100644
--- a/Documentation/git-filter-branch.txt
+++ b/Documentation/git-filter-branch.txt
@@ -16,6 +16,19 @@ SYNOPSIS
 	[--original <namespace>] [-d <directory>] [-f | --force]
 	[--state-branch <branch>] [--] [<rev-list options>...]
 
+WARNING
+-------
+'git filter-branch' has a plethora of pitfalls that can produce non-obvious
+manglings of the intended history rewrite (and can leave you with little
+time to investigate such problems since it has such abysmal performance).
+These safety and performance issues cannot be backward compatibly fixed and
+as such, its use is not recommended.  Please use an alternative history
+filtering tool such as https://github.com/newren/git-filter-repo/[git
+filter-repo].  If you still need to use 'git filter-branch', please
+carefully read <<SAFETY>> (and <<PERFORMANCE>>) to learn about the land
+mines of filter-branch, and then vigilantly avoid as many of the hazards
+listed there as reasonably possible.
+
 DESCRIPTION
 -----------
 Lets you rewrite Git revision history by rewriting the branches mentioned
@@ -445,36 +458,236 @@ warned.
   (or if your git-gc is not new enough to support arguments to
   `--prune`, use `git repack -ad; git prune` instead).
 
-NOTES
------
-
-git-filter-branch allows you to make complex shell-scripted rewrites
-of your Git history, but you probably don't need this flexibility if
-you're simply _removing unwanted data_ like large files or passwords.
-For those operations you may want to consider
-http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner],
-a JVM-based alternative to git-filter-branch, typically at least
-10-50x faster for those use-cases, and with quite different
-characteristics:
-
-* Any particular version of a file is cleaned exactly _once_. The BFG,
-  unlike git-filter-branch, does not give you the opportunity to
-  handle a file differently based on where or when it was committed
-  within your history. This constraint gives the core performance
-  benefit of The BFG, and is well-suited to the task of cleansing bad
-  data - you don't care _where_ the bad data is, you just want it
-  _gone_.
-
-* By default The BFG takes full advantage of multi-core machines,
-  cleansing commit file-trees in parallel. git-filter-branch cleans
-  commits sequentially (i.e. in a single-threaded manner), though it
-  _is_ possible to write filters that include their own parallelism,
-  in the scripts executed against each commit.
-
-* The http://rtyley.github.io/bfg-repo-cleaner/#examples[command options]
-  are much more restrictive than git-filter branch, and dedicated just
-  to the tasks of removing unwanted data- e.g:
-  `--strip-blobs-bigger-than 1M`.
+[[PERFORMANCE]]
+PERFORMANCE
+-----------
+
+The performance of git-filter-branch is glacially slow; its design makes it
+impossible for a backward-compatible implementation to ever be fast:
+
+* In editing files, git-filter-branch by design checks out each and
+every commit as it existed in the original repo.  If your repo has 10\^5
+files and 10\^5 commits, but each commit only modifies 5 files, then
+git-filter-branch will make you do 10\^10 modifications, despite only
+having (at most) 5*10^5 unique blobs.
+
+* If you try and cheat and try to make git-filter-branch only work on
+files modified in a commit, then two things happen
+
+  ** you run into problems with deletions whenever the user is simply
+     trying to rename files (because attempting to delete files that
+     don't exist looks like a no-op; it takes some chicanery to remap
+     deletes across file renames when the renames happen via arbitrary
+     user-provided shell)
+
+  ** even if you succeed at the map-deletes-for-renames chicanery, you
+     still technically violate backward compatibility because users are
+     allowed to filter files in ways that depend upon topology of
+     commits instead of filtering solely based on file contents or names
+     (though this has not been observed in the wild).
+
+* Even if you don't need to edit files but only want to e.g. rename or
+remove some and thus can avoid checking out each file (i.e. you can use
+--index-filter), you still are passing shell snippets for your filters.
+This means that for every commit, you have to have a prepared git repo
+where those filters can be run.  That's a significant setup.
+
+* Further, several additional files are created or updated per commit by
+git-filter-branch.  Some of these are for supporting the convenience
+functions provided by git-filter-branch (such as map()), while others
+are for keeping track of internal state (but could have also been
+accessed by user filters; one of git-filter-branch's regression tests
+does so).  This essentially amounts to using the filesystem as an IPC
+mechanism between git-filter-branch and the user-provided filters.
+Disks tend to be a slow IPC mechanism, and writing these files also
+effectively represents a forced synchronization point between separate
+processes that we hit with every commit.
+
+* The user-provided shell commands will likely involve a pipeline of
+commands, resulting in the creation of many processes per commit.
+Creating and running another process takes a widely varying amount of
+time between operating systems, but on any platform it is very slow
+relative to invoking a function.
+
+* git-filter-branch itself is written in shell, which is kind of slow.
+This is the one performance issue that could be backward-compatibly
+fixed, but compared to the above problems that are intrinsic to the
+design of git-filter-branch, the language of the tool itself is a
+relatively minor issue.
+
+  ** Side note: Unfortunately, people tend to fixate on the
+     written-in-shell aspect and periodically ask if git-filter-branch
+     could be rewritten in another language to fix the performance
+     issues.  Not only does that ignore the bigger intrinsic problems
+     with the design, it'd help less than you'd expect: if
+     git-filter-branch itself were not shell, then the convenience
+     functions (map(), skip_commit(), etc) and the `--setup` argument
+     could no longer be executed once at the beginning of the program
+     but would instead need to be prepended to every user filter (and
+     thus re-executed with every commit).
+
+The https://github.com/newren/git-filter-repo/[git filter-repo] tool is
+an alternative to git-filter-branch which does not suffer from these
+performance problems or the safety problems (mentioned below). For those
+with existing tooling which relies upon git-filter-branch, 'git
+repo-filter' also provides
+https://github.com/newren/git-filter-repo/blob/master/contrib/filter-repo-demos/filter-lamely[filter-lamely],
+a drop-in git-filter-branch replacement (with a few caveats).  While
+filter-lamely suffers from all the same safety issues as
+git-filter-branch, it at least ameloriates the performance issues a
+little.
+
+[[SAFETY]]
+SAFETY
+------
+
+git-filter-branch is riddled with gotchas resulting in various ways to
+easily corrupt repos or end up with a mess worse than what you started
+with:
+
+* Someone can have a set of "working and tested filters" which they
+document or provide to a coworker, who then runs them on a different OS
+where the same commands are not working/tested (some examples in the
+git-filter-branch manpage are also affected by this).  BSD vs. GNU
+userland differences can really bite.  If lucky, error messages are
+spewed.  But just as likely, the commands either don't do the filtering
+requested, or silently corrupt by making some unwanted change.  The
+unwanted change may only affect a few commits, so it's not necessarily
+obvious either.  (The fact that problems won't necessarily be obvious
+means they are likely to go unnoticed until the rewritten history is in
+use for quite a while, at which point it's really hard to justify
+another flag-day for another rewrite.)
+
+* Filenames with spaces are often mishandled by shell snippets since
+they cause problems for shell pipelines.  Not everyone is familiar with
+find -print0, xargs -0, git-ls-files -z, etc.  Even people who are
+familiar with these may assume such flags are not relevant because
+someone else renamed any such files in their repo back before the person
+doing the filtering joined the project.  And often, even those familiar
+with handling arguments with spaces may not do so just because they
+aren't in the mindset of thinking about everything that could possibly
+go wrong.
+
+* Non-ascii filenames can be silently removed despite being in a desired
+directory.  Keeping only wanted paths is often done using pipelines like
+`git ls-files | grep -v ^WANTED_DIR/ | xargs git rm`.  ls-files will
+only quote filenames if needed, so folks may not notice that one of the
+files didn't match the regex (at least not until it's much too late).
+Yes, someone who knows about core.quotePath can avoid this (unless they
+have other special characters like \t, \n, or "), and people who use
+ls-files -z with something other than grep can avoid this, but that
+doesn't mean they will.
+
+* Similarly, when moving files around, one can find that filenames with
+non-ascii or special characters end up in a different directory, one
+that includes a double quote character.  (This is technically the same
+issue as above with quoting, but perhaps an interesting different way
+that it can and has manifested as a problem.)
+
+* It's far too easy to accidentally mix up old and new history.  It's
+still possible with any tool, but git-filter-branch almost invites it.
+If lucky, the only downside is users getting frustrated that they don't
+know how to shrink their repo and remove the old stuff.  If unlucky,
+they merge old and new history and end up with multiple "copies" of each
+commit, some of which have unwanted or sensitive files and others which
+don't.  This comes about in multiple different ways:
+
+  ** the default to only doing a partial history rewrite ('--all' is not
+     the default and few examples show it)
+
+  ** the fact that there's no automatic post-run cleanup
+
+  ** the fact that --tag-name-filter (when used to rename tags) doesn't
+     remove the old tags but just adds new ones with the new name
+
+  ** the fact that little educational information is provided to inform
+     users of the ramifications of a rewrite and how to avoid mixing old
+     and new history.  For example, this man page discusses how users
+     need to understand that they need to rebase their changes for all
+     their branches on top of new history (or delete and reclone), but
+     that's only one of multiple concerns to consider.  See the
+     "DISCUSSION" section of the git filter-repo manual page for more
+     details.
+
+* Annotated tags can be accidentally converted to lightweight tags, due
+to either of two issues:
+
+  ** Someone can do a history rewrite, realize they messed up, restore
+     from the backups in refs/original/, and then redo their
+     git-filter-branch command.  (The backup in refs/original/ is not a
+     real backup; it dereferences tags first.)
+
+  ** Running git-filter-branch with either --tags or --all in your
+     <rev-list options>.  In order to retain annotated tags as
+     annotated, you must use --tag-name-filter (and must not have
+     restored from refs/original/ in a previously botched rewrite).
+
+* Any commit messages that specify an encoding will become corrupted
+by the rewrite; git-filter-branch ignores the encoding, takes the original
+bytes, and feeds it to commit-tree without telling it the proper
+encoding.  (This happens whether or not --msg-filter is used.)
+
+* Commit messages (even if they are all UTF-8) by default become
+corrupted due to not being updated -- any references to other commit
+hashes in commit messages will now refer to no-longer-extant commits.
+
+* There are no facilities for helping users find what unwanted crud they
+should delete, which means they are much more likely to have incomplete
+or partial cleanups that sometimes result in confusion and people
+wasting time trying to understand.  (For example, folks tend to just
+look for big files to delete instead of big directories or extensions,
+and once they do so, then sometime later folks using the new repository
+who are going through history will notice a build artifact directory
+that has some files but not others, or a cache of dependencies
+(node_modules or similar) which couldn't have ever been functional since
+it's missing some files.)
+
+* If --prune-empty isn't specified, then the filtering process can
+create hoards of confusing empty commits
+
+* If --prune-empty is specified, then intentionally placed empty
+commits from before the filtering operation are also pruned instead of
+just pruning commits that became empty due to filtering rules.
+
+* If --prune empty is specified, sometimes empty commits are missed
+and left around anyway (a somewhat rare bug, but it happens...)
+
+* A minor issue, but users who have a goal to update all names and
+emails in a repository may be led to --env-filter which will only update
+authors and committers, missing taggers.
+
+* If the user provides a --tag-name-filter that maps multiple tags to
+the same name, no warning or error is provided; git-filter-branch simply
+overwrites each tag in some undocumented pre-defined order resulting in
+only one tag at the end.  (A git-filter-branch regression test requires
+this surprising behavior.)
+
+Also, the poor performance of git-filter-branch often leads to safety
+issues:
+
+* Coming up with the correct shell snippet to do the filtering you want
+is sometimes difficult unless you're just doing a trivial modification
+such as deleting a couple files.  Unfortunately, people often learn if
+the snippet is right or wrong by trying it out, but the rightness or
+wrongness can vary depending on special circumstances (spaces in
+filenames, non-ascii filenames, funny author names or emails, invalid
+timezones, presence of grafts or replace objects, etc.), meaning they
+may have to wait a long time, hit an error, then restart.  The
+performance of git-filter-branch is so bad that this cycle is painful,
+reducing the time available to carefully re-check (to say nothing about
+what it does to the patience of the person doing the rewrite even if
+they do technically have more time available).  This problem is extra
+compounded because errors from broken filters may not be shown for a
+long time and/or get lost in a sea of output.  Even worse, broken
+filters often just result in silent incorrect rewrites.
+
+* To top it all off, even when users finally find working commands, they
+naturally want to share them.  But they may be unaware that their repo
+didn't have some special cases that someone else's does.  So, when
+someone else with a different repository runs the same commands, they
+get hit by the problems above.  Or, the user just runs commands that
+really were vetted for special cases, but they run it on a different OS
+where it doesn't work, as noted above.
 
 GIT
 ---
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 247f765604..0c114ad1ca 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -115,15 +115,14 @@ NOTES
 -----
 
 'git gc' tries very hard not to delete objects that are referenced
-anywhere in your repository. In
-particular, it will keep not only objects referenced by your current set
-of branches and tags, but also objects referenced by the index,
-remote-tracking branches, refs saved by 'git filter-branch' in
-refs/original/, reflogs (which may reference commits in branches
-that were later amended or rewound), and anything else in the refs/* namespace.
-If you are expecting some objects to be deleted and they aren't, check
-all of those locations and decide whether it makes sense in your case to
-remove those references.
+anywhere in your repository. In particular, it will keep not only
+objects referenced by your current set of branches and tags, but also
+objects referenced by the index, remote-tracking branches, notes saved
+by 'git notes' under refs/notes/, reflogs (which may reference commits
+in branches that were later amended or rewound), and anything else in
+the refs/* namespace.  If you are expecting some objects to be deleted
+and they aren't, check all of those locations and decide whether it
+makes sense in your case to remove those references.
 
 On the other hand, when 'git gc' runs concurrently with another process,
 there is a risk of it deleting an object that the other process is using
diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 6156609cf7..a8cfc0ad82 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -832,7 +832,8 @@ Hard case: The changes are not the same.::
 	This happens if the 'subsystem' rebase had conflicts, or used
 	`--interactive` to omit, edit, squash, or fixup commits; or
 	if the upstream used one of `commit --amend`, `reset`, or
-	`filter-branch`.
+	a full history rewriting command like
+	https://github.com/newren/git-filter-repo[`filter-repo`].
 
 
 The easy case
diff --git a/Documentation/git-replace.txt b/Documentation/git-replace.txt
index 246dc9943c..f271d758c3 100644
--- a/Documentation/git-replace.txt
+++ b/Documentation/git-replace.txt
@@ -123,10 +123,10 @@ The following format are available:
 CREATING REPLACEMENT OBJECTS
 ----------------------------
 
-linkgit:git-filter-branch[1], linkgit:git-hash-object[1] and
-linkgit:git-rebase[1], among other git commands, can be used to create
-replacement objects from existing objects. The `--edit` option can
-also be used with 'git replace' to create a replacement object by
+linkgit:git-hash-object[1], linkgit:git-rebase[1], and
+https://github.com/newren/git-filter-repo[git-filter-repo], among other git commands, can be used to
+create replacement objects from existing objects. The `--edit` option
+can also be used with 'git replace' to create a replacement object by
 editing an existing object.
 
 If you want to replace many blobs, trees or commits that are part of a
@@ -148,13 +148,13 @@ pending objects.
 SEE ALSO
 --------
 linkgit:git-hash-object[1]
-linkgit:git-filter-branch[1]
 linkgit:git-rebase[1]
 linkgit:git-tag[1]
 linkgit:git-branch[1]
 linkgit:git-commit[1]
 linkgit:git-var[1]
 linkgit:git[1]
+https://github.com/newren/git-filter-repo[git-filter-repo]
 
 GIT
 ---
diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
index 30711625fd..53774f5b64 100644
--- a/Documentation/git-svn.txt
+++ b/Documentation/git-svn.txt
@@ -769,11 +769,11 @@ option for (hopefully) obvious reasons.
 +
 This option is NOT recommended as it makes it difficult to track down
 old references to SVN revision numbers in existing documentation, bug
-reports and archives.  If you plan to eventually migrate from SVN to Git
-and are certain about dropping SVN history, consider
-linkgit:git-filter-branch[1] instead.  filter-branch also allows
-reformatting of metadata for ease-of-reading and rewriting authorship
-info for non-"svn.authorsFile" users.
+reports, and archives.  If you plan to eventually migrate from SVN to
+Git and are certain about dropping SVN history, consider
+https://github.com/newren/git-filter-repo[git-filter-repo] instead.
+filter-repo also allows reformatting of metadata for ease-of-reading
+and rewriting authorship info for non-"svn.authorsFile" users.
 
 svn.useSvmProps::
 svn-remote.<name>.useSvmProps::
diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 82cd573776..5a789c91df 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -425,10 +425,12 @@ post-rewrite
 
 This hook is invoked by commands that rewrite commits
 (linkgit:git-commit[1] when called with `--amend` and
-linkgit:git-rebase[1]; currently `git filter-branch` does 'not' call
-it!).  Its first argument denotes the command it was invoked by:
-currently one of `amend` or `rebase`.  Further command-dependent
-arguments may be passed in the future.
+linkgit:git-rebase[1]; however, full-history (re)writing tools like
+linkgit:git-fast-import[1] or
+https://github.com/newren/git-filter-repo[git-filter-repo] typically
+do not call it!).  Its first argument denotes the command it was
+invoked by: currently one of `amend` or `rebase`.  Further
+command-dependent arguments may be passed in the future.
 
 The hook receives a list of the rewritten commits on stdin, in the
 format
diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
index a3425f4770..19333fc8df 100644
--- a/contrib/svn-fe/svn-fe.txt
+++ b/contrib/svn-fe/svn-fe.txt
@@ -56,7 +56,7 @@ line.  This line has the form `git-svn-id: URL@REVNO UUID`.
 
 The resulting repository will generally require further processing
 to put each project in its own repository and to separate the history
-of each branch.  The 'git filter-branch --subdirectory-filter' command
+of each branch.  The 'git filter-repo --subdirectory-filter' command
 may be useful for this purpose.
 
 BUGS
@@ -67,5 +67,5 @@ The exit status does not reflect whether an error was detected.
 
 SEE ALSO
 --------
-git-svn(1), svn2git(1), svk(1), git-filter-branch(1), git-fast-import(1),
+git-svn(1), svn2git(1), svk(1), git-filter-repo(1), git-fast-import(1),
 https://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 5c5afa2b98..fea7964617 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -83,6 +83,20 @@ set_ident () {
 	finish_ident COMMITTER
 }
 
+if test -z "$FILTER_BRANCH_SQUELCH_WARNING$GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS"
+then
+	cat <<EOF
+WARNING: git-filter-branch has a glut of gotchas generating mangled history
+	 rewrites.  Hit Ctrl-C before proceeding to abort, then use an
+	 alternative filtering tool such as 'git filter-repo'
+	 (https://github.com/newren/git-filter-repo/) instead.  See the
+	 filter-branch manual page for more details; to squelch this warning,
+	 set FILTER_BRANCH_SQUELCH_WARNING=1.
+EOF
+	sleep 10
+	printf "Proceeding with filter-branch...\n\n"
+fi
+
 USAGE="[--setup <command>] [--subdirectory-filter <directory>] [--env-filter <command>]
 	[--tree-filter <command>] [--index-filter <command>]
 	[--parent-filter <command>] [--msg-filter <command>]
-- 
2.23.0.3.g19edb94ec2


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v6 3/3] t9902: use a non-deprecated command for testing
  2019-09-04 22:32             ` [PATCH v6 0/3] Warn about git-filter-branch usage and avoid it Elijah Newren
  2019-09-04 22:32               ` [PATCH v6 1/3] t6006: simplify, fix, and optimize empty message test Elijah Newren
  2019-09-04 22:32               ` [PATCH v6 2/3] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
@ 2019-09-04 22:32               ` Elijah Newren
  2 siblings, 0 replies; 73+ messages in thread
From: Elijah Newren @ 2019-09-04 22:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Derrick Stolee, Eric Wong, Jeff King,
	Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Lars Schneider, Jonathan Nieder, Eric Sunshine, Elijah Newren

t9902 had a list of three random porcelain commands as a sanity check,
one of which was filter-branch.  Since we are recommending people not
use filter-branch, let's update this test to use rebase instead of
filter-branch.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t9902-completion.sh | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index 75512c3403..4e7f669c76 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -28,10 +28,10 @@ complete ()
 #
 # (2) A test makes sure that common subcommands are included in the
 #     completion for "git <TAB>", and a plumbing is excluded.  "add",
-#     "filter-branch" and "ls-files" are listed for this.
+#     "rebase" and "ls-files" are listed for this.
 
-GIT_TESTING_ALL_COMMAND_LIST='add checkout check-attr filter-branch ls-files'
-GIT_TESTING_PORCELAIN_COMMAND_LIST='add checkout filter-branch'
+GIT_TESTING_ALL_COMMAND_LIST='add checkout check-attr rebase ls-files'
+GIT_TESTING_PORCELAIN_COMMAND_LIST='add checkout rebase'
 
 . "$GIT_BUILD_DIR/contrib/completion/git-completion.bash"
 
@@ -1392,12 +1392,12 @@ test_expect_success 'basic' '
 	# built-in
 	grep -q "^add \$" out &&
 	# script
-	grep -q "^filter-branch \$" out &&
+	grep -q "^rebase \$" out &&
 	# plumbing
 	! grep -q "^ls-files \$" out &&
 
-	run_completion "git f" &&
-	! grep -q -v "^f" out
+	run_completion "git r" &&
+	! grep -q -v "^r" out
 '
 
 test_expect_success 'double dash "git" itself' '
-- 
2.23.0.3.g19edb94ec2


^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2019-09-04 22:32 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-22 18:26 RFC: Proposing git-filter-repo for inclusion in git.git Elijah Newren
2019-08-22 20:23 ` Junio C Hamano
2019-08-22 21:12   ` Elijah Newren
2019-08-22 21:34     ` Junio C Hamano
2019-08-26 23:52       ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Elijah Newren
2019-08-26 23:52         ` [RFC PATCH 1/5] t6006: simplify and optimize empty message test Elijah Newren
2019-08-27  1:23           ` Derrick Stolee
2019-08-26 23:52         ` [RFC PATCH 2/5] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-08-27  1:25           ` Derrick Stolee
2019-08-26 23:52         ` [RFC PATCH 3/5] git-sh-i18n: work with external scripts Elijah Newren
2019-08-27  1:28           ` Derrick Stolee
2019-08-26 23:52         ` [RFC PATCH 4/5] Recommend git-filter-repo instead of git-filter-branch in documentation Elijah Newren
2019-08-27  1:32           ` Derrick Stolee
2019-08-27  6:23             ` Elijah Newren
2019-08-26 23:52         ` [RFC PATCH 5/5] Remove git-filter-branch, it is now external to git.git Elijah Newren
2019-08-27  1:39         ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Derrick Stolee
2019-08-27  6:17           ` Elijah Newren
2019-08-27  7:03         ` Eric Wong
2019-08-27  8:43           ` Sergey Organov
2019-08-27 22:18             ` Elijah Newren
2019-08-28  8:52               ` Sergey Organov
2019-08-28 17:16                 ` Elijah Newren
2019-08-28 19:03                   ` Sergey Organov
2019-08-30 20:40                   ` Johannes Schindelin
2019-08-30 23:22                     ` Elijah Newren
2019-09-02  9:29                       ` Johannes Schindelin
2019-09-03 17:37                         ` Elijah Newren
2019-08-28  0:22         ` [PATCH v2 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
2019-08-28  0:22           ` [PATCH v2 1/4] t6006: simplify and optimize empty message test Elijah Newren
2019-08-28  0:22           ` [PATCH v2 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-08-28  6:00             ` Eric Sunshine
2019-08-28  0:22           ` [PATCH v2 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-08-28  6:17             ` Eric Sunshine
2019-08-28 21:48               ` Elijah Newren
2019-08-28  0:22           ` [RFC PATCH v2 4/4] Remove git-filter-branch, it is now external to git.git Elijah Newren
2019-08-29  0:06           ` [PATCH v3 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
2019-08-29  0:06             ` [PATCH v3 1/4] t6006: simplify and optimize empty message test Elijah Newren
2019-08-29  0:06             ` [PATCH v3 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-08-29  0:06             ` [PATCH v3 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-08-29 18:10               ` Eric Sunshine
2019-08-30  0:04                 ` Elijah Newren
2019-08-29  0:06             ` [PATCH v3 4/4] t9902: use a non-deprecated command for testing Elijah Newren
2019-08-30  5:57             ` [PATCH v4 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
2019-08-30  5:57               ` [PATCH v4 1/4] t6006: simplify and optimize empty message test Elijah Newren
2019-09-02 14:47                 ` Johannes Schindelin
2019-08-30  5:57               ` [PATCH v4 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-09-02 14:45                 ` Johannes Schindelin
2019-08-30  5:57               ` [PATCH v4 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-08-30  5:57               ` [PATCH v4 4/4] t9902: use a non-deprecated command for testing Elijah Newren
2019-09-03 18:55           ` [PATCH v5 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
2019-09-03 18:55             ` [PATCH v5 1/4] t6006: simplify and optimize empty message test Elijah Newren
2019-09-03 21:08               ` Junio C Hamano
2019-09-03 21:58                 ` Elijah Newren
2019-09-03 22:25                   ` Junio C Hamano
2019-09-03 18:55             ` [PATCH v5 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-09-03 21:26               ` Junio C Hamano
2019-09-03 22:46                 ` Junio C Hamano
2019-09-04 20:32                   ` Elijah Newren
2019-09-03 18:55             ` [PATCH v5 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-09-03 21:40               ` Junio C Hamano
2019-09-04 20:30                 ` Elijah Newren
2019-09-03 18:55             ` [PATCH v5 4/4] t9902: use a non-deprecated command for testing Elijah Newren
2019-09-04 22:32             ` [PATCH v6 0/3] Warn about git-filter-branch usage and avoid it Elijah Newren
2019-09-04 22:32               ` [PATCH v6 1/3] t6006: simplify, fix, and optimize empty message test Elijah Newren
2019-09-04 22:32               ` [PATCH v6 2/3] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-09-04 22:32               ` [PATCH v6 3/3] t9902: use a non-deprecated command for testing Elijah Newren
2019-08-23  3:00     ` RFC: Proposing git-filter-repo for inclusion in git.git Eric Wong
2019-08-23 18:06       ` Elijah Newren
2019-08-23 18:29         ` Elijah Newren
2019-08-28 11:09         ` Johannes Schindelin
2019-08-28 15:06           ` Junio C Hamano
2019-08-23 12:02     ` Derrick Stolee
2019-08-26 19:56   ` Jeff King

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).