git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Elijah Newren <newren@gmail.com>, git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>, "Eric Wong" <e@80x24.org>,
	"Jeff King" <peff@peff.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Lars Schneider" <larsxschneider@gmail.com>,
	"Jonathan Nieder" <jrnieder@gmail.com>
Subject: Re: [RFC PATCH 4/5] Recommend git-filter-repo instead of git-filter-branch in documentation
Date: Mon, 26 Aug 2019 21:32:58 -0400	[thread overview]
Message-ID: <e7df2ce3-f772-54b5-4e81-70510a897352@gmail.com> (raw)
In-Reply-To: <20190826235226.15386-5-newren@gmail.com>

On 8/26/2019 7:52 PM, Elijah Newren wrote:
> filter-branch suffers from a huge number of pitfalls that can result in
> incorrectly rewritten history, and many of the problems can easily go
> undetected until the new repository is in use.  This can result in
> problems ranging from an even messier history than what led folks to
> filter-branch in the first place, to data loss or corruption.  These
> issues cannot be backward compatibly fixed, so add a warning to the
> filter-branch manpage about this and recommand that another tool (such
> as filter-repo) be used instead.
> 
> Also, update other manpages that referenced filter-branch.  Several of
> these needed updates even if we could continue recommending
> filter-branch, either due to implying that something was unique to
> filter-branch when it applied more generally to all history rewriting
> tools (e.g. BFG, reposurgeon, fast-import, filter-repo), or because
> something about filter-branch was used as an example despite other more
> commonly known examples now existing.  Reword these sections to fix
> these issues and to avoid recommending filter-branch.
> 
> Finally, remove the section explaining BFG Repo Cleaner as an
> alternative to filter-branch.  I feel somewhat bad about this,
> especially since I feel like I learned so much from BFG that I put to
> good use in filter-repo (which is much more than I can say for
> filter-branch), but keeping that section presented a few problems:
>   * In order to recommend that people quit using filter-branch, we need
>     to provide them a recomendation for something else to use that
>     can handle all the same types of rewrites.  To my knowledge,
>     filter-repo is the only such tool.  So it needs to be mentioned.
>   * I don't want to give conflicting recommendations to users
>   * If we recommend two tools, we shouldn't expect users to learn both
>     and pick which one to use; we should explain which problems one
>     can solve that the other can't or when one is much faster than
>     the other.
>   * BFG and filter-repo have similar performance
>   * All filtering types that BFG can do, filter-repo can also do.  In
>     fact, filter-repo comes with a reimplementation of BFG named
>     bfg-ish which provides the same user-interface as BFG but with
>     several bugfixes and new features that are hard to implement in
>     BFG due to its technical underpinnings.
> While I could still mention both tools, it seems like I would need to
> provide some kind of comparison and I would ultimately just say that
> filter-repo can do everything BFG can, so ultimately it seems that it
> is just better to remove that section altogether.
> 
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  Documentation/git-fast-export.txt   |  6 ++---
>  Documentation/git-filter-branch.txt | 42 ++++++++---------------------
>  Documentation/git-gc.txt            | 17 ++++++------
>  Documentation/git-rebase.txt        |  2 +-
>  Documentation/git-replace.txt       | 10 +++----
>  Documentation/git-svn.txt           |  4 +--
>  Documentation/githooks.txt          |  7 ++---
>  contrib/svn-fe/svn-fe.txt           |  4 +--
>  8 files changed, 36 insertions(+), 56 deletions(-)
> 
> diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
> index cc940eb9ad..784e934009 100644
> --- a/Documentation/git-fast-export.txt
> +++ b/Documentation/git-fast-export.txt
> @@ -17,9 +17,9 @@ This program dumps the given revisions in a form suitable to be piped
>  into 'git fast-import'.
>  
>  You can use it as a human-readable bundle replacement (see
> -linkgit:git-bundle[1]), or as a kind of an interactive
> -'git filter-branch'.
> -
> +linkgit:git-bundle[1]), or as a format that can be edited before being
> +fed to 'git fast-import' in order to do history rewrites (an ability
> +relied on by tools like 'git filter-repo').
>  
>  OPTIONS
>  -------
> diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
> index 6b53dd7e06..8c586eed55 100644
> --- a/Documentation/git-filter-branch.txt
> +++ b/Documentation/git-filter-branch.txt
> @@ -16,6 +16,17 @@ SYNOPSIS
>  	[--original <namespace>] [-d <directory>] [-f | --force]
>  	[--state-branch <branch>] [--] [<rev-list options>...]
>  
> +WARNING
> +-------
> +'git filter-branch' has a litany of gotchas that can and will cause
> +history to be rewritten incorrectly (in addition to abysmal
> +performance).  These issues cannot be backward compatibly fixed and as
> +such, its use is not recommended.  Please use an alternative history
> +filtering tool such as 'git filter-repo'.  If you still need to use
> +'git filter-branch', please carefully read the "Safety" section of
> +https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@mail.gmail.com/

Is it possible to present this URL as a hyperlink with a succinct
description? Maybe 'carefully read [the "Safety" section of this
message on the Git mailing list](url).' (I'm using Markdown notation
here as I don't know the equivalent for our docs.)

> +and avoid as many of the pitfalls listed there as reasonably possible.
> +
>  DESCRIPTION
>  -----------
>  Lets you rewrite Git revision history by rewriting the branches mentioned
> @@ -445,37 +456,6 @@ warned.
>    (or if your git-gc is not new enough to support arguments to
>    `--prune`, use `git repack -ad; git prune` instead).
>  
> -NOTES
> ------
> -
> -git-filter-branch allows you to make complex shell-scripted rewrites
> -of your Git history, but you probably don't need this flexibility if
> -you're simply _removing unwanted data_ like large files or passwords.
> -For those operations you may want to consider
> -http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner],
> -a JVM-based alternative to git-filter-branch, typically at least
> -10-50x faster for those use-cases, and with quite different
> -characteristics:
> -
> -* Any particular version of a file is cleaned exactly _once_. The BFG,
> -  unlike git-filter-branch, does not give you the opportunity to
> -  handle a file differently based on where or when it was committed
> -  within your history. This constraint gives the core performance
> -  benefit of The BFG, and is well-suited to the task of cleansing bad
> -  data - you don't care _where_ the bad data is, you just want it
> -  _gone_.
> -
> -* By default The BFG takes full advantage of multi-core machines,
> -  cleansing commit file-trees in parallel. git-filter-branch cleans
> -  commits sequentially (i.e. in a single-threaded manner), though it
> -  _is_ possible to write filters that include their own parallelism,
> -  in the scripts executed against each commit.
> -
> -* The http://rtyley.github.io/bfg-repo-cleaner/#examples[command options]
> -  are much more restrictive than git-filter branch, and dedicated just
> -  to the tasks of removing unwanted data- e.g:
> -  `--strip-blobs-bigger-than 1M`.
> -
>  GIT
>  ---
>  Part of the linkgit:git[1] suite
> diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
> index 247f765604..0c114ad1ca 100644
> --- a/Documentation/git-gc.txt
> +++ b/Documentation/git-gc.txt
> @@ -115,15 +115,14 @@ NOTES
>  -----
>  
>  'git gc' tries very hard not to delete objects that are referenced
> -anywhere in your repository. In
> -particular, it will keep not only objects referenced by your current set
> -of branches and tags, but also objects referenced by the index,
> -remote-tracking branches, refs saved by 'git filter-branch' in
> -refs/original/, reflogs (which may reference commits in branches
> -that were later amended or rewound), and anything else in the refs/* namespace.
> -If you are expecting some objects to be deleted and they aren't, check
> -all of those locations and decide whether it makes sense in your case to
> -remove those references.
> +anywhere in your repository. In particular, it will keep not only
> +objects referenced by your current set of branches and tags, but also
> +objects referenced by the index, remote-tracking branches, notes saved
> +by 'git notes' under refs/notes/, reflogs (which may reference commits
> +in branches that were later amended or rewound), and anything else in
> +the refs/* namespace.  If you are expecting some objects to be deleted
> +and they aren't, check all of those locations and decide whether it
> +makes sense in your case to remove those references.
>  
>  On the other hand, when 'git gc' runs concurrently with another process,
>  there is a risk of it deleting an object that the other process is using
> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> index 6156609cf7..2f201d85d4 100644
> --- a/Documentation/git-rebase.txt
> +++ b/Documentation/git-rebase.txt
> @@ -832,7 +832,7 @@ Hard case: The changes are not the same.::
>  	This happens if the 'subsystem' rebase had conflicts, or used
>  	`--interactive` to omit, edit, squash, or fixup commits; or
>  	if the upstream used one of `commit --amend`, `reset`, or
> -	`filter-branch`.
> +	a full history rewriting command like `filter-repo`.
>  
>  
>  The easy case
> diff --git a/Documentation/git-replace.txt b/Documentation/git-replace.txt
> index 246dc9943c..35595a2cd3 100644
> --- a/Documentation/git-replace.txt
> +++ b/Documentation/git-replace.txt
> @@ -123,10 +123,10 @@ The following format are available:
>  CREATING REPLACEMENT OBJECTS
>  ----------------------------
>  
> -linkgit:git-filter-branch[1], linkgit:git-hash-object[1] and
> -linkgit:git-rebase[1], among other git commands, can be used to create
> -replacement objects from existing objects. The `--edit` option can
> -also be used with 'git replace' to create a replacement object by
> +linkgit:git-hash-object[1], linkgit:git-rebase[1], and
> +linkgit:git-filter-repo[1], among other git commands, can be used to
> +create replacement objects from existing objects. The `--edit` option
> +can also be used with 'git replace' to create a replacement object by
>  editing an existing object.
>  
>  If you want to replace many blobs, trees or commits that are part of a
> @@ -148,8 +148,8 @@ pending objects.
>  SEE ALSO
>  --------
>  linkgit:git-hash-object[1]
> -linkgit:git-filter-branch[1]
>  linkgit:git-rebase[1]
> +linkgit:git-filter-repo[1]
>  linkgit:git-tag[1]
>  linkgit:git-branch[1]
>  linkgit:git-commit[1]
> diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
> index 30711625fd..f2762dd5d4 100644
> --- a/Documentation/git-svn.txt
> +++ b/Documentation/git-svn.txt
> @@ -769,9 +769,9 @@ option for (hopefully) obvious reasons.
>  +
>  This option is NOT recommended as it makes it difficult to track down
>  old references to SVN revision numbers in existing documentation, bug
> -reports and archives.  If you plan to eventually migrate from SVN to Git
> +reports, and archives.  If you plan to eventually migrate from SVN to Git
>  and are certain about dropping SVN history, consider
> -linkgit:git-filter-branch[1] instead.  filter-branch also allows
> +linkgit:git-filter-repo[1] instead.  filter-repo also allows
>  reformatting of metadata for ease-of-reading and rewriting authorship
>  info for non-"svn.authorsFile" users.
>  
> diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
> index 82cd573776..997548f5ed 100644
> --- a/Documentation/githooks.txt
> +++ b/Documentation/githooks.txt
> @@ -425,9 +425,10 @@ post-rewrite
>  
>  This hook is invoked by commands that rewrite commits
>  (linkgit:git-commit[1] when called with `--amend` and
> -linkgit:git-rebase[1]; currently `git filter-branch` does 'not' call
> -it!).  Its first argument denotes the command it was invoked by:
> -currently one of `amend` or `rebase`.  Further command-dependent
> +linkgit:git-rebase[1]; however, full-history (re)writing tools like
> +linkgit:git-fast-import[1] or linkgit:git-filter-repo[1] typically do
> +not call it!).  Its first argument denotes the command it was invoked
> +by: currently one of `amend` or `rebase`.  Further command-dependent
>  arguments may be passed in the future.
>  
>  The hook receives a list of the rewritten commits on stdin, in the
> diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
> index a3425f4770..19333fc8df 100644
> --- a/contrib/svn-fe/svn-fe.txt
> +++ b/contrib/svn-fe/svn-fe.txt
> @@ -56,7 +56,7 @@ line.  This line has the form `git-svn-id: URL@REVNO UUID`.
>  
>  The resulting repository will generally require further processing
>  to put each project in its own repository and to separate the history
> -of each branch.  The 'git filter-branch --subdirectory-filter' command
> +of each branch.  The 'git filter-repo --subdirectory-filter' command
>  may be useful for this purpose.
>  
>  BUGS
> @@ -67,5 +67,5 @@ The exit status does not reflect whether an error was detected.
>  
>  SEE ALSO
>  --------
> -git-svn(1), svn2git(1), svk(1), git-filter-branch(1), git-fast-import(1),
> +git-svn(1), svn2git(1), svk(1), git-filter-repo(1), git-fast-import(1),
>  https://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt
> 


  reply	other threads:[~2019-08-27  1:33 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-22 18:26 RFC: Proposing git-filter-repo for inclusion in git.git Elijah Newren
2019-08-22 20:23 ` Junio C Hamano
2019-08-22 21:12   ` Elijah Newren
2019-08-22 21:34     ` Junio C Hamano
2019-08-26 23:52       ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Elijah Newren
2019-08-26 23:52         ` [RFC PATCH 1/5] t6006: simplify and optimize empty message test Elijah Newren
2019-08-27  1:23           ` Derrick Stolee
2019-08-26 23:52         ` [RFC PATCH 2/5] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-08-27  1:25           ` Derrick Stolee
2019-08-26 23:52         ` [RFC PATCH 3/5] git-sh-i18n: work with external scripts Elijah Newren
2019-08-27  1:28           ` Derrick Stolee
2019-08-26 23:52         ` [RFC PATCH 4/5] Recommend git-filter-repo instead of git-filter-branch in documentation Elijah Newren
2019-08-27  1:32           ` Derrick Stolee [this message]
2019-08-27  6:23             ` Elijah Newren
2019-08-26 23:52         ` [RFC PATCH 5/5] Remove git-filter-branch, it is now external to git.git Elijah Newren
2019-08-27  1:39         ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Derrick Stolee
2019-08-27  6:17           ` Elijah Newren
2019-08-27  7:03         ` Eric Wong
2019-08-27  8:43           ` Sergey Organov
2019-08-27 22:18             ` Elijah Newren
2019-08-28  8:52               ` Sergey Organov
2019-08-28 17:16                 ` Elijah Newren
2019-08-28 19:03                   ` Sergey Organov
2019-08-30 20:40                   ` Johannes Schindelin
2019-08-30 23:22                     ` Elijah Newren
2019-09-02  9:29                       ` Johannes Schindelin
2019-09-03 17:37                         ` Elijah Newren
2019-08-28  0:22         ` [PATCH v2 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
2019-08-28  0:22           ` [PATCH v2 1/4] t6006: simplify and optimize empty message test Elijah Newren
2019-08-28  0:22           ` [PATCH v2 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-08-28  6:00             ` Eric Sunshine
2019-08-28  0:22           ` [PATCH v2 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-08-28  6:17             ` Eric Sunshine
2019-08-28 21:48               ` Elijah Newren
2019-08-28  0:22           ` [RFC PATCH v2 4/4] Remove git-filter-branch, it is now external to git.git Elijah Newren
2019-08-29  0:06           ` [PATCH v3 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
2019-08-29  0:06             ` [PATCH v3 1/4] t6006: simplify and optimize empty message test Elijah Newren
2019-08-29  0:06             ` [PATCH v3 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-08-29  0:06             ` [PATCH v3 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-08-29 18:10               ` Eric Sunshine
2019-08-30  0:04                 ` Elijah Newren
2019-08-29  0:06             ` [PATCH v3 4/4] t9902: use a non-deprecated command for testing Elijah Newren
2019-08-30  5:57             ` [PATCH v4 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
2019-08-30  5:57               ` [PATCH v4 1/4] t6006: simplify and optimize empty message test Elijah Newren
2019-09-02 14:47                 ` Johannes Schindelin
2019-08-30  5:57               ` [PATCH v4 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-09-02 14:45                 ` Johannes Schindelin
2019-08-30  5:57               ` [PATCH v4 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-08-30  5:57               ` [PATCH v4 4/4] t9902: use a non-deprecated command for testing Elijah Newren
2019-09-03 18:55           ` [PATCH v5 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
2019-09-03 18:55             ` [PATCH v5 1/4] t6006: simplify and optimize empty message test Elijah Newren
2019-09-03 21:08               ` Junio C Hamano
2019-09-03 21:58                 ` Elijah Newren
2019-09-03 22:25                   ` Junio C Hamano
2019-09-03 18:55             ` [PATCH v5 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-09-03 21:26               ` Junio C Hamano
2019-09-03 22:46                 ` Junio C Hamano
2019-09-04 20:32                   ` Elijah Newren
2019-09-03 18:55             ` [PATCH v5 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-09-03 21:40               ` Junio C Hamano
2019-09-04 20:30                 ` Elijah Newren
2019-09-03 18:55             ` [PATCH v5 4/4] t9902: use a non-deprecated command for testing Elijah Newren
2019-09-04 22:32             ` [PATCH v6 0/3] Warn about git-filter-branch usage and avoid it Elijah Newren
2019-09-04 22:32               ` [PATCH v6 1/3] t6006: simplify, fix, and optimize empty message test Elijah Newren
2019-09-04 22:32               ` [PATCH v6 2/3] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-09-04 22:32               ` [PATCH v6 3/3] t9902: use a non-deprecated command for testing Elijah Newren
2019-08-23  3:00     ` RFC: Proposing git-filter-repo for inclusion in git.git Eric Wong
2019-08-23 18:06       ` Elijah Newren
2019-08-23 18:29         ` Elijah Newren
2019-08-28 11:09         ` Johannes Schindelin
2019-08-28 15:06           ` Junio C Hamano
2019-08-23 12:02     ` Derrick Stolee
2019-08-26 19:56   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e7df2ce3-f772-54b5-4e81-70510a897352@gmail.com \
    --to=stolee@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=larsxschneider@gmail.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).