git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Git Mailing List <git@vger.kernel.org>
Subject: [ANNOUNCE] git_fast_filter
Date: Tue, 7 Apr 2009 21:35:10 -0600	[thread overview]
Message-ID: <51419b2c0904072035u1182b507o836a67ac308d32b9@mail.gmail.com> (raw)

Just thought I'd make this available, in case there's others with
niche needs that find it useful...


git_fast_filter assists with quickly rewriting the history of a repository
by making it easy to write scripts whose purpose is to serve as safe
filters between fast-export and fast-import.  git_fast_filter comes with
example programs, a basic test-suite, and a double your money back
satisfaction guarantee.  (I love free software.)  You can get it from

  git://gitorious.org/git_fast_filter/mainline.git

In more detail...

=== Purpose ===

git_fast_filter is designed to make it easy to filter or rewrite the
history of a repository.  As such, it fills the same role as
git-filter-branch, and was written primarily to overcome the sometimes
severe speed shortcomings of git-filter-branch.  In particular, using
git_fast_filter can avoid thousands or millions of new process forks, and
can allow you to rewrite the same file only one time instead of 50,000
times.  However, while using git_fast_filter is fairly simple and quick, it
is hard to beat writing a simple git-filter-branch one-liner for efficiency
of human time.  Also, the two tools use very different methods of rewriting
history and do not have exactly overlapping feature sets, so the best tool
for a particular job is going to be very problem dependent.

As human time is often more important than computer time, especially for
one-shot rewrites, git-filter-branch will probably continue to be the more
common tool.  However, git_fast_filter is useful in cases where computer
time of a rewrite matters (particularly larger repositories and more
involved rewrites that need to be run and tested many times on large data
sets).  Also git_fast_filter has a couple features that may come in handy
in special cases (assisting with generating fast-export output from
scratch, interleaving commits from seperate repositories, and bidirectional
collaboration between filtered and unfiltered repositories).

=== Idea ===

The way git_fast_filter works is by providing a simple python library,
git_fast_filter.py.  This library can be used in simple python scripts to
create a filter for the output of git-fast-export.  Thus, the typical
calling convention is of the form:

    git fast-export | filter_script.py | git fast-import

=== Example ===

An example script that renames the 'master' branch to 'other is shown
below (this is similar to the example in the git-fast-export manpage, but
is safe against the string 'refs/heads/master' appearing in some file or
commit message in the repository):

  #!/usr/bin/python

  from git_fast_filter import Commit, FastExportFilter

  def my_commit_callback(commit):
    if commit.branch == "refs/heads/master":
      commit.branch = "refs/heads/other"

  filter = FastExportFilter(commit_callback = my_commit_callback)
  filter.run()

The user can then run this script by:
  $ mkdir target && cd target && git init
  $ (cd /PATH/LEADING/TO/source && git fast-export --all) \
       | /PATH/TO/filter_script.py | git fast-import

(Note: The user can have the script take care of the git init, the cd's,
and the invocations of git fast-export and git fast-import by just passing
directory names to FastExportFilter.run; however, writing out the details
explicitly as in the above example makes it clearer what is going on.)


Elijah

             reply	other threads:[~2009-04-08  3:36 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-08  3:35 Elijah Newren [this message]
2009-04-08  9:45 ` [ANNOUNCE] git_fast_filter Johannes Schindelin
2009-04-08 16:55   ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51419b2c0904072035u1182b507o836a67ac308d32b9@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).