From: Elijah Newren <newren@gmail.com> To: Git Mailing List <git@vger.kernel.org> Subject: [ANNOUNCE] git_fast_filter Date: Tue, 7 Apr 2009 21:35:10 -0600 [thread overview] Message-ID: <51419b2c0904072035u1182b507o836a67ac308d32b9@mail.gmail.com> (raw) Just thought I'd make this available, in case there's others with niche needs that find it useful... git_fast_filter assists with quickly rewriting the history of a repository by making it easy to write scripts whose purpose is to serve as safe filters between fast-export and fast-import. git_fast_filter comes with example programs, a basic test-suite, and a double your money back satisfaction guarantee. (I love free software.) You can get it from git://gitorious.org/git_fast_filter/mainline.git In more detail... === Purpose === git_fast_filter is designed to make it easy to filter or rewrite the history of a repository. As such, it fills the same role as git-filter-branch, and was written primarily to overcome the sometimes severe speed shortcomings of git-filter-branch. In particular, using git_fast_filter can avoid thousands or millions of new process forks, and can allow you to rewrite the same file only one time instead of 50,000 times. However, while using git_fast_filter is fairly simple and quick, it is hard to beat writing a simple git-filter-branch one-liner for efficiency of human time. Also, the two tools use very different methods of rewriting history and do not have exactly overlapping feature sets, so the best tool for a particular job is going to be very problem dependent. As human time is often more important than computer time, especially for one-shot rewrites, git-filter-branch will probably continue to be the more common tool. However, git_fast_filter is useful in cases where computer time of a rewrite matters (particularly larger repositories and more involved rewrites that need to be run and tested many times on large data sets). Also git_fast_filter has a couple features that may come in handy in special cases (assisting with generating fast-export output from scratch, interleaving commits from seperate repositories, and bidirectional collaboration between filtered and unfiltered repositories). === Idea === The way git_fast_filter works is by providing a simple python library, git_fast_filter.py. This library can be used in simple python scripts to create a filter for the output of git-fast-export. Thus, the typical calling convention is of the form: git fast-export | filter_script.py | git fast-import === Example === An example script that renames the 'master' branch to 'other is shown below (this is similar to the example in the git-fast-export manpage, but is safe against the string 'refs/heads/master' appearing in some file or commit message in the repository): #!/usr/bin/python from git_fast_filter import Commit, FastExportFilter def my_commit_callback(commit): if commit.branch == "refs/heads/master": commit.branch = "refs/heads/other" filter = FastExportFilter(commit_callback = my_commit_callback) filter.run() The user can then run this script by: $ mkdir target && cd target && git init $ (cd /PATH/LEADING/TO/source && git fast-export --all) \ | /PATH/TO/filter_script.py | git fast-import (Note: The user can have the script take care of the git init, the cd's, and the invocations of git fast-export and git fast-import by just passing directory names to FastExportFilter.run; however, writing out the details explicitly as in the above example makes it clearer what is going on.) Elijah
next reply other threads:[~2009-04-08 3:36 UTC|newest] Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top 2009-04-08 3:35 Elijah Newren [this message] 2009-04-08 9:45 ` Johannes Schindelin 2009-04-08 16:55 ` Elijah Newren
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=51419b2c0904072035u1182b507o836a67ac308d32b9@mail.gmail.com \ --to=newren@gmail.com \ --cc=git@vger.kernel.org \ --subject='Re: [ANNOUNCE] git_fast_filter' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).