git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Johannes Sixt <j.sixt@viscovery.net>,
	Ilya Basin <basinilya@gmail.com>,
	git@vger.kernel.org
Subject: Re: [PATCH 1/2] git-sh-setup: refactor ident-parsing functions
Date: Mon, 12 Nov 2012 14:44:34 -0500	[thread overview]
Message-ID: <20121112194434.GB4623@sigill.intra.peff.net> (raw)
In-Reply-To: <7vpq3ik97i.fsf@alter.siamese.dyndns.org>

On Mon, Nov 12, 2012 at 09:44:01AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The only ident-parsing function we currently provide is
> > get_author_ident_from_commit. This is not very
> > flexible for two reasons:
> >
> >   1. It takes a commit as an argument, and can't read from
> >      commit headers saved on disk.
> >
> >   2. It will only parse authors, not committers.
> >
> > This patch provides a more flexible interface which will
> > parse multiple idents from a commit provide on stdin. We can
> > easily use it as a building block for the current function
> > to retain compatibility.
> >
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> > Since we are counting processes in this series, I should note that this
> > actually adds a subshell invocation for each call, since it went from:
> >
> >   script='...'
> >   sed $script
> >
> > to:
> >
> >   sed "$(make_script)"
> >
> > For filter-branch, which is really the only high-performance caller we
> > have, this is negated by the fact that it will do author and committer
> > at the same time, saving us an extra subshell (in addition to an extra
> > sed invocation).
> 
> Given that pick-ident-script is a const function, a caller that
> repeatedly call is could call it once and use it in a variable, no?

The problem is that it is a helper called from parse_ident_from_commit.
And that function just passes along its arguments, so it does not know
that it is being called repeatedly with the same arguments. So you'd
have to either change the interface or memoize internally.

I don't think memoization is a good option for two reasons:

  1. Storing the arguments to compare to later is complex. You don't
     want to just store "$*" from the last run and see if we got the
     same arguments. You'd have to quote your delimiter (e.g., you would
     not want to confuse ("foo", "bar") with ("foo bar"). Though in this
     instance, we know that our args do not have spaces, so we could get
     away with that.

  2. If you are in a subshell or even a while loop, your memoized
     variable will not be retained.

So unless somebody has some clever scheme for memoizing shell functions
without any process overhead, it is probably not worth it.

Changing the interface for get_author_ident_from_commit would be a pain,
but if we just wanted to help filter-branch, we could do something like
this:

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 5314249..7a693ba 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -74,7 +74,7 @@ finish_ident() {
 }
 
 set_ident () {
-	parse_ident_from_commit author AUTHOR committer COMMITTER
+	parse_ident_from_commit_via_script "$ident_script"
 	finish_ident AUTHOR
 	finish_ident COMMITTER
 }
@@ -93,6 +93,7 @@ if [ "$(is_bare_repository)" = false ]; then
 	require_clean_work_tree 'rewrite branches'
 fi
 
+ident_script=$(pick_ident_script author AUTHOR committer COMMITTER)
 tempdir=.git-rewrite
 filter_env=
 filter_tree=
diff --git a/git-sh-setup.sh b/git-sh-setup.sh
index 22f0aed..1e20e17 100644
--- a/git-sh-setup.sh
+++ b/git-sh-setup.sh
@@ -225,10 +225,17 @@ pick_ident_script () {
 	echo '/^$/q'
 }
 
+# Feed a pick_ident_script return value to sed. Use this instead of
+# parse_ident_from_commit below if you are going to be parsing commits in a
+# tight loop and want to save a process.
+parse_ident_from_commit_via_script() {
+	LANG=C LC_ALL=C sed -ne "$1"
+}
+
 # Create a pick-script as above and feed it to sed. Stdout is suitable for
 # feeding to eval.
 parse_ident_from_commit () {
-	LANG=C LC_ALL=C sed -ne "$(pick_ident_script "$@")"
+	parse_ident_from_commit_via_script "$(pick_ident_script "$@")"
 }
 
 # Parse the author from a commit given as an argument. Stdout is suitable for

  reply	other threads:[~2012-11-12 19:44 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-17  6:47 What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris? Ilya Basin
2012-10-17  7:18 ` Jeff King
2012-10-17  7:23 ` Johannes Sixt
2012-10-17  8:58   ` Re[2]: " Ilya Basin
2012-10-17 10:36     ` Re[3]: " Ilya Basin
2012-10-17 22:13       ` Jeff King
2012-10-17 22:09     ` Jeff King
2012-10-18  5:31       ` Johannes Sixt
2012-10-18  5:36         ` Jeff King
2012-10-18  6:06           ` Junio C Hamano
2012-10-18  6:08             ` Jeff King
2012-10-18  7:22               ` [PATCH 0/2] clean up filter-branch ident parsing Jeff King
2012-10-18  7:25                 ` [PATCH 1/2] git-sh-setup: refactor ident-parsing functions Jeff King
2012-11-12 17:44                   ` Junio C Hamano
2012-11-12 19:44                     ` Jeff King [this message]
2012-11-12 20:08                       ` Junio C Hamano
2012-11-12 20:12                         ` Jeff King
2012-11-12 20:32                           ` Junio C Hamano
2012-10-18  7:25                 ` [PATCH 2/2] filter-branch: use git-sh-setup's ident parsing functions Jeff King
2012-10-18  7:49                   ` Johannes Sixt
2012-10-18  7:54                     ` Jeff King
2012-10-18 10:22                       ` Jeff King
2012-10-18 10:26                         ` Jeff King
2012-10-18 10:33                           ` [PATCHv2 " Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121112194434.GB4623@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=basinilya@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j.sixt@viscovery.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).