git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris?
@ 2012-10-17  6:47 Ilya Basin
  2012-10-17  7:18 ` Jeff King
  2012-10-17  7:23 ` Johannes Sixt
  0 siblings, 2 replies; 24+ messages in thread
From: Ilya Basin @ 2012-10-17  6:47 UTC (permalink / raw)
  To: git

The filter-branch command, the contents of ~/.gitconfig and the tree
are the same.
The command succeeds on cygwin, but fails on Solaris due to
unset GIT_AUTHOR_NAME and GIT_COMMITTER_NAME :

$ git filter-branch --tree-filter "env | grep GIT_ ; $CMD" b416b9bfc5e71531f2f05af4c396bb0ba7560741..HEAD
Rewrite 214efc6eec82b015aefe23b2280979f05b351396 (1/16)GIT_DIR=/home/tester/.ilya/builds/makepkg.rap_0.1-1_sparc.XXXXXX/src/rap/.git
GIT_INDEX_FILE=/home/tester/.ilya/builds/makepkg.rap_0.1-1_sparc.XXXXXX/src/rap/.git-rewrite/t/../index
GIT_WORK_TREE=.
GIT_AUTHOR_NAME=
GIT_COMMITTER_NAME=
GIT_COMMIT=214efc6eec82b015aefe23b2280979f05b351396
fatal: empty ident  <my@email.com> not allowed
could not write rewritten commit

If I explicitly set these 2 variables, filter-branch succeeds, but
other commit attributes like commit date aren't preserved.

I use git 1.7.6, from sunfreeware.

I hope there is some simple thing that needs to be configured.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris?
  2012-10-17  6:47 What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris? Ilya Basin
@ 2012-10-17  7:18 ` Jeff King
  2012-10-17  7:23 ` Johannes Sixt
  1 sibling, 0 replies; 24+ messages in thread
From: Jeff King @ 2012-10-17  7:18 UTC (permalink / raw)
  To: Ilya Basin; +Cc: git

On Wed, Oct 17, 2012 at 10:47:29AM +0400, Ilya Basin wrote:

> The filter-branch command, the contents of ~/.gitconfig and the tree
> are the same.
> The command succeeds on cygwin, but fails on Solaris due to
> unset GIT_AUTHOR_NAME and GIT_COMMITTER_NAME :

That shouldn't happen. The likely culprit is that the sed magic in the
set_ident function of git-filter-branch is not portable to your version
of sed.

What happens if you run this:

	echo 'author Your Name <you@example.com> 1350408529 -0400' >commit
	set -- author
	lid="$(echo "$1" | tr "[A-Z]" "[a-z]")"
	uid="$(echo "$1" | tr "[a-z]" "[A-Z]")"
        pick_id_script='
                /^'$lid' /{
                        s/'\''/'\''\\'\'\''/g
                        h
                        s/^'$lid' \([^<]*\) <[^>]*> .*$/\1/
                        s/'\''/'\''\'\'\''/g
                        s/.*/GIT_'$uid'_NAME='\''&'\''; export GIT_'$uid'_NAME/p

                        g
                        s/^'$lid' [^<]* <\([^>]*\)> .*$/\1/
                        s/'\''/'\''\'\'\''/g
                        s/.*/GIT_'$uid'_EMAIL='\''&'\''; export GIT_'$uid'_EMAIL/p

                        g
                        s/^'$lid' [^<]* <[^>]*> \(.*\)$/@\1/
                        s/'\''/'\''\'\'\''/g
                        s/.*/GIT_'$uid'_DATE='\''&'\''; export GIT_'$uid'_DATE/p

                        q
                }
        '
        LANG=C LC_ALL=C sed -ne "$pick_id_script" <commit

in your shell? You should get:

  GIT_AUTHOR_NAME='Your Name'; export GIT_AUTHOR_NAME
  GIT_AUTHOR_EMAIL='you@example.com'; export GIT_AUTHOR_EMAIL
  GIT_AUTHOR_DATE='@1350408529 -0400'; export GIT_AUTHOR_DATE

> I use git 1.7.6, from sunfreeware.

It might also be worth testing v1.7.12, but reading the logs, I don't
think there has been any meaningful update to filter-branch since then.

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris?
  2012-10-17  6:47 What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris? Ilya Basin
  2012-10-17  7:18 ` Jeff King
@ 2012-10-17  7:23 ` Johannes Sixt
  2012-10-17  8:58   ` Re[2]: " Ilya Basin
  1 sibling, 1 reply; 24+ messages in thread
From: Johannes Sixt @ 2012-10-17  7:23 UTC (permalink / raw)
  To: Ilya Basin; +Cc: git

Am 10/17/2012 8:47, schrieb Ilya Basin:
> The filter-branch command, the contents of ~/.gitconfig and the tree
> are the same.
> The command succeeds on cygwin, but fails on Solaris due to
> unset GIT_AUTHOR_NAME and GIT_COMMITTER_NAME :
> 
> $ git filter-branch --tree-filter "env | grep GIT_ ; $CMD" b416b9bfc5e71531f2f05af4c396bb0ba7560741..HEAD
> Rewrite 214efc6eec82b015aefe23b2280979f05b351396 (1/16)GIT_DIR=/home/tester/.ilya/builds/makepkg.rap_0.1-1_sparc.XXXXXX/src/rap/.git
> GIT_INDEX_FILE=/home/tester/.ilya/builds/makepkg.rap_0.1-1_sparc.XXXXXX/src/rap/.git-rewrite/t/../index
> GIT_WORK_TREE=.
> GIT_AUTHOR_NAME=
> GIT_COMMITTER_NAME=
> GIT_COMMIT=214efc6eec82b015aefe23b2280979f05b351396
> fatal: empty ident  <my@email.com> not allowed
> could not write rewritten commit

Most likely, your sed has problems with a sed script in function
get_author_ident_from_commit. I tested it like this:

$ sh -c '. $(git --exec-path)/git-sh-setup;
		get_author_ident_from_commit HEAD'
GIT_AUTHOR_NAME='Johannes Sixt'
GIT_AUTHOR_EMAIL='j6t@kdbg.org'
GIT_AUTHOR_DATE='@1350025129 +0200'

-- Hannes

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re[2]: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris?
  2012-10-17  7:23 ` Johannes Sixt
@ 2012-10-17  8:58   ` Ilya Basin
  2012-10-17 10:36     ` Re[3]: " Ilya Basin
  2012-10-17 22:09     ` Jeff King
  0 siblings, 2 replies; 24+ messages in thread
From: Ilya Basin @ 2012-10-17  8:58 UTC (permalink / raw)
  To: Johannes Sixt, Jeff King; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 554 bytes --]

JS> Most likely, your sed has problems with a sed script in function
JS> get_author_ident_from_commit. I tested it like this:

JS> $ sh -c '. $(git --exec-path)/git-sh-setup;
JS>                 get_author_ident_from_commit HEAD'
JS> GIT_AUTHOR_NAME='Johannes Sixt'
JS> GIT_AUTHOR_EMAIL='j6t@kdbg.org'
JS> GIT_AUTHOR_DATE='@1350025129 +0200'

JS> -- Hannes

Both systems have GNU sed 4.2.1 installed. I wrote a wrapper script wor sed.
It's output attached.
The difference is letter case in sed input data:
Solaris:
  /^AUTHOR /
Windows:
  /^author /

-- 

[-- Attachment #2: git-sed-good.txt --]
[-- Type: text/plain, Size: 4048 bytes --]

$ git filter-branch -f --tree-filter "env | grep GIT_; true" HEAD~1..HEAD

SED BEGIN
SED ARGUMENTS: -e s/-/ /
SED STDIN BEGIN
git-filter-branch

SED STDIN END

SED OUTPUT BEGIN
git filter-branch

SED OUTPUT END
SED EXIT CODE: 0
SED END

SED BEGIN
SED ARGUMENTS: -e /^^/d /cygdrive/c/sicap/rap/gitcvs/RAP27/.git-rewrite/raw-heads
SED INPUT FILE BEGIN: /cygdrive/c/sicap/rap/gitcvs/RAP27/.git-rewrite/raw-heads
refs/heads/master

SED INPUT FILE   END: /cygdrive/c/sicap/rap/gitcvs/RAP27/.git-rewrite/raw-heads

SED OUTPUT BEGIN
refs/heads/master

SED OUTPUT END
SED EXIT CODE: 0
SED END
Rewrite acd1d2bb1984c96630d5070497590307151c4682 (1/1)
SED BEGIN
SED ARGUMENTS: -ne
                /^author /{
                        s/'/'\\''/g
                        h
                        s/^author \([^<]*\) <[^>]*> .*$/\1/
                        s/'/'\''/g
                        s/.*/GIT_AUTHOR_NAME='&'; export GIT_AUTHOR_NAME/p

                        g
                        s/^author [^<]* <\([^>]*\)> .*$/\1/
                        s/'/'\''/g
                        s/.*/GIT_AUTHOR_EMAIL='&'; export GIT_AUTHOR_EMAIL/p

                        g
                        s/^author [^<]* <[^>]*> \(.*\)$/\1/
                        s/'/'\''/g
                        s/.*/GIT_AUTHOR_DATE='&'; export GIT_AUTHOR_DATE/p

                        q
                }

SED STDIN BEGIN
tree 969f563d319049bb6dabc12054d67671499a6f55
parent c4734950e37c09ca7d3e3088f6f31d866dbb5077
author Ilya Basin <basinilya@gmail.com> 1350401059 +0400
committer Ilya Basin <basinilya@gmail.com> 1350405585 +0400

temp

SED STDIN END

SED OUTPUT BEGIN
GIT_AUTHOR_NAME='Ilya Basin'; export GIT_AUTHOR_NAME
GIT_AUTHOR_EMAIL='basinilya@gmail.com'; export GIT_AUTHOR_EMAIL
GIT_AUTHOR_DATE='1350401059 +0400'; export GIT_AUTHOR_DATE

SED OUTPUT END
SED EXIT CODE: 0
SED END

SED BEGIN
SED ARGUMENTS: -ne
                /^committer /{
                        s/'/'\\''/g
                        h
                        s/^committer \([^<]*\) <[^>]*> .*$/\1/
                        s/'/'\''/g
                        s/.*/GIT_COMMITTER_NAME='&'; export GIT_COMMITTER_NAME/p

                        g
                        s/^committer [^<]* <\([^>]*\)> .*$/\1/
                        s/'/'\''/g
                        s/.*/GIT_COMMITTER_EMAIL='&'; export GIT_COMMITTER_EMAIL/p

                        g
                        s/^committer [^<]* <[^>]*> \(.*\)$/\1/
                        s/'/'\''/g
                        s/.*/GIT_COMMITTER_DATE='&'; export GIT_COMMITTER_DATE/p

                        q
                }

SED STDIN BEGIN
tree 969f563d319049bb6dabc12054d67671499a6f55
parent c4734950e37c09ca7d3e3088f6f31d866dbb5077
author Ilya Basin <basinilya@gmail.com> 1350401059 +0400
committer Ilya Basin <basinilya@gmail.com> 1350405585 +0400

temp

SED STDIN END

SED OUTPUT BEGIN
GIT_COMMITTER_NAME='Ilya Basin'; export GIT_COMMITTER_NAME
GIT_COMMITTER_EMAIL='basinilya@gmail.com'; export GIT_COMMITTER_EMAIL
GIT_COMMITTER_DATE='1350405585 +0400'; export GIT_COMMITTER_DATE

SED OUTPUT END
SED EXIT CODE: 0
SED END
GIT_DIR=/cygdrive/c/sicap/rap/gitcvs/RAP27/.git
GIT_AUTHOR_DATE=1350401059 +0400
GIT_INDEX_FILE=/cygdrive/c/sicap/rap/gitcvs/RAP27/.git-rewrite/t/../index
GIT_WORK_TREE=.
GIT_AUTHOR_NAME=Ilya Basin
GIT_COMMITTER_NAME=Ilya Basin
GIT_COMMIT=acd1d2bb1984c96630d5070497590307151c4682
GIT_COMMITTER_EMAIL=basinilya@gmail.com
GIT_COMMITTER_DATE=1350405585 +0400
GIT_AUTHOR_EMAIL=basinilya@gmail.com

SED BEGIN
SED ARGUMENTS: -e 1,/^$/d
SED STDIN BEGIN
tree 969f563d319049bb6dabc12054d67671499a6f55
parent c4734950e37c09ca7d3e3088f6f31d866dbb5077
author Ilya Basin <basinilya@gmail.com> 1350401059 +0400
committer Ilya Basin <basinilya@gmail.com> 1350405585 +0400

temp

SED STDIN END

SED OUTPUT BEGIN
temp

SED OUTPUT END
SED EXIT CODE: 0
SED END

WARNING: Ref 'refs/heads/master' is unchanged

[-- Attachment #3: git-sed-sol.txt --]
[-- Type: text/plain, Size: 3715 bytes --]

-bash-3.00$ git filter-branch -f --tree-filter "env | grep GIT_; true" HEAD~1..HEAD

SED BEGIN
SED ARGUMENTS: -e s/-/ /
SED STDIN BEGIN
git-filter-branch

SED STDIN END

SED OUTPUT BEGIN
git filter-branch

SED OUTPUT END
SED EXIT CODE: 0
SED END

SED BEGIN
SED ARGUMENTS: -e /^^/d /home/tester/.ilya/builds/makepkg.rap_0.1-1_sparc.XXXXXX/src/rap/.git-rewrite/raw-heads
SED INPUT FILE BEGIN: /home/tester/.ilya/builds/makepkg.rap_0.1-1_sparc.XXXXXX/src/rap/.git-rewrite/raw-heads
refs/heads/master

SED INPUT FILE   END: /home/tester/.ilya/builds/makepkg.rap_0.1-1_sparc.XXXXXX/src/rap/.git-rewrite/raw-heads

SED OUTPUT BEGIN
refs/heads/master

SED OUTPUT END
SED EXIT CODE: 0
SED END
Rewrite acd1d2bb1984c96630d5070497590307151c4682 (1/1)
SED BEGIN
SED ARGUMENTS: -ne
                /^AUTHOR /{
                        s/'/'\\''/g
                        h
                        s/^AUTHOR \([^<]*\) <[^>]*> .*$/\1/
                        s/'/'\''/g
                        s/.*/GIT_AUTHOR_NAME='&'; export GIT_AUTHOR_NAME/p

                        g
                        s/^AUTHOR [^<]* <\([^>]*\)> .*$/\1/
                        s/'/'\''/g
                        s/.*/GIT_AUTHOR_EMAIL='&'; export GIT_AUTHOR_EMAIL/p

                        g
                        s/^AUTHOR [^<]* <[^>]*> \(.*\)$/\1/
                        s/'/'\''/g
                        s/.*/GIT_AUTHOR_DATE='&'; export GIT_AUTHOR_DATE/p

                        q
                }

SED STDIN BEGIN
tree 969f563d319049bb6dabc12054d67671499a6f55
parent c4734950e37c09ca7d3e3088f6f31d866dbb5077
author Ilya Basin <basinilya@gmail.com> 1350401059 +0400
committer Ilya Basin <basinilya@gmail.com> 1350405585 +0400

temp

SED STDIN END

SED OUTPUT BEGIN

SED OUTPUT END
SED EXIT CODE: 0
SED END

SED BEGIN
SED ARGUMENTS: -ne
                /^COMMITTER /{
                        s/'/'\\''/g
                        h
                        s/^COMMITTER \([^<]*\) <[^>]*> .*$/\1/
                        s/'/'\''/g
                        s/.*/GIT_COMMITTER_NAME='&'; export GIT_COMMITTER_NAME/p

                        g
                        s/^COMMITTER [^<]* <\([^>]*\)> .*$/\1/
                        s/'/'\''/g
                        s/.*/GIT_COMMITTER_EMAIL='&'; export GIT_COMMITTER_EMAIL/p

                        g
                        s/^COMMITTER [^<]* <[^>]*> \(.*\)$/\1/
                        s/'/'\''/g
                        s/.*/GIT_COMMITTER_DATE='&'; export GIT_COMMITTER_DATE/p

                        q
                }

SED STDIN BEGIN
tree 969f563d319049bb6dabc12054d67671499a6f55
parent c4734950e37c09ca7d3e3088f6f31d866dbb5077
author Ilya Basin <basinilya@gmail.com> 1350401059 +0400
committer Ilya Basin <basinilya@gmail.com> 1350405585 +0400

temp

SED STDIN END

SED OUTPUT BEGIN

SED OUTPUT END
SED EXIT CODE: 0
SED END
GIT_DIR=/home/tester/.ilya/builds/makepkg.rap_0.1-1_sparc.XXXXXX/src/rap/.git
GIT_INDEX_FILE=/home/tester/.ilya/builds/makepkg.rap_0.1-1_sparc.XXXXXX/src/rap/.git-rewrite/t/../index
GIT_WORK_TREE=.
GIT_AUTHOR_NAME=
GIT_COMMITTER_NAME=
GIT_COMMIT=acd1d2bb1984c96630d5070497590307151c4682

SED BEGIN
SED ARGUMENTS: -e 1,/^$/d
SED STDIN BEGIN
tree 969f563d319049bb6dabc12054d67671499a6f55
parent c4734950e37c09ca7d3e3088f6f31d866dbb5077
author Ilya Basin <basinilya@gmail.com> 1350401059 +0400
committer Ilya Basin <basinilya@gmail.com> 1350405585 +0400

temp

SED STDIN END

SED OUTPUT BEGIN
temp

SED OUTPUT END
SED EXIT CODE: 0
SED END
fatal: empty ident  <basinilya@gmail.com> not allowed
could not write rewritten commit
-bash-3.00$

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re[3]: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris?
  2012-10-17  8:58   ` Re[2]: " Ilya Basin
@ 2012-10-17 10:36     ` Ilya Basin
  2012-10-17 22:13       ` Jeff King
  2012-10-17 22:09     ` Jeff King
  1 sibling, 1 reply; 24+ messages in thread
From: Ilya Basin @ 2012-10-17 10:36 UTC (permalink / raw)
  To: Johannes Sixt, Jeff King; +Cc: git

JS>> Most likely, your sed has problems with a sed script in function
JS>> get_author_ident_from_commit. I tested it like this:

JS>> $ sh -c '. $(git --exec-path)/git-sh-setup;
JS>>                 get_author_ident_from_commit HEAD'
JS>> GIT_AUTHOR_NAME='Johannes Sixt'
JS>> GIT_AUTHOR_EMAIL='j6t@kdbg.org'
JS>> GIT_AUTHOR_DATE='@1350025129 +0200'

JS>> -- Hannes

IB> Both systems have GNU sed 4.2.1 installed. I wrote a wrapper script wor sed.
IB> It's output attached.
IB> The difference is letter case in sed input data:
IB> Solaris:
IB>   /^AUTHOR /
IB> Windows:
IB>   /^author /

The culprit is bad $PATH :
When git-filter-branch runs, for some reason two new entries precede
/usr/bin in it:
     /tmp/777/.ilya-sparc/bin
     /home/tester/.ilya/opt/SNiFF-3.2.1/bin
     /export/home/testora/app/testora/product/11.2.0/client_32/bin
    +/usr/xpg6/bin
    +/usr/xpg4/bin
     /usr/bin
     /home/tester/apache-ant-1.7.1/bin
     /usr/jdk/instances/jdk1.5.0//bin

And /usr/xpg6/bin/tr fails to make "AUTHOR" lowercase.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris?
  2012-10-17  8:58   ` Re[2]: " Ilya Basin
  2012-10-17 10:36     ` Re[3]: " Ilya Basin
@ 2012-10-17 22:09     ` Jeff King
  2012-10-18  5:31       ` Johannes Sixt
  1 sibling, 1 reply; 24+ messages in thread
From: Jeff King @ 2012-10-17 22:09 UTC (permalink / raw)
  To: Ilya Basin; +Cc: Johannes Sixt, git

On Wed, Oct 17, 2012 at 12:58:47PM +0400, Ilya Basin wrote:

> JS> Most likely, your sed has problems with a sed script in function
> JS> get_author_ident_from_commit. I tested it like this:
> 
> JS> $ sh -c '. $(git --exec-path)/git-sh-setup;
> JS>                 get_author_ident_from_commit HEAD'
> JS> GIT_AUTHOR_NAME='Johannes Sixt'
> JS> GIT_AUTHOR_EMAIL='j6t@kdbg.org'
> JS> GIT_AUTHOR_DATE='@1350025129 +0200'
> 
> JS> -- Hannes
> 
> Both systems have GNU sed 4.2.1 installed. I wrote a wrapper script wor sed.
> It's output attached.
> The difference is letter case in sed input data:
> Solaris:
>   /^AUTHOR /
> Windows:
>   /^author /

Ah, so it's tr that is the culprit. We've had problems with Solaris tr
before, but usually around NULs or the use of brackets. But according to
40a7ce6 (tr portability fixes, 2008-03-12), filter-branch is already
doing it the portable way.

If you apply this patch, does your filter-branch work?

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 178e453..58b1d69 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -68,8 +68,8 @@ set_ident () {
 # "author" or "committer
 
 set_ident () {
-	lid="$(echo "$1" | tr "[A-Z]" "[a-z]")"
-	uid="$(echo "$1" | tr "[a-z]" "[A-Z]")"
+	lid="$(echo "$1" | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz)"
+	uid="$(echo "$1" | tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ)"
 	pick_id_script='
 		/^'$lid' /{
 			s/'\''/'\''\\'\'\''/g

That seems like crazy overkill, but it at least will let us double-check
that the tr sequences are the problem.

-Peff

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris?
  2012-10-17 10:36     ` Re[3]: " Ilya Basin
@ 2012-10-17 22:13       ` Jeff King
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff King @ 2012-10-17 22:13 UTC (permalink / raw)
  To: Ilya Basin; +Cc: Johannes Sixt, git

On Wed, Oct 17, 2012 at 02:36:23PM +0400, Ilya Basin wrote:

> The culprit is bad $PATH :
> When git-filter-branch runs, for some reason two new entries precede
> /usr/bin in it:
>      /tmp/777/.ilya-sparc/bin
>      /home/tester/.ilya/opt/SNiFF-3.2.1/bin
>      /export/home/testora/app/testora/product/11.2.0/client_32/bin
>     +/usr/xpg6/bin
>     +/usr/xpg4/bin
>      /usr/bin
>      /home/tester/apache-ant-1.7.1/bin
>      /usr/jdk/instances/jdk1.5.0//bin
> 
> And /usr/xpg6/bin/tr fails to make "AUTHOR" lowercase.

Hmph. Those are controlled by SANE_TOOL_PATH at git's build time, with
the intent that the xpg tools are less terrible than the ones in
/usr/bin on Solaris. But it sounds like that may not be the case. Yuck.

I don't have a Solaris box handy. Is there a way to make sequences like
A-Z work sanely with /usr/xpg6/bin/tr?

Do you have any LANG or locale settings? Sometimes those can affect
sequences. What does:

  echo AUTHOR | LANG=C LC_ALL=C /usr/xpg6/bin/tr '[A-Z]' '[a-z]'

do?

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris?
  2012-10-17 22:09     ` Jeff King
@ 2012-10-18  5:31       ` Johannes Sixt
  2012-10-18  5:36         ` Jeff King
  0 siblings, 1 reply; 24+ messages in thread
From: Johannes Sixt @ 2012-10-18  5:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Ilya Basin, git

Am 10/18/2012 0:09, schrieb Jeff King:
> -	lid="$(echo "$1" | tr "[A-Z]" "[a-z]")"
> -	uid="$(echo "$1" | tr "[a-z]" "[A-Z]")"
> +	lid="$(echo "$1" | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz)"
> +	uid="$(echo "$1" | tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ)"
> 
> That seems like crazy overkill, but it at least will let us double-check
> that the tr sequences are the problem.

Right. But we should really be doing something like this instead to save a
few subprocesses.

-- Hannes

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 178e453..018e56e 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -68,8 +68,8 @@ eval "$functions"
 # "author" or "committer

 set_ident () {
-	lid="$(echo "$1" | tr "[A-Z]" "[a-z]")"
-	uid="$(echo "$1" | tr "[a-z]" "[A-Z]")"
+	uid=$1
+	lid=$2
 	pick_id_script='
 		/^'$lid' /{
 			s/'\''/'\''\\'\'\''/g
@@ -320,9 +320,9 @@ while read commit parents; do
 	git cat-file commit "$commit" >../commit ||
 		die "Cannot read commit $commit"

-	eval "$(set_ident AUTHOR <../commit)" ||
+	eval "$(set_ident AUTHOR author <../commit)" ||
 		die "setting author failed for commit $commit"
-	eval "$(set_ident COMMITTER <../commit)" ||
+	eval "$(set_ident COMMITTER committer <../commit)" ||
 		die "setting committer failed for commit $commit"
 	eval "$filter_env" < /dev/null ||
 		die "env filter failed: $filter_env"

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris?
  2012-10-18  5:31       ` Johannes Sixt
@ 2012-10-18  5:36         ` Jeff King
  2012-10-18  6:06           ` Junio C Hamano
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff King @ 2012-10-18  5:36 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Ilya Basin, git

On Thu, Oct 18, 2012 at 07:31:35AM +0200, Johannes Sixt wrote:

> Right. But we should really be doing something like this instead to save a
> few subprocesses.
> [...]
> -	eval "$(set_ident AUTHOR <../commit)" ||
> +	eval "$(set_ident AUTHOR author <../commit)" ||

I cringe a little at losing DRY-ness to avoid processes. But the
repetition is pretty straightforward and obvious, and I know that some
platforms are really hurt by extra processes (and this is being called
for every commit).

Speaking of repetition, this seems like almost the exact same parsing
that happens in git-sh-setup's get_author_ident_from_commit. Maybe it's
worth merging them. I suspect you could also avoid another process
by parsing out both author and committer information in the same sed
invocation.

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris?
  2012-10-18  5:36         ` Jeff King
@ 2012-10-18  6:06           ` Junio C Hamano
  2012-10-18  6:08             ` Jeff King
  0 siblings, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2012-10-18  6:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Sixt, Ilya Basin, git

Jeff King <peff@peff.net> writes:

> On Thu, Oct 18, 2012 at 07:31:35AM +0200, Johannes Sixt wrote:
>
>> Right. But we should really be doing something like this instead to save a
>> few subprocesses.
>> [...]
>> -	eval "$(set_ident AUTHOR <../commit)" ||
>> +	eval "$(set_ident AUTHOR author <../commit)" ||
>
> I cringe a little at losing DRY-ness to avoid processes.

Well, the header field token "author" and the middle word of the
variable GIT_AUTHOR_NAME _happen_ to be the same modulo case, but
they did not have to be, so you could argue the updated set_ident
implementation is more generally useful (you could even argue that
we should spell the first parameter out as "GIT_AUTHOR_NAME" and
"GIT_AUTHOR_EMAIL", two separate parameters).

> Speaking of repetition, this seems like almost the exact same parsing
> that happens in git-sh-setup's get_author_ident_from_commit. Maybe it's
> worth merging them. I suspect you could also avoid another process
> by parsing out both author and committer information in the same sed
> invocation.

Yes, yes and yes.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris?
  2012-10-18  6:06           ` Junio C Hamano
@ 2012-10-18  6:08             ` Jeff King
  2012-10-18  7:22               ` [PATCH 0/2] clean up filter-branch ident parsing Jeff King
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff King @ 2012-10-18  6:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Ilya Basin, git

On Wed, Oct 17, 2012 at 11:06:11PM -0700, Junio C Hamano wrote:

> >> -	eval "$(set_ident AUTHOR <../commit)" ||
> >> +	eval "$(set_ident AUTHOR author <../commit)" ||
> >
> > I cringe a little at losing DRY-ness to avoid processes.
> 
> Well, the header field token "author" and the middle word of the
> variable GIT_AUTHOR_NAME _happen_ to be the same modulo case, but
> they did not have to be, so you could argue the updated set_ident
> implementation is more generally useful (you could even argue that
> we should spell the first parameter out as "GIT_AUTHOR_NAME" and
> "GIT_AUTHOR_EMAIL", two separate parameters).

True, though that is even more work for the caller (and *_DATE, too). We
could make it "GIT_AUTHOR", but I don't think there is much point in
being that level of half-way general. The caller can always pick it out
of the variables if they really want to do something tricky.

> > Speaking of repetition, this seems like almost the exact same parsing
> > that happens in git-sh-setup's get_author_ident_from_commit. Maybe it's
> > worth merging them. I suspect you could also avoid another process
> > by parsing out both author and committer information in the same sed
> > invocation.
> 
> Yes, yes and yes.

Working on it now. git-sh-setup works, but chasing an annoying bug in
filter-branch. I'm sure it's something silly and stupid.

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 0/2] clean up filter-branch ident parsing
  2012-10-18  6:08             ` Jeff King
@ 2012-10-18  7:22               ` Jeff King
  2012-10-18  7:25                 ` [PATCH 1/2] git-sh-setup: refactor ident-parsing functions Jeff King
  2012-10-18  7:25                 ` [PATCH 2/2] filter-branch: use git-sh-setup's ident parsing functions Jeff King
  0 siblings, 2 replies; 24+ messages in thread
From: Jeff King @ 2012-10-18  7:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Ilya Basin, git

On Thu, Oct 18, 2012 at 02:08:47AM -0400, Jeff King wrote:

> Working on it now. git-sh-setup works, but chasing an annoying bug in
> filter-branch. I'm sure it's something silly and stupid.

Ugh. I was being caught by crazy dash-versus-bash stuff.

Try this:

  $ bash -c 'echo "\\\\"'
  \\
  $ dash -c 'echo "\\\\"'
  \

It's the "echo will automatically do backslash escaping" magic we have
so often enjoyed. I solved it with printf.

Patches to follow.

My primary motivation was cleanup, but it also has a net reduction of 5
fork+execs for each commit that filter-branch processes. This dropped
the run-time of "git filter-branch HEAD -1000" on my linux box from 62s
to 47s. A real filter-branch would do more work in the filters, of
course, but that translates to 7.5 minutes of time saved if you are
filtering all 30,000 commits of git.git.

  [1/2]: git-sh-setup: refactor ident-parsing functions
  [2/2]: filter-branch: use git-sh-setup's ident parsing functions

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/2] git-sh-setup: refactor ident-parsing functions
  2012-10-18  7:22               ` [PATCH 0/2] clean up filter-branch ident parsing Jeff King
@ 2012-10-18  7:25                 ` Jeff King
  2012-11-12 17:44                   ` Junio C Hamano
  2012-10-18  7:25                 ` [PATCH 2/2] filter-branch: use git-sh-setup's ident parsing functions Jeff King
  1 sibling, 1 reply; 24+ messages in thread
From: Jeff King @ 2012-10-18  7:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Ilya Basin, git

The only ident-parsing function we currently provide is
get_author_ident_from_commit. This is not very
flexible for two reasons:

  1. It takes a commit as an argument, and can't read from
     commit headers saved on disk.

  2. It will only parse authors, not committers.

This patch provides a more flexible interface which will
parse multiple idents from a commit provide on stdin. We can
easily use it as a building block for the current function
to retain compatibility.

Signed-off-by: Jeff King <peff@peff.net>
---
Since we are counting processes in this series, I should note that this
actually adds a subshell invocation for each call, since it went from:

  script='...'
  sed $script

to:

  sed "$(make_script)"

For filter-branch, which is really the only high-performance caller we
have, this is negated by the fact that it will do author and committer
at the same time, saving us an extra subshell (in addition to an extra
sed invocation).

 git-sh-setup.sh | 62 +++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 43 insertions(+), 19 deletions(-)

diff --git a/git-sh-setup.sh b/git-sh-setup.sh
index ee0e0bc..22f0aed 100644
--- a/git-sh-setup.sh
+++ b/git-sh-setup.sh
@@ -191,28 +191,52 @@ get_author_ident_from_commit () {
 	fi
 }
 
+# Generate a sed script to parse identities from a commit.
+#
+# Reads the commit from stdin, which should be in raw format (e.g., from
+# cat-file or "--pretty=raw").
+#
+# The first argument specifies the ident line to parse (e.g., "author"), and
+# the second specifies the environment variable to put it in (e.g., "AUTHOR"
+# for "GIT_AUTHOR_*"). Multiple pairs can be given to parse author and
+# committer.
+pick_ident_script () {
+	while test $# -gt 0
+	do
+		lid=$1; shift
+		uid=$1; shift
+		printf '%s' "
+		/^$lid /{
+			s/'/'\\\\''/g
+			h
+			s/^$lid "'\([^<]*\) <[^>]*> .*$/\1/'"
+			s/.*/GIT_${uid}_NAME='&'/p
+
+			g
+			s/^$lid "'[^<]* <\([^>]*\)> .*$/\1/'"
+			s/.*/GIT_${uid}_EMAIL='&'/p
+
+			g
+			s/^$lid "'[^<]* <[^>]*> \(.*\)$/@\1/'"
+			s/.*/GIT_${uid}_DATE='&'/p
+		}
+		"
+	done
+	echo '/^$/q'
+}
+
+# Create a pick-script as above and feed it to sed. Stdout is suitable for
+# feeding to eval.
+parse_ident_from_commit () {
+	LANG=C LC_ALL=C sed -ne "$(pick_ident_script "$@")"
+}
+
+# Parse the author from a commit given as an argument. Stdout is suitable for
+# feeding to eval to set the usual GIT_* ident variables.
 get_author_ident_from_commit () {
-	pick_author_script='
-	/^author /{
-		s/'\''/'\''\\'\'\''/g
-		h
-		s/^author \([^<]*\) <[^>]*> .*$/\1/
-		s/.*/GIT_AUTHOR_NAME='\''&'\''/p
-
-		g
-		s/^author [^<]* <\([^>]*\)> .*$/\1/
-		s/.*/GIT_AUTHOR_EMAIL='\''&'\''/p
-
-		g
-		s/^author [^<]* <[^>]*> \(.*\)$/@\1/
-		s/.*/GIT_AUTHOR_DATE='\''&'\''/p
-
-		q
-	}
-	'
 	encoding=$(git config i18n.commitencoding || echo UTF-8)
 	git show -s --pretty=raw --encoding="$encoding" "$1" -- |
-	LANG=C LC_ALL=C sed -ne "$pick_author_script"
+	parse_ident_from_commit author AUTHOR
 }
 
 # Clear repo-local GIT_* environment variables. Useful when switching to
-- 
1.8.0.rc3.3.gba630e1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/2] filter-branch: use git-sh-setup's ident parsing functions
  2012-10-18  7:22               ` [PATCH 0/2] clean up filter-branch ident parsing Jeff King
  2012-10-18  7:25                 ` [PATCH 1/2] git-sh-setup: refactor ident-parsing functions Jeff King
@ 2012-10-18  7:25                 ` Jeff King
  2012-10-18  7:49                   ` Johannes Sixt
  1 sibling, 1 reply; 24+ messages in thread
From: Jeff King @ 2012-10-18  7:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Ilya Basin, git

This saves us some code, but it also reduces the number of
processes we start for each filtered commit. Since we can
parse both author and committer in the same sed invocation,
we save one process. And since the new interface avoids tr,
we save 4 processes.

It also avoids using "tr", which has had some odd
portability problems reported with from Solaris's xpg6
version.

Signed-off-by: Jeff King <peff@peff.net>
---
 git-filter-branch.sh | 42 +++++++++---------------------------------
 1 file changed, 9 insertions(+), 33 deletions(-)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 178e453..69406ae 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -64,37 +64,15 @@ set_ident () {
 
 eval "$functions"
 
-# When piped a commit, output a script to set the ident of either
-# "author" or "committer
+# Ensure non-empty id name.
+fallback_name() {
+	echo "case \"\$GIT_$1_NAME\" in \"\") GIT_$1_NAME=\"\${GIT_$1_EMAIL%%@*}\" && export GIT_$1_NAME;; esac"
+}
 
 set_ident () {
-	lid="$(echo "$1" | tr "[A-Z]" "[a-z]")"
-	uid="$(echo "$1" | tr "[a-z]" "[A-Z]")"
-	pick_id_script='
-		/^'$lid' /{
-			s/'\''/'\''\\'\'\''/g
-			h
-			s/^'$lid' \([^<]*\) <[^>]*> .*$/\1/
-			s/'\''/'\''\'\'\''/g
-			s/.*/GIT_'$uid'_NAME='\''&'\''; export GIT_'$uid'_NAME/p
-
-			g
-			s/^'$lid' [^<]* <\([^>]*\)> .*$/\1/
-			s/'\''/'\''\'\'\''/g
-			s/.*/GIT_'$uid'_EMAIL='\''&'\''; export GIT_'$uid'_EMAIL/p
-
-			g
-			s/^'$lid' [^<]* <[^>]*> \(.*\)$/@\1/
-			s/'\''/'\''\'\'\''/g
-			s/.*/GIT_'$uid'_DATE='\''&'\''; export GIT_'$uid'_DATE/p
-
-			q
-		}
-	'
-
-	LANG=C LC_ALL=C sed -ne "$pick_id_script"
-	# Ensure non-empty id name.
-	echo "case \"\$GIT_${uid}_NAME\" in \"\") GIT_${uid}_NAME=\"\${GIT_${uid}_EMAIL%%@*}\" && export GIT_${uid}_NAME;; esac"
+	parse_ident_from_commit author AUTHOR committer COMMITTER
+	fallback_name AUTHOR
+	fallback_name COMMITTER
 }
 
 USAGE="[--env-filter <command>] [--tree-filter <command>]
@@ -320,10 +298,8 @@ while read commit parents; do
 	git cat-file commit "$commit" >../commit ||
 		die "Cannot read commit $commit"
 
-	eval "$(set_ident AUTHOR <../commit)" ||
-		die "setting author failed for commit $commit"
-	eval "$(set_ident COMMITTER <../commit)" ||
-		die "setting committer failed for commit $commit"
+	eval "$(set_ident <../commit)" ||
+		die "setting author/committer failed for commit $commit"
 	eval "$filter_env" < /dev/null ||
 		die "env filter failed: $filter_env"
 
-- 
1.8.0.rc3.3.gba630e1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] filter-branch: use git-sh-setup's ident parsing functions
  2012-10-18  7:25                 ` [PATCH 2/2] filter-branch: use git-sh-setup's ident parsing functions Jeff King
@ 2012-10-18  7:49                   ` Johannes Sixt
  2012-10-18  7:54                     ` Jeff King
  0 siblings, 1 reply; 24+ messages in thread
From: Johannes Sixt @ 2012-10-18  7:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Ilya Basin, git

Am 10/18/2012 9:25, schrieb Jeff King:
> -# When piped a commit, output a script to set the ident of either
> -# "author" or "committer
> +# Ensure non-empty id name.
> +fallback_name() {
> +	echo "case \"\$GIT_$1_NAME\" in \"\") GIT_$1_NAME=\"\${GIT_$1_EMAIL%%@*}\" && export GIT_$1_NAME;; esac"
> +}
>  
>  set_ident () {
> -	lid="$(echo "$1" | tr "[A-Z]" "[a-z]")"
> -	uid="$(echo "$1" | tr "[a-z]" "[A-Z]")"
> -	pick_id_script='
> -		/^'$lid' /{
> -			s/'\''/'\''\\'\'\''/g
> -			h
> -			s/^'$lid' \([^<]*\) <[^>]*> .*$/\1/
> -			s/'\''/'\''\'\'\''/g
> -			s/.*/GIT_'$uid'_NAME='\''&'\''; export GIT_'$uid'_NAME/p
> -
> -			g
> -			s/^'$lid' [^<]* <\([^>]*\)> .*$/\1/
> -			s/'\''/'\''\'\'\''/g
> -			s/.*/GIT_'$uid'_EMAIL='\''&'\''; export GIT_'$uid'_EMAIL/p
> -
> -			g
> -			s/^'$lid' [^<]* <[^>]*> \(.*\)$/@\1/
> -			s/'\''/'\''\'\'\''/g
> -			s/.*/GIT_'$uid'_DATE='\''&'\''; export GIT_'$uid'_DATE/p
> -
> -			q
> -		}
> -	'
> -
> -	LANG=C LC_ALL=C sed -ne "$pick_id_script"
> -	# Ensure non-empty id name.
> -	echo "case \"\$GIT_${uid}_NAME\" in \"\") GIT_${uid}_NAME=\"\${GIT_${uid}_EMAIL%%@*}\" && export GIT_${uid}_NAME;; esac"
> +	parse_ident_from_commit author AUTHOR committer COMMITTER
> +	fallback_name AUTHOR
> +	fallback_name COMMITTER
>  }

Didn't you lose the export GIT_$uid_{NAME,EMAIL,DATE} parts somewhere on
the way?

Thanks for working on this!

-- Hannes

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] filter-branch: use git-sh-setup's ident parsing functions
  2012-10-18  7:49                   ` Johannes Sixt
@ 2012-10-18  7:54                     ` Jeff King
  2012-10-18 10:22                       ` Jeff King
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff King @ 2012-10-18  7:54 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Ilya Basin, git

On Thu, Oct 18, 2012 at 09:49:04AM +0200, Johannes Sixt wrote:

> > -			s/.*/GIT_'$uid'_NAME='\''&'\''; export GIT_'$uid'_NAME/p
> 
> Didn't you lose the export GIT_$uid_{NAME,EMAIL,DATE} parts somewhere on
> the way?

Yikes, you're right. I didn't even notice, as the test suite still
passes. I can see how the env filter would still be able to see the
variables, but the commit-tree call wouldn't. I guess it happens to work
because we do not test alternate idents in our filter branch tests (IOW,
we are silently rewriting each commit during the filter-branch, but it
happens to have the same identities).

I'll investigate.

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] filter-branch: use git-sh-setup's ident parsing functions
  2012-10-18  7:54                     ` Jeff King
@ 2012-10-18 10:22                       ` Jeff King
  2012-10-18 10:26                         ` Jeff King
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff King @ 2012-10-18 10:22 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Ilya Basin, git

On Thu, Oct 18, 2012 at 03:54:29AM -0400, Jeff King wrote:

> On Thu, Oct 18, 2012 at 09:49:04AM +0200, Johannes Sixt wrote:
> 
> > > -			s/.*/GIT_'$uid'_NAME='\''&'\''; export GIT_'$uid'_NAME/p
> > 
> > Didn't you lose the export GIT_$uid_{NAME,EMAIL,DATE} parts somewhere on
> > the way?
> 
> Yikes, you're right. I didn't even notice, as the test suite still
> passes. I can see how the env filter would still be able to see the
> variables, but the commit-tree call wouldn't. I guess it happens to work
> because we do not test alternate idents in our filter branch tests (IOW,
> we are silently rewriting each commit during the filter-branch, but it
> happens to have the same identities).

Hrm. We _do_ test this in t7003. Weirder, if I instrument filter-branch
like this:

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 69406ae..1b504ce 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -298,8 +298,13 @@ while read commit parents; do
 	git cat-file commit "$commit" >../commit ||
 		die "Cannot read commit $commit"
 
+	echo >&2 pre: $GIT_AUTHOR_NAME
+	sh -c 'echo >&2 pre, subshell: $GIT_AUTHOR_NAME'
+
 	eval "$(set_ident <../commit)" ||
 		die "setting author/committer failed for commit $commit"
+	echo >&2 post: $GIT_AUTHOR_NAME
+	sh -c 'echo >&2 post, subshell: $GIT_AUTHOR_NAME'
 	eval "$filter_env" < /dev/null ||
 		die "env filter failed: $filter_env"
 
diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh
index 4d13e10..ce57fc5 100755
--- a/t/t7003-filter-branch.sh
+++ b/t/t7003-filter-branch.sh
@@ -174,6 +174,8 @@ test_expect_success 'author information is preserved' '
 	test 1 = $(git rev-list --author="B V Uips" preserved-author | wc -l)
 '
 
+test_done
+
 test_expect_success "remove a certain author's commits" '
 	echo i > i &&
 	test_tick &&

and run t7003, it shows that the variable is properly exported to the
sub-process! But I can't seem to figure out why. Confused...

-Peff

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] filter-branch: use git-sh-setup's ident parsing functions
  2012-10-18 10:22                       ` Jeff King
@ 2012-10-18 10:26                         ` Jeff King
  2012-10-18 10:33                           ` [PATCHv2 " Jeff King
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff King @ 2012-10-18 10:26 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Ilya Basin, git

On Thu, Oct 18, 2012 at 06:22:17AM -0400, Jeff King wrote:

> Hrm. We _do_ test this in t7003. Weirder, if I instrument filter-branch
> like this:
> [...]
> and run t7003, it shows that the variable is properly exported to the
> sub-process! But I can't seem to figure out why. Confused...

Oh, I see. The variables are already exported by test-lib.sh. You can
see the breakage with:

diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh
index 4d13e10..1e7a209 100755
--- a/t/t7003-filter-branch.sh
+++ b/t/t7003-filter-branch.sh
@@ -167,10 +167,11 @@ test_expect_success 'author information is preserved' '
 	test_tick &&
 	GIT_AUTHOR_NAME="B V Uips" git commit -m bvuips &&
 	git branch preserved-author &&
-	git filter-branch -f --msg-filter "cat; \
+	(sane_unset GIT_AUTHOR_NAME &&
+	 git filter-branch -f --msg-filter "cat; \
 			test \$GIT_COMMIT != $(git rev-parse master) || \
 			echo Hallo" \
-		preserved-author &&
+		preserved-author) &&
 	test 1 = $(git rev-list --author="B V Uips" preserved-author | wc -l)
 '
 

-Peff

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCHv2 2/2] filter-branch: use git-sh-setup's ident parsing functions
  2012-10-18 10:26                         ` Jeff King
@ 2012-10-18 10:33                           ` Jeff King
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff King @ 2012-10-18 10:33 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Ilya Basin, git

This saves us some code, but it also reduces the number of
processes we start for each filtered commit. Since we can
parse both author and committer in the same sed invocation,
we save one process. And since the new interface avoids tr,
we save 4 processes.

It also avoids using "tr", which has had some odd
portability problems reported with from Solaris's xpg6
version.

We also tweak one of the tests in t7003 to double-check that
we are properly exporting the variables (because test-lib.sh
exports GIT_AUTHOR_NAME, it will be automatically exported
in subprograms. We override this to make sure that
filter-branch handles it properly itself).

Signed-off-by: Jeff King <peff@peff.net>
---
This fixes the missing exports from v1. There's no changes needed to
patch 1.

 git-filter-branch.sh     | 46 +++++++++++++---------------------------------
 t/t7003-filter-branch.sh |  5 +++--
 2 files changed, 16 insertions(+), 35 deletions(-)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 178e453..5314249 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -64,37 +64,19 @@ set_ident () {
 
 eval "$functions"
 
-# When piped a commit, output a script to set the ident of either
-# "author" or "committer
+finish_ident() {
+	# Ensure non-empty id name.
+	echo "case \"\$GIT_$1_NAME\" in \"\") GIT_$1_NAME=\"\${GIT_$1_EMAIL%%@*}\" && export GIT_$1_NAME;; esac"
+	# And make sure everything is exported.
+	echo "export GIT_$1_NAME"
+	echo "export GIT_$1_EMAIL"
+	echo "export GIT_$1_DATE"
+}
 
 set_ident () {
-	lid="$(echo "$1" | tr "[A-Z]" "[a-z]")"
-	uid="$(echo "$1" | tr "[a-z]" "[A-Z]")"
-	pick_id_script='
-		/^'$lid' /{
-			s/'\''/'\''\\'\'\''/g
-			h
-			s/^'$lid' \([^<]*\) <[^>]*> .*$/\1/
-			s/'\''/'\''\'\'\''/g
-			s/.*/GIT_'$uid'_NAME='\''&'\''; export GIT_'$uid'_NAME/p
-
-			g
-			s/^'$lid' [^<]* <\([^>]*\)> .*$/\1/
-			s/'\''/'\''\'\'\''/g
-			s/.*/GIT_'$uid'_EMAIL='\''&'\''; export GIT_'$uid'_EMAIL/p
-
-			g
-			s/^'$lid' [^<]* <[^>]*> \(.*\)$/@\1/
-			s/'\''/'\''\'\'\''/g
-			s/.*/GIT_'$uid'_DATE='\''&'\''; export GIT_'$uid'_DATE/p
-
-			q
-		}
-	'
-
-	LANG=C LC_ALL=C sed -ne "$pick_id_script"
-	# Ensure non-empty id name.
-	echo "case \"\$GIT_${uid}_NAME\" in \"\") GIT_${uid}_NAME=\"\${GIT_${uid}_EMAIL%%@*}\" && export GIT_${uid}_NAME;; esac"
+	parse_ident_from_commit author AUTHOR committer COMMITTER
+	finish_ident AUTHOR
+	finish_ident COMMITTER
 }
 
 USAGE="[--env-filter <command>] [--tree-filter <command>]
@@ -320,10 +302,8 @@ while read commit parents; do
 	git cat-file commit "$commit" >../commit ||
 		die "Cannot read commit $commit"
 
-	eval "$(set_ident AUTHOR <../commit)" ||
-		die "setting author failed for commit $commit"
-	eval "$(set_ident COMMITTER <../commit)" ||
-		die "setting committer failed for commit $commit"
+	eval "$(set_ident <../commit)" ||
+		die "setting author/committer failed for commit $commit"
 	eval "$filter_env" < /dev/null ||
 		die "env filter failed: $filter_env"
 
diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh
index 4d13e10..1e7a209 100755
--- a/t/t7003-filter-branch.sh
+++ b/t/t7003-filter-branch.sh
@@ -167,10 +167,11 @@ test_expect_success 'author information is preserved' '
 	test_tick &&
 	GIT_AUTHOR_NAME="B V Uips" git commit -m bvuips &&
 	git branch preserved-author &&
-	git filter-branch -f --msg-filter "cat; \
+	(sane_unset GIT_AUTHOR_NAME &&
+	 git filter-branch -f --msg-filter "cat; \
 			test \$GIT_COMMIT != $(git rev-parse master) || \
 			echo Hallo" \
-		preserved-author &&
+		preserved-author) &&
 	test 1 = $(git rev-list --author="B V Uips" preserved-author | wc -l)
 '
 
-- 
1.8.0.rc3.3.gba630e1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] git-sh-setup: refactor ident-parsing functions
  2012-10-18  7:25                 ` [PATCH 1/2] git-sh-setup: refactor ident-parsing functions Jeff King
@ 2012-11-12 17:44                   ` Junio C Hamano
  2012-11-12 19:44                     ` Jeff King
  0 siblings, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2012-11-12 17:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Sixt, Ilya Basin, git

Jeff King <peff@peff.net> writes:

> The only ident-parsing function we currently provide is
> get_author_ident_from_commit. This is not very
> flexible for two reasons:
>
>   1. It takes a commit as an argument, and can't read from
>      commit headers saved on disk.
>
>   2. It will only parse authors, not committers.
>
> This patch provides a more flexible interface which will
> parse multiple idents from a commit provide on stdin. We can
> easily use it as a building block for the current function
> to retain compatibility.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Since we are counting processes in this series, I should note that this
> actually adds a subshell invocation for each call, since it went from:
>
>   script='...'
>   sed $script
>
> to:
>
>   sed "$(make_script)"
>
> For filter-branch, which is really the only high-performance caller we
> have, this is negated by the fact that it will do author and committer
> at the same time, saving us an extra subshell (in addition to an extra
> sed invocation).

Given that pick-ident-script is a const function, a caller that
repeatedly call is could call it once and use it in a variable, no?

>
>  git-sh-setup.sh | 62 +++++++++++++++++++++++++++++++++++++++------------------
>  1 file changed, 43 insertions(+), 19 deletions(-)
>
> diff --git a/git-sh-setup.sh b/git-sh-setup.sh
> index ee0e0bc..22f0aed 100644
> --- a/git-sh-setup.sh
> +++ b/git-sh-setup.sh
> @@ -191,28 +191,52 @@ get_author_ident_from_commit () {
>  	fi
>  }
>  
> +# Generate a sed script to parse identities from a commit.
> +#
> +# Reads the commit from stdin, which should be in raw format (e.g., from
> +# cat-file or "--pretty=raw").
> +#
> +# The first argument specifies the ident line to parse (e.g., "author"), and
> +# the second specifies the environment variable to put it in (e.g., "AUTHOR"
> +# for "GIT_AUTHOR_*"). Multiple pairs can be given to parse author and
> +# committer.
> +pick_ident_script () {
> +	while test $# -gt 0
> +	do
> +		lid=$1; shift
> +		uid=$1; shift
> +		printf '%s' "
> +		/^$lid /{
> +			s/'/'\\\\''/g
> +			h
> +			s/^$lid "'\([^<]*\) <[^>]*> .*$/\1/'"
> +			s/.*/GIT_${uid}_NAME='&'/p
> +
> +			g
> +			s/^$lid "'[^<]* <\([^>]*\)> .*$/\1/'"
> +			s/.*/GIT_${uid}_EMAIL='&'/p
> +
> +			g
> +			s/^$lid "'[^<]* <[^>]*> \(.*\)$/@\1/'"
> +			s/.*/GIT_${uid}_DATE='&'/p
> +		}
> +		"
> +	done
> +	echo '/^$/q'
> +}
> +
> +# Create a pick-script as above and feed it to sed. Stdout is suitable for
> +# feeding to eval.
> +parse_ident_from_commit () {
> +	LANG=C LC_ALL=C sed -ne "$(pick_ident_script "$@")"
> +}
> +
> +# Parse the author from a commit given as an argument. Stdout is suitable for
> +# feeding to eval to set the usual GIT_* ident variables.
>  get_author_ident_from_commit () {
> -	pick_author_script='
> -	/^author /{
> -		s/'\''/'\''\\'\'\''/g
> -		h
> -		s/^author \([^<]*\) <[^>]*> .*$/\1/
> -		s/.*/GIT_AUTHOR_NAME='\''&'\''/p
> -
> -		g
> -		s/^author [^<]* <\([^>]*\)> .*$/\1/
> -		s/.*/GIT_AUTHOR_EMAIL='\''&'\''/p
> -
> -		g
> -		s/^author [^<]* <[^>]*> \(.*\)$/@\1/
> -		s/.*/GIT_AUTHOR_DATE='\''&'\''/p
> -
> -		q
> -	}
> -	'
>  	encoding=$(git config i18n.commitencoding || echo UTF-8)
>  	git show -s --pretty=raw --encoding="$encoding" "$1" -- |
> -	LANG=C LC_ALL=C sed -ne "$pick_author_script"
> +	parse_ident_from_commit author AUTHOR
>  }
>  
>  # Clear repo-local GIT_* environment variables. Useful when switching to

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] git-sh-setup: refactor ident-parsing functions
  2012-11-12 17:44                   ` Junio C Hamano
@ 2012-11-12 19:44                     ` Jeff King
  2012-11-12 20:08                       ` Junio C Hamano
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff King @ 2012-11-12 19:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Ilya Basin, git

On Mon, Nov 12, 2012 at 09:44:01AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The only ident-parsing function we currently provide is
> > get_author_ident_from_commit. This is not very
> > flexible for two reasons:
> >
> >   1. It takes a commit as an argument, and can't read from
> >      commit headers saved on disk.
> >
> >   2. It will only parse authors, not committers.
> >
> > This patch provides a more flexible interface which will
> > parse multiple idents from a commit provide on stdin. We can
> > easily use it as a building block for the current function
> > to retain compatibility.
> >
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> > Since we are counting processes in this series, I should note that this
> > actually adds a subshell invocation for each call, since it went from:
> >
> >   script='...'
> >   sed $script
> >
> > to:
> >
> >   sed "$(make_script)"
> >
> > For filter-branch, which is really the only high-performance caller we
> > have, this is negated by the fact that it will do author and committer
> > at the same time, saving us an extra subshell (in addition to an extra
> > sed invocation).
> 
> Given that pick-ident-script is a const function, a caller that
> repeatedly call is could call it once and use it in a variable, no?

The problem is that it is a helper called from parse_ident_from_commit.
And that function just passes along its arguments, so it does not know
that it is being called repeatedly with the same arguments. So you'd
have to either change the interface or memoize internally.

I don't think memoization is a good option for two reasons:

  1. Storing the arguments to compare to later is complex. You don't
     want to just store "$*" from the last run and see if we got the
     same arguments. You'd have to quote your delimiter (e.g., you would
     not want to confuse ("foo", "bar") with ("foo bar"). Though in this
     instance, we know that our args do not have spaces, so we could get
     away with that.

  2. If you are in a subshell or even a while loop, your memoized
     variable will not be retained.

So unless somebody has some clever scheme for memoizing shell functions
without any process overhead, it is probably not worth it.

Changing the interface for get_author_ident_from_commit would be a pain,
but if we just wanted to help filter-branch, we could do something like
this:

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 5314249..7a693ba 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -74,7 +74,7 @@ finish_ident() {
 }
 
 set_ident () {
-	parse_ident_from_commit author AUTHOR committer COMMITTER
+	parse_ident_from_commit_via_script "$ident_script"
 	finish_ident AUTHOR
 	finish_ident COMMITTER
 }
@@ -93,6 +93,7 @@ if [ "$(is_bare_repository)" = false ]; then
 	require_clean_work_tree 'rewrite branches'
 fi
 
+ident_script=$(pick_ident_script author AUTHOR committer COMMITTER)
 tempdir=.git-rewrite
 filter_env=
 filter_tree=
diff --git a/git-sh-setup.sh b/git-sh-setup.sh
index 22f0aed..1e20e17 100644
--- a/git-sh-setup.sh
+++ b/git-sh-setup.sh
@@ -225,10 +225,17 @@ pick_ident_script () {
 	echo '/^$/q'
 }
 
+# Feed a pick_ident_script return value to sed. Use this instead of
+# parse_ident_from_commit below if you are going to be parsing commits in a
+# tight loop and want to save a process.
+parse_ident_from_commit_via_script() {
+	LANG=C LC_ALL=C sed -ne "$1"
+}
+
 # Create a pick-script as above and feed it to sed. Stdout is suitable for
 # feeding to eval.
 parse_ident_from_commit () {
-	LANG=C LC_ALL=C sed -ne "$(pick_ident_script "$@")"
+	parse_ident_from_commit_via_script "$(pick_ident_script "$@")"
 }
 
 # Parse the author from a commit given as an argument. Stdout is suitable for

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] git-sh-setup: refactor ident-parsing functions
  2012-11-12 19:44                     ` Jeff King
@ 2012-11-12 20:08                       ` Junio C Hamano
  2012-11-12 20:12                         ` Jeff King
  0 siblings, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2012-11-12 20:08 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Sixt, Ilya Basin, git

Jeff King <peff@peff.net> writes:

> Changing the interface for get_author_ident_from_commit would be a pain,
> but if we just wanted to help filter-branch, we could do something like
> this:

Yes, that is the direction I was alluding to.

Callers of get_author_ident_from_commit can also do the same and
avoid rebuilding the same $pick_author_script over and over again,
or get_author_ident_from_commit can do so for its callers.

> diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> index 5314249..7a693ba 100755
> --- a/git-filter-branch.sh
> +++ b/git-filter-branch.sh
> @@ -74,7 +74,7 @@ finish_ident() {
>  }
>  
>  set_ident () {
> -	parse_ident_from_commit author AUTHOR committer COMMITTER
> +	parse_ident_from_commit_via_script "$ident_script"
>  	finish_ident AUTHOR
>  	finish_ident COMMITTER
>  }
> @@ -93,6 +93,7 @@ if [ "$(is_bare_repository)" = false ]; then
>  	require_clean_work_tree 'rewrite branches'
>  fi
>  
> +ident_script=$(pick_ident_script author AUTHOR committer COMMITTER)
>  tempdir=.git-rewrite
>  filter_env=
>  filter_tree=
> diff --git a/git-sh-setup.sh b/git-sh-setup.sh
> index 22f0aed..1e20e17 100644
> --- a/git-sh-setup.sh
> +++ b/git-sh-setup.sh
> @@ -225,10 +225,17 @@ pick_ident_script () {
>  	echo '/^$/q'
>  }
>  
> +# Feed a pick_ident_script return value to sed. Use this instead of
> +# parse_ident_from_commit below if you are going to be parsing commits in a
> +# tight loop and want to save a process.
> +parse_ident_from_commit_via_script() {
> +	LANG=C LC_ALL=C sed -ne "$1"
> +}
> +
>  # Create a pick-script as above and feed it to sed. Stdout is suitable for
>  # feeding to eval.
>  parse_ident_from_commit () {
> -	LANG=C LC_ALL=C sed -ne "$(pick_ident_script "$@")"
> +	parse_ident_from_commit_via_script "$(pick_ident_script "$@")"
>  }
>  
>  # Parse the author from a commit given as an argument. Stdout is suitable for

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] git-sh-setup: refactor ident-parsing functions
  2012-11-12 20:08                       ` Junio C Hamano
@ 2012-11-12 20:12                         ` Jeff King
  2012-11-12 20:32                           ` Junio C Hamano
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff King @ 2012-11-12 20:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Ilya Basin, git

On Mon, Nov 12, 2012 at 12:08:37PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Changing the interface for get_author_ident_from_commit would be a pain,
> > but if we just wanted to help filter-branch, we could do something like
> > this:
> 
> Yes, that is the direction I was alluding to.
> 
> Callers of get_author_ident_from_commit can also do the same and
> avoid rebuilding the same $pick_author_script over and over again,
> or get_author_ident_from_commit can do so for its callers.

I don't think get_author_ident_from_commit can do so on demand due to
the subshell issue I mentioned. So you'd have to generate the pick
script on inclusion of git-sh-setup, whether the includer wants to call
the function or not.

I wonder if we should simply generate these at build time and store them
in the script.

I'm also not sure that saving one process invocation is worth spending a
lot of thought cycles on. Maybe somebody on Windows could run a
filter-branch with the patch I just sent and measure whether it even
makes a difference.

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] git-sh-setup: refactor ident-parsing functions
  2012-11-12 20:12                         ` Jeff King
@ 2012-11-12 20:32                           ` Junio C Hamano
  0 siblings, 0 replies; 24+ messages in thread
From: Junio C Hamano @ 2012-11-12 20:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Sixt, Ilya Basin, git

Jeff King <peff@peff.net> writes:

> On Mon, Nov 12, 2012 at 12:08:37PM -0800, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>> 
>> > Changing the interface for get_author_ident_from_commit would be a pain,
>> > but if we just wanted to help filter-branch, we could do something like
>> > this:
>> 
>> Yes, that is the direction I was alluding to.
>> 
>> Callers of get_author_ident_from_commit can also do the same and
>> avoid rebuilding the same $pick_author_script over and over again,
>> or get_author_ident_from_commit can do so for its callers.
>
> I don't think get_author_ident_from_commit can do so on demand due to
> the subshell issue I mentioned.

OK, that is what I obviously missed.

> I wonder if we should simply generate these at build time and store them
> in the script.

Yeah, my knee-jerk reaction was that it was wasteful to run script
at runtime to build a constant string, but the engineering cost may
not be worth it.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2012-11-12 20:32 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-17  6:47 What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris? Ilya Basin
2012-10-17  7:18 ` Jeff King
2012-10-17  7:23 ` Johannes Sixt
2012-10-17  8:58   ` Re[2]: " Ilya Basin
2012-10-17 10:36     ` Re[3]: " Ilya Basin
2012-10-17 22:13       ` Jeff King
2012-10-17 22:09     ` Jeff King
2012-10-18  5:31       ` Johannes Sixt
2012-10-18  5:36         ` Jeff King
2012-10-18  6:06           ` Junio C Hamano
2012-10-18  6:08             ` Jeff King
2012-10-18  7:22               ` [PATCH 0/2] clean up filter-branch ident parsing Jeff King
2012-10-18  7:25                 ` [PATCH 1/2] git-sh-setup: refactor ident-parsing functions Jeff King
2012-11-12 17:44                   ` Junio C Hamano
2012-11-12 19:44                     ` Jeff King
2012-11-12 20:08                       ` Junio C Hamano
2012-11-12 20:12                         ` Jeff King
2012-11-12 20:32                           ` Junio C Hamano
2012-10-18  7:25                 ` [PATCH 2/2] filter-branch: use git-sh-setup's ident parsing functions Jeff King
2012-10-18  7:49                   ` Johannes Sixt
2012-10-18  7:54                     ` Jeff King
2012-10-18 10:22                       ` Jeff King
2012-10-18 10:26                         ` Jeff King
2012-10-18 10:33                           ` [PATCHv2 " Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).