* What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris? @ 2012-10-17 6:47 Ilya Basin 2012-10-17 7:18 ` Jeff King 2012-10-17 7:23 ` Johannes Sixt 0 siblings, 2 replies; 24+ messages in thread From: Ilya Basin @ 2012-10-17 6:47 UTC (permalink / raw) To: git The filter-branch command, the contents of ~/.gitconfig and the tree are the same. The command succeeds on cygwin, but fails on Solaris due to unset GIT_AUTHOR_NAME and GIT_COMMITTER_NAME : $ git filter-branch --tree-filter "env | grep GIT_ ; $CMD" b416b9bfc5e71531f2f05af4c396bb0ba7560741..HEAD Rewrite 214efc6eec82b015aefe23b2280979f05b351396 (1/16)GIT_DIR=/home/tester/.ilya/builds/makepkg.rap_0.1-1_sparc.XXXXXX/src/rap/.git GIT_INDEX_FILE=/home/tester/.ilya/builds/makepkg.rap_0.1-1_sparc.XXXXXX/src/rap/.git-rewrite/t/../index GIT_WORK_TREE=. GIT_AUTHOR_NAME= GIT_COMMITTER_NAME= GIT_COMMIT=214efc6eec82b015aefe23b2280979f05b351396 fatal: empty ident <my@email.com> not allowed could not write rewritten commit If I explicitly set these 2 variables, filter-branch succeeds, but other commit attributes like commit date aren't preserved. I use git 1.7.6, from sunfreeware. I hope there is some simple thing that needs to be configured. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris? 2012-10-17 6:47 What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris? Ilya Basin @ 2012-10-17 7:18 ` Jeff King 2012-10-17 7:23 ` Johannes Sixt 1 sibling, 0 replies; 24+ messages in thread From: Jeff King @ 2012-10-17 7:18 UTC (permalink / raw) To: Ilya Basin; +Cc: git On Wed, Oct 17, 2012 at 10:47:29AM +0400, Ilya Basin wrote: > The filter-branch command, the contents of ~/.gitconfig and the tree > are the same. > The command succeeds on cygwin, but fails on Solaris due to > unset GIT_AUTHOR_NAME and GIT_COMMITTER_NAME : That shouldn't happen. The likely culprit is that the sed magic in the set_ident function of git-filter-branch is not portable to your version of sed. What happens if you run this: echo 'author Your Name <you@example.com> 1350408529 -0400' >commit set -- author lid="$(echo "$1" | tr "[A-Z]" "[a-z]")" uid="$(echo "$1" | tr "[a-z]" "[A-Z]")" pick_id_script=' /^'$lid' /{ s/'\''/'\''\\'\'\''/g h s/^'$lid' \([^<]*\) <[^>]*> .*$/\1/ s/'\''/'\''\'\'\''/g s/.*/GIT_'$uid'_NAME='\''&'\''; export GIT_'$uid'_NAME/p g s/^'$lid' [^<]* <\([^>]*\)> .*$/\1/ s/'\''/'\''\'\'\''/g s/.*/GIT_'$uid'_EMAIL='\''&'\''; export GIT_'$uid'_EMAIL/p g s/^'$lid' [^<]* <[^>]*> \(.*\)$/@\1/ s/'\''/'\''\'\'\''/g s/.*/GIT_'$uid'_DATE='\''&'\''; export GIT_'$uid'_DATE/p q } ' LANG=C LC_ALL=C sed -ne "$pick_id_script" <commit in your shell? You should get: GIT_AUTHOR_NAME='Your Name'; export GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL='you@example.com'; export GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE='@1350408529 -0400'; export GIT_AUTHOR_DATE > I use git 1.7.6, from sunfreeware. It might also be worth testing v1.7.12, but reading the logs, I don't think there has been any meaningful update to filter-branch since then. -Peff ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris? 2012-10-17 6:47 What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris? Ilya Basin 2012-10-17 7:18 ` Jeff King @ 2012-10-17 7:23 ` Johannes Sixt 2012-10-17 8:58 ` Re[2]: " Ilya Basin 1 sibling, 1 reply; 24+ messages in thread From: Johannes Sixt @ 2012-10-17 7:23 UTC (permalink / raw) To: Ilya Basin; +Cc: git Am 10/17/2012 8:47, schrieb Ilya Basin: > The filter-branch command, the contents of ~/.gitconfig and the tree > are the same. > The command succeeds on cygwin, but fails on Solaris due to > unset GIT_AUTHOR_NAME and GIT_COMMITTER_NAME : > > $ git filter-branch --tree-filter "env | grep GIT_ ; $CMD" b416b9bfc5e71531f2f05af4c396bb0ba7560741..HEAD > Rewrite 214efc6eec82b015aefe23b2280979f05b351396 (1/16)GIT_DIR=/home/tester/.ilya/builds/makepkg.rap_0.1-1_sparc.XXXXXX/src/rap/.git > GIT_INDEX_FILE=/home/tester/.ilya/builds/makepkg.rap_0.1-1_sparc.XXXXXX/src/rap/.git-rewrite/t/../index > GIT_WORK_TREE=. > GIT_AUTHOR_NAME= > GIT_COMMITTER_NAME= > GIT_COMMIT=214efc6eec82b015aefe23b2280979f05b351396 > fatal: empty ident <my@email.com> not allowed > could not write rewritten commit Most likely, your sed has problems with a sed script in function get_author_ident_from_commit. I tested it like this: $ sh -c '. $(git --exec-path)/git-sh-setup; get_author_ident_from_commit HEAD' GIT_AUTHOR_NAME='Johannes Sixt' GIT_AUTHOR_EMAIL='j6t@kdbg.org' GIT_AUTHOR_DATE='@1350025129 +0200' -- Hannes ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re[2]: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris? 2012-10-17 7:23 ` Johannes Sixt @ 2012-10-17 8:58 ` Ilya Basin 2012-10-17 10:36 ` Re[3]: " Ilya Basin 2012-10-17 22:09 ` Jeff King 0 siblings, 2 replies; 24+ messages in thread From: Ilya Basin @ 2012-10-17 8:58 UTC (permalink / raw) To: Johannes Sixt, Jeff King; +Cc: git [-- Attachment #1: Type: text/plain, Size: 554 bytes --] JS> Most likely, your sed has problems with a sed script in function JS> get_author_ident_from_commit. I tested it like this: JS> $ sh -c '. $(git --exec-path)/git-sh-setup; JS> get_author_ident_from_commit HEAD' JS> GIT_AUTHOR_NAME='Johannes Sixt' JS> GIT_AUTHOR_EMAIL='j6t@kdbg.org' JS> GIT_AUTHOR_DATE='@1350025129 +0200' JS> -- Hannes Both systems have GNU sed 4.2.1 installed. I wrote a wrapper script wor sed. It's output attached. The difference is letter case in sed input data: Solaris: /^AUTHOR / Windows: /^author / -- [-- Attachment #2: git-sed-good.txt --] [-- Type: text/plain, Size: 4048 bytes --] $ git filter-branch -f --tree-filter "env | grep GIT_; true" HEAD~1..HEAD SED BEGIN SED ARGUMENTS: -e s/-/ / SED STDIN BEGIN git-filter-branch SED STDIN END SED OUTPUT BEGIN git filter-branch SED OUTPUT END SED EXIT CODE: 0 SED END SED BEGIN SED ARGUMENTS: -e /^^/d /cygdrive/c/sicap/rap/gitcvs/RAP27/.git-rewrite/raw-heads SED INPUT FILE BEGIN: /cygdrive/c/sicap/rap/gitcvs/RAP27/.git-rewrite/raw-heads refs/heads/master SED INPUT FILE END: /cygdrive/c/sicap/rap/gitcvs/RAP27/.git-rewrite/raw-heads SED OUTPUT BEGIN refs/heads/master SED OUTPUT END SED EXIT CODE: 0 SED END Rewrite acd1d2bb1984c96630d5070497590307151c4682 (1/1) SED BEGIN SED ARGUMENTS: -ne /^author /{ s/'/'\\''/g h s/^author \([^<]*\) <[^>]*> .*$/\1/ s/'/'\''/g s/.*/GIT_AUTHOR_NAME='&'; export GIT_AUTHOR_NAME/p g s/^author [^<]* <\([^>]*\)> .*$/\1/ s/'/'\''/g s/.*/GIT_AUTHOR_EMAIL='&'; export GIT_AUTHOR_EMAIL/p g s/^author [^<]* <[^>]*> \(.*\)$/\1/ s/'/'\''/g s/.*/GIT_AUTHOR_DATE='&'; export GIT_AUTHOR_DATE/p q } SED STDIN BEGIN tree 969f563d319049bb6dabc12054d67671499a6f55 parent c4734950e37c09ca7d3e3088f6f31d866dbb5077 author Ilya Basin <basinilya@gmail.com> 1350401059 +0400 committer Ilya Basin <basinilya@gmail.com> 1350405585 +0400 temp SED STDIN END SED OUTPUT BEGIN GIT_AUTHOR_NAME='Ilya Basin'; export GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL='basinilya@gmail.com'; export GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE='1350401059 +0400'; export GIT_AUTHOR_DATE SED OUTPUT END SED EXIT CODE: 0 SED END SED BEGIN SED ARGUMENTS: -ne /^committer /{ s/'/'\\''/g h s/^committer \([^<]*\) <[^>]*> .*$/\1/ s/'/'\''/g s/.*/GIT_COMMITTER_NAME='&'; export GIT_COMMITTER_NAME/p g s/^committer [^<]* <\([^>]*\)> .*$/\1/ s/'/'\''/g s/.*/GIT_COMMITTER_EMAIL='&'; export GIT_COMMITTER_EMAIL/p g s/^committer [^<]* <[^>]*> \(.*\)$/\1/ s/'/'\''/g s/.*/GIT_COMMITTER_DATE='&'; export GIT_COMMITTER_DATE/p q } SED STDIN BEGIN tree 969f563d319049bb6dabc12054d67671499a6f55 parent c4734950e37c09ca7d3e3088f6f31d866dbb5077 author Ilya Basin <basinilya@gmail.com> 1350401059 +0400 committer Ilya Basin <basinilya@gmail.com> 1350405585 +0400 temp SED STDIN END SED OUTPUT BEGIN GIT_COMMITTER_NAME='Ilya Basin'; export GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL='basinilya@gmail.com'; export GIT_COMMITTER_EMAIL GIT_COMMITTER_DATE='1350405585 +0400'; export GIT_COMMITTER_DATE SED OUTPUT END SED EXIT CODE: 0 SED END GIT_DIR=/cygdrive/c/sicap/rap/gitcvs/RAP27/.git GIT_AUTHOR_DATE=1350401059 +0400 GIT_INDEX_FILE=/cygdrive/c/sicap/rap/gitcvs/RAP27/.git-rewrite/t/../index GIT_WORK_TREE=. GIT_AUTHOR_NAME=Ilya Basin GIT_COMMITTER_NAME=Ilya Basin GIT_COMMIT=acd1d2bb1984c96630d5070497590307151c4682 GIT_COMMITTER_EMAIL=basinilya@gmail.com GIT_COMMITTER_DATE=1350405585 +0400 GIT_AUTHOR_EMAIL=basinilya@gmail.com SED BEGIN SED ARGUMENTS: -e 1,/^$/d SED STDIN BEGIN tree 969f563d319049bb6dabc12054d67671499a6f55 parent c4734950e37c09ca7d3e3088f6f31d866dbb5077 author Ilya Basin <basinilya@gmail.com> 1350401059 +0400 committer Ilya Basin <basinilya@gmail.com> 1350405585 +0400 temp SED STDIN END SED OUTPUT BEGIN temp SED OUTPUT END SED EXIT CODE: 0 SED END WARNING: Ref 'refs/heads/master' is unchanged [-- Attachment #3: git-sed-sol.txt --] [-- Type: text/plain, Size: 3715 bytes --] -bash-3.00$ git filter-branch -f --tree-filter "env | grep GIT_; true" HEAD~1..HEAD SED BEGIN SED ARGUMENTS: -e s/-/ / SED STDIN BEGIN git-filter-branch SED STDIN END SED OUTPUT BEGIN git filter-branch SED OUTPUT END SED EXIT CODE: 0 SED END SED BEGIN SED ARGUMENTS: -e /^^/d /home/tester/.ilya/builds/makepkg.rap_0.1-1_sparc.XXXXXX/src/rap/.git-rewrite/raw-heads SED INPUT FILE BEGIN: /home/tester/.ilya/builds/makepkg.rap_0.1-1_sparc.XXXXXX/src/rap/.git-rewrite/raw-heads refs/heads/master SED INPUT FILE END: /home/tester/.ilya/builds/makepkg.rap_0.1-1_sparc.XXXXXX/src/rap/.git-rewrite/raw-heads SED OUTPUT BEGIN refs/heads/master SED OUTPUT END SED EXIT CODE: 0 SED END Rewrite acd1d2bb1984c96630d5070497590307151c4682 (1/1) SED BEGIN SED ARGUMENTS: -ne /^AUTHOR /{ s/'/'\\''/g h s/^AUTHOR \([^<]*\) <[^>]*> .*$/\1/ s/'/'\''/g s/.*/GIT_AUTHOR_NAME='&'; export GIT_AUTHOR_NAME/p g s/^AUTHOR [^<]* <\([^>]*\)> .*$/\1/ s/'/'\''/g s/.*/GIT_AUTHOR_EMAIL='&'; export GIT_AUTHOR_EMAIL/p g s/^AUTHOR [^<]* <[^>]*> \(.*\)$/\1/ s/'/'\''/g s/.*/GIT_AUTHOR_DATE='&'; export GIT_AUTHOR_DATE/p q } SED STDIN BEGIN tree 969f563d319049bb6dabc12054d67671499a6f55 parent c4734950e37c09ca7d3e3088f6f31d866dbb5077 author Ilya Basin <basinilya@gmail.com> 1350401059 +0400 committer Ilya Basin <basinilya@gmail.com> 1350405585 +0400 temp SED STDIN END SED OUTPUT BEGIN SED OUTPUT END SED EXIT CODE: 0 SED END SED BEGIN SED ARGUMENTS: -ne /^COMMITTER /{ s/'/'\\''/g h s/^COMMITTER \([^<]*\) <[^>]*> .*$/\1/ s/'/'\''/g s/.*/GIT_COMMITTER_NAME='&'; export GIT_COMMITTER_NAME/p g s/^COMMITTER [^<]* <\([^>]*\)> .*$/\1/ s/'/'\''/g s/.*/GIT_COMMITTER_EMAIL='&'; export GIT_COMMITTER_EMAIL/p g s/^COMMITTER [^<]* <[^>]*> \(.*\)$/\1/ s/'/'\''/g s/.*/GIT_COMMITTER_DATE='&'; export GIT_COMMITTER_DATE/p q } SED STDIN BEGIN tree 969f563d319049bb6dabc12054d67671499a6f55 parent c4734950e37c09ca7d3e3088f6f31d866dbb5077 author Ilya Basin <basinilya@gmail.com> 1350401059 +0400 committer Ilya Basin <basinilya@gmail.com> 1350405585 +0400 temp SED STDIN END SED OUTPUT BEGIN SED OUTPUT END SED EXIT CODE: 0 SED END GIT_DIR=/home/tester/.ilya/builds/makepkg.rap_0.1-1_sparc.XXXXXX/src/rap/.git GIT_INDEX_FILE=/home/tester/.ilya/builds/makepkg.rap_0.1-1_sparc.XXXXXX/src/rap/.git-rewrite/t/../index GIT_WORK_TREE=. GIT_AUTHOR_NAME= GIT_COMMITTER_NAME= GIT_COMMIT=acd1d2bb1984c96630d5070497590307151c4682 SED BEGIN SED ARGUMENTS: -e 1,/^$/d SED STDIN BEGIN tree 969f563d319049bb6dabc12054d67671499a6f55 parent c4734950e37c09ca7d3e3088f6f31d866dbb5077 author Ilya Basin <basinilya@gmail.com> 1350401059 +0400 committer Ilya Basin <basinilya@gmail.com> 1350405585 +0400 temp SED STDIN END SED OUTPUT BEGIN temp SED OUTPUT END SED EXIT CODE: 0 SED END fatal: empty ident <basinilya@gmail.com> not allowed could not write rewritten commit -bash-3.00$ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re[3]: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris? 2012-10-17 8:58 ` Re[2]: " Ilya Basin @ 2012-10-17 10:36 ` Ilya Basin 2012-10-17 22:13 ` Jeff King 2012-10-17 22:09 ` Jeff King 1 sibling, 1 reply; 24+ messages in thread From: Ilya Basin @ 2012-10-17 10:36 UTC (permalink / raw) To: Johannes Sixt, Jeff King; +Cc: git JS>> Most likely, your sed has problems with a sed script in function JS>> get_author_ident_from_commit. I tested it like this: JS>> $ sh -c '. $(git --exec-path)/git-sh-setup; JS>> get_author_ident_from_commit HEAD' JS>> GIT_AUTHOR_NAME='Johannes Sixt' JS>> GIT_AUTHOR_EMAIL='j6t@kdbg.org' JS>> GIT_AUTHOR_DATE='@1350025129 +0200' JS>> -- Hannes IB> Both systems have GNU sed 4.2.1 installed. I wrote a wrapper script wor sed. IB> It's output attached. IB> The difference is letter case in sed input data: IB> Solaris: IB> /^AUTHOR / IB> Windows: IB> /^author / The culprit is bad $PATH : When git-filter-branch runs, for some reason two new entries precede /usr/bin in it: /tmp/777/.ilya-sparc/bin /home/tester/.ilya/opt/SNiFF-3.2.1/bin /export/home/testora/app/testora/product/11.2.0/client_32/bin +/usr/xpg6/bin +/usr/xpg4/bin /usr/bin /home/tester/apache-ant-1.7.1/bin /usr/jdk/instances/jdk1.5.0//bin And /usr/xpg6/bin/tr fails to make "AUTHOR" lowercase. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris? 2012-10-17 10:36 ` Re[3]: " Ilya Basin @ 2012-10-17 22:13 ` Jeff King 0 siblings, 0 replies; 24+ messages in thread From: Jeff King @ 2012-10-17 22:13 UTC (permalink / raw) To: Ilya Basin; +Cc: Johannes Sixt, git On Wed, Oct 17, 2012 at 02:36:23PM +0400, Ilya Basin wrote: > The culprit is bad $PATH : > When git-filter-branch runs, for some reason two new entries precede > /usr/bin in it: > /tmp/777/.ilya-sparc/bin > /home/tester/.ilya/opt/SNiFF-3.2.1/bin > /export/home/testora/app/testora/product/11.2.0/client_32/bin > +/usr/xpg6/bin > +/usr/xpg4/bin > /usr/bin > /home/tester/apache-ant-1.7.1/bin > /usr/jdk/instances/jdk1.5.0//bin > > And /usr/xpg6/bin/tr fails to make "AUTHOR" lowercase. Hmph. Those are controlled by SANE_TOOL_PATH at git's build time, with the intent that the xpg tools are less terrible than the ones in /usr/bin on Solaris. But it sounds like that may not be the case. Yuck. I don't have a Solaris box handy. Is there a way to make sequences like A-Z work sanely with /usr/xpg6/bin/tr? Do you have any LANG or locale settings? Sometimes those can affect sequences. What does: echo AUTHOR | LANG=C LC_ALL=C /usr/xpg6/bin/tr '[A-Z]' '[a-z]' do? -Peff ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris? 2012-10-17 8:58 ` Re[2]: " Ilya Basin 2012-10-17 10:36 ` Re[3]: " Ilya Basin @ 2012-10-17 22:09 ` Jeff King 2012-10-18 5:31 ` Johannes Sixt 1 sibling, 1 reply; 24+ messages in thread From: Jeff King @ 2012-10-17 22:09 UTC (permalink / raw) To: Ilya Basin; +Cc: Johannes Sixt, git On Wed, Oct 17, 2012 at 12:58:47PM +0400, Ilya Basin wrote: > JS> Most likely, your sed has problems with a sed script in function > JS> get_author_ident_from_commit. I tested it like this: > > JS> $ sh -c '. $(git --exec-path)/git-sh-setup; > JS> get_author_ident_from_commit HEAD' > JS> GIT_AUTHOR_NAME='Johannes Sixt' > JS> GIT_AUTHOR_EMAIL='j6t@kdbg.org' > JS> GIT_AUTHOR_DATE='@1350025129 +0200' > > JS> -- Hannes > > Both systems have GNU sed 4.2.1 installed. I wrote a wrapper script wor sed. > It's output attached. > The difference is letter case in sed input data: > Solaris: > /^AUTHOR / > Windows: > /^author / Ah, so it's tr that is the culprit. We've had problems with Solaris tr before, but usually around NULs or the use of brackets. But according to 40a7ce6 (tr portability fixes, 2008-03-12), filter-branch is already doing it the portable way. If you apply this patch, does your filter-branch work? diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 178e453..58b1d69 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -68,8 +68,8 @@ set_ident () { # "author" or "committer set_ident () { - lid="$(echo "$1" | tr "[A-Z]" "[a-z]")" - uid="$(echo "$1" | tr "[a-z]" "[A-Z]")" + lid="$(echo "$1" | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz)" + uid="$(echo "$1" | tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ)" pick_id_script=' /^'$lid' /{ s/'\''/'\''\\'\'\''/g That seems like crazy overkill, but it at least will let us double-check that the tr sequences are the problem. -Peff ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris? 2012-10-17 22:09 ` Jeff King @ 2012-10-18 5:31 ` Johannes Sixt 2012-10-18 5:36 ` Jeff King 0 siblings, 1 reply; 24+ messages in thread From: Johannes Sixt @ 2012-10-18 5:31 UTC (permalink / raw) To: Jeff King; +Cc: Ilya Basin, git Am 10/18/2012 0:09, schrieb Jeff King: > - lid="$(echo "$1" | tr "[A-Z]" "[a-z]")" > - uid="$(echo "$1" | tr "[a-z]" "[A-Z]")" > + lid="$(echo "$1" | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz)" > + uid="$(echo "$1" | tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ)" > > That seems like crazy overkill, but it at least will let us double-check > that the tr sequences are the problem. Right. But we should really be doing something like this instead to save a few subprocesses. -- Hannes diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 178e453..018e56e 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -68,8 +68,8 @@ eval "$functions" # "author" or "committer set_ident () { - lid="$(echo "$1" | tr "[A-Z]" "[a-z]")" - uid="$(echo "$1" | tr "[a-z]" "[A-Z]")" + uid=$1 + lid=$2 pick_id_script=' /^'$lid' /{ s/'\''/'\''\\'\'\''/g @@ -320,9 +320,9 @@ while read commit parents; do git cat-file commit "$commit" >../commit || die "Cannot read commit $commit" - eval "$(set_ident AUTHOR <../commit)" || + eval "$(set_ident AUTHOR author <../commit)" || die "setting author failed for commit $commit" - eval "$(set_ident COMMITTER <../commit)" || + eval "$(set_ident COMMITTER committer <../commit)" || die "setting committer failed for commit $commit" eval "$filter_env" < /dev/null || die "env filter failed: $filter_env" ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris? 2012-10-18 5:31 ` Johannes Sixt @ 2012-10-18 5:36 ` Jeff King 2012-10-18 6:06 ` Junio C Hamano 0 siblings, 1 reply; 24+ messages in thread From: Jeff King @ 2012-10-18 5:36 UTC (permalink / raw) To: Johannes Sixt; +Cc: Ilya Basin, git On Thu, Oct 18, 2012 at 07:31:35AM +0200, Johannes Sixt wrote: > Right. But we should really be doing something like this instead to save a > few subprocesses. > [...] > - eval "$(set_ident AUTHOR <../commit)" || > + eval "$(set_ident AUTHOR author <../commit)" || I cringe a little at losing DRY-ness to avoid processes. But the repetition is pretty straightforward and obvious, and I know that some platforms are really hurt by extra processes (and this is being called for every commit). Speaking of repetition, this seems like almost the exact same parsing that happens in git-sh-setup's get_author_ident_from_commit. Maybe it's worth merging them. I suspect you could also avoid another process by parsing out both author and committer information in the same sed invocation. -Peff ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris? 2012-10-18 5:36 ` Jeff King @ 2012-10-18 6:06 ` Junio C Hamano 2012-10-18 6:08 ` Jeff King 0 siblings, 1 reply; 24+ messages in thread From: Junio C Hamano @ 2012-10-18 6:06 UTC (permalink / raw) To: Jeff King; +Cc: Johannes Sixt, Ilya Basin, git Jeff King <peff@peff.net> writes: > On Thu, Oct 18, 2012 at 07:31:35AM +0200, Johannes Sixt wrote: > >> Right. But we should really be doing something like this instead to save a >> few subprocesses. >> [...] >> - eval "$(set_ident AUTHOR <../commit)" || >> + eval "$(set_ident AUTHOR author <../commit)" || > > I cringe a little at losing DRY-ness to avoid processes. Well, the header field token "author" and the middle word of the variable GIT_AUTHOR_NAME _happen_ to be the same modulo case, but they did not have to be, so you could argue the updated set_ident implementation is more generally useful (you could even argue that we should spell the first parameter out as "GIT_AUTHOR_NAME" and "GIT_AUTHOR_EMAIL", two separate parameters). > Speaking of repetition, this seems like almost the exact same parsing > that happens in git-sh-setup's get_author_ident_from_commit. Maybe it's > worth merging them. I suspect you could also avoid another process > by parsing out both author and committer information in the same sed > invocation. Yes, yes and yes. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris? 2012-10-18 6:06 ` Junio C Hamano @ 2012-10-18 6:08 ` Jeff King 2012-10-18 7:22 ` [PATCH 0/2] clean up filter-branch ident parsing Jeff King 0 siblings, 1 reply; 24+ messages in thread From: Jeff King @ 2012-10-18 6:08 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Sixt, Ilya Basin, git On Wed, Oct 17, 2012 at 11:06:11PM -0700, Junio C Hamano wrote: > >> - eval "$(set_ident AUTHOR <../commit)" || > >> + eval "$(set_ident AUTHOR author <../commit)" || > > > > I cringe a little at losing DRY-ness to avoid processes. > > Well, the header field token "author" and the middle word of the > variable GIT_AUTHOR_NAME _happen_ to be the same modulo case, but > they did not have to be, so you could argue the updated set_ident > implementation is more generally useful (you could even argue that > we should spell the first parameter out as "GIT_AUTHOR_NAME" and > "GIT_AUTHOR_EMAIL", two separate parameters). True, though that is even more work for the caller (and *_DATE, too). We could make it "GIT_AUTHOR", but I don't think there is much point in being that level of half-way general. The caller can always pick it out of the variables if they really want to do something tricky. > > Speaking of repetition, this seems like almost the exact same parsing > > that happens in git-sh-setup's get_author_ident_from_commit. Maybe it's > > worth merging them. I suspect you could also avoid another process > > by parsing out both author and committer information in the same sed > > invocation. > > Yes, yes and yes. Working on it now. git-sh-setup works, but chasing an annoying bug in filter-branch. I'm sure it's something silly and stupid. -Peff ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 0/2] clean up filter-branch ident parsing 2012-10-18 6:08 ` Jeff King @ 2012-10-18 7:22 ` Jeff King 2012-10-18 7:25 ` [PATCH 1/2] git-sh-setup: refactor ident-parsing functions Jeff King 2012-10-18 7:25 ` [PATCH 2/2] filter-branch: use git-sh-setup's ident parsing functions Jeff King 0 siblings, 2 replies; 24+ messages in thread From: Jeff King @ 2012-10-18 7:22 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Sixt, Ilya Basin, git On Thu, Oct 18, 2012 at 02:08:47AM -0400, Jeff King wrote: > Working on it now. git-sh-setup works, but chasing an annoying bug in > filter-branch. I'm sure it's something silly and stupid. Ugh. I was being caught by crazy dash-versus-bash stuff. Try this: $ bash -c 'echo "\\\\"' \\ $ dash -c 'echo "\\\\"' \ It's the "echo will automatically do backslash escaping" magic we have so often enjoyed. I solved it with printf. Patches to follow. My primary motivation was cleanup, but it also has a net reduction of 5 fork+execs for each commit that filter-branch processes. This dropped the run-time of "git filter-branch HEAD -1000" on my linux box from 62s to 47s. A real filter-branch would do more work in the filters, of course, but that translates to 7.5 minutes of time saved if you are filtering all 30,000 commits of git.git. [1/2]: git-sh-setup: refactor ident-parsing functions [2/2]: filter-branch: use git-sh-setup's ident parsing functions -Peff ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 1/2] git-sh-setup: refactor ident-parsing functions 2012-10-18 7:22 ` [PATCH 0/2] clean up filter-branch ident parsing Jeff King @ 2012-10-18 7:25 ` Jeff King 2012-11-12 17:44 ` Junio C Hamano 2012-10-18 7:25 ` [PATCH 2/2] filter-branch: use git-sh-setup's ident parsing functions Jeff King 1 sibling, 1 reply; 24+ messages in thread From: Jeff King @ 2012-10-18 7:25 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Sixt, Ilya Basin, git The only ident-parsing function we currently provide is get_author_ident_from_commit. This is not very flexible for two reasons: 1. It takes a commit as an argument, and can't read from commit headers saved on disk. 2. It will only parse authors, not committers. This patch provides a more flexible interface which will parse multiple idents from a commit provide on stdin. We can easily use it as a building block for the current function to retain compatibility. Signed-off-by: Jeff King <peff@peff.net> --- Since we are counting processes in this series, I should note that this actually adds a subshell invocation for each call, since it went from: script='...' sed $script to: sed "$(make_script)" For filter-branch, which is really the only high-performance caller we have, this is negated by the fact that it will do author and committer at the same time, saving us an extra subshell (in addition to an extra sed invocation). git-sh-setup.sh | 62 +++++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 43 insertions(+), 19 deletions(-) diff --git a/git-sh-setup.sh b/git-sh-setup.sh index ee0e0bc..22f0aed 100644 --- a/git-sh-setup.sh +++ b/git-sh-setup.sh @@ -191,28 +191,52 @@ get_author_ident_from_commit () { fi } +# Generate a sed script to parse identities from a commit. +# +# Reads the commit from stdin, which should be in raw format (e.g., from +# cat-file or "--pretty=raw"). +# +# The first argument specifies the ident line to parse (e.g., "author"), and +# the second specifies the environment variable to put it in (e.g., "AUTHOR" +# for "GIT_AUTHOR_*"). Multiple pairs can be given to parse author and +# committer. +pick_ident_script () { + while test $# -gt 0 + do + lid=$1; shift + uid=$1; shift + printf '%s' " + /^$lid /{ + s/'/'\\\\''/g + h + s/^$lid "'\([^<]*\) <[^>]*> .*$/\1/'" + s/.*/GIT_${uid}_NAME='&'/p + + g + s/^$lid "'[^<]* <\([^>]*\)> .*$/\1/'" + s/.*/GIT_${uid}_EMAIL='&'/p + + g + s/^$lid "'[^<]* <[^>]*> \(.*\)$/@\1/'" + s/.*/GIT_${uid}_DATE='&'/p + } + " + done + echo '/^$/q' +} + +# Create a pick-script as above and feed it to sed. Stdout is suitable for +# feeding to eval. +parse_ident_from_commit () { + LANG=C LC_ALL=C sed -ne "$(pick_ident_script "$@")" +} + +# Parse the author from a commit given as an argument. Stdout is suitable for +# feeding to eval to set the usual GIT_* ident variables. get_author_ident_from_commit () { - pick_author_script=' - /^author /{ - s/'\''/'\''\\'\'\''/g - h - s/^author \([^<]*\) <[^>]*> .*$/\1/ - s/.*/GIT_AUTHOR_NAME='\''&'\''/p - - g - s/^author [^<]* <\([^>]*\)> .*$/\1/ - s/.*/GIT_AUTHOR_EMAIL='\''&'\''/p - - g - s/^author [^<]* <[^>]*> \(.*\)$/@\1/ - s/.*/GIT_AUTHOR_DATE='\''&'\''/p - - q - } - ' encoding=$(git config i18n.commitencoding || echo UTF-8) git show -s --pretty=raw --encoding="$encoding" "$1" -- | - LANG=C LC_ALL=C sed -ne "$pick_author_script" + parse_ident_from_commit author AUTHOR } # Clear repo-local GIT_* environment variables. Useful when switching to -- 1.8.0.rc3.3.gba630e1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] git-sh-setup: refactor ident-parsing functions 2012-10-18 7:25 ` [PATCH 1/2] git-sh-setup: refactor ident-parsing functions Jeff King @ 2012-11-12 17:44 ` Junio C Hamano 2012-11-12 19:44 ` Jeff King 0 siblings, 1 reply; 24+ messages in thread From: Junio C Hamano @ 2012-11-12 17:44 UTC (permalink / raw) To: Jeff King; +Cc: Johannes Sixt, Ilya Basin, git Jeff King <peff@peff.net> writes: > The only ident-parsing function we currently provide is > get_author_ident_from_commit. This is not very > flexible for two reasons: > > 1. It takes a commit as an argument, and can't read from > commit headers saved on disk. > > 2. It will only parse authors, not committers. > > This patch provides a more flexible interface which will > parse multiple idents from a commit provide on stdin. We can > easily use it as a building block for the current function > to retain compatibility. > > Signed-off-by: Jeff King <peff@peff.net> > --- > Since we are counting processes in this series, I should note that this > actually adds a subshell invocation for each call, since it went from: > > script='...' > sed $script > > to: > > sed "$(make_script)" > > For filter-branch, which is really the only high-performance caller we > have, this is negated by the fact that it will do author and committer > at the same time, saving us an extra subshell (in addition to an extra > sed invocation). Given that pick-ident-script is a const function, a caller that repeatedly call is could call it once and use it in a variable, no? > > git-sh-setup.sh | 62 +++++++++++++++++++++++++++++++++++++++------------------ > 1 file changed, 43 insertions(+), 19 deletions(-) > > diff --git a/git-sh-setup.sh b/git-sh-setup.sh > index ee0e0bc..22f0aed 100644 > --- a/git-sh-setup.sh > +++ b/git-sh-setup.sh > @@ -191,28 +191,52 @@ get_author_ident_from_commit () { > fi > } > > +# Generate a sed script to parse identities from a commit. > +# > +# Reads the commit from stdin, which should be in raw format (e.g., from > +# cat-file or "--pretty=raw"). > +# > +# The first argument specifies the ident line to parse (e.g., "author"), and > +# the second specifies the environment variable to put it in (e.g., "AUTHOR" > +# for "GIT_AUTHOR_*"). Multiple pairs can be given to parse author and > +# committer. > +pick_ident_script () { > + while test $# -gt 0 > + do > + lid=$1; shift > + uid=$1; shift > + printf '%s' " > + /^$lid /{ > + s/'/'\\\\''/g > + h > + s/^$lid "'\([^<]*\) <[^>]*> .*$/\1/'" > + s/.*/GIT_${uid}_NAME='&'/p > + > + g > + s/^$lid "'[^<]* <\([^>]*\)> .*$/\1/'" > + s/.*/GIT_${uid}_EMAIL='&'/p > + > + g > + s/^$lid "'[^<]* <[^>]*> \(.*\)$/@\1/'" > + s/.*/GIT_${uid}_DATE='&'/p > + } > + " > + done > + echo '/^$/q' > +} > + > +# Create a pick-script as above and feed it to sed. Stdout is suitable for > +# feeding to eval. > +parse_ident_from_commit () { > + LANG=C LC_ALL=C sed -ne "$(pick_ident_script "$@")" > +} > + > +# Parse the author from a commit given as an argument. Stdout is suitable for > +# feeding to eval to set the usual GIT_* ident variables. > get_author_ident_from_commit () { > - pick_author_script=' > - /^author /{ > - s/'\''/'\''\\'\'\''/g > - h > - s/^author \([^<]*\) <[^>]*> .*$/\1/ > - s/.*/GIT_AUTHOR_NAME='\''&'\''/p > - > - g > - s/^author [^<]* <\([^>]*\)> .*$/\1/ > - s/.*/GIT_AUTHOR_EMAIL='\''&'\''/p > - > - g > - s/^author [^<]* <[^>]*> \(.*\)$/@\1/ > - s/.*/GIT_AUTHOR_DATE='\''&'\''/p > - > - q > - } > - ' > encoding=$(git config i18n.commitencoding || echo UTF-8) > git show -s --pretty=raw --encoding="$encoding" "$1" -- | > - LANG=C LC_ALL=C sed -ne "$pick_author_script" > + parse_ident_from_commit author AUTHOR > } > > # Clear repo-local GIT_* environment variables. Useful when switching to ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] git-sh-setup: refactor ident-parsing functions 2012-11-12 17:44 ` Junio C Hamano @ 2012-11-12 19:44 ` Jeff King 2012-11-12 20:08 ` Junio C Hamano 0 siblings, 1 reply; 24+ messages in thread From: Jeff King @ 2012-11-12 19:44 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Sixt, Ilya Basin, git On Mon, Nov 12, 2012 at 09:44:01AM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > The only ident-parsing function we currently provide is > > get_author_ident_from_commit. This is not very > > flexible for two reasons: > > > > 1. It takes a commit as an argument, and can't read from > > commit headers saved on disk. > > > > 2. It will only parse authors, not committers. > > > > This patch provides a more flexible interface which will > > parse multiple idents from a commit provide on stdin. We can > > easily use it as a building block for the current function > > to retain compatibility. > > > > Signed-off-by: Jeff King <peff@peff.net> > > --- > > Since we are counting processes in this series, I should note that this > > actually adds a subshell invocation for each call, since it went from: > > > > script='...' > > sed $script > > > > to: > > > > sed "$(make_script)" > > > > For filter-branch, which is really the only high-performance caller we > > have, this is negated by the fact that it will do author and committer > > at the same time, saving us an extra subshell (in addition to an extra > > sed invocation). > > Given that pick-ident-script is a const function, a caller that > repeatedly call is could call it once and use it in a variable, no? The problem is that it is a helper called from parse_ident_from_commit. And that function just passes along its arguments, so it does not know that it is being called repeatedly with the same arguments. So you'd have to either change the interface or memoize internally. I don't think memoization is a good option for two reasons: 1. Storing the arguments to compare to later is complex. You don't want to just store "$*" from the last run and see if we got the same arguments. You'd have to quote your delimiter (e.g., you would not want to confuse ("foo", "bar") with ("foo bar"). Though in this instance, we know that our args do not have spaces, so we could get away with that. 2. If you are in a subshell or even a while loop, your memoized variable will not be retained. So unless somebody has some clever scheme for memoizing shell functions without any process overhead, it is probably not worth it. Changing the interface for get_author_ident_from_commit would be a pain, but if we just wanted to help filter-branch, we could do something like this: diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 5314249..7a693ba 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -74,7 +74,7 @@ finish_ident() { } set_ident () { - parse_ident_from_commit author AUTHOR committer COMMITTER + parse_ident_from_commit_via_script "$ident_script" finish_ident AUTHOR finish_ident COMMITTER } @@ -93,6 +93,7 @@ if [ "$(is_bare_repository)" = false ]; then require_clean_work_tree 'rewrite branches' fi +ident_script=$(pick_ident_script author AUTHOR committer COMMITTER) tempdir=.git-rewrite filter_env= filter_tree= diff --git a/git-sh-setup.sh b/git-sh-setup.sh index 22f0aed..1e20e17 100644 --- a/git-sh-setup.sh +++ b/git-sh-setup.sh @@ -225,10 +225,17 @@ pick_ident_script () { echo '/^$/q' } +# Feed a pick_ident_script return value to sed. Use this instead of +# parse_ident_from_commit below if you are going to be parsing commits in a +# tight loop and want to save a process. +parse_ident_from_commit_via_script() { + LANG=C LC_ALL=C sed -ne "$1" +} + # Create a pick-script as above and feed it to sed. Stdout is suitable for # feeding to eval. parse_ident_from_commit () { - LANG=C LC_ALL=C sed -ne "$(pick_ident_script "$@")" + parse_ident_from_commit_via_script "$(pick_ident_script "$@")" } # Parse the author from a commit given as an argument. Stdout is suitable for ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] git-sh-setup: refactor ident-parsing functions 2012-11-12 19:44 ` Jeff King @ 2012-11-12 20:08 ` Junio C Hamano 2012-11-12 20:12 ` Jeff King 0 siblings, 1 reply; 24+ messages in thread From: Junio C Hamano @ 2012-11-12 20:08 UTC (permalink / raw) To: Jeff King; +Cc: Johannes Sixt, Ilya Basin, git Jeff King <peff@peff.net> writes: > Changing the interface for get_author_ident_from_commit would be a pain, > but if we just wanted to help filter-branch, we could do something like > this: Yes, that is the direction I was alluding to. Callers of get_author_ident_from_commit can also do the same and avoid rebuilding the same $pick_author_script over and over again, or get_author_ident_from_commit can do so for its callers. > diff --git a/git-filter-branch.sh b/git-filter-branch.sh > index 5314249..7a693ba 100755 > --- a/git-filter-branch.sh > +++ b/git-filter-branch.sh > @@ -74,7 +74,7 @@ finish_ident() { > } > > set_ident () { > - parse_ident_from_commit author AUTHOR committer COMMITTER > + parse_ident_from_commit_via_script "$ident_script" > finish_ident AUTHOR > finish_ident COMMITTER > } > @@ -93,6 +93,7 @@ if [ "$(is_bare_repository)" = false ]; then > require_clean_work_tree 'rewrite branches' > fi > > +ident_script=$(pick_ident_script author AUTHOR committer COMMITTER) > tempdir=.git-rewrite > filter_env= > filter_tree= > diff --git a/git-sh-setup.sh b/git-sh-setup.sh > index 22f0aed..1e20e17 100644 > --- a/git-sh-setup.sh > +++ b/git-sh-setup.sh > @@ -225,10 +225,17 @@ pick_ident_script () { > echo '/^$/q' > } > > +# Feed a pick_ident_script return value to sed. Use this instead of > +# parse_ident_from_commit below if you are going to be parsing commits in a > +# tight loop and want to save a process. > +parse_ident_from_commit_via_script() { > + LANG=C LC_ALL=C sed -ne "$1" > +} > + > # Create a pick-script as above and feed it to sed. Stdout is suitable for > # feeding to eval. > parse_ident_from_commit () { > - LANG=C LC_ALL=C sed -ne "$(pick_ident_script "$@")" > + parse_ident_from_commit_via_script "$(pick_ident_script "$@")" > } > > # Parse the author from a commit given as an argument. Stdout is suitable for ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] git-sh-setup: refactor ident-parsing functions 2012-11-12 20:08 ` Junio C Hamano @ 2012-11-12 20:12 ` Jeff King 2012-11-12 20:32 ` Junio C Hamano 0 siblings, 1 reply; 24+ messages in thread From: Jeff King @ 2012-11-12 20:12 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Sixt, Ilya Basin, git On Mon, Nov 12, 2012 at 12:08:37PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Changing the interface for get_author_ident_from_commit would be a pain, > > but if we just wanted to help filter-branch, we could do something like > > this: > > Yes, that is the direction I was alluding to. > > Callers of get_author_ident_from_commit can also do the same and > avoid rebuilding the same $pick_author_script over and over again, > or get_author_ident_from_commit can do so for its callers. I don't think get_author_ident_from_commit can do so on demand due to the subshell issue I mentioned. So you'd have to generate the pick script on inclusion of git-sh-setup, whether the includer wants to call the function or not. I wonder if we should simply generate these at build time and store them in the script. I'm also not sure that saving one process invocation is worth spending a lot of thought cycles on. Maybe somebody on Windows could run a filter-branch with the patch I just sent and measure whether it even makes a difference. -Peff ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] git-sh-setup: refactor ident-parsing functions 2012-11-12 20:12 ` Jeff King @ 2012-11-12 20:32 ` Junio C Hamano 0 siblings, 0 replies; 24+ messages in thread From: Junio C Hamano @ 2012-11-12 20:32 UTC (permalink / raw) To: Jeff King; +Cc: Johannes Sixt, Ilya Basin, git Jeff King <peff@peff.net> writes: > On Mon, Nov 12, 2012 at 12:08:37PM -0800, Junio C Hamano wrote: > >> Jeff King <peff@peff.net> writes: >> >> > Changing the interface for get_author_ident_from_commit would be a pain, >> > but if we just wanted to help filter-branch, we could do something like >> > this: >> >> Yes, that is the direction I was alluding to. >> >> Callers of get_author_ident_from_commit can also do the same and >> avoid rebuilding the same $pick_author_script over and over again, >> or get_author_ident_from_commit can do so for its callers. > > I don't think get_author_ident_from_commit can do so on demand due to > the subshell issue I mentioned. OK, that is what I obviously missed. > I wonder if we should simply generate these at build time and store them > in the script. Yeah, my knee-jerk reaction was that it was wasteful to run script at runtime to build a constant string, but the engineering cost may not be worth it. ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2/2] filter-branch: use git-sh-setup's ident parsing functions 2012-10-18 7:22 ` [PATCH 0/2] clean up filter-branch ident parsing Jeff King 2012-10-18 7:25 ` [PATCH 1/2] git-sh-setup: refactor ident-parsing functions Jeff King @ 2012-10-18 7:25 ` Jeff King 2012-10-18 7:49 ` Johannes Sixt 1 sibling, 1 reply; 24+ messages in thread From: Jeff King @ 2012-10-18 7:25 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Sixt, Ilya Basin, git This saves us some code, but it also reduces the number of processes we start for each filtered commit. Since we can parse both author and committer in the same sed invocation, we save one process. And since the new interface avoids tr, we save 4 processes. It also avoids using "tr", which has had some odd portability problems reported with from Solaris's xpg6 version. Signed-off-by: Jeff King <peff@peff.net> --- git-filter-branch.sh | 42 +++++++++--------------------------------- 1 file changed, 9 insertions(+), 33 deletions(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 178e453..69406ae 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -64,37 +64,15 @@ set_ident () { eval "$functions" -# When piped a commit, output a script to set the ident of either -# "author" or "committer +# Ensure non-empty id name. +fallback_name() { + echo "case \"\$GIT_$1_NAME\" in \"\") GIT_$1_NAME=\"\${GIT_$1_EMAIL%%@*}\" && export GIT_$1_NAME;; esac" +} set_ident () { - lid="$(echo "$1" | tr "[A-Z]" "[a-z]")" - uid="$(echo "$1" | tr "[a-z]" "[A-Z]")" - pick_id_script=' - /^'$lid' /{ - s/'\''/'\''\\'\'\''/g - h - s/^'$lid' \([^<]*\) <[^>]*> .*$/\1/ - s/'\''/'\''\'\'\''/g - s/.*/GIT_'$uid'_NAME='\''&'\''; export GIT_'$uid'_NAME/p - - g - s/^'$lid' [^<]* <\([^>]*\)> .*$/\1/ - s/'\''/'\''\'\'\''/g - s/.*/GIT_'$uid'_EMAIL='\''&'\''; export GIT_'$uid'_EMAIL/p - - g - s/^'$lid' [^<]* <[^>]*> \(.*\)$/@\1/ - s/'\''/'\''\'\'\''/g - s/.*/GIT_'$uid'_DATE='\''&'\''; export GIT_'$uid'_DATE/p - - q - } - ' - - LANG=C LC_ALL=C sed -ne "$pick_id_script" - # Ensure non-empty id name. - echo "case \"\$GIT_${uid}_NAME\" in \"\") GIT_${uid}_NAME=\"\${GIT_${uid}_EMAIL%%@*}\" && export GIT_${uid}_NAME;; esac" + parse_ident_from_commit author AUTHOR committer COMMITTER + fallback_name AUTHOR + fallback_name COMMITTER } USAGE="[--env-filter <command>] [--tree-filter <command>] @@ -320,10 +298,8 @@ while read commit parents; do git cat-file commit "$commit" >../commit || die "Cannot read commit $commit" - eval "$(set_ident AUTHOR <../commit)" || - die "setting author failed for commit $commit" - eval "$(set_ident COMMITTER <../commit)" || - die "setting committer failed for commit $commit" + eval "$(set_ident <../commit)" || + die "setting author/committer failed for commit $commit" eval "$filter_env" < /dev/null || die "env filter failed: $filter_env" -- 1.8.0.rc3.3.gba630e1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 2/2] filter-branch: use git-sh-setup's ident parsing functions 2012-10-18 7:25 ` [PATCH 2/2] filter-branch: use git-sh-setup's ident parsing functions Jeff King @ 2012-10-18 7:49 ` Johannes Sixt 2012-10-18 7:54 ` Jeff King 0 siblings, 1 reply; 24+ messages in thread From: Johannes Sixt @ 2012-10-18 7:49 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Ilya Basin, git Am 10/18/2012 9:25, schrieb Jeff King: > -# When piped a commit, output a script to set the ident of either > -# "author" or "committer > +# Ensure non-empty id name. > +fallback_name() { > + echo "case \"\$GIT_$1_NAME\" in \"\") GIT_$1_NAME=\"\${GIT_$1_EMAIL%%@*}\" && export GIT_$1_NAME;; esac" > +} > > set_ident () { > - lid="$(echo "$1" | tr "[A-Z]" "[a-z]")" > - uid="$(echo "$1" | tr "[a-z]" "[A-Z]")" > - pick_id_script=' > - /^'$lid' /{ > - s/'\''/'\''\\'\'\''/g > - h > - s/^'$lid' \([^<]*\) <[^>]*> .*$/\1/ > - s/'\''/'\''\'\'\''/g > - s/.*/GIT_'$uid'_NAME='\''&'\''; export GIT_'$uid'_NAME/p > - > - g > - s/^'$lid' [^<]* <\([^>]*\)> .*$/\1/ > - s/'\''/'\''\'\'\''/g > - s/.*/GIT_'$uid'_EMAIL='\''&'\''; export GIT_'$uid'_EMAIL/p > - > - g > - s/^'$lid' [^<]* <[^>]*> \(.*\)$/@\1/ > - s/'\''/'\''\'\'\''/g > - s/.*/GIT_'$uid'_DATE='\''&'\''; export GIT_'$uid'_DATE/p > - > - q > - } > - ' > - > - LANG=C LC_ALL=C sed -ne "$pick_id_script" > - # Ensure non-empty id name. > - echo "case \"\$GIT_${uid}_NAME\" in \"\") GIT_${uid}_NAME=\"\${GIT_${uid}_EMAIL%%@*}\" && export GIT_${uid}_NAME;; esac" > + parse_ident_from_commit author AUTHOR committer COMMITTER > + fallback_name AUTHOR > + fallback_name COMMITTER > } Didn't you lose the export GIT_$uid_{NAME,EMAIL,DATE} parts somewhere on the way? Thanks for working on this! -- Hannes ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/2] filter-branch: use git-sh-setup's ident parsing functions 2012-10-18 7:49 ` Johannes Sixt @ 2012-10-18 7:54 ` Jeff King 2012-10-18 10:22 ` Jeff King 0 siblings, 1 reply; 24+ messages in thread From: Jeff King @ 2012-10-18 7:54 UTC (permalink / raw) To: Johannes Sixt; +Cc: Junio C Hamano, Ilya Basin, git On Thu, Oct 18, 2012 at 09:49:04AM +0200, Johannes Sixt wrote: > > - s/.*/GIT_'$uid'_NAME='\''&'\''; export GIT_'$uid'_NAME/p > > Didn't you lose the export GIT_$uid_{NAME,EMAIL,DATE} parts somewhere on > the way? Yikes, you're right. I didn't even notice, as the test suite still passes. I can see how the env filter would still be able to see the variables, but the commit-tree call wouldn't. I guess it happens to work because we do not test alternate idents in our filter branch tests (IOW, we are silently rewriting each commit during the filter-branch, but it happens to have the same identities). I'll investigate. -Peff ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/2] filter-branch: use git-sh-setup's ident parsing functions 2012-10-18 7:54 ` Jeff King @ 2012-10-18 10:22 ` Jeff King 2012-10-18 10:26 ` Jeff King 0 siblings, 1 reply; 24+ messages in thread From: Jeff King @ 2012-10-18 10:22 UTC (permalink / raw) To: Johannes Sixt; +Cc: Junio C Hamano, Ilya Basin, git On Thu, Oct 18, 2012 at 03:54:29AM -0400, Jeff King wrote: > On Thu, Oct 18, 2012 at 09:49:04AM +0200, Johannes Sixt wrote: > > > > - s/.*/GIT_'$uid'_NAME='\''&'\''; export GIT_'$uid'_NAME/p > > > > Didn't you lose the export GIT_$uid_{NAME,EMAIL,DATE} parts somewhere on > > the way? > > Yikes, you're right. I didn't even notice, as the test suite still > passes. I can see how the env filter would still be able to see the > variables, but the commit-tree call wouldn't. I guess it happens to work > because we do not test alternate idents in our filter branch tests (IOW, > we are silently rewriting each commit during the filter-branch, but it > happens to have the same identities). Hrm. We _do_ test this in t7003. Weirder, if I instrument filter-branch like this: diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 69406ae..1b504ce 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -298,8 +298,13 @@ while read commit parents; do git cat-file commit "$commit" >../commit || die "Cannot read commit $commit" + echo >&2 pre: $GIT_AUTHOR_NAME + sh -c 'echo >&2 pre, subshell: $GIT_AUTHOR_NAME' + eval "$(set_ident <../commit)" || die "setting author/committer failed for commit $commit" + echo >&2 post: $GIT_AUTHOR_NAME + sh -c 'echo >&2 post, subshell: $GIT_AUTHOR_NAME' eval "$filter_env" < /dev/null || die "env filter failed: $filter_env" diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh index 4d13e10..ce57fc5 100755 --- a/t/t7003-filter-branch.sh +++ b/t/t7003-filter-branch.sh @@ -174,6 +174,8 @@ test_expect_success 'author information is preserved' ' test 1 = $(git rev-list --author="B V Uips" preserved-author | wc -l) ' +test_done + test_expect_success "remove a certain author's commits" ' echo i > i && test_tick && and run t7003, it shows that the variable is properly exported to the sub-process! But I can't seem to figure out why. Confused... -Peff ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 2/2] filter-branch: use git-sh-setup's ident parsing functions 2012-10-18 10:22 ` Jeff King @ 2012-10-18 10:26 ` Jeff King 2012-10-18 10:33 ` [PATCHv2 " Jeff King 0 siblings, 1 reply; 24+ messages in thread From: Jeff King @ 2012-10-18 10:26 UTC (permalink / raw) To: Johannes Sixt; +Cc: Junio C Hamano, Ilya Basin, git On Thu, Oct 18, 2012 at 06:22:17AM -0400, Jeff King wrote: > Hrm. We _do_ test this in t7003. Weirder, if I instrument filter-branch > like this: > [...] > and run t7003, it shows that the variable is properly exported to the > sub-process! But I can't seem to figure out why. Confused... Oh, I see. The variables are already exported by test-lib.sh. You can see the breakage with: diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh index 4d13e10..1e7a209 100755 --- a/t/t7003-filter-branch.sh +++ b/t/t7003-filter-branch.sh @@ -167,10 +167,11 @@ test_expect_success 'author information is preserved' ' test_tick && GIT_AUTHOR_NAME="B V Uips" git commit -m bvuips && git branch preserved-author && - git filter-branch -f --msg-filter "cat; \ + (sane_unset GIT_AUTHOR_NAME && + git filter-branch -f --msg-filter "cat; \ test \$GIT_COMMIT != $(git rev-parse master) || \ echo Hallo" \ - preserved-author && + preserved-author) && test 1 = $(git rev-list --author="B V Uips" preserved-author | wc -l) ' -Peff ^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCHv2 2/2] filter-branch: use git-sh-setup's ident parsing functions 2012-10-18 10:26 ` Jeff King @ 2012-10-18 10:33 ` Jeff King 0 siblings, 0 replies; 24+ messages in thread From: Jeff King @ 2012-10-18 10:33 UTC (permalink / raw) To: Johannes Sixt; +Cc: Junio C Hamano, Ilya Basin, git This saves us some code, but it also reduces the number of processes we start for each filtered commit. Since we can parse both author and committer in the same sed invocation, we save one process. And since the new interface avoids tr, we save 4 processes. It also avoids using "tr", which has had some odd portability problems reported with from Solaris's xpg6 version. We also tweak one of the tests in t7003 to double-check that we are properly exporting the variables (because test-lib.sh exports GIT_AUTHOR_NAME, it will be automatically exported in subprograms. We override this to make sure that filter-branch handles it properly itself). Signed-off-by: Jeff King <peff@peff.net> --- This fixes the missing exports from v1. There's no changes needed to patch 1. git-filter-branch.sh | 46 +++++++++++++--------------------------------- t/t7003-filter-branch.sh | 5 +++-- 2 files changed, 16 insertions(+), 35 deletions(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 178e453..5314249 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -64,37 +64,19 @@ set_ident () { eval "$functions" -# When piped a commit, output a script to set the ident of either -# "author" or "committer +finish_ident() { + # Ensure non-empty id name. + echo "case \"\$GIT_$1_NAME\" in \"\") GIT_$1_NAME=\"\${GIT_$1_EMAIL%%@*}\" && export GIT_$1_NAME;; esac" + # And make sure everything is exported. + echo "export GIT_$1_NAME" + echo "export GIT_$1_EMAIL" + echo "export GIT_$1_DATE" +} set_ident () { - lid="$(echo "$1" | tr "[A-Z]" "[a-z]")" - uid="$(echo "$1" | tr "[a-z]" "[A-Z]")" - pick_id_script=' - /^'$lid' /{ - s/'\''/'\''\\'\'\''/g - h - s/^'$lid' \([^<]*\) <[^>]*> .*$/\1/ - s/'\''/'\''\'\'\''/g - s/.*/GIT_'$uid'_NAME='\''&'\''; export GIT_'$uid'_NAME/p - - g - s/^'$lid' [^<]* <\([^>]*\)> .*$/\1/ - s/'\''/'\''\'\'\''/g - s/.*/GIT_'$uid'_EMAIL='\''&'\''; export GIT_'$uid'_EMAIL/p - - g - s/^'$lid' [^<]* <[^>]*> \(.*\)$/@\1/ - s/'\''/'\''\'\'\''/g - s/.*/GIT_'$uid'_DATE='\''&'\''; export GIT_'$uid'_DATE/p - - q - } - ' - - LANG=C LC_ALL=C sed -ne "$pick_id_script" - # Ensure non-empty id name. - echo "case \"\$GIT_${uid}_NAME\" in \"\") GIT_${uid}_NAME=\"\${GIT_${uid}_EMAIL%%@*}\" && export GIT_${uid}_NAME;; esac" + parse_ident_from_commit author AUTHOR committer COMMITTER + finish_ident AUTHOR + finish_ident COMMITTER } USAGE="[--env-filter <command>] [--tree-filter <command>] @@ -320,10 +302,8 @@ while read commit parents; do git cat-file commit "$commit" >../commit || die "Cannot read commit $commit" - eval "$(set_ident AUTHOR <../commit)" || - die "setting author failed for commit $commit" - eval "$(set_ident COMMITTER <../commit)" || - die "setting committer failed for commit $commit" + eval "$(set_ident <../commit)" || + die "setting author/committer failed for commit $commit" eval "$filter_env" < /dev/null || die "env filter failed: $filter_env" diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh index 4d13e10..1e7a209 100755 --- a/t/t7003-filter-branch.sh +++ b/t/t7003-filter-branch.sh @@ -167,10 +167,11 @@ test_expect_success 'author information is preserved' ' test_tick && GIT_AUTHOR_NAME="B V Uips" git commit -m bvuips && git branch preserved-author && - git filter-branch -f --msg-filter "cat; \ + (sane_unset GIT_AUTHOR_NAME && + git filter-branch -f --msg-filter "cat; \ test \$GIT_COMMIT != $(git rev-parse master) || \ echo Hallo" \ - preserved-author && + preserved-author) && test 1 = $(git rev-list --author="B V Uips" preserved-author | wc -l) ' -- 1.8.0.rc3.3.gba630e1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
end of thread, other threads:[~2012-11-12 20:32 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-10-17 6:47 What can cause empty GIT_AUTHOR_NAME for 'git filter-branch --tree-filter' on Solaris? Ilya Basin 2012-10-17 7:18 ` Jeff King 2012-10-17 7:23 ` Johannes Sixt 2012-10-17 8:58 ` Re[2]: " Ilya Basin 2012-10-17 10:36 ` Re[3]: " Ilya Basin 2012-10-17 22:13 ` Jeff King 2012-10-17 22:09 ` Jeff King 2012-10-18 5:31 ` Johannes Sixt 2012-10-18 5:36 ` Jeff King 2012-10-18 6:06 ` Junio C Hamano 2012-10-18 6:08 ` Jeff King 2012-10-18 7:22 ` [PATCH 0/2] clean up filter-branch ident parsing Jeff King 2012-10-18 7:25 ` [PATCH 1/2] git-sh-setup: refactor ident-parsing functions Jeff King 2012-11-12 17:44 ` Junio C Hamano 2012-11-12 19:44 ` Jeff King 2012-11-12 20:08 ` Junio C Hamano 2012-11-12 20:12 ` Jeff King 2012-11-12 20:32 ` Junio C Hamano 2012-10-18 7:25 ` [PATCH 2/2] filter-branch: use git-sh-setup's ident parsing functions Jeff King 2012-10-18 7:49 ` Johannes Sixt 2012-10-18 7:54 ` Jeff King 2012-10-18 10:22 ` Jeff King 2012-10-18 10:26 ` Jeff King 2012-10-18 10:33 ` [PATCHv2 " Jeff King
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).