git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] Maintaince script for l10n files and commits
@ 2012-03-07 18:47 Jiang Xin
  2012-03-07 19:17 ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Jiang Xin @ 2012-03-07 18:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, avarab, Jiang Xin

Usage of this script:

 * rake commits      : Check commit logs written with non-ascii chars,
                       but without the correct encoding settings.
                       Always report Non-ascii in subject line as error.

 * rake pot          : Print the summary of the update of git.pot file

 * rake XX.po        : Create or update XX.po from the git.po tempolate file

 * rake check[XX.po] : Syntax check on XX.po

Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
---
 po/Rakefile |  157 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 157 insertions(+)
 create mode 100644 po/Rakefile

diff --git a/po/Rakefile b/po/Rakefile
new file mode 100644
index 00000..e581b
--- /dev/null
+++ b/po/Rakefile
@@ -0,0 +1,157 @@
+require 'tempfile'
+
+POTFILE="git.pot"
+
+class NonAsciiInSubjectError < Exception
+end
+
+class BadEncodingError < Exception
+end
+
+def shellout(cmd)
+    pipe = IO.popen(cmd)
+    pipe.readlines
+end
+
+desc "Syntax check on XX.po, or all .po files if nothing provided."
+task :check, :po_file do |t, args|
+    if args[:po_file]
+        if File.exists? args[:po_file]
+            system("msgfmt -o /dev/null --check --statistics #{args[:po_file]}")
+        else
+            $stderr.puts "File #{args[:po_file]} does not exist."
+        end
+    else
+        FileList["*.po"].each do |po_file|
+            puts "=" * 72
+            puts "Check #{po_file}..."
+            system("msgfmt -o /dev/null --check --statistics #{po_file}")
+        end
+    end
+end
+
+desc "Show summary of updates of git.pot"
+task :pot do
+    status = shellout("git status --porcelain -- #{POTFILE}")
+    new = []
+    dropped = []
+    tmpfile = Tempfile.new('git.pot')
+    if status.empty?
+        puts "Nothing changed."
+    else
+        ENV["LANGUAGE"] = "C"
+        system("git show HEAD:./git.pot > #{tmpfile.path}")
+        msgcmp = shellout("msgcmp -N --use-untranslated #{tmpfile.path} #{POTFILE} 2>&1")
+        msgcmp.each do |line|
+            if m = /^.*:([0-9]+): this message is used but not defined in/.match(line)
+                new << m[1]
+            elsif m = /^.*:([0-9]+): warning: this message is not used/.match(line)
+                dropped << m[1]
+            end
+        end
+        puts "Update of #{POTFILE}:"
+        puts
+        if not new.empty?
+            puts " * Add #{new.count} new l10n string#{new.count>1 ? "s":""}" +
+                 " in the new generated \"git.pot\" file at" +
+                 " line#{new.count>1? "s":""}:"
+            puts "   " + new.join(", ")
+            puts
+        end
+        if not dropped.empty?
+            puts " * Remove #{dropped.count} l10n string#{dropped.count>1 ?
+                 "s":""} from the old \"git.pot\" file at line" +
+                 "#{dropped.count>1 ? "s":""}:"
+            puts "   " + dropped.join(", ")
+        end
+    end
+end
+
+# raise Exception if commit has bad encoding setting
+def verify_commit_encoding(commit, log)
+    subject = 0
+    non_ascii = nil
+    encoding = nil
+    log.each do |line|
+        if line.chomp!.empty?
+            # next line would be the commit log subject line,
+            # if no previous empty line found.
+            subject += 1
+            next
+        end
+        if subject == 0 and line =~ /^encoding /
+            encoding = line.chomp.sub(/^encoding /, '')
+        end
+        # non-ascii found in commit log
+        if match = /([^[:alnum:][:punct:][:space:]]+)/.match(line)
+            non_ascii = "#{line} << #{match[1][0..9]}"
+            # subject must be written in english
+            raise NonAsciiInSubjectError.new(non_ascii) if subject == 1
+        end
+        # subject has only one line
+        subject += 1 if subject == 1
+        # break if there are non-asciis and has already checked subject line
+        break if non_ascii && subject > 0
+    end
+
+    return if not non_ascii
+
+    encoding = 'UTF-8' if not encoding
+    cmd = "python -c \"s='''#{
+              log.collect!{
+                |x| x.chomp.gsub(/['"]/, "")
+              }.join(' - ')}'''; s.decode('#{encoding}')\" 2>/dev/null"
+    raise BadEncodingError.new(non_ascii) if not system(cmd)
+end
+
+desc "Check commits for bad encoding settings."
+task :commits, :from, :to do |t, args|
+    from = args[:from] || 'origin/master'
+    to = args[:to] || 'HEAD'
+    commits = shellout("git rev-list #{from}..#{to}")
+    commits.each do |c|
+        c.chomp!
+        log = shellout("git cat-file commit #{c}")
+        begin
+            verify_commit_encoding(c, log)
+        rescue BadEncodingError => e
+            $stderr.puts "=" * 78
+            $stderr.puts "Error: Bad encoding setting found in commit #{c[0,7]}:"
+            $stderr.puts "       >> #{e.message}"
+            $stderr.puts
+            log.each {|line| puts "\t" + line.chomp}
+        rescue NonAsciiInSubjectError => e
+            $stderr.puts "=" * 78
+            $stderr.puts "Error: Non-AscII found in subject in commit #{c[0,7]}:"
+            $stderr.puts "       >> #{e.message}"
+            $stderr.puts
+            log.each {|line| puts "\t" + line.chomp}
+        end
+    end
+end
+
+
+desc "Create or update XX.po file from git.pot"
+task "XX.po" do
+    $stderr.puts "Use your real locale file, such as zh_CN.po"
+end
+
+# Update XX.po even if timestamp of XX.po is newer
+FileList["*.po"].each do |t|
+    task t => POTFILE
+end
+
+rule '.po' => POTFILE do |t|
+    if File.exist?(t.name)
+        system("msgmerge --add-location --backup=off -U #{t.name} #{t.source}")
+    else
+        system("msginit -i #{t.source} --locale=#{t.name.sub(/.po$/, '')}")
+    end
+    mofile="build/locale/#{t.name.sub(/.po$/, '')}/LC_MESSAGES/git.mo"
+    FileUtils.mkdir_p File.dirname(mofile)
+    system("msgfmt -o #{mofile} --check --statistics #{t.name}")
+end
+
+task :default do
+    system("rake -T")
+end
-- 
1.7.9.2.330.gaa956.dirty

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Maintaince script for l10n files and commits
  2012-03-07 18:47 [PATCH] Maintaince script for l10n files and commits Jiang Xin
@ 2012-03-07 19:17 ` Junio C Hamano
  2012-03-08 16:05   ` Jiang Xin
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2012-03-07 19:17 UTC (permalink / raw)
  To: Jiang Xin; +Cc: Git List, avarab

Jiang Xin <worldhello.net@gmail.com> writes:

> Usage of this script:
>
>  * rake commits      : Check commit logs written with non-ascii chars,
>                        but without the correct encoding settings.
>                        Always report Non-ascii in subject line as error.
>
>  * rake pot          : Print the summary of the update of git.pot file
>
>  * rake XX.po        : Create or update XX.po from the git.po tempolate file
>
>  * rake check[XX.po] : Syntax check on XX.po

I would relly prefer not to add another language dependency to the
system.  Are you doing anything that cannot be done with what we
already use, e.g. make and shell?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] Maintaince script for l10n files and commits
  2012-03-07 19:17 ` Junio C Hamano
@ 2012-03-08 16:05   ` Jiang Xin
  2012-03-08 20:41     ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Jiang Xin @ 2012-03-08 16:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, avarab, Jiang Xin

Usage of this script:

 * po-helper.sh XX.po   : Init or update XX.po from git.pot

 * po-helper.sh check [XX.po]
                        : Perform all the checks for XX.po

 * po-helper.sh commits [since] [til]
                        : Check non-ascii chars in commit logs

                          Don't write commit log with non-ascii chars
                          without proper encoding settings.

                          Subject of commit log must written in English.

                          Don't change files outside this directory (po/)

 * po-helper.sh pot     : Display summary of updates of git.pot file

Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
---
 po/po-helper.sh |  271 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 271 insertions(+)
 create mode 100755 po/po-helper.sh

diff --git a/po/po-helper.sh b/po/po-helper.sh
new file mode 100755
index 0000000..dd370a5
--- /dev/null
+++ b/po/po-helper.sh
@@ -0,0 +1,271 @@
+#!/bin/bash
+#
+# Copyright (c) 2012 Jiang Xin
+
+POTFILE=git.pot
+
+usage()
+{
+    cat <<-END_OF_USAGE
+Maintaince script for l10n files and commits.
+
+Usage:
+
+ * po-helper.sh XX.po   : Init or update XX.po from git.pot
+
+ * po-helper.sh check [XX.po]
+                        : Perform all the checks for XX.po
+
+ * po-helper.sh commits [since] [til]
+                        : Check non-ascii chars in commit logs
+
+                          Don't write commit log with non-ascii chars
+                          without proper encoding settings.
+
+                          Subject of commit log must written in English.
+
+                          Don't change files outside this directory (po/)
+
+ * po-helper.sh pot     : Display summary of updates of git.pot file
+
+END_OF_USAGE
+
+    exit 1
+}
+
+# Display summary of updates of git.pot
+show_pot_update_summary()
+{
+    pnew="^.*:([0-9]+): this message is used but not defined in"
+    pdel="^.*:([0-9]+): warning: this message is not used"
+    new_count=0
+    del_count=0
+    new_lineno=""
+    del_lineno=""
+
+    status=$(git status --porcelain -- $POTFILE)
+    if [ -z "$status" ]; then
+        echo "Nothing changed."
+    else
+        tmpfile=$(mktemp /tmp/git.po.XXXX)
+        LANGUAGE=C git show HEAD:./git.pot > $tmpfile
+        LANGUAGE=C msgcmp -N --use-untranslated $tmpfile $POTFILE 2>&1 |
+        {    while read line; do
+                if [[ $line =~ $pnew ]]; then
+                    new_count=$(( new_count + 1 ))
+                    if [ -z "$new_lineno" ]; then
+                        new_lineno="${BASH_REMATCH[1]}"
+                    else
+                        new_lineno="${new_lineno}, ${BASH_REMATCH[1]}"
+                    fi
+                fi
+                if [[ $line =~ $pdel ]]; then
+                    del_count=$(( del_count + 1 ))
+                    if [ -z "$del_lineno" ]; then
+                        del_lineno="${BASH_REMATCH[1]}"
+                    else
+                        del_lineno="${del_lineno}, ${BASH_REMATCH[1]}"
+                    fi
+                fi
+            done
+            [ $new_count -gt 1 ] && new_plur="s" || new_plur=""
+            [ $del_count -gt 1 ] && del_plur="s" || del_plur=""
+            echo "Updates of $POTFILE since last update:"
+            echo
+            echo " * Add ${new_count} new l10n message${new_plur}" \
+                 "in the new generated \"git.pot\" file at" \
+                 "line${new_plur}:"
+            echo "   ${new_lineno}"
+            echo
+
+            echo " * Remove ${del_count} l10n message${del_plur}" \
+                 "from the old \"git.pot\" file at line${del_plur}:"
+            echo "   ${del_lineno}"
+        }
+        rm $tmpfile
+    fi
+}
+
+# Syntax check on XX.po, or all .po files if nothing provided
+check_po()
+{
+    if [ $# -eq 0 ]; then
+        ls *.po | while read f; do
+            echo "============================================================"
+            echo "Check $f..."
+            check_po $f
+        done
+    fi
+    while [ $# -gt 0 ]; do
+        po=$1
+        shift
+        if [ -f $po ]; then
+            msgfmt -o /dev/null --check --statistics $po
+        else
+            echo "Error: File $po does not exist." >&2
+        fi
+    done
+}
+
+# Create or update XX.po file from git.pot
+create_or_update_po()
+{
+    if [ $# -eq 0 ]; then
+        usage
+    fi
+    while [ $# -gt 0 ]; do
+        po=$1
+        shift
+        if [ -f $po ]; then
+            msgmerge --add-location --backup=off -U $po $POTFILE
+        else
+            msginit -i $POTFILE --locale=${po%.po}
+            perl -pi -e 's/(?<="Project-Id-Version: )PACKAGE VERSION/Git/' $po
+        fi
+        mo="build/locale/${po%.po}/LC_MESSAGES/git.mo"
+        mkdir -p $(dirname $mo)
+        msgfmt -o $mo --check --statistics $po
+    done
+}
+
+
+verify_commit_encoding()
+{
+    c=$1
+    subject=0
+    non_ascii=""
+    encoding=""
+    log=""
+
+    git cat-file commit $c |
+    {
+        while read line; do
+            log="$log - $line"
+            # next line would be the commit log subject line,
+            # if no previous empty line found.
+            if [ -z "$line" ]; then
+                subject=$((subject + 1))
+                continue
+            fi
+            pencoding="^encoding (.+)"
+            if [ $subject -eq 0 ] && [[ $line =~ $pencoding ]]; then
+                encoding="${BASH_REMATCH[1]}"
+            fi
+            # non-ascii found in commit log
+            pnonascii="([^[:alnum:][:space:][:punct:]]+)"
+            if [[ $line =~ $pnonascii ]]; then
+                non_ascii="${BASH_REMATCH[1]} >> $line <<"
+                if [ $subject -eq 1 ]; then
+                    report_nonascii_in_subject $c "$non_ascii"
+                    return
+                fi
+            fi
+            # subject has only one line
+            [ $subject -eq 1 ] && subject=$((subject += 1))
+            # break if there are non-asciis and has already checked subject line
+            if [ -n "$non_ascii" ] && [ $subject -gt 0 ]; then
+                break
+            fi
+        done
+        if [ -n "$non_ascii" ]; then
+            if [ -z "$encoding" ]; then
+                echo $line | iconv -f UTF-8 -t UTF-8 -s >/dev/null ||
+                report_bad_encoding "$c" "$non_ascii"
+            else
+                echo $line | iconv -f $encoding -t UTF-8 -s >/dev/null ||
+                report_bad_encoding "$c" "$non_ascii" "$encoding"
+            fi
+        fi
+    }
+}
+
+report_nonascii_in_subject()
+{
+    c=$1
+    non_ascii=$2
+
+    echo "============================================================" >&2
+    echo "Error: Non-ASCII in subject of commit $c:"                    >&2
+    echo "       ${non_ascii}"                                          >&2
+    echo
+    git cat-file commit $c | head -15 | while read line; do
+        echo "\t$line"                                                  >&2
+    done
+}
+
+report_bad_encoding()
+{
+    c=$1
+    non_ascii=$2
+    encoding=$3
+
+    echo "============================================================" >&2
+    if [ -z "$encoding" ]; then
+        echo "Error: Not have encoding setting for commit $c:"          >&2
+    else
+        echo "Error: Wrong encoding ($encoding) for commit $c:"         >&2
+    fi
+    echo "       ${non_ascii}"                                          >&2
+    echo                                                                >&2
+    git cat-file commit $c | head -15 | while read line; do
+        echo "\t$line"                                                  >&2
+    done
+}
+
+# Check commit logs for bad encoding settings
+check_commits()
+{
+    if [ $# -gt 2 ]; then
+        usage
+    fi
+    since=${1:-origin/master}
+    til=${2:-HEAD}
+
+    git diff-tree -r $since $til | awk '{print $6}' |
+    while read f; do
+        if [[ ! $f =~ ^po/ ]]; then
+            echo "Error: changed files ($f...) outside po directory!"   >&2
+            echo "       reference: git diff-tree -r $since $til"       >&2
+        fi
+    done
+
+    count=0
+    git rev-list ${since}..${til} |
+    {   while read c; do
+            verify_commit_encoding $c
+            count=$(( count + 1 ))
+        done
+        echo
+        echo "$count commits check complete."
+    }
+}
+
+
+[ $# -eq 0 ] && usage
+
+while [ $# -ne 0 ]; do
+    case "$1" in
+    pot|git.pot)
+        show_pot_update_summary
+        ;;
+    *.po)
+        create_or_update_po $1
+        ;;
+    check)
+        shift
+        check_po $*
+        exit 0
+        ;;
+    commit|commits)
+        shift
+        check_commits $*
+        exit 0
+        ;;
+    *)
+        usage
+        ;;
+    esac
+    shift
+done
+
+# vim: et ts=4 sw=4
-- 
1.7.9.2.330.gaa956.dirty

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] Maintaince script for l10n files and commits
  2012-03-08 16:05   ` Jiang Xin
@ 2012-03-08 20:41     ` Junio C Hamano
  2012-03-09  0:57       ` Jiang Xin
                         ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Junio C Hamano @ 2012-03-08 20:41 UTC (permalink / raw)
  To: Jiang Xin; +Cc: Git List, avarab

Jiang Xin <worldhello.net@gmail.com> writes:

> Usage of this script:
>
>  * po-helper.sh XX.po   : Init or update XX.po from git.pot
>
>  * po-helper.sh check [XX.po]
>                         : Perform all the checks for XX.po
>
>  * po-helper.sh commits [since] [til]
>                         : Check non-ascii chars in commit logs
>
>                           Don't write commit log with non-ascii chars
>                           without proper encoding settings.
>
>                           Subject of commit log must written in English.
>
>                           Don't change files outside this directory (po/)
>
>  * po-helper.sh pot     : Display summary of updates of git.pot file

That's quite a style deviation from out norm in the commit log
messages, don't you think (see "git log --no-merges -100", for
example)?  State the problem you are attempting to solve first, and
explain the solution to the problem, in separate paragraphs for
readability, perhaps like this:

	There are routine tasks translators need to perform that can
	be automated.

	Help them to

         (1) initialize or update the message files;
         (2) check errors in the message files they edited;
         (3) check errors in their commits; and
         (4) review recent updates to the message template file
             they base their translations on.

        by adding a helper script.

> Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
> ---
>  po/po-helper.sh |  271 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 271 insertions(+)
>  create mode 100755 po/po-helper.sh
>
> diff --git a/po/po-helper.sh b/po/po-helper.sh
> new file mode 100755
> index 0000000..dd370a5
> --- /dev/null
> +++ b/po/po-helper.sh
> @@ -0,0 +1,271 @@
> +#!/bin/bash

Is there any bash-ism in this script?  Otherwise please start this
with "#!/bin/sh" to allow people who do not use bash to get involved
in the project.

> +#
> +# Copyright (c) 2012 Jiang Xin
> +
> +POTFILE=git.pot
> +
> +usage()
> +{

Style:

	usage () {

In general, it is safe to mimic the style "git-am.sh" is written in,
although some crufts seem to have slipped in with recent updates.

> +    cat <<-END_OF_USAGE

Style: unless you have substitutions ($var etc.) inside the here
text, please quote the end token to make it clear that readers do
not have to scan and look for what is substituted, i.e.

	cat <<-\END_OF_USAGE

> +Maintaince script for l10n files and commits.
> +
> +Usage:
> +
> + * po-helper.sh XX.po   : Init or update XX.po from git.pot

Will we later regret that we didn't give a command word for this
one?  Two common sources of such risks are:

 (1) it turns out XX.po matches the pattern we would want to use as
     a command; and

 (2) it turns out "init/update" is not the most often used action.

I do not think (1) is likely. I do not think anybody can decide
about (2) at this point yet.

> + * po-helper.sh check [XX.po]
> +                        : Perform all the checks for XX.po
> +
> + * po-helper.sh commits [since] [til]
> +                        : Check non-ascii chars in commit logs
> +
> +                          Don't write commit log with non-ascii chars
> +                          without proper encoding settings.
> +
> +                          Subject of commit log must written in English.
> +
> +                          Don't change files outside this directory (po/)
> +
> + * po-helper.sh pot     : Display summary of updates of git.pot file
> +
> +END_OF_USAGE

Do you really want the blank line output at the end?

> +
> +    exit 1
> +}
> +
> +# Display summary of updates of git.pot
> +show_pot_update_summary()
> +{
> +    pnew="^.*:([0-9]+): this message is used but not defined in"
> +    pdel="^.*:([0-9]+): warning: this message is not used"
> +    new_count=0
> +    del_count=0
> +    new_lineno=""
> +    del_lineno=""
> +
> +    status=$(git status --porcelain -- $POTFILE)
> +    if [ -z "$status" ]; then
> +        echo "Nothing changed."
> +    else
> +        tmpfile=$(mktemp /tmp/git.po.XXXX)
> +        LANGUAGE=C git show HEAD:./git.pot > $tmpfile
> +        LANGUAGE=C msgcmp -N --use-untranslated $tmpfile $POTFILE 2>&1 |
> +        {    while read line; do
> +                if [[ $line =~ $pnew ]]; then

That sounds like a blatant bash-ism to me.

> +...
> +}
> +
> +# Syntax check on XX.po, or all .po files if nothing provided
> +check_po()
> +{
> +    if [ $# -eq 0 ]; then
> +        ls *.po | while read f; do
> +            echo "============================================================"

Style:

	if test $# = 0
        then
		ls *.po |
		while read f
		do
                	...

Also indentation is done with HT, not runs of SP.

> +# Create or update XX.po file from git.pot
> +create_or_update_po()
> +{
> +    if [ $# -eq 0 ]; then
> +        usage
> +    fi
> +    while [ $# -gt 0 ]; do
> +        po=$1
> +        shift
> +        if [ -f $po ]; then
> +            msgmerge --add-location --backup=off -U $po $POTFILE
> +        else
> +            msginit -i $POTFILE --locale=${po%.po}
> +            perl -pi -e 's/(?<="Project-Id-Version: )PACKAGE VERSION/Git/' $po

Ah ;-) show_pot_update_summary() can be written so that this script
does not have to rely on bash-ism at all, as you are going to use
Perl anyway.

> +verify_commit_encoding()
> +{
> ...
> +}

I am not sure if the protection these checks offer is worth the
complexity of the script, but it is primarily to reduce back and
forth between the l10n coordinator and the language teams, so I
won't complain.

> +# vim: et ts=4 sw=4

I prefer not to see scripts forcing author's personal preference.
Especially, wouldn't ts=4 make it hard to avoid indenting with runs
of spaces by mistake?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Maintaince script for l10n files and commits
  2012-03-08 20:41     ` Junio C Hamano
@ 2012-03-09  0:57       ` Jiang Xin
  2012-03-09  6:08       ` [PATCH v2] " Jiang Xin
  2012-03-10  9:17       ` [PATCH v3] " Jiang Xin
  2 siblings, 0 replies; 10+ messages in thread
From: Jiang Xin @ 2012-03-09  0:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, avarab

2012/3/9 Junio C Hamano <gitster@pobox.com>:
> That's quite a style deviation from out norm in the commit log
> messages, don't you think (see "git log --no-merges -100", for
> example)?  State the problem you are attempting to solve first, and
> explain the solution to the problem, in separate paragraphs for
> readability, perhaps like this:
>
>        There are routine tasks translators need to perform that can
>        be automated.
>
>        Help them to
>
>         (1) initialize or update the message files;
>         (2) check errors in the message files they edited;
>         (3) check errors in their commits; and
>         (4) review recent updates to the message template file
>             they base their translations on.
>
>        by adding a helper script.

Thank you for provide better commit log, and I learn
a lot from it. Writing in English still a big issue for me,
and also obstacles to many l10n contributors. So your
decision that l10n contributors can write commit logs
in native language is very helpful, yet it has potential
risk about wrong character encodings in commit log.
So I need to write a helper for l10n team leaders,
especially for myself, to detect bad commit log,
because nobody knows all languages and encodings.
Hacks outside of "po/" directory should be checked
regularly also.


>> @@ -0,0 +1,271 @@
>> +#!/bin/bash
>
> Is there any bash-ism in this script?  Otherwise please start this
> with "#!/bin/sh" to allow people who do not use bash to get involved
> in the project.

There are several regex match expressions written in bash style,
which is not dash compatible. I will try to use grep and sed instead.

> Will we later regret that we didn't give a command word for this
> one?  Two common sources of such risks are:
>
>  (1) it turns out XX.po matches the pattern we would want to use as
>     a command; and
>
>  (2) it turns out "init/update" is not the most often used action.
>
> I do not think (1) is likely. I do not think anybody can decide
> about (2) at this point yet.

The style of arguments comes from previous Rakefile implementation.
I will change XX.po as a alias subcommand of init/update.

-- 
Jiang xin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] Maintaince script for l10n files and commits
  2012-03-08 20:41     ` Junio C Hamano
  2012-03-09  0:57       ` Jiang Xin
@ 2012-03-09  6:08       ` Jiang Xin
  2012-03-09  6:20         ` David Aguilar
  2012-03-10  0:40         ` Junio C Hamano
  2012-03-10  9:17       ` [PATCH v3] " Jiang Xin
  2 siblings, 2 replies; 10+ messages in thread
From: Jiang Xin @ 2012-03-09  6:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, avarab, Jiang Xin

There are routine tasks translators need to perform that can be automated.
Add a helper script which can help:

 - Initialize or update the message files;

 - Check errors in the message files they edited;

 - Check errors in their commits; and

 - Review recent updates to the message template file
   they base their translations on.

Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
---
 po/po-helper.sh |  387 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 387 insertions(+)
 create mode 100755 po/po-helper.sh

diff --git a/po/po-helper.sh b/po/po-helper.sh
new file mode 100755
index 0000000..166ebb7
--- /dev/null
+++ b/po/po-helper.sh
@@ -0,0 +1,387 @@
+#!/bin/sh
+#
+# Copyright (c) 2012 Jiang Xin
+
+GIT=git
+PODIR=$($GIT rev-parse --show-cdup)po
+POTFILE=$PODIR/git.pot
+
+usage() {
+	cat <<-\END_OF_USAGE
+Maintaince script for l10n files and commits.
+
+Usage:
+
+ * po-helper.sh init XX.po
+       Create the initial XX.po file in the po/ directory, where
+       XX is the locale, e.g. "de", "is", "pt_BR", "zh_CN", etc.
+
+ * po-helper.sh update XX.po
+       Update XX.po file(s) to the new git.pot template
+
+ * po-helper.sh check XX.po
+       Perform syntax check on XX.po file(s)
+
+ * po-helper.sh check pot
+       Display summary of updates/modifications of git.pot template
+
+ * po-helper.sh check commits [since] [til]
+       Check non-ascii chars in commit logs
+
+       - don't write commit log with non-ascii chars without proper
+         encoding settings;
+
+       - subject of commit log must written in English; and
+
+       - don't change files outside this directory (po/)
+END_OF_USAGE
+
+	if test $# -gt 0
+	then
+		echo
+		echo "Error: $*"
+		exit 1
+	else
+		exit 0
+	fi
+}
+
+# Init or update XX.po file from git.pot
+update_po() {
+	if test $# -eq 0
+	then
+		usage "init/update needs at least one argument"
+	fi
+	while test $# -gt 0
+	do
+		locale=$(basename $1)
+		locale=${locale%.po}
+		po=$PODIR/$locale.po
+		mo=$PODIR/build/locale/$locale/LC_MESSAGES/git.mo
+		if test -f $po
+		then
+			msgmerge --add-location --backup=off -U $po $POTFILE
+		else
+			msginit -i $POTFILE --locale=$locale -o $po
+			perl -pi -e 's/(?<="Project-Id-Version: )PACKAGE VERSION/Git/' $po
+		fi
+		mkdir -p $(dirname $mo)
+		msgfmt -o $mo --check --statistics $po
+		shift
+	done
+}
+
+notes_for_l10n_team_leader() {
+	locale=$(basename $1)
+	locale=${locale%.po}
+
+	cat <<-END_OF_NOTES
+	============================================================
+	Notes for l10n team leader:
+
+	    Since you create a initialial locale file, you are
+	    likely to be the leader of the $locale l10n team.
+
+	    You can add your team infomation in the "po/TEAMS"
+	    file, and update it when necessary.
+
+	    Please read the file "po/README" first to understand
+	    the workflow of Git l10n maintenance.
+	============================================================
+	END_OF_NOTES
+}
+
+# Check po, pot, commits
+check() {
+	if test $# -eq 0
+	then
+		ls $PODIR/*.po |
+		while read f
+		do
+			echo "============================================================"
+			echo "Check $(basename $f)..."
+			check $f
+		done
+
+		echo "============================================================"
+		echo "Check updates of git.pot..."
+		check pot
+
+		echo "============================================================"
+		echo "Check commits..."
+		check commits
+	fi
+	while test $# -gt 0
+	do
+		case "$1" in
+		*.po)
+			check_po $1
+			;;
+		*pot)
+			check_pot
+			;;
+		commit|commits)
+			shift
+			check_commits $@
+			break
+			;;
+		*)
+			usage "Unkown task \"$1\" for check"
+			;;
+		esac
+		shift
+	done
+}
+
+# Syntax check on XX.po, or all .po files if nothing provided
+check_po() {
+	while test $# -gt 0
+	do
+		locale=$(basename $1)
+		locale=${locale%.po}
+		po=$PODIR/$locale.po
+		mo=$PODIR/build/locale/$locale/LC_MESSAGES/git.mo
+		if test -f $po
+		then
+			mkdir -p $(dirname $mo)
+			msgfmt -o $mo --check --statistics $po
+		else
+			echo "Error: File $po does not exist." >&2
+		fi
+		shift
+	done
+}
+
+# Display summary of updates of git.pot
+check_pot() {
+	pnew="^.*:\([0-9]*\): this message is used but not defined in.*"
+	pdel="^.*:\([0-9]*\): warning: this message is not used.*"
+	new_count=0
+	del_count=0
+	new_lineno=""
+	del_lineno=""
+
+	status=$(cd $PODIR; $GIT status --porcelain -- $(basename $POTFILE))
+	if test -z "$status"
+	then
+		echo "Nothing changed."
+	else
+		tmpfile=$(mktemp /tmp/git.po.XXXX)
+		(cd $PODIR; LANGUAGE=C $GIT show HEAD:./$(basename $POTFILE) > $tmpfile )
+		LANGUAGE=C msgcmp -N --use-untranslated $tmpfile $POTFILE 2>&1 | {
+			while read line
+			do
+				m=$(echo $line | grep "$pnew" | sed -e "s/$pnew/\1/")
+				if test -n "$m"
+				then
+					new_count=$(( new_count + 1 ))
+					if test -z "$new_lineno"
+					then
+						new_lineno="$m"
+					else
+						new_lineno="${new_lineno}, $m"
+					fi
+					continue
+				fi
+
+				m=$(echo $line | grep "$pdel" | sed -e "s/$pdel/\1/")
+				if test -n "$m"
+				then
+					del_count=$(( del_count + 1 ))
+					if test -z "$del_lineno"
+					then
+						del_lineno="$m"
+					else
+						del_lineno="${del_lineno}, $m"
+					fi
+				fi
+			done
+			test $new_count -gt 1 && new_plur="s" || new_plur=""
+			test $del_count -gt 1 && del_plur="s" || del_plur=""
+			echo "Updates of po/git.pot since last update:"
+			echo
+			echo " * Add ${new_count} new l10n message${new_plur}" \
+				 "in the new generated \"git.pot\" file at" \
+				 "line${new_plur}:"
+			echo "   ${new_lineno}"
+			echo
+
+			echo " * Remove ${del_count} l10n message${del_plur}" \
+				 "from the old \"git.pot\" file at line${del_plur}:"
+			echo "   ${del_lineno}"
+		}
+		rm $tmpfile
+	fi
+}
+
+verify_commit_encoding() {
+	c=$1
+	subject=0
+	non_ascii=""
+	encoding=""
+	log=""
+
+	$GIT cat-file commit $c | {
+		while read line
+		do
+			log="$log - $line"
+			# next line would be the commit log subject line,
+			# if no previous empty line found.
+			if test -z "$line"
+			then
+				subject=$((subject + 1))
+				continue
+			fi
+			if test $subject -eq 0
+			then
+				if echo $line | grep -q "^encoding "
+				then
+					encoding=${line#encoding }
+				fi
+			fi
+			# non-ascii found in commit log
+			m=$(echo $line | sed -e "s/\([[:alnum:][:space:][:punct:]]\)//g")
+			if test -n "$m"
+			then
+				non_ascii="$m >> $line <<"
+				if test $subject -eq 1
+				then
+					report_nonascii_in_subject $c "$non_ascii"
+					return
+				fi
+			fi
+			# subject has only one line
+			test $subject -eq 1 && subject=$((subject += 1))
+			# break if there are non-asciis and has already checked subject line
+			if test -n "$non_ascii" && test $subject -gt 0
+			then
+				break
+			fi
+		done
+		if test -n "$non_ascii"
+		then
+			if test -z "$encoding"
+			then
+				echo $line | iconv -f UTF-8 -t UTF-8 -s >/dev/null ||
+					report_bad_encoding "$c" "$non_ascii"
+			else
+				echo $line | iconv -f $encoding -t UTF-8 -s >/dev/null ||
+					report_bad_encoding "$c" "$non_ascii" "$encoding"
+			fi
+		fi
+	}
+}
+
+report_nonascii_in_subject() {
+	c=$1
+	non_ascii=$2
+
+	echo "============================================================" >&2
+	echo "Error: Non-ASCII in subject of commit $c:"                    >&2
+	echo "	   ${non_ascii}"                                            >&2
+	echo
+	$GIT cat-file commit $c | head -15 |
+	while read line
+	do
+		echo "\t$line"                                              >&2
+	done
+	echo
+}
+
+report_bad_encoding() {
+	c=$1
+	non_ascii=$2
+	encoding=$3
+
+	echo "============================================================" >&2
+	if test -z "$encoding"
+	then
+		echo "Error: Not have encoding setting for commit $c:"      >&2
+	else
+		echo "Error: Wrong encoding ($encoding) for commit $c:"     >&2
+	fi
+	echo "	   ${non_ascii}"                                            >&2
+	echo                                                                >&2
+	$GIT cat-file commit $c | head -15 |
+	while read line
+	do
+		echo "\t$line"                                              >&2
+	done
+	echo
+}
+
+# Check commit logs for bad encoding settings
+check_commits() {
+	if test $# -gt 2
+	then
+		usage "check commits only needs 2 arguments"
+	fi
+	since=${1:-origin/master}
+	til=${2:-HEAD}
+
+	if $GIT diff-tree -r $since $til | awk '{print $6}' | grep -qv "^po/"
+	then
+		echo "============================================================" >&2
+		echo "Error: changed files outside po directory!"           >&2
+		echo "	   reference: git diff-tree -r $since $til"         >&2
+	fi
+
+	count=0
+	$GIT rev-list ${since}..${til} | {
+		while read c
+		do
+			verify_commit_encoding $c
+			count=$(( count + 1 ))
+		done
+		echo "$count commits checked complete."
+	}
+}
+
+
+test $# -eq 0 && usage
+
+if ! test -f $POTFILE
+then
+	echo "Cannot find git.pot in your workspace. Are you in the workspace of git project?"
+	exit 1
+fi
+
+while test $# -ne 0
+do
+	case "$1" in
+	init)
+		shift
+		update_po $1
+		notes_for_l10n_team_leader $1
+		break
+		;;
+	update)
+		shift
+		update_po $@
+		break
+		;;
+	check)
+		shift
+		check $@
+		break
+		;;
+	*.po)
+		update_po $1
+		;;
+	pot|git.pot)
+		check pot
+		;;
+	commit|commits)
+		shift
+		check commits $@
+		break
+		;;
+	-h|--help)
+		usage
+		;;
+	*)
+		usage "Unknown command"
+		;;
+	esac
+	shift
+done
-- 
1.7.9.2.330.gaa956.dirty

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] Maintaince script for l10n files and commits
  2012-03-09  6:08       ` [PATCH v2] " Jiang Xin
@ 2012-03-09  6:20         ` David Aguilar
  2012-03-09  6:31           ` Jiang Xin
  2012-03-10  0:40         ` Junio C Hamano
  1 sibling, 1 reply; 10+ messages in thread
From: David Aguilar @ 2012-03-09  6:20 UTC (permalink / raw)
  To: Jiang Xin; +Cc: Junio C Hamano, Git List, avarab

On Thu, Mar 8, 2012 at 10:08 PM, Jiang Xin <worldhello.net@gmail.com> wrote:
> diff --git a/po/po-helper.sh b/po/po-helper.sh
> new file mode 100755
> index 0000000..166ebb7
> --- /dev/null
> +++ b/po/po-helper.sh
> @@ -0,0 +1,387 @@
> +#!/bin/sh
> +#
> +# Copyright (c) 2012 Jiang Xin
> +
> +GIT=git

I think it's customary to just write `git` in shell scripts.  It looks
nicer then seeing $GIT everywhere, IMO.  I guess this could be helpful
in the future but I don't see GIT being reassigned anywhere.

Cheers,
-- 
David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] Maintaince script for l10n files and commits
  2012-03-09  6:20         ` David Aguilar
@ 2012-03-09  6:31           ` Jiang Xin
  0 siblings, 0 replies; 10+ messages in thread
From: Jiang Xin @ 2012-03-09  6:31 UTC (permalink / raw)
  To: David Aguilar; +Cc: Junio C Hamano, Git List, avarab

2012/3/9 David Aguilar <davvid@gmail.com>:
>> +
>> +GIT=git
>
> I think it's customary to just write `git` in shell scripts.  It looks
> nicer then seeing $GIT everywhere, IMO.  I guess this could be helpful
> in the future but I don't see GIT being reassigned anywhere.

Just for easy to find the git command called, nothing else.

-- 
Jiang Xin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] Maintaince script for l10n files and commits
  2012-03-09  6:08       ` [PATCH v2] " Jiang Xin
  2012-03-09  6:20         ` David Aguilar
@ 2012-03-10  0:40         ` Junio C Hamano
  1 sibling, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2012-03-10  0:40 UTC (permalink / raw)
  To: Jiang Xin; +Cc: Git List, avarab

Jiang Xin <worldhello.net@gmail.com> writes:

> diff --git a/po/po-helper.sh b/po/po-helper.sh
> new file mode 100755
> index 0000000..166ebb7
> --- /dev/null
> +++ b/po/po-helper.sh
> @@ -0,0 +1,387 @@
> +#!/bin/sh
> +#
> +# Copyright (c) 2012 Jiang Xin
> +
> +GIT=git
> +PODIR=$($GIT rev-parse --show-cdup)po

David already pointed out that $GIT is a bad style.

> +POTFILE=$PODIR/git.pot
> +
> +usage() {

Style:

	usage () {

I won't repeat this anymore in this message but there are others.

> +# Init or update XX.po file from git.pot
> +update_po() {
> +	if test $# -eq 0
> +	then
> +		usage "init/update needs at least one argument"
> +	fi
> +	while test $# -gt 0
> +	do
> +		locale=$(basename $1)
> +		locale=${locale%.po}

This is bad for two reasons. (1) Your $1 that directly comes from
the end user's command line may have $IFS characters; (2) You do not
have to spawn a separate process to run basename.

But if I were writing this loop, I would probably avoided refering
to "$1" and shifting at the end by starting it like so:

	for locale
        do
                locale=${locale##*/}
                locale=${locale%.po}
		...
	done

> +		po=$PODIR/$locale.po
> +		mo=$PODIR/build/locale/$locale/LC_MESSAGES/git.mo
> +		if test -f $po

It is safer to say

	if test -f "$po"

here, even though it is not needed in the current form of this
script. Later somebody may change the definition of PODIR above to
something else that may contain $IFS.

> +		then
> +			msgmerge --add-location --backup=off -U $po $POTFILE

Likewise, both for "$po" and "$POTFILE" (I won't repeat this anymore
in this message but there are others).

> +}
> +
> +notes_for_l10n_team_leader() {
> +	locale=$(basename $1)
> +	locale=${locale%.po}

Likewise. I won't repeat this anymore in this message but there are others.

> +# Check po, pot, commits
> +check() {
> +	if test $# -eq 0
> +	then
> +		ls $PODIR/*.po |
> +		while read f
> +		do
> +			echo "============================================================"
> +			echo "Check $(basename $f)..."
> +			check $f
> +		done
> +
> +		echo "============================================================"
> +		echo "Check updates of git.pot..."
> +		check pot
> +
> +		echo "============================================================"
> +		echo "Check commits..."
> +		check commits
> +	fi
> +	while test $# -gt 0
> +	do
> +		case "$1" in
> +		*.po)
> +			check_po $1
> +			;;
> +		*pot)

This is yucky.

		pot | git.pot)

would have been easier to understand what is going on.  Either we
reached here from "check pot" when this function was called without
argument, or we were fed the git.pot from the command line.

> +			check_pot
> +			;;
> +		commit|commits)
> +			shift
> +			check_commits $@
> +			break
> +			;;

Perhaps you meant to say

	check_commits "$@"

> +		*)
> +			usage "Unkown task \"$1\" for check"
> +			;;

I think we tend to do

	usage "Unknown task '$1' for check"

> +# Syntax check on XX.po, or all .po files if nothing provided
> +check_po() {
> +	while test $# -gt 0
> +	do
> +		locale=$(basename $1)
> +		locale=${locale%.po}
> +		po=$PODIR/$locale.po
> +		mo=$PODIR/build/locale/$locale/LC_MESSAGES/git.mo
> +		if test -f $po
> +		then
> +			mkdir -p $(dirname $mo)
> +			msgfmt -o $mo --check --statistics $po
> +		else
> +			echo "Error: File $po does not exist." >&2

It would make it more obvious that this error message does go to
the standard error stream if you said it like this:

	echo >&2 "Error: File '$po' does not exist."

> +		fi
> +		shift
> +	done
> +}

Again,

	for locale
        do
        	...
	done

would have made the loop easier (less risk forgetting to shift or
to shift too many).

> +# Display summary of updates of git.pot
> +check_pot() {
> +	pnew="^.*:\([0-9]*\): this message is used but not defined in.*"
> +	pdel="^.*:\([0-9]*\): warning: this message is not used.*"
> +	new_count=0
> +	del_count=0
> +	new_lineno=""
> +	del_lineno=""
> +
> +	status=$(cd $PODIR; $GIT status --porcelain -- $(basename $POTFILE))
> +	if test -z "$status"
> +	then
> +		echo "Nothing changed."
> +	else
> +		tmpfile=$(mktemp /tmp/git.po.XXXX)

(optional) perhaps set a trap to remove this here, instead of an
explicit "rm" at the end?

> +		(cd $PODIR; LANGUAGE=C $GIT show HEAD:./$(basename $POTFILE) > $tmpfile )

		... show HEAD:./${POTFILE##*/} >"$tmpfile"

Look for "Redirection operators" in Documentation/CodingGuidelines.

> +		LANGUAGE=C msgcmp -N --use-untranslated $tmpfile $POTFILE 2>&1 | {
> +			while read line
> +			do
> +				m=$(echo $line | grep "$pnew" | sed -e "s/$pnew/\1/")

Make it a habit to suspect that you are using one process too many,
whenever you see grep and sed in the same pipe.

	... | sed -ne "s/$pnew/\1/"

would be sufficient, no?

> +				if test -n "$m"
> +				then
> +					new_count=$(( new_count + 1 ))
> +					if test -z "$new_lineno"
> +					then
> +						new_lineno="$m"
> +					else
> +						new_lineno="${new_lineno}, $m"
> +					fi
> +					continue
> +				fi
> +
> +				m=$(echo $line | grep "$pdel" | sed -e "s/$pdel/\1/")
> +				if test -n "$m"
> +				then
> +					del_count=$(( del_count + 1 ))
> +					if test -z "$del_lineno"
> +					then
> +						del_lineno="$m"
> +					else
> +						del_lineno="${del_lineno}, $m"
> +					fi
> +				fi
> +			done
> +			test $new_count -gt 1 && new_plur="s" || new_plur=""
> +			test $del_count -gt 1 && del_plur="s" || del_plur=""

Isn't plural forms "0 lines, 1 line, 2 lines,..."?  We have this in
our "gettext.h" that says "use singular form only when n is 1", not
"when n is less than 2":

	#define ngettext(s, p, n) ((n == 1) ? (s) : (p))

> +verify_commit_encoding() {
> +	c=$1
> +	subject=0
> +	non_ascii=""
> +	encoding=""
> +	log=""
> +
> +	$GIT cat-file commit $c | {
> +		while read line

At least, you should read with "read -r", if you were to write this
as a shell script.

> +		do
> +			log="$log - $line"
> +			# next line would be the commit log subject line,
> +			# if no previous empty line found.
> +			if test -z "$line"
> +			then
> +				subject=$((subject + 1))

Even though POSIX allows the above, this is preferred:

	subject=$(( $subject + 1 ))

as some shells were reported to barf without $ in front of the
variable names.

> +				continue
> +			fi
> +			if test $subject -eq 0
> +			then
> +				if echo $line | grep -q "^encoding "
> +				then
> +					encoding=${line#encoding }
> +				fi

	case "$line" in
        encoding' '*)
		encoding=${...}
		;;
	esac

> +			fi
> +			# non-ascii found in commit log
> +			m=$(echo $line | sed -e "s/\([[:alnum:][:space:][:punct:]]\)//g")
> +			if test -n "$m"
> +			then
> +				non_ascii="$m >> $line <<"
> +				if test $subject -eq 1
> +				then
> +					report_nonascii_in_subject $c "$non_ascii"
> +					return
> +				fi
> +			fi
> +			# subject has only one line
> +			test $subject -eq 1 && subject=$((subject += 1))

subject=$(( $subject + 1 ))

> +			# break if there are non-asciis and has already checked subject line
> +			if test -n "$non_ascii" && test $subject -gt 0
> +			then
> +				break

Is it OK for the body to have good line followed by a bad line?

> +			fi
> +		done
> +		if test -n "$non_ascii"
> +		then
> +			if test -z "$encoding"
> +			then
> +				echo $line | iconv -f UTF-8 -t UTF-8 -s >/dev/null ||
> +					report_bad_encoding "$c" "$non_ascii"
> +			else
> +				echo $line | iconv -f $encoding -t UTF-8 -s >/dev/null ||
> +					report_bad_encoding "$c" "$non_ascii" "$encoding"
> +			fi
> +		fi
> +	}
> +}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3] Maintaince script for l10n files and commits
  2012-03-08 20:41     ` Junio C Hamano
  2012-03-09  0:57       ` Jiang Xin
  2012-03-09  6:08       ` [PATCH v2] " Jiang Xin
@ 2012-03-10  9:17       ` Jiang Xin
  2 siblings, 0 replies; 10+ messages in thread
From: Jiang Xin @ 2012-03-10  9:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, avarab, Jiang Xin

There are routine tasks translators need to perform that can be automated.
Add a helper script which can help:

 - Initialize or update the message files;

 - Check errors in the message files they edited;

 - Check errors in their commits; and

 - Review recent updates to the message template file
   they base their translations on.

Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
---
 po/po-helper.sh |  411 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 411 insertions(+)
 create mode 100755 po/po-helper.sh

diff --git a/po/po-helper.sh b/po/po-helper.sh
new file mode 100755
index 0000000..2398f52
--- /dev/null
+++ b/po/po-helper.sh
@@ -0,0 +1,411 @@
+#!/bin/sh
+#
+# Copyright (c) 2012 Jiang Xin
+
+PODIR=$(git rev-parse --show-cdup)po
+POTFILE=$PODIR/git.pot
+
+usage () {
+	cat <<-\END_OF_USAGE
+Maintaince script for l10n files and commits.
+
+Usage:
+
+ * po-helper.sh init XX.po
+       Create the initial XX.po file in the po/ directory, where
+       XX is the locale, e.g. "de", "is", "pt_BR", "zh_CN", etc.
+
+ * po-helper.sh update XX.po ...
+       Update XX.po file(s) from the new git.pot template
+
+ * po-helper.sh check XX.po ...
+       Perform syntax check on XX.po file(s)
+
+ * po-helper.sh check commits [ <since> <til> ]
+       Check proper encoding of non-ascii chars in commit logs
+
+       - don't write commit log with non-ascii chars without proper
+         encoding settings;
+
+       - subject of commit log must written in English; and
+
+       - don't change files outside this directory (po/)
+
+ * po-helper.sh diff [ <old> <new> ]
+       Show difference between old and new po/pot files.
+       Default show changes of git.pot since last update.
+END_OF_USAGE
+
+	if test $# -gt 0
+	then
+		echo >&2
+		echo >&2 "Error: $*"
+		exit 1
+	else
+		exit 0
+	fi
+}
+
+# Init or update XX.po file from git.pot
+update_po () {
+	if test $# -eq 0
+	then
+		usage "init/update needs at least one argument"
+	fi
+	for locale
+	do
+		locale=${locale##*/}
+		locale=${locale%.po}
+		po=$PODIR/$locale.po
+		mo=$PODIR/build/locale/$locale/LC_MESSAGES/git.mo
+		if test -n "$locale"
+		then
+			if test -f "$po"
+			then
+				msgmerge --add-location --backup=off -U "$po" "$POTFILE"
+				mkdir -p "${mo%/*}"
+				msgfmt -o "$mo" --check --statistics "$po"
+			else
+				msginit -i "$POTFILE" --locale="$locale" -o "$po"
+				perl -pi -e 's/(?<="Project-Id-Version: )PACKAGE VERSION/Git/' "$po"
+				notes_for_l10n_team_leader "$locale"
+			fi
+		fi
+	done
+}
+
+notes_for_l10n_team_leader () {
+	cat <<-END_OF_NOTES
+	============================================================
+	Notes for l10n team leader:
+
+	    Since you create a initialial locale file, you are
+	    likely to be the leader of the $1 l10n team.
+
+	    You can add your team infomation in the "po/TEAMS"
+	    file, and update it when necessary.
+
+	    Please read the file "po/README" first to understand
+	    the workflow of Git l10n maintenance.
+	============================================================
+	END_OF_NOTES
+}
+
+# Check po files and commits. Run all checks if no argument is given.
+check () {
+	if test $# -eq 0
+	then
+		ls $PODIR/*.po |
+		while read f
+		do
+			echo "============================================================"
+			echo "Check ${f##*/}..."
+			check "$f"
+		done
+
+		echo "============================================================"
+		echo "Show updates of git.pot..."
+		show_diff
+
+		echo "============================================================"
+		echo "Check commits..."
+		check commits
+	fi
+	while test $# -gt 0
+	do
+		case "$1" in
+		*.po)
+			check_po "$1"
+			;;
+		commit | commits)
+			shift
+			check_commits "$@"
+			break
+			;;
+		*)
+			usage "Unkown task '$1' for check"
+			;;
+		esac
+		shift
+	done
+}
+
+# Syntax check on XX.po
+check_po () {
+	for locale
+	do
+		locale=${locale##*/}
+		locale=${locale%.po}
+		po=$PODIR/$locale.po
+		mo=$PODIR/build/locale/$locale/LC_MESSAGES/git.mo
+		if test -n "$locale"
+		then
+			if test -f "$po"
+			then
+				mkdir -p "${mo%/*}"
+				msgfmt -o "$mo" --check --statistics "$po"
+			else
+				echo >&2 "Error: File $po does not exist."
+			fi
+		fi
+	done
+}
+
+# Show summary of updates of git.pot or difference between two po/pot files.
+show_diff () {
+	pnew="^.*:\([0-9]*\): this message is used but not defined in.*"
+	pdel="^.*:\([0-9]*\): warning: this message is not used.*"
+	new_count=0
+	del_count=0
+	new_lines=
+	del_lines=
+
+	case $# in
+	0)
+		status=$(cd $PODIR; git status --porcelain -- ${POTFILE##*/})
+		if test -z "$status"
+		then
+			echo "Nothing changed."
+			return 0
+		fi
+		tmpfile=$(mktemp /tmp/git.po.XXXX)
+		(cd $PODIR; LANGUAGE=C git show HEAD:./${POTFILE##*/} >"$tmpfile")
+		oldpot=$tmpfile
+		newpot=$POTFILE
+		oldtitle="the old 'git.pot' file"
+		newtitle="the new generated 'git.pot' file"
+		# Remove tmpfile on exit
+		trap 'rm -f "$tmpfile"' 0
+		echo "Changes of po/git.pot since last update:"
+		;;
+	2)
+		oldpot=$1
+		newpot=$2
+		oldtitle=${oldpot##*/}
+		newtitle=${newpot##*/}
+		echo "Difference between $oldtitle and $newtitle:"
+		;;
+	*)
+		usage "show_diff takes 2 or null arguments."
+		;;
+	esac
+
+	LANGUAGE=C msgcmp -N --use-untranslated "$oldpot" "$newpot" 2>&1 | {
+		while read line
+		do
+			# Extract line number "NNN"from output, like:
+			#     git.pot:NNN: this message is used but not defined in /tmp/git.po.XXXX
+			m=$(echo $line | grep "$pnew" | sed -e "s/$pnew/\1/")
+			if test -n "$m"
+			then
+				new_count=$(( new_count + 1 ))
+				if test -z "$new_lines"
+				then
+					new_lines="$m"
+				else
+					new_lines="${new_lines}, $m"
+				fi
+				continue
+			fi
+
+			# Extract line number "NNN" from output, like:
+			#     /tmp/git.po.XXXX:NNN: warning: this message is not used
+			m=$(echo $line | grep "$pdel" | sed -e "s/$pdel/\1/")
+			if test -n "$m"
+			then
+				del_count=$(( del_count + 1 ))
+				if test -z "$del_lines"
+				then
+					del_lines="$m"
+				else
+					del_lines="${del_lines}, $m"
+				fi
+			fi
+		done
+		if test $new_count -eq 0 && test $del_count -eq 0
+		then
+			echo "Nothing changed."
+			return 0
+		fi
+		if test $new_count -gt 0
+		then
+			test $new_count -ne 1 && new_plur="s" || new_plur=""
+			echo
+			echo " * Add ${new_count} new l10n message${new_plur}" \
+				 "in $newtitle at" \
+				 "line${new_plur}:"
+			echo "   ${new_lines}"
+		fi
+		if test $del_count -gt 0
+		then
+			test $del_count -ne 1 && del_plur="s" || del_plur=""
+			echo
+			echo " * Remove ${del_count} l10n message${del_plur}" \
+				 "from $oldtitle at line${del_plur}:"
+			echo "   ${del_lines}"
+		fi
+	}
+}
+
+verify_commit_encoding () {
+	c=$1
+	subject=0
+	non_ascii=""
+	encoding=""
+	log=""
+
+	git cat-file commit $c | {
+		while read line
+		do
+			log="$log - $line"
+			# next line would be the commit log subject line,
+			# if no previous empty line found.
+			if test -z "$line"
+			then
+				subject=$(( subject + 1 ))
+				continue
+			fi
+			if test $subject -eq 0
+			then
+				if echo $line | grep -q "^encoding "
+				then
+					encoding=${line#encoding }
+				fi
+			fi
+			# non-ascii found in commit log
+			m=$(echo $line | sed -e "s/\([[:alnum:][:space:][:punct:]]\)//g")
+			if test -n "$m"
+			then
+				non_ascii="$m >> $line <<"
+				if test $subject -eq 1
+				then
+					report_nonascii_in_subject $c "$non_ascii"
+					return
+				fi
+			fi
+			# subject has only one line
+			test $subject -eq 1 && subject=$(( subject + 1 ))
+			# break if there are non-asciis and has already checked subject line
+			if test -n "$non_ascii" && test $subject -gt 0
+			then
+				break
+			fi
+		done
+		if test -n "$non_ascii"
+		then
+			if test -z "$encoding"
+			then
+				echo $line | iconv -f UTF-8 -t UTF-8 -s >/dev/null ||
+					report_bad_encoding "$c" "$non_ascii"
+			else
+				echo $line | iconv -f $encoding -t UTF-8 -s >/dev/null ||
+					report_bad_encoding "$c" "$non_ascii" "$encoding"
+			fi
+		fi
+	}
+}
+
+report_nonascii_in_subject () {
+	c=$1
+	non_ascii=$2
+
+	echo >&2 "============================================================"
+	echo >&2 "Error: Non-ASCII in subject of commit $c:"
+	echo >&2 "       ${non_ascii}"
+	echo >&2
+	git cat-file commit "$c" | head -15 |
+	while read line
+	do
+		echo >&2 "\t$line"
+	done
+	echo >&2
+}
+
+report_bad_encoding () {
+	c=$1
+	non_ascii=$2
+	encoding=$3
+
+	echo >&2 "============================================================"
+	if test -z "$encoding"
+	then
+		echo >&2 "Error: Not have encoding setting for commit $c:"
+	else
+		echo >&2 "Error: Wrong encoding ($encoding) for commit $c:"
+	fi
+	echo >&2 "       ${non_ascii}"
+	echo >&2
+	git cat-file commit "$c" | head -15 |
+	while read line
+	do
+		echo >&2 "\t$line"
+	done
+	echo
+}
+
+# Check commit logs for bad encoding settings
+check_commits () {
+	if test $# -gt 2
+	then
+		usage "check commits only needs 2 arguments"
+	fi
+	since=${1:-origin/master}
+	til=${2:-HEAD}
+
+	if git diff-tree -r "$since" "$til" | awk '{print $6}' | grep -qv "^po/"
+	then
+		echo >&2 "============================================================"
+		echo >&2 "Error: changed files outside po directory!"
+		echo >&2 "       reference: git diff-tree -r $since $til"
+	fi
+
+	count=0
+	git rev-list ${since}..${til} | {
+		while read c
+		do
+			verify_commit_encoding $c
+			count=$(( count + 1 ))
+		done
+		echo "$count commits checked complete."
+	}
+}
+
+
+test $# -eq 0 && usage
+
+if ! test -f "$POTFILE"
+then
+	echo "Cannot find git.pot in your workspace. Are you in the workspace of git project?"
+	exit 1
+fi
+
+while test $# -ne 0
+do
+	case "$1" in
+	init | update)
+		shift
+		update_po "$@"
+		break
+		;;
+	check)
+		shift
+		check "$@"
+		break
+		;;
+	diff)
+		shift
+		show_diff "$@"
+		break
+		;;
+	*.po)
+		update_po "$1"
+		;;
+	-h | --help)
+		usage
+		;;
+	*)
+		usage "Unknown command '$1'."
+		;;
+	esac
+	shift
+done
-- 
1.7.9.2.330.gaa956.dirty

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-03-10  9:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-07 18:47 [PATCH] Maintaince script for l10n files and commits Jiang Xin
2012-03-07 19:17 ` Junio C Hamano
2012-03-08 16:05   ` Jiang Xin
2012-03-08 20:41     ` Junio C Hamano
2012-03-09  0:57       ` Jiang Xin
2012-03-09  6:08       ` [PATCH v2] " Jiang Xin
2012-03-09  6:20         ` David Aguilar
2012-03-09  6:31           ` Jiang Xin
2012-03-10  0:40         ` Junio C Hamano
2012-03-10  9:17       ` [PATCH v3] " Jiang Xin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).