git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>,
	"Jeff King" <peff@peff.net>
Subject: Re: [PATCH 00/11] [RFH] Introduce support for GETTEXT_POISON=rot13
Date: Tue, 12 Jan 2021 14:32:40 +0100	[thread overview]
Message-ID: <87mtxe5lhj.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <pull.836.git.1610441262.gitgitgadget@gmail.com>


On Tue, Jan 12 2021, Johannes Schindelin via GitGitGadget wrote:

> This patch series is incomplete because I lack the time to finish it, and I
> hope that there are interested developers who can help with it.
>
> In https://lore.kernel.org/git/20181022153633.31757-1-pclouds@gmail.com/,
> Duy proposed introducing a fake language they called "Ook" to be able to
> test translations better. "Ook" would "translate" messages without any
> printf format specifiers to "Eek", otherwise convert all upper-case letters
> to "Ook" and lower-case letters to "ook".
>
> Gábor replied with a different proposal that has the advantage of being
> bijective, i.e. it is possible to reconstruct the untranslated message from
> the translated one.
>
> Ævar suggested recently
> [https://lore.kernel.org/git/xmqqim836v6m.fsf@gitster.c.googlers.com/T/#m6fdc43d4f1eb3f20f841096c59e985b69c84875e]
> that this whole GETTEXT_POISON business is totally useless.

To clarify what I really mean, but admittedly am not always the best at
articulating: I do not think GETTEXT_POISON is useless, it's useful.

The interesting question is "useful enough to bother?", or in economics
terms "is this investment in time/money worth the opportunity cost?".

Any hoops we make people jump through when writing tests takes away time
and attention from other things.

My motivation here is that I feel this whole poison business is all my
fault. Every time some newbie submits a patch and needs to re-roll a v2
because they used "grep" instead of "test_i18ngrep" in a case that
clearly didn't involve a plumbing string I cringe a little about the
technical debt.

So while I still think my series (just nuke it) would be better overall,
I'm secretly rooting for yours. If the consensus is that this is worth
keeping and improving perhaps we haven't been mostly wasting our time :)

On to discussing this series:

> I do not believe that it is useless. To back up my belief with something
> tangible, I implemented a GETTEXT_POISON=rot13 mode and ran the test suite
> to see whether we erroneously expect translated messages where they aren't,
> and related problems.

I agree that rot13ing it makes it much more useful in the sense that
before we could over-tag things with test_i18n* and we'd just "return 0"
there in the poison mode, now we actually check if the string is
poisoned.

We could get most of the way there by checking e.g. if "GETTEXT POISON"
is in the output, but it wouldn't assert that the message is 100%
translated as this WIP series does.

In our off-list discussion you maintained that
"GIT_TEST_GETTEXT_POISON=false <cmd>" was an anti-pattern. Was it
because you thought "test_i18n*" et al was actually doing some "is
poison?" comparison as we now do, or just because you didn't
conceptually like a pattern like:

    GIT_TEST_GETTEXT_POISON=false git cmd >out &&
    grep string out

Being changed later to:

    GIT_TEST_GETTEXT_POISON=false git cmd >out &&
    grep string out &&
    grep plumbing-string out

Or whatever, as opposed to:

    git cmd >out &&
    test_i18ngrep string out &&
    test_i18ngrep plumbing-string out

I get the aesthetic preference, I'm just wondering if I'm missing
something else about it...

> And the experiment delivered. It is just a demonstration (I only addressed a
> handful of the test suite failures, 124 test scripts still need to be
> inspected to find out why they fail), of course. Yet I think that finding
> e.g. the missing translations of sha1dc-cb and parse-options show that it
> would be very useful to complete this patch series and then use the rot13
> mode in our CI jobs (instead of the current GETTEXT_POISON=true mode, which
> really is less useful).

The following patch on top cuts down on the failures by more than half:

-----------------
diff --git a/git-sh-i18n.sh b/git-sh-i18n.sh
index 8eef60b43f..f9239d2917 100644
--- a/git-sh-i18n.sh
+++ b/git-sh-i18n.sh
@@ -17,7 +17,10 @@ export TEXTDOMAINDIR
 
 # First decide what scheme to use...
 GIT_INTERNAL_GETTEXT_SH_SCHEME=fallthrough
-if test -n "$GIT_TEST_GETTEXT_POISON" &&
+if test -n "$GIT_TEST_GETTEXT_POISON" -a "$GIT_TEST_GETTEXT_POISON" = "rot13"
+then
+	GIT_INTERNAL_GETTEXT_SH_SCHEME=rot13poison
+elif  test -n "$GIT_TEST_GETTEXT_POISON" &&
 	    git env--helper --type=bool --default=0 --exit-code \
 		GIT_TEST_GETTEXT_POISON
 then
@@ -63,6 +66,21 @@ gettext_without_eval_gettext)
 		)
 	}
 	;;
+rot13poison)
+	# Emit garbage so that tests that incorrectly rely on translatable
+	# strings will fail.
+	gettext () {
+		printf "%s" "# SHELL GETTEXT POISON #"
+	}
+
+	eval_gettext () {
+		printf "%s" "# SHELL GETTEXT POISON #"
+	}
+
+	eval_ngettext () {
+		printf "%s" "# SHELL GETTEXT POISON #"
+	}
+	;;
 poison)
 	# Emit garbage so that tests that incorrectly rely on translatable
 	# strings will fail.
diff --git a/t/helper/test-i18n.c b/t/helper/test-i18n.c
index 82efc66f1f..1f0fa3d041 100644
--- a/t/helper/test-i18n.c
+++ b/t/helper/test-i18n.c
@@ -1,6 +1,8 @@
 #include "test-tool.h"
 #include "cache.h"
 #include "grep.h"
+#include "config.h"
 
 static const char *usage_msg = "\n"
 "  test-tool i18n cmp <file1> <file2>\n"
@@ -34,8 +36,12 @@ static size_t unrot13(char *buf)
 
 		p += strlen("<rot13>");
 		end = strstr(p, "</rot13>");
-		if (!end)
-			BUG("could not find </rot13> in\n%s", buf);
+		if (!end) {
+			if (git_env_bool("GIT_TEST_GETTEXT_POISON_PEDANTIC", 0))
+				BUG("could not find </rot13> in\n%s", buf);
+			else
+				break;
+		}
 
 		while (p != end)
 			*(q++) = do_rot13(*(p++));
@@ -67,8 +73,12 @@ static int i18n_cmp(const char **argv)
 
 	if (strbuf_read_file(&a, path1, 0) < 0)
 		die_errno("could not read %s", path1);
+	if (strstr(a.buf, "# SHELL GETTEXT POISON #"))
+		return 0;
 	if (strbuf_read_file(&b, path2, 0) < 0)
 		die_errno("could not read %s", path2);
+	if (strstr(b.buf, "# SHELL GETTEXT POISON #"))
+		return 0;
 	unrot13_strbuf(&b);
 
 	if (a.len != b.len || memcmp(a.buf, b.buf, a.len))
@@ -79,7 +89,6 @@ static int i18n_cmp(const char **argv)
 
 static int i18n_grep(const char **argv)
 {
-	int dont_match = 0;
 	const char *pattern, *path;
 
 	struct grep_opt opt;
@@ -87,11 +96,6 @@ static int i18n_grep(const char **argv)
 	struct strbuf buf = STRBUF_INIT;
 	int hit;
 
-	if (*argv && !strcmp("!", *argv)) {
-		dont_match = 1;
-		argv++;
-	}
-
 	pattern = *(argv++);
 	path = *(argv++);
 
@@ -104,13 +108,15 @@ static int i18n_grep(const char **argv)
 
 	if (strbuf_read_file(&buf, path, 0) < 0)
 		die_errno("could not read %s", path);
+	if (strstr(buf.buf, "# SHELL GETTEXT POISON #"))
+		return 0;
 	unrot13_strbuf(&buf);
 	grep_source_init(&source, GREP_SOURCE_BUF, path, path, path);
 	source.buf = buf.buf;
 	source.size = buf.len;
 	hit = grep_source(&opt, &source);
 	strbuf_release(&buf);
-	return dont_match ^ !hit;
+	return !hit;
 }
 
 int cmd__i18n(int argc, const char **argv)
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 394fd2ef5b..6c580a3000 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1019,14 +1019,12 @@ test_i18ngrep () {
 		BUG "too few parameters to test_i18ngrep"
 	fi
 
-	if test_have_prereq !C_LOCALE_OUTPUT
+	grep_cmd=grep
+	if test "$GIT_TEST_GETTEXT_POISON" == "rot13"
+	then
+		grep_cmd="test-tool i18n grep"
+	elif test_have_prereq !C_LOCALE_OUTPUT
 	then
-		if test rot13 = "$GIT_TEST_GETTEXT_POISON"
-		then
-			test-tool i18n grep "$@"
-			return $?
-		fi
-
 		# pretend success
 		return 0
 	fi
@@ -1034,13 +1032,13 @@ test_i18ngrep () {
 	if test "x!" = "x$1"
 	then
 		shift
-		! grep "$@" && return 0
+		! $grep_cmd "$@" && return 0
 
-		echo >&4 "error: '! grep $@' did find a match in:"
+		echo >&4 "error: '! $grep_cmd $@' did find a match in:"
 	else
-		grep "$@" && return 0
+		$grep_cmd "$@" && return 0
 
-		echo >&4 "error: 'grep $@' didn't find a match in:"
+		echo >&4 "error: '$grep_cmd $@' didn't find a match in:"
 	fi
 
 	if test -s "$last_arg"
-----------------

I.e. most of this was because you didn't add any support for this to the
shellscript translations.

We still punt on that here, it should ideally preserve the shell
variables like the format specifiers, but I think that's not worth the
effort with us actively cutting down our shell code to nothing.

We still have failures e.g. due to "test_i18ngrep -F fixed file" not
being supported. Wouldn't a better/simpler approach be to just have a
light rot13 helper, and then call the actual "cmp"/"grep" with the
munged input/output?

  parent reply	other threads:[~2021-01-12 13:35 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-12  8:47 [PATCH 00/11] [RFH] Introduce support for GETTEXT_POISON=rot13 Johannes Schindelin via GitGitGadget
2021-01-12  8:47 ` [PATCH 01/11] tests: remove unnecessary GIT_TEST_GETTEXT_POISON=false constructs Johannes Schindelin via GitGitGadget
2021-01-12  9:07   ` SZEDER Gábor
2021-01-12  8:47 ` [PATCH 02/11] Support GIT_TEST_GETTEXT_POISON=rot13 Johannes Schindelin via GitGitGadget
2021-01-12 19:37   ` Junio C Hamano
2021-01-12  8:47 ` [PATCH 03/11] parse-options: add forgotten translations Johannes Schindelin via GitGitGadget
2021-01-12  8:47 ` [PATCH 04/11] sha1dc: mark forgotten message for translation Johannes Schindelin via GitGitGadget
2021-01-12 11:37   ` Jeff King
2021-01-12 19:40     ` Junio C Hamano
2021-01-15 15:43     ` Johannes Schindelin
2021-01-15 16:29       ` Jeff King
2021-01-18 14:26         ` Johannes Schindelin
2021-01-18 21:06           ` Junio C Hamano
2021-01-19 15:52             ` Johannes Schindelin
2021-01-12  8:47 ` [PATCH 05/11] t0006: use `test_i18ncmp` only for C locales Johannes Schindelin via GitGitGadget
2021-01-12  8:47 ` [PATCH 06/11] t0041: stop using `test_i18ngrep` on untranslated output Johannes Schindelin via GitGitGadget
2021-01-12  8:47 ` [PATCH 07/11] t0027: mark a function as requiring the C locale Johannes Schindelin via GitGitGadget
2021-01-12  8:47 ` [PATCH 08/11] t6301: do not expect the output of `for-each-ref` to be translated Johannes Schindelin via GitGitGadget
2021-01-12  8:47 ` [PATCH 09/11] GETTEXT_POISON=rot13: do compare the output in `test_i18ncmp` Johannes Schindelin via GitGitGadget
2021-01-12 19:47   ` Junio C Hamano
2021-01-15 15:44     ` Johannes Schindelin
2021-01-15 19:58       ` Junio C Hamano
2021-01-18 14:36         ` Johannes Schindelin
2021-01-12  8:47 ` [PATCH 10/11] GETTEXT_POISON=rot13: perform actual checks in `test_i18ngrep` Johannes Schindelin via GitGitGadget
2021-01-12  8:47 ` [PATCH 11/11] test-tool i18n: do not complain about empty messages Johannes Schindelin via GitGitGadget
2021-01-12 11:34 ` [PATCH 00/11] [RFH] Introduce support for GETTEXT_POISON=rot13 Jeff King
2021-01-12 19:50   ` Junio C Hamano
2021-01-13  7:32     ` Jeff King
2021-01-16  6:38     ` Junio C Hamano
2021-01-12 13:32 ` Ævar Arnfjörð Bjarmason [this message]
2021-01-15 16:13   ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mtxe5lhj.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).