git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] t8005: Nobody writes Russian in shift_jis
@ 2009-06-19  2:18 Junio C Hamano
  2009-06-19 10:25 ` Alexander Gavrilov
  2009-06-19 14:54 ` Brandon Casey
  0 siblings, 2 replies; 4+ messages in thread
From: Junio C Hamano @ 2009-06-19  2:18 UTC (permalink / raw
  To: git; +Cc: Alexander Gavrilov, Brandon Casey

The second and third tests of this script expected that Russian strings
are converted between ISO-8859-5 and Shift_JIS in the "blame --porcelain"
format output correctly.

Sure, many platforms may convert between such a combination, but that is
only because one of the base character set of Shift_JIS, JIS X 0208,
defines codepoints for Russian characters (among others); I do not think
anybody uses Shift_JIS when seriously writing Russian, and it is perfectly
understandable if iconv() libraries on some platforms fail converting
between this combination, as it does not matter in reality.

This patch changes the test to verify Japanese strings are converted
correctly between EUC-JP and Shift_JIS in the same procedure.  The point
of the test is not about verifying the platform's iconv() library, but to
see if "git blame" makes correct iconv() library calls when it should.

We could instead use ISO-8859-5 and KOI8-R as the combination, because
they are both meant to represent Russian, in order to make this test
meaningful on more platforms, but we already use Shift_JIS vs EUC-JP
combinations to test other programs in our test suite, so this combination
is safer from the point of view of the portability.  Besides, I do not
read nor write Russian; sorry ;-)

This change allows tests to pass on my (friend's) Solaris 5.11 box.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * I am Cc'ing Alexander because he originally wrote this test using
   cp1251 and shift_jis, and I could be wrong in saying that nobody sane
   writes Russian in shift_jis.

   To allow 7-bit mailpath to pass this patch through, I tentatively
   dropped this in my t/t8005 directory (the file is not tracked):

	$ echo '*.txt binary' >t/t8005/.gitattributes
   
   before running format-patch on this commit.

 t/t8005-blame-i18n.sh |   26 +++++++++++++-------------
 t/t8005/euc-japan.txt |  Bin 0 -> 66 bytes
 t/t8005/iso8859-5.txt |  Bin 74 -> 0 bytes
 t/t8005/sjis.txt      |  Bin 100 -> 56 bytes
 t/t8005/utf8.txt      |  Bin 100 -> 71 bytes
 5 files changed, 13 insertions(+), 13 deletions(-)
 create mode 100644 t/t8005/euc-japan.txt
 delete mode 100644 t/t8005/iso8859-5.txt

diff --git a/t/t8005-blame-i18n.sh b/t/t8005-blame-i18n.sh
index 9cca14d..cb39055 100755
--- a/t/t8005-blame-i18n.sh
+++ b/t/t8005-blame-i18n.sh
@@ -4,7 +4,7 @@ test_description='git blame encoding conversion'
 . ./test-lib.sh
 
 . "$TEST_DIRECTORY"/t8005/utf8.txt
-. "$TEST_DIRECTORY"/t8005/iso8859-5.txt
+. "$TEST_DIRECTORY"/t8005/euc-japan.txt
 . "$TEST_DIRECTORY"/t8005/sjis.txt
 
 test_expect_success 'setup the repository' '
@@ -13,10 +13,10 @@ test_expect_success 'setup the repository' '
 	git add file &&
 	git commit --author "$UTF8_NAME <utf8@localhost>" -m "$UTF8_MSG" &&
 
-	echo "ISO-8859-5 LINE" >> file &&
+	echo "EUC-JAPAN LINE" >> file &&
 	git add file &&
-	git config i18n.commitencoding ISO8859-5 &&
-	git commit --author "$ISO8859_5_NAME <iso8859-5@localhost>" -m "$ISO8859_5_MSG" &&
+	git config i18n.commitencoding eucJP &&
+	git commit --author "$EUC_JAPAN_NAME <euc-japan@localhost>" -m "$EUC_JAPAN_MSG" &&
 
 	echo "SJIS LINE" >> file &&
 	git add file &&
@@ -41,17 +41,17 @@ test_expect_success \
 '
 
 cat >expected <<EOF
-author $ISO8859_5_NAME
-summary $ISO8859_5_MSG
-author $ISO8859_5_NAME
-summary $ISO8859_5_MSG
-author $ISO8859_5_NAME
-summary $ISO8859_5_MSG
+author $EUC_JAPAN_NAME
+summary $EUC_JAPAN_MSG
+author $EUC_JAPAN_NAME
+summary $EUC_JAPAN_MSG
+author $EUC_JAPAN_NAME
+summary $EUC_JAPAN_MSG
 EOF
 
 test_expect_success \
 	'blame respects i18n.logoutputencoding' '
-	git config i18n.logoutputencoding ISO8859-5 &&
+	git config i18n.logoutputencoding eucJP &&
 	git blame --incremental file | \
 		egrep "^(author|summary) " > actual &&
 	test_cmp actual expected
@@ -76,8 +76,8 @@ test_expect_success \
 cat >expected <<EOF
 author $SJIS_NAME
 summary $SJIS_MSG
-author $ISO8859_5_NAME
-summary $ISO8859_5_MSG
+author $EUC_JAPAN_NAME
+summary $EUC_JAPAN_MSG
 author $UTF8_NAME
 summary $UTF8_MSG
 EOF
diff --git a/t/t8005/euc-japan.txt b/t/t8005/euc-japan.txt
new file mode 100644
index 0000000000000000000000000000000000000000..288f040c99f6b61559e3ad964a1247d4b9fd62a3
GIT binary patch
literal 66
zcmZ<_b&mIP3~=;|_jB}hwN=`^`REaaLkG_9QsQ!jOZf)7+bS)+w)D-yJxd=fIk)uK
R(w$3BEIGbp=fcHGTmX#39_;`C

literal 0
HcmV?d00001

diff --git a/t/t8005/iso8859-5.txt b/t/t8005/iso8859-5.txt
deleted file mode 100644
index 2e4b80c8df4da30722561049c46cca778e49cd2f..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001

literal 74
zcmeYa_P4MwwTw57_jB}hwN=`2>B3!w{Z}77xOeHsbA^L9uG|B%l(;<M%6x;}ZIupP
Zefa3!rF&Nu9^Sim@#WRKH?Asi0RTCvCqV!J

diff --git a/t/t8005/sjis.txt b/t/t8005/sjis.txt
index 2ccfbad207c6e96b1f4f528031d9e4938d364b92..bbdefeaced4b54f98e5d9a85ddd8e0d7346fe7e3 100644
GIT binary patch
literal 56
zcmWIc@(hmmbM$q!Rq6|xoUAZ$-;78lu3(U;Z?L<qQgdl@Ph)g*L(`e&)aHoh^roXt
J+Z&yfxBxx66{!FK

literal 100
zcmWIc@(hmmbM$q!Rci5UDQYQbsZ(ePXen)JX=!R{018yLbSkt20jUxo7c8X26%5kk
k8|)6$6AV<^3{(tK+R##}0OT|PVPQ)*P@)c~tyGB%00x;U@Bjb+

diff --git a/t/t8005/utf8.txt b/t/t8005/utf8.txt
index f46cfc56d80797740c3ec15e166add052f905fcb..4d00dbea7659ee27fda283e7e45cfb2d5f6ea4d1 100644
GIT binary patch
literal 71
zcmWFyakGf`bM$q!ReHK{<MSyS6rL_w^|HB7i7ON&;~VU5tMs^e+T-RmkDK>AZeH-X
baoywQw#Q97A2)YAZe0GjapvQOCM7Na%AzF~

literal 100
zcmWFyakGf`bM$q!Rk|?a!lnxwF6>pfF#p2Vi%l0BF6;ve?6}yjaADzv9T&D-*as0(
t;tB<6@(p$e>RAL-+IX=EtaRUntqK<#fy{juHeT$!u=T=Tpth|_TmU(XJDmUk

-- 
1.6.3.2.316.gda4e4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] t8005: Nobody writes Russian in shift_jis
  2009-06-19  2:18 [PATCH] t8005: Nobody writes Russian in shift_jis Junio C Hamano
@ 2009-06-19 10:25 ` Alexander Gavrilov
  2009-06-19 14:54 ` Brandon Casey
  1 sibling, 0 replies; 4+ messages in thread
From: Alexander Gavrilov @ 2009-06-19 10:25 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, Brandon Casey

On Fri, Jun 19, 2009 at 6:18 AM, Junio C Hamano<gitster@pobox.com> wrote:
>  * I am Cc'ing Alexander because he originally wrote this test using
>   cp1251 and shift_jis, and I could be wrong in saying that nobody sane
>   writes Russian in shift_jis.

Well, certainly not intentionally, but I've managed to send a few
work-related emails in sjis accidentally (resulting in much confusion
for the people on the other side), and thought it is a bit funny :)

I'd guess that almost nobody uses iso8859-5 as well, though. Nobody
that I know, anyway.

Alexander

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] t8005: Nobody writes Russian in shift_jis
  2009-06-19  2:18 [PATCH] t8005: Nobody writes Russian in shift_jis Junio C Hamano
  2009-06-19 10:25 ` Alexander Gavrilov
@ 2009-06-19 14:54 ` Brandon Casey
  2009-06-21 10:07   ` Junio C Hamano
  1 sibling, 1 reply; 4+ messages in thread
From: Brandon Casey @ 2009-06-19 14:54 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, Alexander Gavrilov, Brandon Casey

Junio C Hamano wrote:
> The second and third tests of this script expected that Russian strings
> are converted between ISO-8859-5 and Shift_JIS in the "blame --porcelain"
> format output correctly.
> 
> Sure, many platforms may convert between such a combination, but that is
> only because one of the base character set of Shift_JIS, JIS X 0208,
> defines codepoints for Russian characters (among others); I do not think
> anybody uses Shift_JIS when seriously writing Russian, and it is perfectly
> understandable if iconv() libraries on some platforms fail converting
> between this combination, as it does not matter in reality.
> 
> This patch changes the test to verify Japanese strings are converted
> correctly between EUC-JP and Shift_JIS in the same procedure.  The point
> of the test is not about verifying the platform's iconv() library, but to
> see if "git blame" makes correct iconv() library calls when it should.
> 
> We could instead use ISO-8859-5 and KOI8-R as the combination, because
> they are both meant to represent Russian, in order to make this test
> meaningful on more platforms, but we already use Shift_JIS vs EUC-JP
> combinations to test other programs in our test suite, so this combination
> is safer from the point of view of the portability.  Besides, I do not
> read nor write Russian; sorry ;-)
> 
> This change allows tests to pass on my (friend's) Solaris 5.11 box.

No change on my systems.  I can convert eucJP and SJIS from/to UTF-8, but
I cannot convert between eucJP and SJIS.  So tests 2 and 3 still fail for
me.  Nothing was broken though.  The fourth test still passes which converts
each of the encodings to UTF-8.  So this patch is fine with me.

-brandon

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] t8005: Nobody writes Russian in shift_jis
  2009-06-19 14:54 ` Brandon Casey
@ 2009-06-21 10:07   ` Junio C Hamano
  0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2009-06-21 10:07 UTC (permalink / raw
  To: Brandon Casey; +Cc: git, Alexander Gavrilov, Brandon Casey

Brandon Casey <casey@nrlssc.navy.mil> writes:

> No change on my systems.  I can convert eucJP and SJIS from/to UTF-8, but
> I cannot convert between eucJP and SJIS.

I wonder what's different, but I suspect having lang-support-japanese
package on the box perhaps is helping me.

> So tests 2 and 3 still fail for
> me.  Nothing was broken though.  The fourth test still passes which converts
> each of the encodings to UTF-8.  So this patch is fine with me.

Yikes, so it does not really help by itself.  Taken together with
Alexander's comment that he did manage to send Russian in Shift_JIS (I
somehow do not think Alexander used Solaris for that, though; neither have
I any clue if the receiving end grokked that), perhaps the patch is
useless.

Even though I do not think if any Russian writes in KOI8-R and converts to
Shift_JIS on purpose, converting eucJP directly to SJIS is something
Japanese people who are on UNIX do quite often, or at least used to before
everybody moved to UTF-8.

Perhaps we should instead optionally help platform's iconv(3), when it
cannot convert A to B directly, by pivoting the conversion on UTF-8
(i.e. A -> UTF-8 -> B)?  That would probably help the real world use cases
while fixing the issue with this test script.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-06-21 10:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-19  2:18 [PATCH] t8005: Nobody writes Russian in shift_jis Junio C Hamano
2009-06-19 10:25 ` Alexander Gavrilov
2009-06-19 14:54 ` Brandon Casey
2009-06-21 10:07   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).