* [PATCH] t8005: Nobody writes Russian in shift_jis
@ 2009-06-19 2:18 Junio C Hamano
2009-06-19 10:25 ` Alexander Gavrilov
2009-06-19 14:54 ` Brandon Casey
0 siblings, 2 replies; 4+ messages in thread
From: Junio C Hamano @ 2009-06-19 2:18 UTC (permalink / raw
To: git; +Cc: Alexander Gavrilov, Brandon Casey
The second and third tests of this script expected that Russian strings
are converted between ISO-8859-5 and Shift_JIS in the "blame --porcelain"
format output correctly.
Sure, many platforms may convert between such a combination, but that is
only because one of the base character set of Shift_JIS, JIS X 0208,
defines codepoints for Russian characters (among others); I do not think
anybody uses Shift_JIS when seriously writing Russian, and it is perfectly
understandable if iconv() libraries on some platforms fail converting
between this combination, as it does not matter in reality.
This patch changes the test to verify Japanese strings are converted
correctly between EUC-JP and Shift_JIS in the same procedure. The point
of the test is not about verifying the platform's iconv() library, but to
see if "git blame" makes correct iconv() library calls when it should.
We could instead use ISO-8859-5 and KOI8-R as the combination, because
they are both meant to represent Russian, in order to make this test
meaningful on more platforms, but we already use Shift_JIS vs EUC-JP
combinations to test other programs in our test suite, so this combination
is safer from the point of view of the portability. Besides, I do not
read nor write Russian; sorry ;-)
This change allows tests to pass on my (friend's) Solaris 5.11 box.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
* I am Cc'ing Alexander because he originally wrote this test using
cp1251 and shift_jis, and I could be wrong in saying that nobody sane
writes Russian in shift_jis.
To allow 7-bit mailpath to pass this patch through, I tentatively
dropped this in my t/t8005 directory (the file is not tracked):
$ echo '*.txt binary' >t/t8005/.gitattributes
before running format-patch on this commit.
t/t8005-blame-i18n.sh | 26 +++++++++++++-------------
t/t8005/euc-japan.txt | Bin 0 -> 66 bytes
t/t8005/iso8859-5.txt | Bin 74 -> 0 bytes
t/t8005/sjis.txt | Bin 100 -> 56 bytes
t/t8005/utf8.txt | Bin 100 -> 71 bytes
5 files changed, 13 insertions(+), 13 deletions(-)
create mode 100644 t/t8005/euc-japan.txt
delete mode 100644 t/t8005/iso8859-5.txt
diff --git a/t/t8005-blame-i18n.sh b/t/t8005-blame-i18n.sh
index 9cca14d..cb39055 100755
--- a/t/t8005-blame-i18n.sh
+++ b/t/t8005-blame-i18n.sh
@@ -4,7 +4,7 @@ test_description='git blame encoding conversion'
. ./test-lib.sh
. "$TEST_DIRECTORY"/t8005/utf8.txt
-. "$TEST_DIRECTORY"/t8005/iso8859-5.txt
+. "$TEST_DIRECTORY"/t8005/euc-japan.txt
. "$TEST_DIRECTORY"/t8005/sjis.txt
test_expect_success 'setup the repository' '
@@ -13,10 +13,10 @@ test_expect_success 'setup the repository' '
git add file &&
git commit --author "$UTF8_NAME <utf8@localhost>" -m "$UTF8_MSG" &&
- echo "ISO-8859-5 LINE" >> file &&
+ echo "EUC-JAPAN LINE" >> file &&
git add file &&
- git config i18n.commitencoding ISO8859-5 &&
- git commit --author "$ISO8859_5_NAME <iso8859-5@localhost>" -m "$ISO8859_5_MSG" &&
+ git config i18n.commitencoding eucJP &&
+ git commit --author "$EUC_JAPAN_NAME <euc-japan@localhost>" -m "$EUC_JAPAN_MSG" &&
echo "SJIS LINE" >> file &&
git add file &&
@@ -41,17 +41,17 @@ test_expect_success \
'
cat >expected <<EOF
-author $ISO8859_5_NAME
-summary $ISO8859_5_MSG
-author $ISO8859_5_NAME
-summary $ISO8859_5_MSG
-author $ISO8859_5_NAME
-summary $ISO8859_5_MSG
+author $EUC_JAPAN_NAME
+summary $EUC_JAPAN_MSG
+author $EUC_JAPAN_NAME
+summary $EUC_JAPAN_MSG
+author $EUC_JAPAN_NAME
+summary $EUC_JAPAN_MSG
EOF
test_expect_success \
'blame respects i18n.logoutputencoding' '
- git config i18n.logoutputencoding ISO8859-5 &&
+ git config i18n.logoutputencoding eucJP &&
git blame --incremental file | \
egrep "^(author|summary) " > actual &&
test_cmp actual expected
@@ -76,8 +76,8 @@ test_expect_success \
cat >expected <<EOF
author $SJIS_NAME
summary $SJIS_MSG
-author $ISO8859_5_NAME
-summary $ISO8859_5_MSG
+author $EUC_JAPAN_NAME
+summary $EUC_JAPAN_MSG
author $UTF8_NAME
summary $UTF8_MSG
EOF
diff --git a/t/t8005/euc-japan.txt b/t/t8005/euc-japan.txt
new file mode 100644
index 0000000000000000000000000000000000000000..288f040c99f6b61559e3ad964a1247d4b9fd62a3
GIT binary patch
literal 66
zcmZ<_b&mIP3~=;|_jB}hwN=`^`REaaLkG_9QsQ!jOZf)7+bS)+w)D-yJxd=fIk)uK
R(w$3BEIGbp=fcHGTmX#39_;`C
literal 0
HcmV?d00001
diff --git a/t/t8005/iso8859-5.txt b/t/t8005/iso8859-5.txt
deleted file mode 100644
index 2e4b80c8df4da30722561049c46cca778e49cd2f..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 74
zcmeYa_P4MwwTw57_jB}hwN=`2>B3!w{Z}77xOeHsbA^L9uG|B%l(;<M%6x;}ZIupP
Zefa3!rF&Nu9^Sim@#WRKH?Asi0RTCvCqV!J
diff --git a/t/t8005/sjis.txt b/t/t8005/sjis.txt
index 2ccfbad207c6e96b1f4f528031d9e4938d364b92..bbdefeaced4b54f98e5d9a85ddd8e0d7346fe7e3 100644
GIT binary patch
literal 56
zcmWIc@(hmmbM$q!Rq6|xoUAZ$-;78lu3(U;Z?L<qQgdl@Ph)g*L(`e&)aHoh^roXt
J+Z&yfxBxx66{!FK
literal 100
zcmWIc@(hmmbM$q!Rci5UDQYQbsZ(ePXen)JX=!R{018yLbSkt20jUxo7c8X26%5kk
k8|)6$6AV<^3{(tK+R##}0OT|PVPQ)*P@)c~tyGB%00x;U@Bjb+
diff --git a/t/t8005/utf8.txt b/t/t8005/utf8.txt
index f46cfc56d80797740c3ec15e166add052f905fcb..4d00dbea7659ee27fda283e7e45cfb2d5f6ea4d1 100644
GIT binary patch
literal 71
zcmWFyakGf`bM$q!ReHK{<MSyS6rL_w^|HB7i7ON&;~VU5tMs^e+T-RmkDK>AZeH-X
baoywQw#Q97A2)YAZe0GjapvQOCM7Na%AzF~
literal 100
zcmWFyakGf`bM$q!Rk|?a!lnxwF6>pfF#p2Vi%l0BF6;ve?6}yjaADzv9T&D-*as0(
t;tB<6@(p$e>RAL-+IX=EtaRUntqK<#fy{juHeT$!u=T=Tpth|_TmU(XJDmUk
--
1.6.3.2.316.gda4e4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] t8005: Nobody writes Russian in shift_jis
2009-06-19 2:18 [PATCH] t8005: Nobody writes Russian in shift_jis Junio C Hamano
@ 2009-06-19 10:25 ` Alexander Gavrilov
2009-06-19 14:54 ` Brandon Casey
1 sibling, 0 replies; 4+ messages in thread
From: Alexander Gavrilov @ 2009-06-19 10:25 UTC (permalink / raw
To: Junio C Hamano; +Cc: git, Brandon Casey
On Fri, Jun 19, 2009 at 6:18 AM, Junio C Hamano<gitster@pobox.com> wrote:
> * I am Cc'ing Alexander because he originally wrote this test using
> cp1251 and shift_jis, and I could be wrong in saying that nobody sane
> writes Russian in shift_jis.
Well, certainly not intentionally, but I've managed to send a few
work-related emails in sjis accidentally (resulting in much confusion
for the people on the other side), and thought it is a bit funny :)
I'd guess that almost nobody uses iso8859-5 as well, though. Nobody
that I know, anyway.
Alexander
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] t8005: Nobody writes Russian in shift_jis
2009-06-19 2:18 [PATCH] t8005: Nobody writes Russian in shift_jis Junio C Hamano
2009-06-19 10:25 ` Alexander Gavrilov
@ 2009-06-19 14:54 ` Brandon Casey
2009-06-21 10:07 ` Junio C Hamano
1 sibling, 1 reply; 4+ messages in thread
From: Brandon Casey @ 2009-06-19 14:54 UTC (permalink / raw
To: Junio C Hamano; +Cc: git, Alexander Gavrilov, Brandon Casey
Junio C Hamano wrote:
> The second and third tests of this script expected that Russian strings
> are converted between ISO-8859-5 and Shift_JIS in the "blame --porcelain"
> format output correctly.
>
> Sure, many platforms may convert between such a combination, but that is
> only because one of the base character set of Shift_JIS, JIS X 0208,
> defines codepoints for Russian characters (among others); I do not think
> anybody uses Shift_JIS when seriously writing Russian, and it is perfectly
> understandable if iconv() libraries on some platforms fail converting
> between this combination, as it does not matter in reality.
>
> This patch changes the test to verify Japanese strings are converted
> correctly between EUC-JP and Shift_JIS in the same procedure. The point
> of the test is not about verifying the platform's iconv() library, but to
> see if "git blame" makes correct iconv() library calls when it should.
>
> We could instead use ISO-8859-5 and KOI8-R as the combination, because
> they are both meant to represent Russian, in order to make this test
> meaningful on more platforms, but we already use Shift_JIS vs EUC-JP
> combinations to test other programs in our test suite, so this combination
> is safer from the point of view of the portability. Besides, I do not
> read nor write Russian; sorry ;-)
>
> This change allows tests to pass on my (friend's) Solaris 5.11 box.
No change on my systems. I can convert eucJP and SJIS from/to UTF-8, but
I cannot convert between eucJP and SJIS. So tests 2 and 3 still fail for
me. Nothing was broken though. The fourth test still passes which converts
each of the encodings to UTF-8. So this patch is fine with me.
-brandon
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] t8005: Nobody writes Russian in shift_jis
2009-06-19 14:54 ` Brandon Casey
@ 2009-06-21 10:07 ` Junio C Hamano
0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2009-06-21 10:07 UTC (permalink / raw
To: Brandon Casey; +Cc: git, Alexander Gavrilov, Brandon Casey
Brandon Casey <casey@nrlssc.navy.mil> writes:
> No change on my systems. I can convert eucJP and SJIS from/to UTF-8, but
> I cannot convert between eucJP and SJIS.
I wonder what's different, but I suspect having lang-support-japanese
package on the box perhaps is helping me.
> So tests 2 and 3 still fail for
> me. Nothing was broken though. The fourth test still passes which converts
> each of the encodings to UTF-8. So this patch is fine with me.
Yikes, so it does not really help by itself. Taken together with
Alexander's comment that he did manage to send Russian in Shift_JIS (I
somehow do not think Alexander used Solaris for that, though; neither have
I any clue if the receiving end grokked that), perhaps the patch is
useless.
Even though I do not think if any Russian writes in KOI8-R and converts to
Shift_JIS on purpose, converting eucJP directly to SJIS is something
Japanese people who are on UNIX do quite often, or at least used to before
everybody moved to UTF-8.
Perhaps we should instead optionally help platform's iconv(3), when it
cannot convert A to B directly, by pivoting the conversion on UTF-8
(i.e. A -> UTF-8 -> B)? That would probably help the real world use cases
while fixing the issue with this test script.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-06-21 10:08 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-19 2:18 [PATCH] t8005: Nobody writes Russian in shift_jis Junio C Hamano
2009-06-19 10:25 ` Alexander Gavrilov
2009-06-19 14:54 ` Brandon Casey
2009-06-21 10:07 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).