From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-98617-e=80x24.org@sourceware.org>
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN: AS31976 209.132.180.0/23
X-Spam-Status: No, score=-3.2 required=3.0 tests=AWL,BAYES_00,BODY_8BITS,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS
	shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by dcvr.yhbt.net (Postfix) with ESMTPS id 4CCB01F405
	for <e@80x24.org>; Wed, 19 Dec 2018 22:26:18 +0000 (UTC)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:date:from:to:message-id:in-reply-to:references
	:subject:mime-version:content-type:content-transfer-encoding; q=
	dns; s=default; b=iJpGN9ie4T2pFAgo76U+WoNecG6q/ya45kBllgKqVt/xzR
	ST07twsaErpLb1nKx/HntPYcC69UxEzkjgaD87nSUtJiWGrSkeWoDoIn7KOvo9rU
	Pkl0cq9sLM0XQ+XYFUyRhosyjtt38aUFTs7aW1/zkT5zyYvMtKVrkukd+he+Y=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:date:from:to:message-id:in-reply-to:references
	:subject:mime-version:content-type:content-transfer-encoding; s=
	default; bh=atZQ/juFgCBdf60ijw2O+xorstg=; b=q7q9JbcNniVkyq7B6zg+
	6DYAuxO5MwVifYPZvcEuZ7BywR0cvPPyKBwsSZ7cMlDZD409Hd3aQiQfpwqeNHt6
	k7TpxicTrJ/6cEKOrKQdrURg22mOQi0sBteTKE7VDshu4pfyX1sde/EG2N74OEo7
	A8oidWQb1VUR5A/A7b92U0w=
Received: (qmail 110568 invoked by alias); 19 Dec 2018 22:26:15 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-e=80x24.org@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 110547 invoked by uid 89); 19 Dec 2018 22:26:14 -0000
Authentication-Results: sourceware.org; auth=none
X-HELO: shared-ano163.rev.nazwa.pl
Date: Wed, 19 Dec 2018 23:25:28 +0100 (CET)
From: Rafal Luzynski <digitalfreak@lingonborough.com>
To: Marko Myllynen <myllynen@redhat.com>, Egor Kobylkin <egor@kobylkin.com>,
	libc-alpha@sourceware.org, libc-locales@sourceware.org
Message-ID: <300646541.674067.1545258329027@poczta.nazwa.pl>
In-Reply-To: <d5cdbc81-8049-e1fa-56f6-047bd1d7eb28@redhat.com>
References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com>
 <20180412224352.GB2911@altlinux.org>
 <b82fe65b-b880-a2b5-c97d-2a6aae9c1165@kobylkin.com>
 <837001401.21346.1542406647888@poczta.nazwa.pl>
 <bef63562-09d1-3306-aae9-20002ccf4130@kobylkin.com>
 <5a247161-c498-ed50-ff4a-58f2ecf974f0@redhat.com>
 <1441622134.517912.1543702039942@poczta.nazwa.pl>
 <2f6fc82c-77ba-d331-ae5d-e2373e122a88@kobylkin.com>
 <1361059722.707244.1544231740358@poczta.nazwa.pl>
 <d5cdbc81-8049-e1fa-56f6-047bd1d7eb28@redhat.com>
Subject: Re: [PATCH v9] Locales: Cyrillic -> ASCII transliteration table [BZ
 #2872]
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

10.12.2018 22:20 Marko Myllynen <myllynen@redhat.com> wrote:
>=20
> Hi,
>=20
> On 08/12/2018 03.15, Rafal Luzynski wrote:
> > [...]
> > Marko, it would be good to hear your opinion about System A vs. System =
B
> > again.
>=20
> I think System A is a better option as it should be the same as ISO 9
> and perhaps also produces results in some cases which are more expected
> than with System B (if the Wikipedia ISO 9 article is to be believed).
>=20
> Wrt BZ #2872 I think it's good to keep it in mind but IMHO we can also
> deviate from it if needed, however with System A + ASCII fallback
> definitions the RFE should be satisfied as well?

That's exactly what I meant (sorry if it was not clear before).

> > [...]  Marko, what is your
> > opinion about possible implementation of SFS 4900 in these cases:
> >=20
> > * When the destination charset does not contain required Latin diacriti=
c
> >   characters (e.g., it is plain ASCII)?
>=20
> This would be according to http://jkorpela.fi/iso9.html8 so for example
> instead of =C5=BE -> zh and instead of =C5=A1t=C5=A1 -> shtsh.

Agree.

> > * When the output is ambiguous, that means, when two different Cyrillic
> >   strings produce the same Latin (or ASCII) output?
>=20
> This is a good point and one I haven't considered but I'm not sure is
> there anything we can do about this (at least without major locale
> system internals work)?

I agree with the suggestion that we can't do much about it.  I mean,
there are possibly solutions (like using more punctuation characters)
but they don't look natural to me.

> Do you have any rough idea how frequently this
> could happen or is this more a theoretical issue? (Sorry if I've missed
> earlier comments about this, it's been a long thread.)

Yes, Egor provided this example many times:

"=D1=81=D1=85=D0=B5=D0=BC=D0=B0" -> "shema" (if "=D1=81" -> "s" and "=D1=85=
" -> "h")
"=D1=88=D0=B5=D0=BC=D0=B0"  -> "shema" (if "=D1=88" -> "sh")

I don't think that it matters how frequent are these cases.  I think that
the question is if ambiguity is a bug because if yes then even one corner
case proves that the solution is wrong.

> [...]
> Yes, I also think System A AKA ISO 9 would be a better choice but I'll
> leave the final decision for you two (and others who might weigh in).

Egor is a native speaker so I respect his opinion even if I'm not fully
convinced for technical reasons.  Sadly, nobody else provides any opinion
which could weigh.  I am going to write a separate email about it.

Regards,

Rafal