From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.2 required=3.0 tests=AWL,BAYES_00,BODY_8BITS, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 4CCB01F405 for ; Wed, 19 Dec 2018 22:26:18 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:message-id:in-reply-to:references :subject:mime-version:content-type:content-transfer-encoding; q= dns; s=default; b=iJpGN9ie4T2pFAgo76U+WoNecG6q/ya45kBllgKqVt/xzR ST07twsaErpLb1nKx/HntPYcC69UxEzkjgaD87nSUtJiWGrSkeWoDoIn7KOvo9rU Pkl0cq9sLM0XQ+XYFUyRhosyjtt38aUFTs7aW1/zkT5zyYvMtKVrkukd+he+Y= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:message-id:in-reply-to:references :subject:mime-version:content-type:content-transfer-encoding; s= default; bh=atZQ/juFgCBdf60ijw2O+xorstg=; b=q7q9JbcNniVkyq7B6zg+ 6DYAuxO5MwVifYPZvcEuZ7BywR0cvPPyKBwsSZ7cMlDZD409Hd3aQiQfpwqeNHt6 k7TpxicTrJ/6cEKOrKQdrURg22mOQi0sBteTKE7VDshu4pfyX1sde/EG2N74OEo7 A8oidWQb1VUR5A/A7b92U0w= Received: (qmail 110568 invoked by alias); 19 Dec 2018 22:26:15 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 110547 invoked by uid 89); 19 Dec 2018 22:26:14 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: shared-ano163.rev.nazwa.pl Date: Wed, 19 Dec 2018 23:25:28 +0100 (CET) From: Rafal Luzynski To: Marko Myllynen , Egor Kobylkin , libc-alpha@sourceware.org, libc-locales@sourceware.org Message-ID: <300646541.674067.1545258329027@poczta.nazwa.pl> In-Reply-To: References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <837001401.21346.1542406647888@poczta.nazwa.pl> <5a247161-c498-ed50-ff4a-58f2ecf974f0@redhat.com> <1441622134.517912.1543702039942@poczta.nazwa.pl> <2f6fc82c-77ba-d331-ae5d-e2373e122a88@kobylkin.com> <1361059722.707244.1544231740358@poczta.nazwa.pl> Subject: Re: [PATCH v9] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable 10.12.2018 22:20 Marko Myllynen wrote: >=20 > Hi, >=20 > On 08/12/2018 03.15, Rafal Luzynski wrote: > > [...] > > Marko, it would be good to hear your opinion about System A vs. System = B > > again. >=20 > I think System A is a better option as it should be the same as ISO 9 > and perhaps also produces results in some cases which are more expected > than with System B (if the Wikipedia ISO 9 article is to be believed). >=20 > Wrt BZ #2872 I think it's good to keep it in mind but IMHO we can also > deviate from it if needed, however with System A + ASCII fallback > definitions the RFE should be satisfied as well? That's exactly what I meant (sorry if it was not clear before). > > [...] Marko, what is your > > opinion about possible implementation of SFS 4900 in these cases: > >=20 > > * When the destination charset does not contain required Latin diacriti= c > > characters (e.g., it is plain ASCII)? >=20 > This would be according to http://jkorpela.fi/iso9.html8 so for example > instead of =C5=BE -> zh and instead of =C5=A1t=C5=A1 -> shtsh. Agree. > > * When the output is ambiguous, that means, when two different Cyrillic > > strings produce the same Latin (or ASCII) output? >=20 > This is a good point and one I haven't considered but I'm not sure is > there anything we can do about this (at least without major locale > system internals work)? I agree with the suggestion that we can't do much about it. I mean, there are possibly solutions (like using more punctuation characters) but they don't look natural to me. > Do you have any rough idea how frequently this > could happen or is this more a theoretical issue? (Sorry if I've missed > earlier comments about this, it's been a long thread.) Yes, Egor provided this example many times: "=D1=81=D1=85=D0=B5=D0=BC=D0=B0" -> "shema" (if "=D1=81" -> "s" and "=D1=85= " -> "h") "=D1=88=D0=B5=D0=BC=D0=B0" -> "shema" (if "=D1=88" -> "sh") I don't think that it matters how frequent are these cases. I think that the question is if ambiguity is a bug because if yes then even one corner case proves that the solution is wrong. > [...] > Yes, I also think System A AKA ISO 9 would be a better choice but I'll > leave the final decision for you two (and others who might weigh in). Egor is a native speaker so I respect his opinion even if I'm not fully convinced for technical reasons. Sadly, nobody else provides any opinion which could weigh. I am going to write a separate email about it. Regards, Rafal