From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.0 required=3.0 tests=AWL,BAD_ENC_HEADER,BAYES_00, BODY_8BITS,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 73AD81F97E for ; Tue, 9 Oct 2018 22:09:40 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:reply-to:to:cc:message-id :in-reply-to:references:subject:mime-version:content-type :content-transfer-encoding; q=dns; s=default; b=IoBxAKeqGkyrXR0Q bMCtqs+lPcHHpkWDjAJbxxE1kSI9D6NniZ+OR1e36HMtRt9rTOHDhbSU4ijBbBX9 0+6dM11RABt9nHLEgiHg7BLgO1r0fShkAxJi6qIfAEcc90N+PcWMja7J8tfpEWCZ 1luOJomv+GBd4FA66FMSQtb1si4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:reply-to:to:cc:message-id :in-reply-to:references:subject:mime-version:content-type :content-transfer-encoding; s=default; bh=QQE45yMPN967PvXJxM8ILF D2CUQ=; b=NAKhSBF94gbT0GyzZkSRyh/2nhE6fX6Kz2JOSuWSD7ni+V22cQzKeB 57Zcvzx2scSZCNajcddNjUXKFghTK4s6KOXyuCqM1Xk9Gxgw9jwhHoYbBTi6vAJm Lq9j2Gx0dJUuAO/bmI8jz39zEXD6u0/2MGUIjoW8gyhcegjXOEEV8= Received: (qmail 119304 invoked by alias); 9 Oct 2018 22:09:38 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 119024 invoked by uid 89); 9 Oct 2018 22:09:37 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: shared-ano163.rev.nazwa.pl Date: Wed, 10 Oct 2018 00:08:56 +0200 (CEST) From: Rafal Luzynski Reply-To: Rafal Luzynski To: Marko Myllynen , Egor Kobylkin , Keld Simonsen Cc: libc-alpha@sourceware.org, libc-locales@sourceware.org, "Dmitry V. Levin" , Volodymyr Lisivka , Carlos O'Donell , Max Kutny , danilo@gnome.org Message-ID: <1984104697.413415.1539122936119@poczta.nazwa.pl> In-Reply-To: <63fb4fae-a93b-7aff-13df-4452cbc8853f@redhat.com> References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <16e785f3-2e9f-ceb2-698f-dc33c91a5d5e@kobylkin.com> <20181003091949.GA21486@rap.rap.dk> <21d872b2-613e-d1f5-26c0-baa4b5721df9@kobylkin.com> <1485772360.805333.1538731225156@poczta.nazwa.pl> <19e29568-e710-535f-4f90-98dbcec930ed@kobylkin.com> <1028447684.826961.1539036295224@poczta.nazwa.pl> <63fb4fae-a93b-7aff-13df-4452cbc8853f@redhat.com> Subject: Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable 9.10.2018 18:10 Marko Myllynen wrote: > On 2018-10-09 01:04, Rafal Luzynski wrote: > > If you refer to other languages than Russian which also use the Cyrilli= c > > alphabet but need a different transliteration rules than Russian for > > the same characters then it is OK for me now. I am afraid that the icon= v > > algorithm does not handle such case. Of course, we should add this miss= ing > > feature eventually but I do not volunteer to do it now. > > Yes, this would be needed for correct transliteration of different > languages, and this might be quite a bit of work. There's also the case > of transliteration and character sets, consider the transliteration > examples from https://fi.wikipedia.org/wiki/Siirtokirjoitus: > > Russian: =D0=91=D0=BE=D1=80=D0=B8=D1=81 =D0=9D=D0=B8=D0=BA=D0=BE=D0=BB=D0= =B0=D0=B5=D0=B2=D0=B8=D1=87 =D0=95=D0=BB=D1=8C=D1=86=D0=B8=D0=BD > Int'l: Boris Nikolaevi=C4=8D El=CA=B9cin > Finnish: Boris Nikolajevit=C5=A1 Jeltsin > French: Boris Nikola=C3=AFevitch Ieltsine > Phonetic (IPA): [b=C9=90=CB=88r=CA=B2is n=CA=B2=C9=AAk=C9=90=CB=88la=C9= =AAv=CA=B2=C9=AAt=C9=95 =CB=88jel=CA=B2ts=C9=A8n] No, I did not mean the transcription using the rules of the destination locale using Latin but that the rules of transliteration may be different depending on the language of the source text. For example, consider this Cyrillic string: "=D0=BD=D1=8A=D0=B3" (I'm not telling that it is actu= ally used in any existing word but still must be handled). By our transliteration rules it will be transliterated as "n``g". But this is fine for Russian; if we knew that the source string is Ukrainian it would be transliterated as "n``h"; if it was Bulgarian it would be transliterated as "n=C4=83g". Similarly, if you had to transliterate the Latin letters "sch" to Cyrillic first you would have to ask what was be the source language. Unfortunately, I think that distinction of the source language is impossibl= e at the moment so let's assume that we fall back to Russian if there is any ambiguity. Regards, Rafal