From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.0 required=3.0 tests=AWL,BAD_ENC_HEADER,BAYES_00, BODY_8BITS,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 9E08E1F97E for ; Wed, 10 Oct 2018 11:21:58 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:reply-to:subject:to:cc:references:from :message-id:date:mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=hP4zIATQ/gytM/tL QHp3So4AsB+ceO/jON6NY6x+LU3Zt0CBaxqTQrqLo+9OTiDzmCgbR37GZVn8sQHi zgpZxOVg6UQRD0q3n4q4eo++k2jTI9xSJFqUEMXcm+3W+9jLolPZNwdETEdalTde dva9ZVclksp8PifFjnKY4QNkkV4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:reply-to:subject:to:cc:references:from :message-id:date:mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=JJbtN8U4oMkelFZo4rG67Q aY0k4=; b=jAhOAUM41RiGTEoVz1oUezALU13vkWRQMXW7+zLFNcjoDIo7fsDJjI lfdD8IzsQX7bQRYxxN6pZxkkrKfDHzDudMN9nPD8RvzSb4hFx63GhaQYtgFmdoVf 6YwOeoi8J6p5QsowXTfRIccFr6MGIoyyPz45KtQ6HqUGbXZxTxeM0= Received: (qmail 67515 invoked by alias); 10 Oct 2018 11:21:54 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 67482 invoked by uid 89); 10 Oct 2018 11:21:53 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: mail-wm1-f67.google.com Reply-To: Marko Myllynen Subject: Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29 To: Rafal Luzynski , Egor Kobylkin , Keld Simonsen Cc: libc-alpha@sourceware.org, libc-locales@sourceware.org, "Dmitry V. Levin" , Volodymyr Lisivka , Carlos O'Donell , Max Kutny , danilo@gnome.org References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <16e785f3-2e9f-ceb2-698f-dc33c91a5d5e@kobylkin.com> <20181003091949.GA21486@rap.rap.dk> <21d872b2-613e-d1f5-26c0-baa4b5721df9@kobylkin.com> <1485772360.805333.1538731225156@poczta.nazwa.pl> <19e29568-e710-535f-4f90-98dbcec930ed@kobylkin.com> <1028447684.826961.1539036295224@poczta.nazwa.pl> <63fb4fae-a93b-7aff-13df-4452cbc8853f@redhat.com> <1984104697.413415.1539122936119@poczta.nazwa.pl> From: Marko Myllynen Message-ID: <87803183-18f2-3641-df81-c766d0736fd9@redhat.com> Date: Wed, 10 Oct 2018 14:21:46 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <1984104697.413415.1539122936119@poczta.nazwa.pl> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Hi, On 2018-10-10 01:08, Rafal Luzynski wrote: > 9.10.2018 18:10 Marko Myllynen wrote: >> On 2018-10-09 01:04, Rafal Luzynski wrote: >>> If you refer to other languages than Russian which also use the Cyrillic >>> alphabet but need a different transliteration rules than Russian for >>> the same characters then it is OK for me now. I am afraid that the iconv >>> algorithm does not handle such case. Of course, we should add this missing >>> feature eventually but I do not volunteer to do it now. >> >> Yes, this would be needed for correct transliteration of different >> languages, and this might be quite a bit of work. There's also the case >> of transliteration and character sets, consider the transliteration >> examples from https://fi.wikipedia.org/wiki/Siirtokirjoitus: >> >> Russian: Борис Николаевич Ельцин >> Int'l: Boris Nikolaevič Elʹcin >> Finnish: Boris Nikolajevitš Jeltsin >> French: Boris Nikolaïevitch Ieltsine >> Phonetic (IPA): [bɐˈrʲis nʲɪkɐˈlaɪvʲɪtɕ ˈjelʲtsɨn] > > No, I did not mean the transcription using the rules of the destination > locale using Latin but that the rules of transliteration may be different > depending on the language of the source text. Yes, I mentioned this case in my earlier email: https://sourceware.org/ml/libc-alpha/2018-10/msg00083.html > this Cyrillic string: "нъг" (I'm not telling that it is actually used > in any existing word but still must be handled). By our transliteration > rules it will be transliterated as "n``g". But this is fine for Russian; > if we knew that the source string is Ukrainian it would be transliterated > as "n``h"; if it was Bulgarian it would be transliterated as "năg". And according to SFS 4900, in fi_FI for this string we would see for Russian ng, for Ukrainian nh, and for Bulgarian năg. > Unfortunately, I think that distinction of the source language is impossible > at the moment so let's assume that we fall back to Russian if there is > any ambiguity. Yeah, it's not optimal but probably the most decent compromise for now. Thanks, -- Marko Myllynen