From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-2.9 required=3.0 tests=AWL,BAD_ENC_HEADER,BAYES_00, BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 8739A20248 for ; Tue, 16 Apr 2019 18:42:01 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:cc:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=cel9w/rqC1EFsdjd L4/E+MtEmeTWIdjWWrp4zA7ldsA6nO14z6FaZkRibvPY5KKEP7QnbOLO4uFUaCXa j7CldgphhESrTVDIyPx0dh0JFEVMRcYnyK06xrHnjpbAnwUi7YjxGpI+UpZAO9E0 N5kVzuYOiPMJvgxviuUkCyC0xGU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:cc:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=dQP2yXSyO+rbVc4927FFZi d6Wlg=; b=KusXXTJgh7XVb7GPMwl8kLwLlbhZJ5JXTyGvigv6tB4+qo2nRZJo9l XPFLio13F27WFvF2wioR8Gm2wjtE+H8+cpl91ePu/FGXvQdz6kB/3BeTrpa/F2hb XRi/JnIZ1iWS11IGG291fHJmGzSMDFEb0K983tfBk0h8zARVCFBlY= Received: (qmail 78200 invoked by alias); 16 Apr 2019 18:41:58 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 78181 invoked by uid 89); 16 Apr 2019 18:41:58 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: mout.kundenserver.de Subject: Re: [PING^6][PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872] To: Carlos O'Donell , Marko Myllynen , libc-alpha@sourceware.org, libc-locales@sourceware.org, Carlos O'Donell , Siddhesh Poyarekar , Rafal Luzynski Cc: Mike Fabian References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <7cdd817a-4a47-201a-8eeb-87db324104b3@kobylkin.com> <8923a5a0-65c8-4784-6d7d-f3571933dcb5@redhat.com> <4ebfdba5-41c1-3465-0b01-9152d6417350@redhat.com> <5aa900a3-b6ce-66c9-d2b5-fcc71e764154@kobylkin.com> From: Egor Kobylkin Message-ID: <7d272d07-87ca-12fd-0f1b-00cbd93ea43d@kobylkin.com> Date: Tue, 16 Apr 2019 20:41:42 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit On 16.04.19 19:58, Carlos O'Donell wrote: > On 4/16/19 1:06 PM, Egor Kobylkin wrote: >> Just FYI, this what I was testing: ./testrun.sh /usr/bin/iconv -f >> UTF-8 -t ASCII//TRANSLIT <<< >> "ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУУ́ФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуу́фхцчшщъыьэюяёђѓєѕіїјљњћќўџѪѫѲѳѴѵҌҍ >> ҐґҒғҔҕҖҗҚқҞҟҢңҤҥҦҧҨҩҪҫҬҭҮүҲҳҴҵҺһҼҽҾҿӀӁӂӋӌӐӑӒӓӖӗӘәӜӝӞӟӠӡӤӥӦӧӨөӰӱӲӳӴӵӸӹ’" >> >> And this is the expected result ("" added by myself): >> "YODJG`YEZ`IYIJL`N`TSHK`U`DHABVGDEZHZIJKLMNOPRSTUU?FXCZCHSHSHHA`Y``E`YUYAabvgdezhzijklmnoprstuu?fxczchshshh``y``e`yuyayodjg`yez`iyijl`n`tshk`u`dhO`o`FHfhYHyhE`e` >> G`g`GHghGHghZH`zh`K`k`K`k`N`n`NGngP`p`O`o`C`C`T`t`UuH`h`TCZtczSH`sh`CH`ch`CH`ch`iZH`zh`CH`ch`A`a`A`a`E`e`A`a`ZH`zh`Z`z`Z`z`I`i`O`o`O`o`U`u`U`u`CH`ch`Y`y`'" >> > > Thanks. > > I was using CyrTranslit (python translater) to review other work done in > this area, > but it wasn't very fruitful. > > $ python3 > Python 3.7.3 (default, Mar 27 2019, 13:36:35) > [GCC 9.0.1 20190227 (Red Hat 9.0.1-0.8)] on linux > Type "help", "copyright", "credits" or "license" for more information. >>>> import cyrtranslit >>>> cyrtranslit.supported() > dict_keys(['sr', 'me', 'mk', 'ru']) >>>> cyrtranslit.to_latin("ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУУ́ФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуу́фхцчшщъыьэюяёђѓєѕіїјљњћќўџѪѫѲѳѴѵҌҍҐґҒғҔҕҖҗҚқҞҟҢңҤҥҦҧҨҩҪҫҬҭҮүҲҳҴҵҺһҼҽҾҿӀӁӂӋӌӐӑӒӓӖӗӘәӜӝӞӟӠӡӤӥӦӧӨөӰӱӲӳӴӵӸӹ’") >>>> > 'ЁĐЃЄЅІЇJLjNjĆЌЎDžABVGDEŽZIЙKLMNOPRSTUÚFHCČŠЩЪЫЬЭЮЯabvgdežziйklmnoprstuúfhcčšщъыьэюяёđѓєѕіїjljnjćќўdžѪѫѲѳѴѵҌҍҐґҒғҔҕҖҗҚқҞҟҢңҤҥҦҧҨҩҪҫҬҭҮүҲҳҴҵҺһҼҽҾҿӀӁӂӋӌӐӑӒӓӖӗӘәӜӝӞӟӠӡӤӥӦӧӨөӰӱӲӳӴӵӸӹ’' > >>>> > > "ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУУ́ФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуу́фхцчшщъыьэюяёђѓєѕіїјљњћќўџѪѫѲѳѴѵҌҍҐґҒғҔҕҖҗҚқҞҟҢңҤҥҦҧҨҩҪҫҬҭҮүҲҳҴҵҺһҼҽҾҿӀӁӂӋӌӐӑӒӓӖӗӘәӜӝӞӟӠӡӤӥӦӧӨөӰӱӲӳӴӵӸӹ’" > > 'ЁĐЃЄЅІЇJLjNjĆЌЎDžABVGDEŽZIЙKLMNOPRSTUÚFHCČŠЩЪЫЬЭЮЯabvgdežziйklmnoprstuúfhcčšщъыьэюяёđѓєѕіїjljnjćќўdžѪѫѲѳѴѵҌҍҐґҒғҔҕҖҗҚқҞҟҢңҤҥҦҧҨҩҪҫҬҭҮүҲҳҴҵҺһҼҽҾҿӀӁӂӋӌӐӑӒӓӖӗӘәӜӝӞӟӠӡӤӥӦӧӨөӰӱӲӳӴӵӸӹ’' > > > Which doesn't give a good transliteration. I guess the reason for that is that it is using the first key 'sr' from your list that stands for Serbian. And Serbian doesn't have those characters that are omitted ( "Щ" for example). > But the table is better: > https://github.com/opendatakosovo/cyrillic-transliteration/blob/master/cyrtranslit/mapping.py#L138-L155 > > > Ё -> YO. > > Which is a good cross-check for me. Yet the closest one from that codebase should be this https://github.com/opendatakosovo/cyrillic-transliteration/blob/master/cyrtranslit/mapping.py#L88 It is exactly the reason we had 12 iterations on this patch - we wanted to cover the most complete yet workable standard for the table. What we reference in the bug memo is the actual accepted standard. It is coalesced with the extended standard for further outdated cyrillic letters. Bests, Egor Kobylkin