From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.5 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 813121F97E for ; Mon, 8 Oct 2018 22:52:30 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:cc:references:from:message-id:date :mime-version:in-reply-to:content-type; q=dns; s=default; b=YRvF zZ+S5thvg9dcCSIh7W+J8/VdFDkwqL/l0NmxpUs6W7JaqVQqiO6osEzszuDvuRV0 Y8GeKXmVAp+HIZn+x7KFHY7MvKDF9zlDZXglWqDFFlkaVeSPH0UT0+kJgYMCdReb ufkCMmh4Mil6HFnJMNZyZESRjet4i3z+tOwIteM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:cc:references:from:message-id:date :mime-version:in-reply-to:content-type; s=default; bh=djDh7CFH1d qjZRGAOJ11HWLzTPM=; b=f+p7PjjI0Brh8q681IP3JVAwNuPVZWYC5yZLia6ZP8 lQhvaeEgGEhLeavtytVfUcUMtHV7qsJhVTC6bQ1Rw96nYQVfnPuI5GpJv1dmaLKo xemHrLIC0hnbjm2e1NXRNh28BqBaqMEx2jArJ2HRCjQe+yOZ6RLPpJN5gmfkWY3/ w= Received: (qmail 68757 invoked by alias); 8 Oct 2018 22:52:25 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 68719 invoked by uid 89); 8 Oct 2018 22:52:25 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: mout.kundenserver.de Subject: Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29 To: Rafal Luzynski , Marko Myllynen Cc: Keld Simonsen , libc-alpha@sourceware.org, libc-locales@sourceware.org, "Dmitry V. Levin" , Volodymyr Lisivka , Carlos O'Donell , Max Kutny , danilo@gnome.org References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <16e785f3-2e9f-ceb2-698f-dc33c91a5d5e@kobylkin.com> <20181003091949.GA21486@rap.rap.dk> <21d872b2-613e-d1f5-26c0-baa4b5721df9@kobylkin.com> <1485772360.805333.1538731225156@poczta.nazwa.pl> <19e29568-e710-535f-4f90-98dbcec930ed@kobylkin.com> <1028447684.826961.1539036295224@poczta.nazwa.pl> From: Egor Kobylkin Openpgp: preference=signencrypt Message-ID: Date: Tue, 9 Oct 2018 00:52:00 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <1028447684.826961.1539036295224@poczta.nazwa.pl> Content-Type: multipart/mixed; boundary="------------BCB4DAF45DDA2708571892B4" This is a multi-part message in MIME format. --------------BCB4DAF45DDA2708571892B4 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Hi Rafal, > But, while at this, is there anything that stops are from adding > transliteration rules for additional Cyrillic characters not used in > Russian but used in other languages? Just to make sure we are not talking at cross purposes. Since your last email on this topic on the suggestion from Marko I have already implemented ISO 9 transliteration for all characters there are. This should cover most if not all Slavic Cyrillic. You seem to have just noticed and replied to this email of Marko as I write mine. Pls also check the Spreadsheet version I have just uploaded https://sourceware.org/bugzilla/attachment.cgi?id=11298 I am currently absorbing Marko's further suggestions and correction to that one and will get back for more discussion once done there. I am reading your suggestions and taking them to my heart, be sure of that. Two professional translators independently indicated the difference between transliteration and transcription to me. Transliteration is normative (letter for letter) and transcription is phonetic - letter for whatever combination of Latin letters in the target language that sounds like it for a native speaker. While transliteration should be easy to cover for all those languages via ISO 9, transcription is inherently language specific. The problem is we are (mis)using the transcription as transliteration to ASCII because ASCII set of characters does not allow for proper transcription. Another problem is that to be really useful the ASCII transliteration should work outside of source locale (i.e. not only ru_RU but en_US, de_DE, en_DE, es_ES etc. or even just C locale). In fact for myself I would be committed to do all work needed to cover at least C, en_US, ru_RU, de_DE in that order. ru_RU as a "courtesy", I am not really using it but hope more contributors for locales may come because of that and fix my bugs :-). > The problem is that we don't have a separate maintainer for each > locale, we have only 2 maintainers for about 200 locales and we must > represent them all. It was not clear to me that glibc team can not fall back on the individual locale maintainers to make the decision. But then it may make the decision making even easier. If you guys have a list of requirements (may be implicit until now) could you please shoot them my way? We can also certainly just keep this thread up and have all issues ironed out. Anyway hopefully with ISO 9 as a first column in the translit_cyrillic we cover the issue of the completeness of transliteration now. What we need to figure out is transcription/transliteration to ASCII - second column. Are we sharing the same view on this? Speaking on decision making - maybe I can get an officially certified court translator to answer our questions. Do you care to put a list together of questions you would like answered to make a decision on the table/inclusion into various locales? Hope this helps, Egor On 09.10.2018 00:04, Rafal Luzynski wrote: > 5.10.2018 12:36 Egor Kobylkin wrote: >> [...] I see three options: 1. those locale maintainers that are >> fine with using ISO 9:1995/GOST_7.79_System_B cyrillic >> transliteration table (Ru) include it in their locales. >> https://sourceware.org/bugzilla/attachment.cgi?id=11289 2. those >> that that want to have a differing table can create their own >> variety based on the spreadsheet I have prepared >> https://sourceware.org/bugzilla/attachment.cgi?id=8590 and include >> it in this patch. 3. those that want to omit a cyrillic >> transliteration altogether for now state so and just carry over the >> bug #2872 from the year 2006. >> >> Does this make sense to you? > > The problem is that we don't have a separate maintainer for each > locale, we have only 2 maintainers for about 200 locales and we must > represent them all. Sometimes a locale may happen to be our own > native locale or of someone in this list, or it may be a locale which > we accidentally can speak as a foreign language, or we may have > friends who can speak it. Or it may be totally unknown and we still > must somehow handle it. > > I think that these transliteration rules should be included in > multiple locales on "opt-in" basis rather than "opt-out". I mean, we > should not include them in all locales unless someone explicitly > provides a different rules. Instead, I think we should add them > (maybe with modification) only to those locales where we have a good > reason to think they will work. > > Particularly, I think that those rules will not be helpful at all > for the languages which use neither Latin nor Cyrillic alphabet. > >> [...] The fact that the patch is reflecting Russian variety of ISO >> 9:1995/GOST_7.79_System_B is because a) ISO >> 9:1995/GOST_7.79_System_B is available and can be helpful to a >> majority of cyrillic users b) I have access to it including via >> being proficient in Russian. > > I took a look at these standards and as first I doubted they may be > correct for English language now I understand they are created for > Russian users. Therefore I think it is pretty correct to include > them to Russian locale data. Will it be OK if we say that it is only > for Russian language? Will it be satisfying for you and/or your > users? > >> It is offered to all the respective locale maintainers as a >> stopgap solution. Stopgap in the sense that it is better to have >> some transliteration than not to have any at all and carry over the >> bug from 2006. That it may be a somewhat officially correct >> transliteration for ru_RU is a bonus. In that sense I would dub the >> discussion on the correctness for other languages "offtopic". Let >> me know if this is not OK. > > If you refer to other languages than Russian which also use the > Cyrillic alphabet but need a different transliteration rules than > Russian for the same characters then it is OK for me now. I am > afraid that the iconv algorithm does not handle such case. Of > course, we should add this missing feature eventually but I do not > volunteer to do it now. > >> [...] P.S. specifically as to how address languages other than Ru >> included in GOST_7.79_System_B: we can take the first option left >> to right from that table (Ru,By,Uk,Bg,Mk). Then it will technically >> work for all those locales/languages but with errors where Ru >> supersedes their own variants. > > Makes sense, as long as we cannot select the source language now. > > But, while at this, is there anything that stops are from adding > transliteration rules for additional Cyrillic characters not used in > Russian but used in other languages? > > Regards, > > Rafal > --------------BCB4DAF45DDA2708571892B4 Content-Type: message/rfc822; name="Attached Message" Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename="Attached Message" Return-Path: Received: from mail-wm1-f66.google.com ([209.85.128.66]) by mx.kundenserver.de (mxeue010 [212.227.15.41]) with ESMTPS (Nemesis) id 1Mw89c-1ft4SE162u-00s7Ym for ; Mon, 08 Oct 2018 14:40:58 +0200 Received: from mail-wm1-f66.google.com ([209.85.128.66]) by mx.kundenserver.de (mxeue010 [212.227.15.41]) with ESMTPS (Nemesis) id 1Mw89c-1ft4SE162u-00s7Ym for ; Mon, 08 Oct 2018 14:40:58 +0200 Received: by mail-wm1-f66.google.com with SMTP id r63-v6so8008883wma.4 for ; Mon, 08 Oct 2018 05:40:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:reply-to:subject:to:cc:references:from :organization:message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=v/Pkq9G8E6W3f0+VtVuY9tDm9a0k3xHQZav/N7rcX9k=; b=InnKl+8OKd1u8fRYQynZukg/dK1ktS7DnKrPxIC08ud2FxuvoCizsD5DzFsCBwLy9e Odxw2/K09ZT4w0hIYLsEWAonvBUkoTfShm+Xf/VacOu2OSOTparrqjYKj+PS90jKCiao WJnx1P4BNJ+i5P+GzBNj1nIR0rbjevU58KCZ8t6XoGBWFdoFobHTi/I9WXyvaH5JTUar +Gs4lvfWTFqTiHruG0l8TY72wtRNegZsEl0eTUDGhR7Z6zHXTgZVpwTXckzC2HClcSKg bqLSevyovvpM+x6FDySiFeoPcSqjwq7clOJeGUZJDg3ZqAKg9LIPaf//P2Lu9nuzoWi0 Fn4A== X-Gm-Message-State: ABuFfoiKW12wAMOtFRUaiCLlYiP+rkMGzU9PpMjEHdI+me49BalETjQJ ILo+vMjXHl0S0MOasPnhU3ZxmQ== X-Google-Smtp-Source: ACcGV60dH+vo+Hg+FLxhsSN7Pa/PZvVDQwErIuioWqX88XJGs21tWur9UAg0LCQon2P/ty/Y9bjNew== X-Received: by 2002:a1c:930c:: with SMTP id v12-v6mr16048194wmd.9.1539002457592; Mon, 08 Oct 2018 05:40:57 -0700 (PDT) Received: from [192.168.1.101] (87-93-44-255.bb.dnainternet.fi. [87.93.44.255]) by smtp.gmail.com with ESMTPSA id c13-v6sm21609464wrm.50.2018.10.08.05.40.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 08 Oct 2018 05:40:56 -0700 (PDT) Reply-To: Marko Myllynen Subject: Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29 To: Egor Kobylkin , Rafal Luzynski , Keld Simonsen Cc: libc-alpha@sourceware.org, libc-locales@sourceware.org, "Dmitry V. Levin" , Volodymyr Lisivka , Carlos O'Donell , Max Kutny , danilo@gnome.org References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <16e785f3-2e9f-ceb2-698f-dc33c91a5d5e@kobylkin.com> <20181003091949.GA21486@rap.rap.dk> <21d872b2-613e-d1f5-26c0-baa4b5721df9@kobylkin.com> <1485772360.805333.1538731225156@poczta.nazwa.pl> <69e26cab-810e-824b-3b16-b75ac44d8b0c@redhat.com> From: Marko Myllynen Organization: Red Hat Message-ID: Date: Mon, 8 Oct 2018 15:40:53 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Envelope-To: X-UI-Filterresults: notjunk:1;V01:K0:ek5lqM3egBU=:axIY+JUj2o6NbHsWoRmlYm09At 1XxoSaVjP4Wt8BoLX0D5cI6JguhLMkdX+m5KDkhl0l1+fA+fAanlzccSNotINUfKB+fexP8vi R/eaBHAgs8903zdhKoF4fkj9cwiFT+6iy3+oad5wtStvKkSHwShc9//Rw/ivKcnJwo0AqdjIb bV765IvgOfuL09L1OsH50UG3Zvr9CIP/Tcf1jmc9FDfLe9l2ApfxsZzAu5/h9MJYbGEecqewC 9oBF2WZj+Nh1WL43UBdaZf3gjqrPkIJXUXu57YNgFyJq163xD8S/44wvuX0yRPnX6G+W57U67 0CR8p7xWBt5POMJngV7+w4PAMWHOCvoqfd7zIXfmP2k7s5AlFMU1t1+OcUIV9chY7BcMVKSV4 3rMAnodRMrBogIbO3tBtgC3t2e0VOzrKKZ6e027973uJfMqd3OVvNKb+EknYYQ3knPm+59iIY 6isOriM5+yDLUoz0jZKN5/Hyu1rvmJb9nuyb3dUB5wyBjGSALhlPCVbKIKpud1YbCOAYK3OAp CmAFumfEG5V9fPDDyq0e4Gc2BJ3tZy4zNJ9xeFmSl3+RkcwKzJQdPRvyxL9pYGHRelcQzGVld HmFIIhiHrieKUWwDTZnGn9iQx1CdNxvwVuauRrdSpV3AtNqTfSG0PHjGvEVdlzwfCmL3IHJs5 fzy0HxoibSBnHRvRbmIq3bIYxu4G65VBUGNe4hwZKnJfycdmbL+9e4ivRRm49oeTxQxoDKdj2 grR4jHwrGo2c1Sk3oM37ZPjdL4UDG53T3CVD+jno+lkk7OXQdyvxBhPv6l125OenZpHXg/Mk1 8l7M7xGI7Q3rGmQ9Z4bGjFnxLF6Bv0NS6c6l8XkaS0/AwDuIq66Zx0HHjBqNppqaN97LGW8+2 hBVj2rMU+R5V/KiNSaLhFlWPsDUDIGtPe1Kq4RYyoMWnz6CTbakknfQLRQY+DaKApG4kRBmzi at+3GaT05TVc34NMIc8ZLfO8VA6ZgPjVu9W8ChcxB+sjTe8kYuJaVSaIw2AXTJZ4XWgIKyB7c vGLxSbF9wd93mJ/q3pxqnNFxjBkXhOCGN40tFWrqgpW9tZZed3nHzQ+ljfbrif0r3+phXDcJG X1qcAv6Rts5PcB/0D1paILXQ8m/hbby3ghviIT+IA2EUQH/Ijri4vUE4dgLtm3JipIhll/eEG /RSrtiAA9cr+aEJoHBgxnCzRK1g1POWS/IMJINCiAWttmKkjVBilj8Q4kvXbu4nNDAm+K9M2w nOeocTeHkGuLsq2aeUh8Qow6PkPFHHSbq8rB3mJRg7Q3qRlKTikJSo2SkgZuFUvIdDYuqnwyk kAigq3MxuBI0dMPYVH3CXYRzYaRZn+2/jEhj1Smk6oLugvAVQtDEcy0+ovLAKwxv5JdAsLfGl V6juU0W6wOZJiqUVsTE1qI1zrwdsqiTHRUSVNIpzqugS1HVQg67xDJ37gW12GoE7g+a+xxCia pWeitbENoWTzaahy0d47b6g2XrFCY4+aZfmRI51gOSsV6besS22Yj5It3bVQnpKQRNndPD5Q= = Hi, Thanks for the update. I have few mostly cosmetic comments below, hopefully we'll hear from others whether they agree with this direction. - Please add the standard glibc locale header (see the existing translit_* files for reference) - Consider wrapping the header lines at or around column 70-72 - Consider describing which characters, character ranges, or blocks are supported (perhaps also describe why some of those are not included, see e.g. https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode) - Please remove trailing whitespaces and spaces after ; - No duplicates: % CYRILLIC SMALL LETTER IE ; should become: % CYRILLIC SMALL LETTER IE - There are few issues with the definitions: % CYRILLIC CAPITAL LETTER U ; % CYRILLIC UNDEFINED ; "" % CYRILLIC SMALL LETTER U ; % CYRILLIC UNDEFINED ; "" I wonder would it be possible to automate generation of this file so that issues like the above could avoided? But perhaps that could be the next step once this initial patch lands. Thanks, On 2018-10-05 23:47, Egor Kobylkin wrote: > After some kind help from Marko in the offline discussion > I realized the multi/single character approach I originally took was > against the of the iconv(1) logic anyway. So there is no harm in > dropping it and adopting Marko's suggestion instead. I will do so and > will resubmit the patch with ISO 9:1995/GOST 7.79 System A + fallback to > GOST 7.79 System B (for ASCII). > > However this doesn't resolve the issue for ASCII part being different > for various locales. Again, I am offering the locale maintainers to let > me know if they want to 1) adopt the one I am supplying, 2) write their > own or 3) ignore the patch altogether. Your feedback is appreciated! > > This is the relevant part that helped: >> The first part (ISO-8859-15 or ASCII) defines the target encoding for >> iconv(1). //TRANSLIT is described in the iconv(1) man page as: >> >> If the string //TRANSLIT is appended to to-encoding, characters >> being converted are transliterated when needed and possible. This >> means that when a character cannot be represented in the target >> character set, it can be approximated through one or sev‐ eral >> similar looking characters. Characters that are outside of the >> target character set and cannot be transliterated are replaced >> with a question mark (?) in the output. >> >> So in the above examples, iconv(1) encounters the character U+0428 >> which is not part of either of the target encoding and since >> //TRANSLIT is specified, iconv(1) tries transliteration according to >> the rules defined above, in case of ASCII U+0160 is not part of the >> target encoding so the next alternative is used. > > Bests, > Egor Kobylkin > > On 05.10.2018 14:21, Marko Myllynen wrote: >> Hi, >> >> The scheme I proposed would also be ASCII compatible; consider this >> example: >> >> % CYRILLIC CAPITAL LETTER SHA "";"" >> >> "printf \\u0428\\n | iconv -f UTF-8 -t ISO-8859-15//TRANSLIT | iconv >> -f ISO-8859-15 -t UTF-8" would produce Š as per System A and "printf >> \\u0428\\n | iconv -f UTF-8 -t ASCII//TRANSLIT" would produce Sh as >> per System B. >> >> Thanks, >> >> On 2018-10-05 15:00, Egor Kobylkin wrote: >>> Hi Marko, >>> >>> I have chosen the System B because it is ASCII compartible. System >>> A is not ASCII compartible (diacritics in target). >>> >>> https://en.wikipedia.org/wiki/ISO_9#ISO_9:1995,_or_GOST_7.79_System_A >>> >>> >>> > "GOST 7.79 contains two transliteration tables. >>> >>> System A one Cyrillic character to one Latin character, some with >>> diacritics – identical to ISO 9:1995 >>> >>> System B one Cyrillic character to one or many Latin characters >>> without diacritics " Hope this helps, Egor >>> >>> On 05.10.2018 13:54, Marko Myllynen wrote: >>>> Hi, >>>> >>>> Would it make sense to first use ISO 9:1995/GOST 7.79 System A if >>>> possible and if not, then fall back to GOST 7.79 System B? >>>> >>>> Implementation-wise current translit_* files have few examples >>>> where a non-ASCII transliteration is tried first before an ASCII >>>> fallback. These examples are from translit_neutral: >>>> >>>> % NARROW NO-BREAK SPACE ; % REVERSED >>>> TRIPLE PRIME >>>> "";"" >>>> >>>> Thanks, >>>> >>>> On 2018-10-05 13:29, Egor Kobylkin wrote: >>>>> Keld,Marko,Rafal, other locale maintainers, >>>>> >>>>> this all is written with having in mind a minimal viable fix >>>>> for this bug asap. I want to avoid wasting maintainers time >>>>> getting into fundamental discussions here (although for >>>>> perfectly good reasons). >>>>> >>>>> I see three options: 1. those locale maintainers that are fine >>>>> with using ISO 9:1995/GOST_7.79_System_B cyrillic >>>>> transliteration table (Ru) include it in their locales (see >>>>> attached screenshot of the table). 2. those that that want to >>>>> have a differing table can create their own variety based on >>>>> the spreadsheet I have prepared >>>>> https://sourceware.org/bugzilla/attachment.cgi?id=8590 and >>>>> include it in this patch. 3. those that want to omit a >>>>> cyrillic transliteration altogether for now state so and just >>>>> carry over the bug #2872 from the year 2006. >>>>> >>>>> Does this make sense to you? >>>>> >>>>> Just to be super clear on this: the patch is a stopgap _ASCII_ >>>>> transliteration table. ASCII being AMERICAN Standard Code for >>>>> Information Interchange, that is obviously orthogonal to any >>>>> transliteration rule of other countries. As such it is not >>>>> explicitly targeting transliteration standards of any country. >>>>> >>>>> The fact that the patch is reflecting Russian variety of ISO >>>>> 9:1995/GOST_7.79_System_B is because a) ISO >>>>> 9:1995/GOST_7.79_System_B is available and can be helpful to a >>>>> majority of cyrillic users b) I have access to it including >>>>> via being proficient in Russian. >>>>> >>>>> It is offered to all the respective locale maintainers as a >>>>> stopgap solution. Stopgap in the sense that it is better to >>>>> have some transliteration than not to have any at all and >>>>> carry over the bug from 2006. That it may be a somewhat >>>>> officially correct transliteration for ru_RU is a bonus. In >>>>> that sense I would dub the discussion on the correctness for >>>>> other languages "offtopic". Let me know if this is not OK. >>>>> >>>>> You are all are correctly mentioning the deficiencies of this >>>>> approach. However, I couldn't find a better straightforward >>>>> approach as of yet. Happy to hear from you as on how this >>>>> could be handled. >>>>> >>>>> There is a danger of being caught in the web of >>>>> language/country differences. I propose just pruning the >>>>> locales that are not comfortable including this current table. >>>>> We can address possible solutions in the second wave of >>>>> patching. >>>>> >>>>> I am vary of getting into discussions on specific country >>>>> variants just because of the sheer complexity of this topic. >>>>> It is probably better addressed by respective maintainers of >>>>> their locales. I do not see a "one fits all" solution in this >>>>> first wave possible. >>>>> >>>>> I would like to have this "three options plan of action" >>>>> vetted first and then we could go to the specific detail. >>>>> (Like, for instance, what characters should be included in to >>>>> the table, and in which transliteration form.) >>>>> >>>>> I am looking forward to your reply, Egor Kobylkin >>>>> >>>>> P.S. specifically as to how address languages other than Ru >>>>> included in GOST_7.79_System_B: we can take the first option >>>>> left to right from that table (Ru,By,Uk,Bg,Mk). Then it will >>>>> technically work for all those locales/languages but with >>>>> errors where Ru supersedes their own variants. >>>>> >>>>> >>>>> On 05.10.2018 11:20, Rafal Luzynski wrote: >>>>>> 3.10.2018 11:32 Egor Kobylkin wrote: >>>>>>> >>>>>>> On 03.10.2018 11:19, Keld Simonsen wrote: >>>>>>>> Hi >>>>>>>> >>>>>>>> Please note that translitteration of Cyrillic to latin >>>>>>>> is not universal. There are different schemes for for >>>>>>>> example German, English and Danish, and there is also an >>>>>>>> ISO standard for it. >>>>>>> >>>>>>> Thanks for your feedback, Keld! >>>>>>> >>>>>>> Could the locale maintainers that wouldn't like to include >>>>>>> this patch explicitly state so here? >>>>>> >>>>>> I think it is about me so I must reply. I am sorry about >>>>>> that and the sole reason is my lack of time. I'm just a >>>>>> volunteer here, that means it's not my regular job to work >>>>>> on locale data nor anything in glibc nor in any other open >>>>>> source project. I do these things only in my free time >>>>>> which I don't have much. Of course you will see my >>>>>> contributions here and there but they are either trivial or >>>>>> take me months to complete. Your patches are on my radar but >>>>>> I can't tell any ETA for them. Of course, there are other >>>>>> people around here and they are all welcome to come and >>>>>> join. >>>>>> >>>>>>> That is: - In the case that there is a different preferred >>>>>>> cyrillic transliteration table for any specific locale >>>>>>> their maintainers may want to point me to it so I can >>>>>>> supply a separate table/patch. - Or they could state >>>>>>> explicitly that for some reason they would like to exclude >>>>>>> their locale from the patch for a default cyrillic >>>>>>> transliteration altogether. >>>>>> >>>>>> As Keld wrote, there are probably separate rules for every >>>>>> language so I don't think you should treat your rules as >>>>>> universal and include them in every locale. At first sight, >>>>>> it seems to me they work only for English (as a destination >>>>>> locale). Also, although it is called "transliteration from >>>>>> Cyrillic" it seems that it covers only Russian alphabet. What >>>>>> about other languages which use Cyrillic alphabet but add >>>>>> their own diacritic characters? Think about Belarusian, >>>>>> Ukrainian, Serbian, Chechen, Chuvash, Mari, Ossetian, Yakut, >>>>>> Tatar, and more. What about languages which use Cyrillic >>>>>> alphabet but transliterate their respective letters in a >>>>>> different way than Russian? For example, Russian "Ъ" is (I >>>>>> think) usually skipped in transliteration, I think you >>>>>> propose "``", but when transliterating from Bulgarian they >>>>>> usually transliterate this as "ă". >>>>>> >>>>>> Few remarks: >>>>>> >>>>>> * I think you transliterate "щ" as "shh", wouldn't "shch" be >>>>>> better? * You transliterate "ц" as "cz", wouldn't "ts" be >>>>>> better? By the way, in Polish language "cz" is a correct >>>>>> transliteration of "ч". * You transliterate "й" as "j", this >>>>>> is fine in many languages but wouldn't "y" be better in >>>>>> English? * In case of "е": how will you know if it is >>>>>> correct to transliterate it to "e" or "ie" or "je" or "ye"? >>>>>> >>>>>> These remarks are obviously incomplete, your patch deserves >>>>>> much more attention to review. >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Rafal >>>>>> >>>>> >>>> >>>> >>> >> >> > -- Marko Myllynen --------------BCB4DAF45DDA2708571892B4--