From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-96293-e=80x24.org@sourceware.org>
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN: AS31976 209.132.180.0/23
X-Spam-Status: No, score=-3.5 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED,
	DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,
	SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no
	version=3.4.1
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by dcvr.yhbt.net (Postfix) with ESMTPS id 813121F97E
	for <e@80x24.org>; Mon,  8 Oct 2018 22:52:30 +0000 (UTC)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:subject:to:cc:references:from:message-id:date
	:mime-version:in-reply-to:content-type; q=dns; s=default; b=YRvF
	zZ+S5thvg9dcCSIh7W+J8/VdFDkwqL/l0NmxpUs6W7JaqVQqiO6osEzszuDvuRV0
	Y8GeKXmVAp+HIZn+x7KFHY7MvKDF9zlDZXglWqDFFlkaVeSPH0UT0+kJgYMCdReb
	ufkCMmh4Mil6HFnJMNZyZESRjet4i3z+tOwIteM=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:subject:to:cc:references:from:message-id:date
	:mime-version:in-reply-to:content-type; s=default; bh=djDh7CFH1d
	qjZRGAOJ11HWLzTPM=; b=f+p7PjjI0Brh8q681IP3JVAwNuPVZWYC5yZLia6ZP8
	lQhvaeEgGEhLeavtytVfUcUMtHV7qsJhVTC6bQ1Rw96nYQVfnPuI5GpJv1dmaLKo
	xemHrLIC0hnbjm2e1NXRNh28BqBaqMEx2jArJ2HRCjQe+yOZ6RLPpJN5gmfkWY3/
	w=
Received: (qmail 68757 invoked by alias); 8 Oct 2018 22:52:25 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-e=80x24.org@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 68719 invoked by uid 89); 8 Oct 2018 22:52:25 -0000
Authentication-Results: sourceware.org; auth=none
X-HELO: mout.kundenserver.de
Subject: Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ
 #2872] re-submission for 2.29
To: Rafal Luzynski <digitalfreak@lingonborough.com>,
 Marko Myllynen <myllynen@redhat.com>
Cc: Keld Simonsen <keld@keldix.com>, libc-alpha@sourceware.org,
 libc-locales@sourceware.org, "Dmitry V. Levin" <ldv@altlinux.org>,
 Volodymyr Lisivka <vlisivka@gmail.com>, Carlos O'Donell <carlos@redhat.com>,
 Max Kutny <mkutny@gmail.com>, danilo@gnome.org
References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com>
 <20180412224352.GB2911@altlinux.org>
 <16e785f3-2e9f-ceb2-698f-dc33c91a5d5e@kobylkin.com>
 <ac4c9b3e-aeae-30de-23ef-24d8f53d7bc4@kobylkin.com>
 <20181003091949.GA21486@rap.rap.dk>
 <21d872b2-613e-d1f5-26c0-baa4b5721df9@kobylkin.com>
 <1485772360.805333.1538731225156@poczta.nazwa.pl>
 <19e29568-e710-535f-4f90-98dbcec930ed@kobylkin.com>
 <1028447684.826961.1539036295224@poczta.nazwa.pl>
From: Egor Kobylkin <egor@kobylkin.com>
Openpgp: preference=signencrypt
Message-ID: <bfbf5abf-445b-4392-4f41-8c95700b8f11@kobylkin.com>
Date: Tue, 9 Oct 2018 00:52:00 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <1028447684.826961.1539036295224@poczta.nazwa.pl>
Content-Type: multipart/mixed;
 boundary="------------BCB4DAF45DDA2708571892B4"

This is a multi-part message in MIME format.
--------------BCB4DAF45DDA2708571892B4
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Hi Rafal,

> But, while at this, is there anything that stops are from adding
> transliteration rules for additional Cyrillic characters not used in
> Russian but used in other languages?

Just to make sure we are not talking at cross purposes. Since your last
email on this topic on the suggestion from Marko I have already
implemented ISO 9 transliteration for all characters there are. This
should cover most if not all Slavic Cyrillic. You seem to have just
noticed and replied to this email of Marko as I write mine.

Pls also check the Spreadsheet version I have just uploaded
https://sourceware.org/bugzilla/attachment.cgi?id=11298

I am currently absorbing Marko's further suggestions and correction to
that one and will get back for more discussion once done there. I am
reading your suggestions and taking them to my heart, be sure of that.

Two  professional translators independently indicated the difference
between transliteration and transcription to me. Transliteration is
normative (letter for letter) and transcription is phonetic - letter for
whatever combination of Latin letters in the target language that sounds
like it for a native speaker. While transliteration should be easy to
cover for all those languages via ISO 9, transcription is inherently
language specific. The problem is we are (mis)using the transcription as
transliteration to ASCII because ASCII set of characters does not allow
for proper transcription. Another problem is that to be really useful
the ASCII transliteration should work outside of source locale (i.e. not
only ru_RU but en_US, de_DE, en_DE, es_ES etc. or even just C locale).

In fact for myself I would be committed to do all work needed to cover
at least C, en_US, ru_RU, de_DE in that order. ru_RU as a "courtesy", I
am not really using it but hope more contributors for locales may come
because of that and fix my bugs :-).


> The problem is that we don't have a separate maintainer for each
> locale, we have only 2 maintainers for about 200 locales and we must
> represent them all.

It was not clear to me that glibc team can not fall back on the
individual locale maintainers to make the decision. But then it may make
the decision making even easier. If you guys have a list of requirements
(may be implicit until now) could you please shoot them my way? We can
also certainly just keep this thread up and have all issues ironed out.

Anyway hopefully with ISO 9 as a first column in the translit_cyrillic
we cover the issue of the completeness of transliteration now. What we
need to figure out is transcription/transliteration to ASCII - second
column.

Are we sharing the same view on this?

Speaking on decision making - maybe I can get an officially certified
court translator to answer our questions. Do you care to put a list
together of questions you would like answered to make a decision on the
table/inclusion into various locales?

Hope this helps,
Egor


On 09.10.2018 00:04, Rafal Luzynski wrote:
> 5.10.2018 12:36 Egor Kobylkin <egor@kobylkin.com> wrote:
>> [...] I see three options: 1. those locale maintainers that are
>> fine with using ISO 9:1995/GOST_7.79_System_B cyrillic
>> transliteration table (Ru) include it in their locales.
>> https://sourceware.org/bugzilla/attachment.cgi?id=11289 2. those
>> that that want to have a differing table can create their own 
>> variety based on the spreadsheet I have prepared 
>> https://sourceware.org/bugzilla/attachment.cgi?id=8590 and include
>> it in this patch. 3. those that want to omit a cyrillic
>> transliteration altogether for now state so and just carry over the
>> bug #2872 from the year 2006.
>> 
>> Does this make sense to you?
> 
> The problem is that we don't have a separate maintainer for each
> locale, we have only 2 maintainers for about 200 locales and we must
> represent them all.  Sometimes a locale may happen to be our own
> native locale or of someone in this list, or it may be a locale which
> we accidentally can speak as a foreign language, or we may have
> friends who can speak it. Or it may be totally unknown and we still
> must somehow handle it.
> 
> I think that these transliteration rules should be included in
> multiple locales on "opt-in" basis rather than "opt-out".  I mean, we
> should not include them in all locales unless someone explicitly
> provides a different rules.  Instead, I think we should add them
> (maybe with modification) only to those locales where we have a good
> reason to think they will work.
> 
> Particularly, I think that those rules will not be helpful at all
> for the languages which use neither Latin nor Cyrillic alphabet.
> 
>> [...] The fact that the patch is reflecting Russian variety of ISO 
>> 9:1995/GOST_7.79_System_B is because a) ISO
>> 9:1995/GOST_7.79_System_B is available and can be helpful to a
>> majority of cyrillic users b) I have access to it including via
>> being proficient in Russian.
> 
> I took a look at these standards and as first I doubted they may be 
> correct for English language now I understand they are created for 
> Russian users.  Therefore I think it is pretty correct to include
> them to Russian locale data.  Will it be OK if we say that it is only
> for Russian language?  Will it be satisfying for you and/or your
> users?
> 
>> It is offered to all the respective locale maintainers as a
>> stopgap solution. Stopgap in the sense that it is better to have
>> some transliteration than not to have any at all and carry over the
>> bug from 2006. That it may be a somewhat officially correct
>> transliteration for ru_RU is a bonus. In that sense I would dub the
>> discussion on the correctness for other languages "offtopic". Let
>> me know if this is not OK.
> 
> If you refer to other languages than Russian which also use the
> Cyrillic alphabet but need a different transliteration rules than
> Russian for the same characters then it is OK for me now.  I am
> afraid that the iconv algorithm does not handle such case.  Of
> course, we should add this missing feature eventually but I do not
> volunteer to do it now.
> 
>> [...] P.S. specifically as to how address languages other than Ru
>> included in GOST_7.79_System_B: we can take the first option left
>> to right from that table (Ru,By,Uk,Bg,Mk). Then it will technically
>> work for all those locales/languages but with errors where Ru
>> supersedes their own variants.
> 
> Makes sense, as long as we cannot select the source language now.
> 
> But, while at this, is there anything that stops are from adding
> transliteration rules for additional Cyrillic characters not used in
> Russian but used in other languages?
> 
> Regards,
> 
> Rafal
> 


--------------BCB4DAF45DDA2708571892B4
Content-Type: message/rfc822;
 name="Attached Message"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment;
 filename="Attached Message"

Return-Path: <myllynen@redhat.com>
Received: from mail-wm1-f66.google.com ([209.85.128.66]) by mx.kundenserver.de
 (mxeue010 [212.227.15.41]) with ESMTPS (Nemesis) id 1Mw89c-1ft4SE162u-00s7Ym
 for <egor@kobylkin.com>; Mon, 08 Oct 2018 14:40:58 +0200
Received: from mail-wm1-f66.google.com ([209.85.128.66]) by mx.kundenserver.de
 (mxeue010 [212.227.15.41]) with ESMTPS (Nemesis) id 1Mw89c-1ft4SE162u-00s7Ym
 for <egor@kobylkin.com>; Mon, 08 Oct 2018 14:40:58 +0200
Received: by mail-wm1-f66.google.com with SMTP id r63-v6so8008883wma.4
        for <egor@kobylkin.com>; Mon, 08 Oct 2018 05:40:58 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:reply-to:subject:to:cc:references:from
         :organization:message-id:date:user-agent:mime-version:in-reply-to
         :content-language:content-transfer-encoding;
        bh=v/Pkq9G8E6W3f0+VtVuY9tDm9a0k3xHQZav/N7rcX9k=;
        b=InnKl+8OKd1u8fRYQynZukg/dK1ktS7DnKrPxIC08ud2FxuvoCizsD5DzFsCBwLy9e
         Odxw2/K09ZT4w0hIYLsEWAonvBUkoTfShm+Xf/VacOu2OSOTparrqjYKj+PS90jKCiao
         WJnx1P4BNJ+i5P+GzBNj1nIR0rbjevU58KCZ8t6XoGBWFdoFobHTi/I9WXyvaH5JTUar
         +Gs4lvfWTFqTiHruG0l8TY72wtRNegZsEl0eTUDGhR7Z6zHXTgZVpwTXckzC2HClcSKg
         bqLSevyovvpM+x6FDySiFeoPcSqjwq7clOJeGUZJDg3ZqAKg9LIPaf//P2Lu9nuzoWi0
         Fn4A==
X-Gm-Message-State: ABuFfoiKW12wAMOtFRUaiCLlYiP+rkMGzU9PpMjEHdI+me49BalETjQJ
	ILo+vMjXHl0S0MOasPnhU3ZxmQ==
X-Google-Smtp-Source: ACcGV60dH+vo+Hg+FLxhsSN7Pa/PZvVDQwErIuioWqX88XJGs21tWur9UAg0LCQon2P/ty/Y9bjNew==
X-Received: by 2002:a1c:930c:: with SMTP id v12-v6mr16048194wmd.9.1539002457592;
        Mon, 08 Oct 2018 05:40:57 -0700 (PDT)
Received: from [192.168.1.101] (87-93-44-255.bb.dnainternet.fi. [87.93.44.255])
        by smtp.gmail.com with ESMTPSA id c13-v6sm21609464wrm.50.2018.10.08.05.40.54
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Mon, 08 Oct 2018 05:40:56 -0700 (PDT)
Reply-To: Marko Myllynen <myllynen@redhat.com>
Subject: Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ
 #2872] re-submission for 2.29
To: Egor Kobylkin <egor@kobylkin.com>,
 Rafal Luzynski <digitalfreak@lingonborough.com>,
 Keld Simonsen <keld@keldix.com>
Cc: libc-alpha@sourceware.org, libc-locales@sourceware.org,
 "Dmitry V. Levin" <ldv@altlinux.org>, Volodymyr Lisivka
 <vlisivka@gmail.com>, Carlos O'Donell <carlos@redhat.com>,
 Max Kutny <mkutny@gmail.com>, danilo@gnome.org
References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com>
 <20180412224352.GB2911@altlinux.org>
 <16e785f3-2e9f-ceb2-698f-dc33c91a5d5e@kobylkin.com>
 <ac4c9b3e-aeae-30de-23ef-24d8f53d7bc4@kobylkin.com>
 <20181003091949.GA21486@rap.rap.dk>
 <21d872b2-613e-d1f5-26c0-baa4b5721df9@kobylkin.com>
 <1485772360.805333.1538731225156@poczta.nazwa.pl>
 <deacdf31-d0bb-a92d-1de3-934d6b4cb158@kobylkin.com>
 <bda2ca60-18f1-3b19-91e5-c9ad144bc834@redhat.com>
 <bb4e1ba5-5fa5-2986-2573-7d27be226124@kobylkin.com>
 <69e26cab-810e-824b-3b16-b75ac44d8b0c@redhat.com>
 <b8f02fe9-f911-487f-b50b-9b0c43191cb6@kobylkin.com>
From: Marko Myllynen <myllynen@redhat.com>
Organization: Red Hat
Message-ID: <f51992ad-008b-03a4-8880-4c12edced53b@redhat.com>
Date: Mon, 8 Oct 2018 15:40:53 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <b8f02fe9-f911-487f-b50b-9b0c43191cb6@kobylkin.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Envelope-To: <egor@kobylkin.com>
X-UI-Filterresults: notjunk:1;V01:K0:ek5lqM3egBU=:axIY+JUj2o6NbHsWoRmlYm09At
 1XxoSaVjP4Wt8BoLX0D5cI6JguhLMkdX+m5KDkhl0l1+fA+fAanlzccSNotINUfKB+fexP8vi
 R/eaBHAgs8903zdhKoF4fkj9cwiFT+6iy3+oad5wtStvKkSHwShc9//Rw/ivKcnJwo0AqdjIb
 bV765IvgOfuL09L1OsH50UG3Zvr9CIP/Tcf1jmc9FDfLe9l2ApfxsZzAu5/h9MJYbGEecqewC
 9oBF2WZj+Nh1WL43UBdaZf3gjqrPkIJXUXu57YNgFyJq163xD8S/44wvuX0yRPnX6G+W57U67
 0CR8p7xWBt5POMJngV7+w4PAMWHOCvoqfd7zIXfmP2k7s5AlFMU1t1+OcUIV9chY7BcMVKSV4
 3rMAnodRMrBogIbO3tBtgC3t2e0VOzrKKZ6e027973uJfMqd3OVvNKb+EknYYQ3knPm+59iIY
 6isOriM5+yDLUoz0jZKN5/Hyu1rvmJb9nuyb3dUB5wyBjGSALhlPCVbKIKpud1YbCOAYK3OAp
 CmAFumfEG5V9fPDDyq0e4Gc2BJ3tZy4zNJ9xeFmSl3+RkcwKzJQdPRvyxL9pYGHRelcQzGVld
 HmFIIhiHrieKUWwDTZnGn9iQx1CdNxvwVuauRrdSpV3AtNqTfSG0PHjGvEVdlzwfCmL3IHJs5
 fzy0HxoibSBnHRvRbmIq3bIYxu4G65VBUGNe4hwZKnJfycdmbL+9e4ivRRm49oeTxQxoDKdj2
 grR4jHwrGo2c1Sk3oM37ZPjdL4UDG53T3CVD+jno+lkk7OXQdyvxBhPv6l125OenZpHXg/Mk1
 8l7M7xGI7Q3rGmQ9Z4bGjFnxLF6Bv0NS6c6l8XkaS0/AwDuIq66Zx0HHjBqNppqaN97LGW8+2
 hBVj2rMU+R5V/KiNSaLhFlWPsDUDIGtPe1Kq4RYyoMWnz6CTbakknfQLRQY+DaKApG4kRBmzi
 at+3GaT05TVc34NMIc8ZLfO8VA6ZgPjVu9W8ChcxB+sjTe8kYuJaVSaIw2AXTJZ4XWgIKyB7c
 vGLxSbF9wd93mJ/q3pxqnNFxjBkXhOCGN40tFWrqgpW9tZZed3nHzQ+ljfbrif0r3+phXDcJG
 X1qcAv6Rts5PcB/0D1paILXQ8m/hbby3ghviIT+IA2EUQH/Ijri4vUE4dgLtm3JipIhll/eEG
 /RSrtiAA9cr+aEJoHBgxnCzRK1g1POWS/IMJINCiAWttmKkjVBilj8Q4kvXbu4nNDAm+K9M2w
 nOeocTeHkGuLsq2aeUh8Qow6PkPFHHSbq8rB3mJRg7Q3qRlKTikJSo2SkgZuFUvIdDYuqnwyk
 kAigq3MxuBI0dMPYVH3CXYRzYaRZn+2/jEhj1Smk6oLugvAVQtDEcy0+ovLAKwxv5JdAsLfGl
 V6juU0W6wOZJiqUVsTE1qI1zrwdsqiTHRUSVNIpzqugS1HVQg67xDJ37gW12GoE7g+a+xxCia
 pWeitbENoWTzaahy0d47b6g2XrFCY4+aZfmRI51gOSsV6besS22Yj5It3bVQnpKQRNndPD5Q=
 =

Hi,

Thanks for the update. I have few mostly cosmetic comments below,
hopefully we'll hear from others whether they agree with this direction.

- Please add the standard glibc locale header (see the existing
translit_* files for reference)
- Consider wrapping the header lines at or around column 70-72
- Consider describing which characters, character ranges, or blocks are
supported (perhaps also describe why some of those are not included, see
e.g. https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode)
- Please remove trailing whitespaces and spaces after ;
- No duplicates:

% CYRILLIC SMALL LETTER IE
<U0435> <U0065>; <U0065>

should become:

% CYRILLIC SMALL LETTER IE
<U0435> <U0065>

- There are few issues with the definitions:

% CYRILLIC CAPITAL LETTER U
<U0423> <U0055>; <U0055>
% CYRILLIC UNDEFINED
<U0423><U0423> <U00DA>; "<U0055><U0060>"

% CYRILLIC SMALL LETTER U
<U0443> <U0075>; <U0075>
% CYRILLIC UNDEFINED
<U0443><U0443> <U00FA>; "<U0075><U0060>"

I wonder would it be possible to automate generation of this file so
that issues like the above could avoided? But perhaps that could be the
next step once this initial patch lands.

Thanks,

On 2018-10-05 23:47, Egor Kobylkin wrote:
> After some kind help from Marko in the offline discussion
> I realized the multi/single character approach I originally took was
> against the  of the iconv(1) logic anyway. So there is no harm in
> dropping it and adopting Marko's suggestion instead. I will do so and
> will resubmit the patch with ISO 9:1995/GOST 7.79 System A + fallback to
> GOST 7.79 System B (for ASCII).
> 
> However this doesn't resolve the issue for ASCII part being different
> for various locales. Again, I am offering the locale maintainers to let
> me know if they want to 1) adopt the one I am supplying, 2) write their
> own or 3) ignore the patch altogether. Your feedback is appreciated!
> 
> This is the relevant part that helped:
>> The first part (ISO-8859-15 or ASCII) defines the target encoding for
>> iconv(1). //TRANSLIT is described in the iconv(1) man page as:
>>
>> If the string //TRANSLIT is appended to to-encoding,  characters 
>> being  converted  are  transliterated  when needed and possible. This
>> means that when a character cannot be  represented  in  the target
>> character set, it can be approximated through one or sev‐ eral
>> similar looking characters.  Characters that are outside of the
>> target  character  set  and  cannot  be  transliterated are replaced
>> with a question mark (?) in the output.
>>
>> So in the above examples, iconv(1) encounters the character U+0428
>> which is not part of either of the target encoding and since
>> //TRANSLIT is specified, iconv(1) tries transliteration according to
>> the rules defined above, in case of ASCII U+0160 is not part of the
>> target encoding so the next alternative is used.
> 
> Bests,
> Egor Kobylkin
> 
> On 05.10.2018 14:21, Marko Myllynen wrote:
>> Hi,
>>
>> The scheme I proposed would also be ASCII compatible; consider this 
>> example:
>>
>> % CYRILLIC CAPITAL LETTER SHA <U0428> "<U0160>";"<U0053><U0068>"
>>
>> "printf \\u0428\\n | iconv -f UTF-8 -t ISO-8859-15//TRANSLIT | iconv 
>> -f ISO-8859-15 -t UTF-8" would produce Š as per System A and "printf
>>  \\u0428\\n | iconv -f UTF-8 -t ASCII//TRANSLIT" would produce Sh as 
>> per System B.
>>
>> Thanks,
>>
>> On 2018-10-05 15:00, Egor Kobylkin wrote:
>>> Hi Marko,
>>>
>>> I have chosen the System B because it is ASCII compartible. System 
>>> A is not ASCII compartible (diacritics in target).
>>>
>>> https://en.wikipedia.org/wiki/ISO_9#ISO_9:1995,_or_GOST_7.79_System_A
>>>
>>>
>>>
> "GOST 7.79 contains two transliteration tables.
>>>
>>> System A one Cyrillic character to one Latin character, some with 
>>> diacritics – identical to ISO 9:1995
>>>
>>> System B one Cyrillic character to one or many Latin characters 
>>> without diacritics " Hope this helps, Egor
>>>
>>> On 05.10.2018 13:54, Marko Myllynen wrote:
>>>> Hi,
>>>>
>>>> Would it make sense to first use ISO 9:1995/GOST 7.79 System A if
>>>> possible and if not, then fall back to GOST 7.79 System B?
>>>>
>>>> Implementation-wise current translit_* files have few examples 
>>>> where a non-ASCII transliteration is tried first before an ASCII 
>>>> fallback. These examples are from translit_neutral:
>>>>
>>>> % NARROW NO-BREAK SPACE <U202F> <U00A0>;<U0020> % REVERSED
>>>> TRIPLE PRIME <U2037>
>>>> "<U2035><U2035><U2035>";"<U0060><U0060><U0060>"
>>>>
>>>> Thanks,
>>>>
>>>> On 2018-10-05 13:29, Egor Kobylkin wrote:
>>>>> Keld,Marko,Rafal, other locale maintainers,
>>>>>
>>>>> this all is written with having in mind a minimal viable fix 
>>>>> for this bug asap. I want to avoid wasting maintainers time 
>>>>> getting into fundamental discussions here (although for 
>>>>> perfectly good reasons).
>>>>>
>>>>> I see three options: 1. those locale maintainers that are fine 
>>>>> with using ISO 9:1995/GOST_7.79_System_B cyrillic 
>>>>> transliteration table (Ru) include it in their locales (see 
>>>>> attached screenshot of the table). 2. those that that want to 
>>>>> have a differing table can create their own variety based on 
>>>>> the spreadsheet I have prepared 
>>>>> https://sourceware.org/bugzilla/attachment.cgi?id=8590 and 
>>>>> include it in this patch. 3. those that want to omit a
>>>>> cyrillic transliteration altogether for now state so and just
>>>>> carry over the bug #2872 from the year 2006.
>>>>>
>>>>> Does this make sense to you?
>>>>>
>>>>> Just to be super clear on this: the patch is a stopgap _ASCII_
>>>>>  transliteration table. ASCII being AMERICAN Standard Code for
>>>>>  Information Interchange, that is obviously orthogonal to any 
>>>>> transliteration rule of other countries. As such it is not 
>>>>> explicitly targeting transliteration standards of any country.
>>>>>
>>>>> The fact that the patch is reflecting Russian variety of ISO 
>>>>> 9:1995/GOST_7.79_System_B is because a) ISO 
>>>>> 9:1995/GOST_7.79_System_B is available and can be helpful to a 
>>>>> majority of cyrillic users b) I have access to it including
>>>>> via being proficient in Russian.
>>>>>
>>>>> It is offered to all the respective locale maintainers as a 
>>>>> stopgap solution. Stopgap in the sense that it is better to 
>>>>> have some transliteration than not to have any at all and
>>>>> carry over the bug from 2006. That it may be a somewhat
>>>>> officially correct transliteration for ru_RU is a bonus. In
>>>>> that sense I would dub the discussion on the correctness for
>>>>> other languages "offtopic". Let me know if this is not OK.
>>>>>
>>>>> You are all are correctly mentioning the deficiencies of this 
>>>>> approach. However, I couldn't find a better straightforward 
>>>>> approach as of yet. Happy to hear from you as on how this
>>>>> could be handled.
>>>>>
>>>>> There is a danger of being caught in the web of 
>>>>> language/country differences. I propose just pruning the 
>>>>> locales that are not comfortable including this current table. 
>>>>> We can address possible solutions in the second wave of 
>>>>> patching.
>>>>>
>>>>> I am vary of getting into discussions on specific country 
>>>>> variants just because of the sheer complexity of this topic.
>>>>> It is probably better addressed by respective maintainers of
>>>>> their locales. I do not see a "one fits all" solution in this
>>>>> first wave possible.
>>>>>
>>>>> I would like to have this "three options plan of action"
>>>>> vetted first and then we could go to the specific detail.
>>>>> (Like, for instance, what characters should be included in to
>>>>> the table, and in which transliteration form.)
>>>>>
>>>>> I am looking forward to your reply, Egor Kobylkin
>>>>>
>>>>> P.S. specifically as to how address languages other than Ru 
>>>>> included in GOST_7.79_System_B: we can take the first option 
>>>>> left to right from that table (Ru,By,Uk,Bg,Mk). Then it will 
>>>>> technically work for all those locales/languages but with 
>>>>> errors where Ru supersedes their own variants.
>>>>>
>>>>>
>>>>> On 05.10.2018 11:20, Rafal Luzynski wrote:
>>>>>> 3.10.2018 11:32 Egor Kobylkin <egor@kobylkin.com> wrote:
>>>>>>>
>>>>>>> On 03.10.2018 11:19, Keld Simonsen wrote:
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> Please note that translitteration of Cyrillic to latin
>>>>>>>> is not universal. There are different schemes for for 
>>>>>>>> example German, English and Danish, and there is also an 
>>>>>>>> ISO standard for it.
>>>>>>>
>>>>>>> Thanks for your feedback, Keld!
>>>>>>>
>>>>>>> Could the locale maintainers that wouldn't like to include 
>>>>>>> this patch explicitly state so here?
>>>>>>
>>>>>> I think it is about me so I must reply.  I am sorry about 
>>>>>> that and the sole reason is my lack of time.  I'm just a 
>>>>>> volunteer here, that means it's not my regular job to work
>>>>>> on locale data nor anything in glibc nor in any other open 
>>>>>> source project.  I do these things only in my free time
>>>>>> which I don't have much.  Of course you will see my
>>>>>> contributions here and there but they are either trivial or
>>>>>> take me months to complete.  Your patches are on my radar but
>>>>>> I can't tell any ETA for them.  Of course, there are other
>>>>>> people around here and they are all welcome to come and
>>>>>> join.
>>>>>>
>>>>>>> That is: - In the case that there is a different preferred 
>>>>>>> cyrillic transliteration table for any specific locale 
>>>>>>> their maintainers may want to point me to it so I can 
>>>>>>> supply a separate table/patch. - Or they could state 
>>>>>>> explicitly that for some reason they would like to exclude 
>>>>>>> their locale from the patch for a default cyrillic 
>>>>>>> transliteration altogether.
>>>>>>
>>>>>> As Keld wrote, there are probably separate rules for every 
>>>>>> language so I don't think you should treat your rules as 
>>>>>> universal and include them in every locale.  At first sight, 
>>>>>> it seems to me they work only for English (as a destination 
>>>>>> locale).  Also, although it is called "transliteration from 
>>>>>> Cyrillic" it seems that it covers only Russian alphabet. What
>>>>>> about other languages which use Cyrillic alphabet but add
>>>>>> their own diacritic characters?  Think about Belarusian, 
>>>>>> Ukrainian, Serbian, Chechen, Chuvash, Mari, Ossetian, Yakut, 
>>>>>> Tatar, and more.  What about languages which use Cyrillic 
>>>>>> alphabet but transliterate their respective letters in a 
>>>>>> different way than Russian?  For example, Russian "Ъ" is (I 
>>>>>> think) usually skipped in transliteration, I think you 
>>>>>> propose "``", but when transliterating from Bulgarian they 
>>>>>> usually transliterate this as "ă".
>>>>>>
>>>>>> Few remarks:
>>>>>>
>>>>>> * I think you transliterate "щ" as "shh", wouldn't "shch" be 
>>>>>> better? * You transliterate "ц" as "cz", wouldn't "ts" be 
>>>>>> better?  By the way, in Polish language "cz" is a correct 
>>>>>> transliteration of "ч". * You transliterate "й" as "j", this 
>>>>>> is fine in many languages but wouldn't "y" be better in 
>>>>>> English? * In case of "е": how will you know if it is
>>>>>> correct to transliterate it to "e" or "ie" or "je" or "ye"?
>>>>>>
>>>>>> These remarks are obviously incomplete, your patch deserves 
>>>>>> much more attention to review.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Rafal
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
> 


-- 
Marko Myllynen

--------------BCB4DAF45DDA2708571892B4--