removed a png image attachment

Keld,Marko,Rafal, other locale maintainers,

this all is written with having in mind a minimal viable fix for this
bug asap. I want to avoid wasting maintainers time getting into
fundamental discussions here (although for perfectly good reasons).

I see three options:
1. those locale maintainers that are fine with using ISO
9:1995/GOST_7.79_System_B cyrillic transliteration table (Ru) include it
in their locales. https://sourceware.org/bugzilla/attachment.cgi?id=11289
2. those that that want to have a differing table can create their own
variety based on the spreadsheet I have prepared
https://sourceware.org/bugzilla/attachment.cgi?id=8590 and include it in
this patch.
3. those that want to omit a cyrillic transliteration altogether for now
state so and just carry over the bug #2872 from the year 2006.

Does this make sense to you?

Just to be super clear on this: the patch is a stopgap _ASCII_
transliteration table. ASCII being AMERICAN Standard Code for
Information Interchange, that is obviously orthogonal to any
transliteration rule of other countries. As such it is not explicitly
targeting transliteration standards of any country.

The fact that the patch is reflecting Russian variety of ISO
9:1995/GOST_7.79_System_B is because a) ISO 9:1995/GOST_7.79_System_B is
available and can be helpful to a majority of cyrillic users b) I have
access to it including via being proficient in Russian.

It is offered to all the respective locale maintainers as a stopgap
solution. Stopgap in the sense that it is better to have some
transliteration than not to have any at all and carry over the bug from
2006. That it may be a somewhat officially correct transliteration for
ru_RU is a bonus. In that sense I would dub the discussion on the
correctness for other languages "offtopic". Let me know if this is not OK.

You are all are correctly mentioning the deficiencies of this approach.
However, I couldn't find a better straightforward approach as of yet.
Happy to hear from you as on how this could be handled.

There is a danger of being caught in the web of language/country
differences. I propose just pruning the locales that are not comfortable
including this current table. We can address possible solutions in the
second wave of patching.

I am vary of getting into discussions on specific country variants just
because of the sheer complexity of this topic. It is probably better
addressed by respective maintainers of their locales. I do not see a
"one fits all" solution in this first wave possible.

I would like to have this "three options plan of action" vetted first
and then we could go to the specific detail. (Like, for instance, what
characters should be included in to the table, and in which
transliteration form.)

I am looking forward to your reply,
Egor Kobylkin

P.S. specifically as to how address languages other than Ru included in
GOST_7.79_System_B: we can take the first option left to right from that
table (Ru,By,Uk,Bg,Mk). Then it will technically work for all those
locales/languages but with errors where Ru supersedes their own variants.


On 05.10.2018 11:20, Rafal Luzynski wrote:
> 3.10.2018 11:32 Egor Kobylkin <egor@kobylkin.com> wrote:
>>
>> On 03.10.2018 11:19, Keld Simonsen wrote:
>>> Hi
>>>
>>> Please note that translitteration of Cyrillic to latin is not universal.
>>> There are different schemes for for example German, English and Danish, and
>>> there is also an ISO standard for it.
>>
>> Thanks for your feedback, Keld!
>>
>> Could the locale maintainers that wouldn't like to include this patch
>> explicitly state so here?
> 
> I think it is about me so I must reply.  I am sorry about that and the sole
> reason is my lack of time.  I'm just a volunteer here, that means it's not
> my regular job to work on locale data nor anything in glibc nor in any other
> open source project.  I do these things only in my free time which I don't
> have much.  Of course you will see my contributions here and there but they
> are either trivial or take me months to complete.  Your patches are on my
> radar but I can't tell any ETA for them.  Of course, there are other people
> around here and they are all welcome to come and join.
> 
>> That is:
>> - In the case that there is a different preferred cyrillic
>> transliteration table for any specific locale their maintainers may want
>> to point me to it so I can supply a separate table/patch.
>> - Or they could state explicitly that for some reason they would like to
>> exclude their locale from the patch for a default cyrillic
>> transliteration altogether.
> 
> As Keld wrote, there are probably separate rules for every language so
> I don't think you should treat your rules as universal and include them
> in every locale.  At first sight, it seems to me they work only for English
> (as a destination locale).  Also, although it is called "transliteration
> from Cyrillic" it seems that it covers only Russian alphabet.  What about
> other languages which use Cyrillic alphabet but add their own diacritic
> characters?  Think about Belarusian, Ukrainian, Serbian, Chechen, Chuvash,
> Mari, Ossetian, Yakut, Tatar, and more.  What about languages which use
> Cyrillic alphabet but transliterate their respective letters in a different
> way than Russian?  For example, Russian "Ъ" is (I think) usually skipped
> in transliteration, I think you propose "``", but when transliterating from
> Bulgarian they usually transliterate this as "ă".
> 
> Few remarks:
> 
> * I think you transliterate "щ" as "shh", wouldn't "shch" be better?
> * You transliterate "ц" as "cz", wouldn't "ts" be better?  By the way,
>   in Polish language "cz" is a correct transliteration of "ч".
> * You transliterate "й" as "j", this is fine in many languages but wouldn't
>   "y" be better in English?
> * In case of "е": how will you know if it is correct to transliterate it
>   to "e" or "ie" or "je" or "ye"?
> 
> These remarks are obviously incomplete, your patch deserves much more
> attention to review.
> 
> Best regards,
> 
> Rafal
>