On 01.12.18 23:07, Rafal Luzynski wrote:
> 
> Also, the difference between uconv and iconv is that we can provide
> multiple transliterations for any source character but we can't group
> them into standards so we can't tell iconv to use this or another
> system.  It will just choose the best fitting the current output
> character set and the only thing we can choose is the locale.
> 
> This makes me think: should we add a locale like ru_RU@SystemA or
> ru_RU@SystemB?

Wouldn't it require to create 3 versions of every locale that would
include the translit_cyrillic file then? I.e. en_US + en_US@SystemA,
en_US@SystemB etc.?

This in turn will make two of them optional (as cyrillic fonts are at
the moment). The highest value is in having the default locale being
able to transliterate, isn't it? So putting the transliteration to
optional locales kind of defeats the purpose.

An example from my experience as a user - a networked device or host
would often have the en_US as the default (only?) locale with no viable
way to change it or install cyrillic fonts. Anyway, this is the most
dire situation where the ASCII transliteration certainly helps most.
Having en_US@SystemA or en_US@SystemB theoretically available but not
compiled by the distributor wouldn't help here, would it?

So the only useful scenario here would be to ship your locales with the
transliteration already included by default in en_US. This way the
distributor won't have to get active to include transliteration as
en_US@SystemA or en_US@SystemB.

From my (however limited) point of view it is better to have the System
B in first, then see if some code need to be changed to accommodate
System A/System B problematic. Again, System B is _transcription_ to
ASCII and System A _transliteration_ to Latin with different use cases.

It's insightful to see your comparison of the uconv vs. iconv!
Similar to your checks this is what I was using to see whether any
locale fails the transliteration for any cyrillic letter:

echo
"ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУУ́ФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуу́фхцчшщъыьэюяёђѓєѕіїјљњћќўџѪѫѲѳѴѵҌҍ
ҐґҒғҔҕҖҗҚқҞҟҢңҤҥҦҧҨҩҪҫҬҭҮүҲҳҴҵҺһҼҽҾҿӀӁӂӋӌӐӑӒӓӖӗӘәӜӝӞӟӠӡӤӥӦӧӨөӰӱӲӳӴӵӸӹ’"|
LOCPATH=$workdir/compiled_locales/"$locale"/ LC_ALL="$locale".UTF-8
iconv -f UTF-8 -t ASCII//TRANSLIT

should give (can be asserted with bash string comparison):

AaOoUussYODJG`YeZ`IYiJL`N`TSHK`U`DhABVGDEZHZIJKLMNOPRSTUUFHCCHSHSHHA`Y`E`YUYAabvgdezhzijklmnoprstuufhcchshshh``y`e`yuyayodjg`yez`iyijl`n`tshk`u`dhO`o`FhfhYhyhE`e`
G`g`GHghGHghZH`zh`K`k`K`k`N`n`NGngP`p`O`o`C`C`T`t`UuH`h`TCZtczSH`SH`CH`ch`CH`ch`iZH`zh`CH`ch`A`a`A`a`E`e`A`a`ZH`zh`Z`z`Z`z`I`i`O`o`O`o`U`u`U`u`CH`ch`Y`y`'

And I am attaching another file that has the Unicode Codepoints next to
the letters for easier identification of failures. (like  "U0401-Ё
U0402-Ђ U0403-Ѓ etc.) Hope it will be helpful in creating the tests.

Best regards,
Egor Kobylkin