On 01.12.18 23:07, Rafal Luzynski wrote: > > Also, the difference between uconv and iconv is that we can provide > multiple transliterations for any source character but we can't group > them into standards so we can't tell iconv to use this or another > system. It will just choose the best fitting the current output > character set and the only thing we can choose is the locale. > > This makes me think: should we add a locale like ru_RU@SystemA or > ru_RU@SystemB? Wouldn't it require to create 3 versions of every locale that would include the translit_cyrillic file then? I.e. en_US + en_US@SystemA, en_US@SystemB etc.? This in turn will make two of them optional (as cyrillic fonts are at the moment). The highest value is in having the default locale being able to transliterate, isn't it? So putting the transliteration to optional locales kind of defeats the purpose. An example from my experience as a user - a networked device or host would often have the en_US as the default (only?) locale with no viable way to change it or install cyrillic fonts. Anyway, this is the most dire situation where the ASCII transliteration certainly helps most. Having en_US@SystemA or en_US@SystemB theoretically available but not compiled by the distributor wouldn't help here, would it? So the only useful scenario here would be to ship your locales with the transliteration already included by default in en_US. This way the distributor won't have to get active to include transliteration as en_US@SystemA or en_US@SystemB. From my (however limited) point of view it is better to have the System B in first, then see if some code need to be changed to accommodate System A/System B problematic. Again, System B is _transcription_ to ASCII and System A _transliteration_ to Latin with different use cases. It's insightful to see your comparison of the uconv vs. iconv! Similar to your checks this is what I was using to see whether any locale fails the transliteration for any cyrillic letter: echo "ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУУ́ФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуу́фхцчшщъыьэюяёђѓєѕіїјљњћќўџѪѫѲѳѴѵҌҍ ҐґҒғҔҕҖҗҚқҞҟҢңҤҥҦҧҨҩҪҫҬҭҮүҲҳҴҵҺһҼҽҾҿӀӁӂӋӌӐӑӒӓӖӗӘәӜӝӞӟӠӡӤӥӦӧӨөӰӱӲӳӴӵӸӹ’"| LOCPATH=$workdir/compiled_locales/"$locale"/ LC_ALL="$locale".UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT should give (can be asserted with bash string comparison): AaOoUussYODJG`YeZ`IYiJL`N`TSHK`U`DhABVGDEZHZIJKLMNOPRSTUUFHCCHSHSHHA`Y`E`YUYAabvgdezhzijklmnoprstuufhcchshshh``y`e`yuyayodjg`yez`iyijl`n`tshk`u`dhO`o`FhfhYhyhE`e` G`g`GHghGHghZH`zh`K`k`K`k`N`n`NGngP`p`O`o`C`C`T`t`UuH`h`TCZtczSH`SH`CH`ch`CH`ch`iZH`zh`CH`ch`A`a`A`a`E`e`A`a`ZH`zh`Z`z`Z`z`I`i`O`o`O`o`U`u`U`u`CH`ch`Y`y`' And I am attaching another file that has the Unicode Codepoints next to the letters for easier identification of failures. (like "U0401-Ё U0402-Ђ U0403-Ѓ etc.) Hope it will be helpful in creating the tests. Best regards, Egor Kobylkin