Hi, I have now implemented all the changes requested for translit_cyrillic file but started hitting what seems like a bug: - If the line ; is present in translt_cyrillic the locale compilation fails i.e. grep CYRILLIC < $testfile | LOCPATH=$workdir/compiled_locales/"$locale"/ LC_ALL="$locale".UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT is hanging frozen. - If the line ; is absent from translit_cyrillic everything works, just the transliteration of fails as expected (? is displayed) - If translit_cyrillic contains ; as the _only_ line the transliteration of works again (others as ?). Would you have any idea into what direction should I look? The new translit_cyrillic is attached. ( is % CYRILLIC CAPITAL LETTER HA) Best regards, Egor On 09.10.2018 01:35, Egor Kobylkin wrote: > On 09.10.2018 00:23, Rafal Luzynski wrote: >> 8.10.2018 14:40 Marko Myllynen wrote: >>> Hi, >>> >>> Thanks for the update. I have few mostly cosmetic comments below, >>> hopefully we'll hear from others whether they agree with this direction. >>> > > Yeah, the earlier we have feedback the more productive we are. I'd be > happy to get much feedback on this as early as possible. So please > everybody concerned please chime in. > >> >>> - No duplicates: >>> >>> % CYRILLIC SMALL LETTER IE >>> ; >>> >>> should become: >>> >>> % CYRILLIC SMALL LETTER IE >>> >>> >>> - There are few issues with the definitions: >>> >>> % CYRILLIC CAPITAL LETTER U >>> ; >>> % CYRILLIC UNDEFINED >>> ; "" >>> >>> % CYRILLIC SMALL LETTER U >>> ; >>> % CYRILLIC UNDEFINED >>> ; "" >> >> Are the duplicates here because some Cyrillic letters may have multiple >> Latin transliterations depending on the context, for example Cyrillic IE >> must be transliterated sometimes as "e", sometimes as "ie", sometimes >> as "ye" or "je"? Can we provide rules for groups of characters instead? > No, the duplicates are just by design of my line generating logic. I > have fixed (removed) them. The varying transcription between > languages/locales can not be handled in one file at all as far as I > understood. > >> >>> I wonder would it be possible to automate generation of this file so >>> that issues like the above could avoided? But perhaps that could be the >>> next step once this initial patch lands. > > I am generating the content part of the translit_cyrillc from the > LibreOffice Spreadsheet. Not sure if you had time to view it by now? > https://sourceware.org/bugzilla/attachment.cgi?id=11299 > > Anyway I have just fixed the issues identified by Marko above in that > spreadsheet. I will do the changes for the below request and then upload > the new translit_cyrillic file to the bugzilla. > > >>> - Please add the standard glibc locale header (see the existing >>> translit_* files for reference) >>> - Consider wrapping the header lines at or around column 70-72 >>> - Consider describing which characters, character ranges, or blocks are >>> supported (perhaps also describe why some of those are not included, see >>> e.g. https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode) >>> - Please remove trailing whitespaces and spaces after ; >> >> Thanks for this, Marko. While at this, in the ChangeLog and in the commit >> message these paths: >> >> * locales/aa_DJ: likewise >> >> 1. Should be a relative path starting in the root directory of glibc > source, >> that is: "* localedata/locales/aa_DJ". >> 2. Should be "Likewise." (starting with an uppercase and ending with a > dot). > > will do. > > Bests, > Egor >