From: Marko Myllynen <myllynen@redhat.com>
To: Egor Kobylkin <egor@kobylkin.com>,
libc-alpha@sourceware.org, libc-locales@sourceware.org,
Carlos O'Donell <carlos@redhat.com>,
Siddhesh Poyarekar <siddhesh@gotplt.org>,
Rafal Luzynski <digitalfreak@lingonborough.com>
Cc: Mike Fabian <mfabian@redhat.com>
Subject: [PING^4][PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872]
Date: Thu, 28 Mar 2019 18:20:01 +0200 [thread overview]
Message-ID: <978ccde6-2f56-afd3-b13a-5d624cec1697@redhat.com> (raw)
In-Reply-To: <7cdd817a-4a47-201a-8eeb-87db324104b3@kobylkin.com>
Ping?
On 19/03/2019 12.39, Egor Kobylkin wrote:
> Changelog v12:
> * Adjusted to the new comment style suddenly appearing in the target
> file locale/C-translit.h.in (the original file changed on the master
> branch from /* style to # style since v11)
> * Fixed a typo for <U04BB> CYRILLIC SMALL LETTER SHHA to be mapped to
> "sh`" instead of erroneous "SH`" in v11
>
> Changelog v11:
> * Re-targeted the patch against locale/C-translit.h.in as the proper
> file for the ASCII translit table.
> * Correspondingly the patch now only contains the additional
> Cyrillic-ASCII strings in the format of locale/C-translit.h.in table.
> The 'include "translit_cyrillic";""' directives are not necessary in the
> locale files and they are now all left intact.
> * Also the file translit_cyrillic is not longer needed and is omitted.
> * Edited below email, commit message.
>
> Changelog v10:
> * Removed ISO 9.1995 GOST 7.79-2000 System A (transliteration to Latin
> with diacritics) as conflicting with System B within glibc mechanics and
> not solving BZ #2872
> * Edited below email, commit message, comment in translit_cyrillic to
> reflect System A removal
> * Removed <U0423><U0301> and <U0443><U0301> (Cyrillic U with acute,
> using composition) as composing is not covered by current glibc
> conversion mechanics
>
> Changelog v9:
> * Fixed formatting (trailing spaces etc.)
> * Put commit summary in the patch file, now it is generated completely
> by git format-patch
>
> Changelog v8:
> * Re-added missing translit_cyrillic in patch v7 (due to missing "git
> add" in the script).
>
> Changelog v7:
> * Generated against git://sourceware.org/git/glibc.git master with git
> format-patch.
> * The 'include "translit_cyrillic";""' now immediately follows last
> 'include "translit_XXX";""' string (was inserted just before
> translit_end previously.)
> * Only the locales already having 'include .*translit.*;""' are patched
> (see the list for manual exclusions below, full list of included locales
> at the end of the email in the commit section.)
> * Excluded az_AZ completely to avoid circular reference from tr_TR via
> “copy "tr_TR"”.
>
> Changelog v6:
> * Locales removed from the patch: C and sd_PK.
> * Added locales: az_AZ and ky_KG.
> * Consistently transliterate single uppercase Cyrillic letters
> to sequences of all uppercase Latin letters in all languages (whenever
> a Cyrillic letter is transliterated to more than one Latin letter),
> for example "Ї" is now transliterated as "YI" rather than "Yi".
>
> Dear locale maintainers,
>
> fix the glibc bug 2872 "Transliteration Cyrillic -> ASCII fails"
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=2872 [1]
>
> add the Cyrillic transliteration rows to locale/C-translit.h.in.
>
> The patch is attached.
>
>
> Current bug effect:
>
> The glibc wiki explicitly lists this use case as the test example and
> currently it fails on Cyrillic texts [1] [8] [9]:
>
> iconv -f UTF-8 -t ASCII//TRANSLIT < translit-test-input.txt |grep CYRILLIC
>
> CYRILLIC ????? ??? ???? ?????? ??????????? ?????, ?? ????? ?? ???.
>
> - it produces a string of question marks and spaces.
>
> This is what it should produce and it does so after the patch applied:
>
> CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe
> chayu.
>
>
> The root problem and the fix:
>
> The root problem is the missing transliteration table that I am
> supplying here.
>
>
> COMMIT MESSAGE:
> This translit_cyrillic table enables conversion (e.g. with iconv) from a
> UTF-8 encoded text based on Cyrillic alphabet to a ASCII//TRANSLIT text.
>
> Example: iconv -f UTF-8 -t ASCII//TRANSLIT will produce ASCII
> compatible transcription.
>
> While a UTF-encoded Cyrillic text requires Cyrillic fonts the result of
> a transliteration/transcription has only Latin/ASCII codes but still can
> be read by a native speaker. Among other things it is useful for
> processing the Cyrillic texts and filenames by programs or on systems
> that are not specifically prepared to work with Cyrillic, don't have
> corresponding fonts installed or can't handle UTF-8.
>
> The patch content (mapping) is based on ISO 9.1995 standard [10] and its
> derivative GOST 7.79-2000 System B official source (Federal Agency on
> Technical Regulating and Metrology Of Russian Federation [2]).
> Technically an independent but mostly identical source [3] was used and
> prepared in a spreadsheet [6].
>
> The transliteration of Cyrillic to ASCII according to GOST 7.79-2000
> System B represents what is actually called transcription (preserving
> phonemes), while System A is the transliteration (preserving graphemes).
> There is no meaningful way to preserve graphemes converting Cyrillic to
> ASCII and thus the System B is chosen [11]. To be super clear the System
> A has nothing to do with this bug regardless it being a transliteration.
>
> Those interested in implementing System A for transliteration of
> Cyrillic to Latin with Diacritic as a new feature are welcome to use the
> spreadsheet in [6] as a starting point.
>
> Links:
>
> [1] This bug entry https://sourceware.org/bugzilla/show_bug.cgi?id=2872
> [2] GOST 7.79-2000 official source
> http://protect.gost.ru/document.aspx?control=7&id=130715 (is only
> available in low quality gif format)
> [3] http://transliteration.ru/gost-7-79-2000/ and
> http://www.yfermer.ru/specifications/285821.html
> [4] Wikipedia article on Cyrillic transliteration with Latin alphabet
> https://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9
>
> [5] http://man7.org/linux/man-pages/man5/locale.5.html
> [6] Spreadsheet for generating translit_cyrillic
> https://sourceware.org/bugzilla/attachment.cgi?bugid=2872&action=viewall&hide_obsolete=1
>
> [8] https://sourceware.org/glibc/wiki/Locales#Testing_Locales
> [9] translit-test-input.txt
> https://sourceware.org/bugzilla/attachment.cgi?id=11304
> [10] https://en.wikipedia.org/wiki/ISO_9#GOST_7.79_System_B
> [11]
> https://scriptsource.org/cms/scripts/page.php?item_id=entry_detail&uid=gslmka8xq3
>
>
> Best regards,
> Egor Kobylkin
>
>
>
>
--
Marko Myllynen
next prev parent reply other threads:[~2019-03-28 16:20 UTC|newest]
Thread overview: 111+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com>
[not found] ` <20180412224352.GB2911@altlinux.org>
2018-07-17 19:34 ` SUBJECT: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] Egor Kobylkin
2018-07-17 19:40 ` Carlos O'Donell
2018-07-17 19:50 ` Egor Kobylkin
2018-07-17 19:59 ` Carlos O'Donell
2018-08-06 19:00 ` [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29 Egor Kobylkin
2018-10-03 8:26 ` Egor Kobylkin
2018-10-03 9:19 ` Keld Simonsen
2018-10-03 9:32 ` Egor Kobylkin
2018-10-05 8:43 ` Marko Myllynen
2018-10-05 9:20 ` Rafal Luzynski
2018-10-05 10:36 ` Egor Kobylkin
2018-10-08 22:04 ` Rafal Luzynski
2018-10-08 22:52 ` Egor Kobylkin
2018-10-09 21:43 ` Rafal Luzynski
2018-10-08 23:20 ` Zack Weinberg
2018-10-09 15:26 ` Carlos O'Donell
2018-10-09 21:51 ` Rafal Luzynski
2018-10-09 16:10 ` Marko Myllynen
2018-10-09 16:22 ` Egor Kobylkin
2018-10-09 16:49 ` Marko Myllynen
2018-10-09 22:08 ` Rafal Luzynski
2018-10-10 11:21 ` Marko Myllynen
2018-10-11 10:10 ` Marko Myllynen
[not found] ` <deacdf31-d0bb-a92d-1de3-934d6b4cb158@kobylkin.com>
2018-10-05 11:54 ` Marko Myllynen
2018-10-05 12:00 ` Egor Kobylkin
2018-10-05 12:21 ` Marko Myllynen
2018-10-05 20:47 ` Egor Kobylkin
2018-10-08 12:40 ` Marko Myllynen
2018-10-08 22:23 ` Rafal Luzynski
2018-10-08 23:35 ` Egor Kobylkin
2018-10-09 13:18 ` Egor Kobylkin
2018-10-09 18:34 ` Egor Kobylkin
2018-10-09 22:17 ` Rafal Luzynski
2018-10-09 22:40 ` Egor Kobylkin
2018-10-09 22:42 ` Egor Kobylkin
2018-10-10 11:22 ` Marko Myllynen
2018-10-10 12:19 ` Egor Kobylkin
2018-10-10 12:34 ` Marko Myllynen
2018-10-10 22:29 ` [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] v2 Egor Kobylkin
2018-10-11 9:59 ` Marko Myllynen
2018-10-11 11:04 ` Rafal Luzynski
2018-10-11 13:10 ` Marko Myllynen
2018-10-11 13:50 ` Volodymyr Lisivka
2018-10-11 14:59 ` Egor Kobylkin
2018-10-11 21:30 ` Egor Kobylkin
2018-10-11 15:05 ` Egor Kobylkin
2018-10-11 15:44 ` [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] v3 Egor Kobylkin
2018-10-11 21:33 ` [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] v4 Egor Kobylkin
2018-10-12 14:05 ` [PATCH v5] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] Egor Kobylkin
2018-10-13 0:59 ` Rafal Luzynski
2018-10-13 16:58 ` Egor Kobylkin
2018-10-15 11:04 ` Marko Myllynen
2018-10-15 11:54 ` Egor Kobylkin
2018-10-23 23:08 ` Rafal Luzynski
2018-10-17 14:16 ` [PATCH v6] " Egor Kobylkin
2018-11-01 22:51 ` [PATCH v7] " Egor Kobylkin
2018-11-02 0:00 ` [PATCH v8] " Egor Kobylkin
2018-11-02 22:22 ` Rafal Luzynski
2018-11-02 23:27 ` Egor Kobylkin
2018-11-14 21:25 ` [PATCH v9] " Egor Kobylkin
2018-11-16 22:17 ` Rafal Luzynski
2018-11-17 18:34 ` Egor Kobylkin
2018-11-19 7:13 ` Marko Myllynen
2018-11-19 9:21 ` Egor Kobylkin
2018-11-19 19:35 ` Marko Myllynen
2018-12-01 22:07 ` Rafal Luzynski
2018-12-01 22:53 ` Egor Kobylkin
2018-12-03 22:19 ` Egor Kobylkin
2018-12-08 1:15 ` Rafal Luzynski
2018-12-10 21:20 ` Marko Myllynen
2018-12-19 22:25 ` Rafal Luzynski
2018-12-19 22:48 ` Egor Kobylkin
2018-12-19 23:50 ` Rafal Luzynski
2018-11-19 11:10 ` [PATCH v10] " Egor Kobylkin
2018-12-07 23:35 ` Rafal Luzynski
2018-12-08 21:51 ` Egor Kobylkin
2018-12-19 22:41 ` Rafal Luzynski
2018-12-19 23:02 ` Egor Kobylkin
2018-12-20 0:05 ` Rafal Luzynski
2018-12-08 22:28 ` [PATCH v11] Locales: Cyrillic -> ASCII transliteration " Egor Kobylkin
2018-12-19 23:16 ` Egor Kobylkin
2018-12-26 10:07 ` Siddhesh Poyarekar
2018-12-26 12:13 ` Egor Kobylkin
2018-12-27 1:30 ` Siddhesh Poyarekar
2018-12-27 11:28 ` Rafal Luzynski
2019-01-02 18:38 ` [PATCH v12] " Egor Kobylkin
2019-01-05 14:35 ` Rafal Luzynski
2019-01-05 21:12 ` Egor Kobylkin
2019-01-07 20:37 ` Marko Myllynen
2019-01-09 0:46 ` Egor Kobylkin
2019-01-09 20:03 ` Marko Myllynen
2019-02-04 7:14 ` [PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872] ping for 2.30 Egor Kobylkin
2019-02-14 16:48 ` Marko Myllynen
2019-03-04 22:11 ` Egor Kobylkin
2019-03-11 13:59 ` PING " Egor Kobylkin
2019-03-14 19:48 ` Egor Kobylkin
2019-04-19 22:24 ` Rafal Luzynski
[not found] ` <5ELixS9SQ0DW4mlvswp96ASpLobBabU9KQ6zOTH-Udrb34mABhcqiPERpBZfPWZ9F77s8XNmiLIAq9UWu0AjLFFdjOz_FZVU5_xF-SiQkrw=@kobylkin.com>
2019-04-27 2:51 ` Siddhesh Poyarekar
2019-04-27 7:34 ` Diego (Egor) Kobylkin
2019-04-09 1:04 ` [PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872] Carlos O'Donell
2019-03-19 10:39 ` ping " Egor Kobylkin
2019-03-28 16:20 ` Marko Myllynen [this message]
2019-04-04 19:44 ` [PING^5][PATCH " Egor Kobylkin
2019-04-06 1:36 ` Siddhesh Poyarekar
2019-04-16 7:15 ` [PING^6][PATCH " Marko Myllynen
2019-04-16 13:17 ` Carlos O'Donell
2019-04-16 17:06 ` Egor Kobylkin
2019-04-16 17:58 ` Carlos O'Donell
2019-04-16 18:41 ` Egor Kobylkin
2019-04-16 19:06 ` Carlos O'Donell
2019-05-10 12:19 ` Marko Myllynen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/libc/involved.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=978ccde6-2f56-afd3-b13a-5d624cec1697@redhat.com \
--to=myllynen@redhat.com \
--cc=carlos@redhat.com \
--cc=digitalfreak@lingonborough.com \
--cc=egor@kobylkin.com \
--cc=libc-alpha@sourceware.org \
--cc=libc-locales@sourceware.org \
--cc=mfabian@redhat.com \
--cc=siddhesh@gotplt.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).