From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.4 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 39F9C211B5 for ; Wed, 19 Dec 2018 23:02:40 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=izhn714tSOITatPU 3ZyUZxJMY4HHgkRHL2iGCOpEnLigPZMhw6utw2ltNh3h2V3TbQ2mlvtRFw6JO8wJ UwM0TtfhtQ2HXLtyB3da+CPFyXe+2KvE3QxJ696EFSNPrQhXN/DYmVZnjk8yz+Ej DwC9InG3s0kWvz1vPB/w4Kgd+hM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=KvaXaU7qTmBdp6IyVgtib3 Kjc8E=; b=odYlfYgjjyxIOn/WJ9UjxJz4LPxJ6Fnk2zgCkwvEsVrKk3mULj9Wsl ZsY6YjMg7XxqbB3qa6ygugaIblTvuiUkbXcV2uMYU+ITGlXl/TviftR4R0Rxng2M h1c4PAt8HSn36RshOH2vz5YWv5DPy76FappaebTq4IVDy17IjrQw4= Received: (qmail 53804 invoked by alias); 19 Dec 2018 23:02:36 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 53466 invoked by uid 89); 19 Dec 2018 23:02:35 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: mout.kundenserver.de Subject: Re: [PATCH v10] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] To: libc-alpha@sourceware.org, libc-locales@sourceware.org, "Dmitry V. Levin" , Marko Myllynen , mfabian@redhat.com References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <676c37bd-ba92-a7ed-019e-94974143233f@kobylkin.com> <1718190635.706992.1544225756803@poczta.nazwa.pl> <749726562.674232.1545259279320@poczta.nazwa.pl> From: Egor Kobylkin Openpgp: preference=signencrypt Message-ID: Date: Thu, 20 Dec 2018 00:02:21 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <749726562.674232.1545259279320@poczta.nazwa.pl> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit On 19.12.18 23:41, Rafal Luzynski wrote: > 8.12.2018 22:51 Egor Kobylkin wrote: >> >> Rafal, Dmitry, Marko, Mike >> >> On 08.12.18 00:35, Rafal Luzynski wrote: >>> 19.11.2018 12:10 Egor Kobylkin wrote: >>>> >>>> Changelog v10: * Removed ISO 9.1995 GOST 7.79-2000 System A >>>> (transliteration to Latin with diacritics) as conflicting with >>>> System B within glibc mechanics and not solving BZ #2872 >>> >>> I'm in favor of implementing System A and dropping System B instead. >> >> The BZ #2872 bug name is explicitly "Transliteration Cyrillic -> ASCII >> fails". The ISO 9 System A does not map to ASCII so it is not a solution >> to BZ #2872 at all. > > I did not mean implementing System A and nothing more. I meant implementing > System A and a fallback for ASCII which can be similar to System B but > we wouldn't be able to call it "System B" because it would differ in > few cases. Just for the record, I have no objection on my side to that (Using A as a basis for ASCII as well). But I'm not sure anymore that inserting a translit table into every locale is the right solution for ASCII problem. Especially because distributions may not include any locale but C. > >> I was scratching my head as to how can we avoid the explosion of the >> scope for this patch. And then it appeared to me that it was wrong to >> target all the present locales for the ASCII translit. This seems to be >> the root cause for this prolonged A vs. B discussions. The proper target >> for my table is actually the C locale translit file >> (locale/C-translit.h.in). I will submit a proper patch shortly. > > I saw your patch v11 and now I must say I'm sorry for making noise because > it was me who said that I didn't mind adding Cyrillic -> ASCII > transliteration > to C locale. I said so before taking a look at the current contents of > transliteration in C locale. When I looked at this I realized that it does > not support any national characters, even from modified Latin alphabets > (like > used in most of western European languages). It only contains mathematical, > physical, commercial, diacritical etc. characters. So I'm no longer sure > it should support Cyrillic -> ASCII. But maybe again I'm wrong, maybe > it should support but just nobody implemented it yet. Actually there are quite a few letters already transliterated in locale/C-translit.h.in. (Note the CAPCAP transliteration style for the capitals, i.e. LATIN CAPITAL LETTER AE is mapped to AE, not to Ae.) "\x00c6" "AE" /* LATIN CAPITAL LETTER AE */ "\x00d7" "x" /* MULTIPLICATION SIGN */ "\x00df" "ss" /* LATIN SMALL LETTER SHARP S */ "\x00e6" "ae" /* LATIN SMALL LETTER AE */ "\x0132" "IJ" /* LATIN CAPITAL LIGATURE IJ */ "\x0133" "ij" /* LATIN SMALL LIGATURE IJ */ "\x0149" "'n" /* LATIN SMALL LETTER N PRECEDED BY APOSTROPHE */ "\x0152" "OE" /* LATIN CAPITAL LIGATURE OE */ "\x0153" "oe" /* LATIN SMALL LIGATURE OE */ "\x017f" "s" /* LATIN SMALL LETTER LONG S */ "\x01c7" "LJ" /* LATIN CAPITAL LETTER LJ */ "\x01c8" "Lj" /* LATIN CAPITAL LETTER L WITH SMALL LETTER J */ "\x01c9" "lj" /* LATIN SMALL LETTER LJ */ "\x01ca" "NJ" /* LATIN CAPITAL LETTER NJ */ "\x01cb" "Nj" /* LATIN CAPITAL LETTER N WITH SMALL LETTER J */ "\x01cc" "nj" /* LATIN SMALL LETTER NJ */ "\x01f1" "DZ" /* LATIN CAPITAL LETTER DZ */ "\x01f2" "Dz" /* LATIN CAPITAL LETTER D WITH SMALL LETTER Z */ "\x01f3" "dz" /* LATIN SMALL LETTER DZ */ >> My focus is super sharp on helping with Cyrillic -> ASCII translit >> availability for a default installation with glibc. > > I understand your aim and I agree to support ASCII. Our disagreements are: > > * whether to support conversion Cyrillic -> extended Latin as well, no contest on my side > * which standard to implement, no contest on my side > * what to do if the standard is ambiguous or if some details cannot be > implemented for technical reasons. no contest on my side either I just think we may work around all those decisions with a smaller pure ASCII patch first (more useful too if covers C locale).