From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 095D220A1E for ; Mon, 10 Dec 2018 21:20:46 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:reply-to:subject:to:references:from:cc :message-id:date:mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=H6cYRk9Y6VUlLy3r UuJKkfxwFOluFO5kuPFO+OewApsTwPM3t5Nkm39yXAZRFw4fn1bLNFCK51BvJ+m7 nElIu0ujMcL1FMwL/yaW2m08O74hoCgJuPywgUqTXhD03Pq6xhosYw9KsjBGUWSM EJSMT53K403rMHXYuRuZVwQl7N0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:reply-to:subject:to:references:from:cc :message-id:date:mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=69u8nSWll5/D1IaRZZ3MNL 0xtSs=; b=Txin6GBRAnOGO44RfKUMJYiB53IJjB1mr8x2Q8+t6uvFa7vPQZFtXT 3j6EUOngV+my47kNuAaMB/hkQKf5mIGnBqC/t9dKh371/Q0IDc6pn0YnyBtfCpty oI9jedvUr9K38h/U9VaIrS9tIlROsQIHdKksTa2pG75TUxFTI2tsY= Received: (qmail 13210 invoked by alias); 10 Dec 2018 21:20:42 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 13183 invoked by uid 89); 10 Dec 2018 21:20:41 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: mail-wr1-f49.google.com Reply-To: Marko Myllynen Subject: Re: [PATCH v9] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] To: Rafal Luzynski , Egor Kobylkin , libc-alpha@sourceware.org, libc-locales@sourceware.org References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <837001401.21346.1542406647888@poczta.nazwa.pl> <5a247161-c498-ed50-ff4a-58f2ecf974f0@redhat.com> <1441622134.517912.1543702039942@poczta.nazwa.pl> <2f6fc82c-77ba-d331-ae5d-e2373e122a88@kobylkin.com> <1361059722.707244.1544231740358@poczta.nazwa.pl> From: Marko Myllynen Cc: Mike Fabian , Carlos O'Donell Message-ID: Date: Mon, 10 Dec 2018 23:20:33 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.0 MIME-Version: 1.0 In-Reply-To: <1361059722.707244.1544231740358@poczta.nazwa.pl> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Hi, On 08/12/2018 03.15, Rafal Luzynski wrote: > 17.11.2018 19:34 Egor Kobylkin wrote: >> >> The SH/Sh can be decided on either way - seems like an easy change any >> way. > > I'm in favor of "Sh" because it will work fine for titlecased words > (where only the first letter is uppercase) but I'm aware it would be > a problem for uppercased words. Unfortunately, I think we are unable > to satisfy both cases. I think I'm in favor of "Sh" as well, although not perfect I'd assume it's probably going to be correct in more cases than SH. >> System A was added on Marko's request (so setting him on TO:) I am >> neutral on keeping it or dropping it, just to be clear. > > I think I didn't see this Marko's request but I'm in favor of keeping > System A, too. > > Marko, it would be good to hear your opinion about System A vs. System B > again. I think System A is a better option as it should be the same as ISO 9 and perhaps also produces results in some cases which are more expected than with System B (if the Wikipedia ISO 9 article is to be believed). Wrt BZ #2872 I think it's good to keep it in mind but IMHO we can also deviate from it if needed, however with System A + ASCII fallback definitions the RFE should be satisfied as well? > 19.11.2018 20:35 Marko Myllynen wrote: >> [...] >> In any case once your patch lands I'm going to submit a follow-up patch >> for fi_FI to make it compliant with the applicable national standard >> (SFS 4900) which defines how to do Cyrillic transliteration / >> transcription in the context Finnish. > > I totally agree. As far as I can see, SFS 4900 is more similar to > System A (ISO 9) rather than System B, that is, it transliterates to Latin > characters with diacritics rather than plain ASCII. Marko, what is your > opinion about possible implementation of SFS 4900 in these cases: > > * When the destination charset does not contain required Latin diacritic > characters (e.g., it is plain ASCII)? This would be according to http://jkorpela.fi/iso9.html8 so for example instead of ž -> zh and instead of štš -> shtsh. > * When the output is ambiguous, that means, when two different Cyrillic > strings produce the same Latin (or ASCII) output? This is a good point and one I haven't considered but I'm not sure is there anything we can do about this (at least without major locale system internals work)? Do you have any rough idea how frequently this could happen or is this more a theoretical issue? (Sorry if I've missed earlier comments about this, it's been a long thread.) >> The same with having both System A and System B. Initially I went along >> with the suggestion to include the system A but it is clear now that it >> doesn’t make fixing [BZ #2872] more straightforward. So I’d also propose >> to set it aside for the moment and use the v10 without the system A. >> That is the whole reason I have submitted it, to be superclear on that. > > OK, I think that now I understand your reason to drop System A better. > But still I'd like to rethink implementing System A somehow and drop > (or rather: implement only partially) System B. Yes, I also think System A AKA ISO 9 would be a better choice but I'll leave the final decision for you two (and others who might weigh in). Thanks, -- Marko Myllynen