From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-4.0 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 4368C1F453 for ; Thu, 14 Feb 2019 16:49:04 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:reply-to:subject:to:cc:references:from :message-id:date:mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=Um610EZYKQR0cn7G 0YfOGASlH0RsWl8yJkVM9jlCWg8FxQR9XiuJcsjGYvrpr7ToNdnEhRHgVbw8MB7D WTioWFnEpyhIIlqhiUEIO5mS5AAKHyCjneEd4a6egjkxuyULPRszQIlwGBP0+y+c RUHeqDwexw3buf04fHxMxAGQH5Y= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:reply-to:subject:to:cc:references:from :message-id:date:mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=yTn/2OVJtd1OBUPSnEWneL jrZvU=; b=j4UjheS3XeyB9mZNeYPYy8z6KCpn9RXboX0wpcc+8eqwu+Ff+uYgQW QpxJEhDOD2DnKqSTXe2BEb3TRHU5boKaQ6pRmsfgApaDdrK/0FCNY2Z8XIXC11MJ NX7LOp4MKBcTCRAqK99VyehsFYMbLgo2BcVA7W9qA6mhV/b2EyMgs= Received: (qmail 53990 invoked by alias); 14 Feb 2019 16:48:59 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 53956 invoked by uid 89); 14 Feb 2019 16:48:58 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: mail-wr1-f67.google.com Reply-To: Marko Myllynen Subject: Re: [PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872] ping for 2.30 To: Egor Kobylkin , libc-alpha@sourceware.org, libc-locales@sourceware.org, Carlos O'Donell Cc: Rafal Luzynski , Siddhesh Poyarekar , Mike Fabian References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <2124833400.35614.1546698902753@poczta.nazwa.pl> <908ed415-cfe4-804c-f421-4351ef062edc@kobylkin.com> <6d076299-babd-406a-b1fe-87778f54bf36@kobylkin.com> <41aff10b-9cf1-638c-4fbc-8c4f4122f2e9@kobylkin.com> From: Marko Myllynen Message-ID: Date: Thu, 14 Feb 2019 18:48:51 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 MIME-Version: 1.0 In-Reply-To: <41aff10b-9cf1-638c-4fbc-8c4f4122f2e9@kobylkin.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Hi Carlos, Mike, Rafal, It seems clear that you all are currently too busy to have a look at this but would you have any estimate when you might be able to review this so that we could consider merging? FWIW, I chatted with Egor off-list and we're on the same page wrt the following, hopefully this gives you a bit off jump start for this subject when you have time to dig deeper: 1) Built-in C locale doesn't read/use any translit_* files and it can't have any fallback mechanisms and it only supports ASCII so using GOST 7.79 System B in locale/C-translit.h.in (as per patch v12) would seem to be the appropriate way to implement Cyrillic transliteration for the built-in C locale (it adds some 8KB to the binary). 2) Other locales read/use translit_* files and with them fallbacks and non-ASCII are possible so it would seem preferable to first try ISO 9 / GOST 7.79 System A and only if that fails then use GOST 7.79 System B (in which case the end result should match with the built-in C locale). For this the translit_cyrillic file should be added (as per patch v9 + changes mentioned in patches v10 and v12). 3) Individual locale files can then be updated to use translit_cyrillic as appropriate (see patch v9) and language/national specific conventions (e.g., SFS 4900 for fi_FI) can be applied on per-locale basis. Thanks, On 04/02/2019 09.14, Egor Kobylkin wrote: > Carlos, > are you comfortable to pick this up again this month? > > I would really love to have a reliable action plan to get this committed > for 2.30. Maybe cut out a subset that is undisputed and commit only that > first. It looks kinda like an eternal moving target otherwise. > > for you reference: > https://sourceware.org/ml/libc-alpha/2019-01/msg00036.html > https://sourceware.org/ml/libc-alpha/2019-01/msg00040.html > > Bests, > Egor Kobylkin > > On 09.01.19 21:03, Marko Myllynen wrote: >> Hi, >> >> On 09/01/2019 02.46, Egor Kobylkin wrote: >>> On 07.01.19 21:37, Marko Myllynen wrote: >>>> On 05/01/2019 23.12, Egor Kobylkin wrote: >>>>> >>>>> Good catch! Should we maybe split this into two patches, one for C and >>>>> the other for "country" locales? They have different codes and >>>>> functionality so it looks like it would be easier to keep focus. >>>> >>>> That would probably make sense, the standard C/POSIX locale won't >>>> support System A so it also narrows down solution alternatives with it. >>>> >>>>> "Country" locales in localedata/locales/ can then have the exact same >>>>> translit table included or they can have any other flavor - I don't >>>>> see >>>>> a problem here. >>>> >>>> Indeed, and since those files are not limited to ASCII, perhaps we >>>> could >>>> now reconsider the v9 approach for them, i.e., prefer System A if >>>> possible, otherwise use System B / ASCII (just need to make sure that >>>> the ASCII fall-back for them will match the built-in C ASCII rule)? >>> >>> Happy to hear the split seems to be a clear cut one. >>> How about I rename the "[PATCH v12]...[BZ #2872]" to "[PATCH v1]... >>> C/POSIX [BZ #2872]" and the "[PATCH v9]" gets its own bug-report >>> (number) and title for clarity in communication? >> >> I'm not sure is a new BZ really needed for such an addition, perhaps a >> NEWS entry might be more appropriate (with the full details explained in >> the commit messages of course) but I'll leave this to others to decide. >> >>> This way it would probably be easier to have the decision making process >>> tied up for both patches (separately). We may want to get the v12 POSIX >>> out of the door in 2.30 then and can take all the time we need to set up >>> the rules for "Countries" locales as you need them to be. >> >> Perhaps Rafal or Carlos have better suggestions but I would think we >> could have a patch series where the patch 1/3 adds the C/POSIX locale >> part (that would be what you posted as v12), then patch 2/3 adds >> translit_cyrillic (based on your v9 so supports ISO 9.1995 / GOST 7.79 >> System A and GOST 7.79 System B as a fall-back (which would match the >> C/POSIX rules)), and finally the patch 3/3 updates locales to use >> translit_cyrillic as appropriate. But as said, Rafal or Carlos may have >> alternative suggestions so it might be best to wait for their feedback >> before doing anything yet (it's unfortunate you've had to do so many >> iterations around this already but I think we've all learned something >> during the process and the end result will be more correct than any of >> the earlier versions). >> >> Thanks, >> -- Marko Myllynen