From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS17314 8.43.84.0/22 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 65D851F5AE for ; Thu, 29 Apr 2021 21:02:14 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6587439450E5; Thu, 29 Apr 2021 21:02:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6587439450E5 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1619730133; bh=WmRLOmNYlswTcRsd12aDIkMjhhwlJZ3ne5iKl9JswlI=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=oUkG4qKyKbWMion1zeo6s/yaFNdvAJDSnhshJLF9tcZhFXKeyaywXkHSh1ak1uXhc 2MKTuNCaEKd8Qp5QK10a6EaSdjs6iVJWYOqenNai2dYApifryXqHoYLUuQh2QFZwXn Djp7fvQeUgh1dphfAmNJRDK9TDN4CSobwltli+a4= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id C7F13385E001 for ; Thu, 29 Apr 2021 21:02:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C7F13385E001 Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-602-3GnLXSK6NFmaxriYMhVmSQ-1; Thu, 29 Apr 2021 17:02:07 -0400 X-MC-Unique: 3GnLXSK6NFmaxriYMhVmSQ-1 Received: by mail-qv1-f71.google.com with SMTP id w20-20020a0562140b34b029019c9674180fso31470843qvj.0 for ; Thu, 29 Apr 2021 14:02:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=WmRLOmNYlswTcRsd12aDIkMjhhwlJZ3ne5iKl9JswlI=; b=Q7MJFIu/ywnDF9mej9swXylFi4JlEHAF9+kcUkzBwHJWwWvuIn0bPgSp0v0avcdRdu N1yQlJhZMwIqyAycZy7dWx+EWUDNz5XK2lG+mlULkD1Y46lPu3qhJDbLg0/y3HqG6L2L LqCJwjG8pz3srbeKCvGn0nV2+C070kKSvf71rNt7F6tEdR/sv7O8WLX3MPGjF1ea5oRL JyFuv4WMlT/QgwHHL5UUPTb/J8vUZa9KrNYa8MKp3np+A6HkfNmXwhzr2/WeVXMNFtof yF+52d3EuUejeMAOX26Efm5OxzCXKbK73whBJoIRdERx3YXw3i0Pb1Ze5O9YPevp3csi TZJw== X-Gm-Message-State: AOAM5333ulxhEdYFV+lJD+MddBERZKP754HGrwTC9ydzmKehtoP5rt2b fyk2WK9VVKmNzPUkob+dY+NGjMps41mYESbg6xSVY80Vcfw3hA60IF2UajkYQQZ3PiccA4lmuAq FxEuARDboE8iGlPgFcaa6y1se6uNVh9pBkXwf9rg+m+nOPAW0F5L5NlC79gAQGDc8SapLjQ== X-Received: by 2002:a05:620a:7dd:: with SMTP id 29mr1789496qkb.334.1619730126710; Thu, 29 Apr 2021 14:02:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwN1B5bkvUPS+W4rewTUWd493G3EL6vf9ZMWgpq8ITRMIJ4CfBUWLuinUNfxbsyLrja9A02vg== X-Received: by 2002:a05:620a:7dd:: with SMTP id 29mr1789476qkb.334.1619730126471; Thu, 29 Apr 2021 14:02:06 -0700 (PDT) Received: from [192.168.1.16] (198-84-214-74.cpe.teksavvy.com. [198.84.214.74]) by smtp.gmail.com with ESMTPSA id l6sm2950145qkk.130.2021.04.29.14.02.05 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 29 Apr 2021 14:02:05 -0700 (PDT) Subject: Re: [PATCH v4 2/4] Update UTF-8 charmap processing. To: Florian Weimer References: <20210428130033.3196848-1-carlos@redhat.com> <20210428130033.3196848-3-carlos@redhat.com> <87mtthfakh.fsf@oldenburg.str.redhat.com> Organization: Red Hat Message-ID: Date: Thu, 29 Apr 2021 17:02:04 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <87mtthfakh.fsf@oldenburg.str.redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Carlos O'Donell via Libc-alpha Reply-To: Carlos O'Donell Cc: libc-alpha@sourceware.org Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" On 4/29/21 10:07 AM, Florian Weimer wrote: > * Carlos O'Donell: > >> def convert_to_hex(code_point): >> '''Converts a code point to a hexadecimal UTF-8 representation >> - like /x**/x**/x**.''' >> - # Getting UTF8 of Unicode characters. >> - # In Python3, .encode('UTF-8') does not work for >> - # surrogates. Therefore, we use this conversion table >> - surrogates = { >> - 0xD800: '/xed/xa0/x80', >> - 0xDB7F: '/xed/xad/xbf', >> - 0xDB80: '/xed/xae/x80', >> - 0xDBFF: '/xed/xaf/xbf', >> - 0xDC00: '/xed/xb0/x80', >> - 0xDFFF: '/xed/xbf/xbf', >> - } >> - if code_point in surrogates: >> - return surrogates[code_point] >> - return ''.join([ >> - '/x{:02x}'.format(c) for c in chr(code_point).encode('UTF-8') >> - ]) >> + ready for use in a locale character map specification e.g. >> + /xc2/xaf for MACRON. >> + >> + ''' >> + cp_locale = '' >> + cp_bytes = chr(code_point).encode('UTF-8', 'surrogatepass') >> + for byte in cp_bytes: >> + cp_locale += ''.join('/x{:02x}'.format(byte)) >> + return cp_locale > > I think you should keep the list comprehension. That ''.join() is > unnecessary. Like this? return ''.join(['/x{:02x}'.format(c) \ for c in chr(code_point).encode('UTF-8', 'surrogatepass')]) (tested works fine and produces the same results) -- Cheers, Carlos.