From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.3 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 2EAD21F5AE for ; Thu, 29 Apr 2021 14:07:57 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 361983AAA0C0; Thu, 29 Apr 2021 14:07:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 361983AAA0C0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1619705276; bh=v5L29WLCgWATBWv9mJ3EjqVxLZsOArspx1+u/zBwgt4=; h=To:Subject:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=KJcVnc98HfLZgBBCKg16ehWURFLdPB7kZtkgWvuCAH7tLbiyHkn4E7di2j+ObJ8SU ggR7bucEb2Mc1ppjBOB1a4j1IY7uKw+E2x0f8zk2KkeZ9PLz1WPZp+xoxQoiQXa2Og 0wD/P8zHDx3i64mV/yYMR/XUJtiS4XReZ2GgDCro= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTP id 80FA73AAB01D for ; Thu, 29 Apr 2021 14:07:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 80FA73AAB01D Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-593-Wb_mIcOVOuu5VcQa6ph7ng-1; Thu, 29 Apr 2021 10:07:52 -0400 X-MC-Unique: Wb_mIcOVOuu5VcQa6ph7ng-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6118C8CCF6C for ; Thu, 29 Apr 2021 14:07:51 +0000 (UTC) Received: from oldenburg.str.redhat.com (ovpn-115-124.ams2.redhat.com [10.36.115.124]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5C67D5DEAD; Thu, 29 Apr 2021 14:07:47 +0000 (UTC) To: Carlos O'Donell Subject: Re: [PATCH v4 2/4] Update UTF-8 charmap processing. References: <20210428130033.3196848-1-carlos@redhat.com> <20210428130033.3196848-3-carlos@redhat.com> Date: Thu, 29 Apr 2021 16:07:58 +0200 In-Reply-To: <20210428130033.3196848-3-carlos@redhat.com> (Carlos O'Donell's message of "Wed, 28 Apr 2021 09:00:31 -0400") Message-ID: <87mtthfakh.fsf@oldenburg.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Florian Weimer via Libc-alpha Reply-To: Florian Weimer Cc: libc-alpha@sourceware.org Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" * Carlos O'Donell: > def convert_to_hex(code_point): > '''Converts a code point to a hexadecimal UTF-8 representation > - like /x**/x**/x**.''' > - # Getting UTF8 of Unicode characters. > - # In Python3, .encode('UTF-8') does not work for > - # surrogates. Therefore, we use this conversion table > - surrogates = { > - 0xD800: '/xed/xa0/x80', > - 0xDB7F: '/xed/xad/xbf', > - 0xDB80: '/xed/xae/x80', > - 0xDBFF: '/xed/xaf/xbf', > - 0xDC00: '/xed/xb0/x80', > - 0xDFFF: '/xed/xbf/xbf', > - } > - if code_point in surrogates: > - return surrogates[code_point] > - return ''.join([ > - '/x{:02x}'.format(c) for c in chr(code_point).encode('UTF-8') > - ]) > + ready for use in a locale character map specification e.g. > + /xc2/xaf for MACRON. > + > + ''' > + cp_locale = '' > + cp_bytes = chr(code_point).encode('UTF-8', 'surrogatepass') > + for byte in cp_bytes: > + cp_locale += ''.join('/x{:02x}'.format(byte)) > + return cp_locale I think you should keep the list comprehension. That ''.join() is unnecessary. Thanks, Florian