From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-5.5 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 46D5D1FBB0 for ; Sat, 3 Dec 2016 10:53:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751175AbcLCKxr (ORCPT ); Sat, 3 Dec 2016 05:53:47 -0500 Received: from mx1.2b3w.ch ([92.42.186.250]:35341 "EHLO mx1.2b3w.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751292AbcLCKxq (ORCPT ); Sat, 3 Dec 2016 05:53:46 -0500 Received: from mx1.2b3w.ch (localhost [127.0.0.1]) by mx1.2b3w.ch (Postfix) with ESMTP id 9B793C3472; Sat, 3 Dec 2016 11:53:44 +0100 (CET) Received: from drbeat.li (215-243-153-5.dyn.cable.fcom.ch [5.153.243.215]) by mx1.2b3w.ch (Postfix) with ESMTPSA id 82CE7C3471; Sat, 3 Dec 2016 11:53:44 +0100 (CET) Received: by drbeat.li (Postfix, from userid 1000) id 6237B201A7; Sat, 3 Dec 2016 11:53:44 +0100 (CET) From: Beat Bolli To: git@vger.kernel.org Cc: Beat Bolli , =?UTF-8?q?Torsten=20B=C3=B6gershausen?= Subject: [PATCH v2 3/3] unicode_width.h: fix the double_width[] table Date: Sat, 3 Dec 2016 11:53:12 +0100 Message-Id: <1480762392-28731-3-git-send-email-dev+git@drbeat.li> X-Mailer: git-send-email 2.7.2 In-Reply-To: <1480762392-28731-1-git-send-email-dev+git@drbeat.li> References: <1480713995-16157-1-git-send-email-dev+git@drbeat.li> <1480762392-28731-1-git-send-email-dev+git@drbeat.li> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The function bisearch() in utf8.c does a pure binary search in double_width. It does not care about the 17 plane offsets which unicode/uniset/uniset prepends. Leaving the plane offsets in the table may cause wrong results. Filter out the plane offsets in update-unicode.sh and regenerate the table. Cc: Torsten Bögershausen Signed-off-by: Beat Bolli --- Diff to v1: - add Thorsten's Cc: unicode_width.h | 17 ----------------- update_unicode.sh | 2 +- 2 files changed, 1 insertion(+), 18 deletions(-) diff --git a/unicode_width.h b/unicode_width.h index 73b5fd6..02207be 100644 --- a/unicode_width.h +++ b/unicode_width.h @@ -297,23 +297,6 @@ static const struct interval zero_width[] = { { 0xE0100, 0xE01EF } }; static const struct interval double_width[] = { -{ /* plane */ 0x0, 0x3D }, -{ /* plane */ 0x3D, 0x68 }, -{ /* plane */ 0x68, 0x69 }, -{ /* plane */ 0x69, 0x6A }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, -{ /* plane */ 0x0, 0x0 }, { 0x1100, 0x115F }, { 0x231A, 0x231B }, { 0x2329, 0x232A }, diff --git a/update_unicode.sh b/update_unicode.sh index 3c84270..4c1ec8d 100755 --- a/update_unicode.sh +++ b/update_unicode.sh @@ -30,7 +30,7 @@ fi && grep -v plane) }; static const struct interval double_width[] = { - $(uniset/uniset --32 eaw:F,W) + $(uniset/uniset --32 eaw:F,W | grep -v plane) }; EOF ) -- 2.7.2