git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2 0/6] unicode_width.h: update the width tables to Unicode 9.0
@ 2016-12-13 23:31 Beat Bolli
  2016-12-13 23:31 ` [PATCH v2 1/6] update_unicode.sh: move it into contrib/update-unicode Beat Bolli
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Beat Bolli @ 2016-12-13 23:31 UTC (permalink / raw)
  To: git

This is v2 of my Unicode 9.0 series. After a short discussion [1], we
decided to move the generator script into contrib. This is what this
series now does first. The script is then updated in contrib.

Diff to v1:
- complete commit reordering
- fix nits in the commit messages

.gitignore                               |   1 -
contrib/update-unicode/.gitignore        |   3 ++
contrib/update-unicode/README            |  20 +++++++++++
contrib/update-unicode/update_unicode.sh |  33 ++++++++++++++++++
unicode_width.h                          | 131 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------
update_unicode.sh                        |  40 ----------------------
6 files changed, 163 insertions(+), 65 deletions(-)

[1] http://public-inbox.org/git/xmqqr35dm203.fsf@gitster.mtv.corp.google.com/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/6] update_unicode.sh: move it into contrib/update-unicode
  2016-12-13 23:31 [PATCH v2 0/6] unicode_width.h: update the width tables to Unicode 9.0 Beat Bolli
@ 2016-12-13 23:31 ` Beat Bolli
  2016-12-13 23:31 ` [PATCH v2 2/6] update_unicode.sh: remove an unnecessary subshell level Beat Bolli
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Beat Bolli @ 2016-12-13 23:31 UTC (permalink / raw)
  To: git; +Cc: Beat Bolli

As it's used only by a tiny minority of the Git developer population,
this script does not belong into the main Git source directory.

Move it into contrib/ and adjust the paths to account for the new
location.

Signed-off-by: Beat Bolli <dev+git@drbeat.li>
---
 .gitignore                               |  1 -
 contrib/update-unicode/.gitignore        |  3 +++
 contrib/update-unicode/README            | 20 ++++++++++++++++
 contrib/update-unicode/update_unicode.sh | 38 ++++++++++++++++++++++++++++++
 update_unicode.sh                        | 40 --------------------------------
 5 files changed, 61 insertions(+), 41 deletions(-)
 create mode 100644 contrib/update-unicode/.gitignore
 create mode 100644 contrib/update-unicode/README
 create mode 100755 contrib/update-unicode/update_unicode.sh
 delete mode 100755 update_unicode.sh

diff --git a/.gitignore b/.gitignore
index f96e50e..5555ae0 100644
--- a/.gitignore
+++ b/.gitignore
@@ -204,7 +204,6 @@
 /config.mak.autogen
 /config.mak.append
 /configure
-/unicode
 /tags
 /TAGS
 /cscope*
diff --git a/contrib/update-unicode/.gitignore b/contrib/update-unicode/.gitignore
new file mode 100644
index 0000000..b0ebc6a
--- /dev/null
+++ b/contrib/update-unicode/.gitignore
@@ -0,0 +1,3 @@
+uniset/
+UnicodeData.txt
+EastAsianWidth.txt
diff --git a/contrib/update-unicode/README b/contrib/update-unicode/README
new file mode 100644
index 0000000..b9e2fc8
--- /dev/null
+++ b/contrib/update-unicode/README
@@ -0,0 +1,20 @@
+TL;DR: Run update_unicode.sh after the publication of a new Unicode
+standard and commit the resulting unicode_widths.h file.
+
+The long version
+================
+
+The Git source code ships the file unicode_widths.h which contains
+tables of zero and double width Unicode code points, respectively.
+These tables are generated using update_unicode.sh in this directory.
+update_unicode.sh itself uses a third-party tool, uniset, to query two
+Unicode data files for the interesting code points.
+
+On first run, update_unicode.sh clones uniset from Github and builds it.
+This requires a current-ish version of autoconf (2.69 works per December
+2016).
+
+On each run, update_unicode.sh checks whether more recent Unicode data
+files are available from the Unicode consortium, and rebuilds the header
+unicode_widths.h with the new data. The new header can then be
+committed.
diff --git a/contrib/update-unicode/update_unicode.sh b/contrib/update-unicode/update_unicode.sh
new file mode 100755
index 0000000..7b90126
--- /dev/null
+++ b/contrib/update-unicode/update_unicode.sh
@@ -0,0 +1,38 @@
+#!/bin/sh
+#See http://www.unicode.org/reports/tr44/
+#
+#Me Enclosing_Mark  an enclosing combining mark
+#Mn Nonspacing_Mark a nonspacing combining mark (zero advance width)
+#Cf Format          a format control character
+#
+cd "$(dirname "$0")"
+UNICODEWIDTH_H=$(git rev-parse --show-toplevel)/unicode_width.h
+(
+	if ! test -f UnicodeData.txt; then
+		wget http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
+	fi &&
+	if ! test -f EastAsianWidth.txt; then
+		wget http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
+	fi &&
+	if ! test -d uniset; then
+		git clone https://github.com/depp/uniset.git
+	fi &&
+	(
+		cd uniset &&
+		if ! test -x uniset; then
+			autoreconf -i &&
+			./configure --enable-warnings=-Werror CFLAGS='-O0 -ggdb'
+		fi &&
+		make
+	) &&
+	UNICODE_DIR=. && export UNICODE_DIR &&
+	cat >$UNICODEWIDTH_H <<-EOF
+	static const struct interval zero_width[] = {
+		$(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD |
+		  grep -v plane)
+	};
+	static const struct interval double_width[] = {
+		$(uniset/uniset --32 eaw:F,W)
+	};
+	EOF
+)
diff --git a/update_unicode.sh b/update_unicode.sh
deleted file mode 100755
index 27af77c..0000000
--- a/update_unicode.sh
+++ /dev/null
@@ -1,40 +0,0 @@
-#!/bin/sh
-#See http://www.unicode.org/reports/tr44/
-#
-#Me Enclosing_Mark  an enclosing combining mark
-#Mn Nonspacing_Mark a nonspacing combining mark (zero advance width)
-#Cf Format          a format control character
-#
-UNICODEWIDTH_H=../unicode_width.h
-if ! test -d unicode; then
-	mkdir unicode
-fi &&
-( cd unicode &&
-	if ! test -f UnicodeData.txt; then
-		wget http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
-	fi &&
-	if ! test -f EastAsianWidth.txt; then
-		wget http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
-	fi &&
-	if ! test -d uniset; then
-		git clone https://github.com/depp/uniset.git
-	fi &&
-	(
-		cd uniset &&
-		if ! test -x uniset; then
-			autoreconf -i &&
-			./configure --enable-warnings=-Werror CFLAGS='-O0 -ggdb'
-		fi &&
-		make
-	) &&
-	UNICODE_DIR=. && export UNICODE_DIR &&
-	cat >$UNICODEWIDTH_H <<-EOF
-	static const struct interval zero_width[] = {
-		$(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD |
-		  grep -v plane)
-	};
-	static const struct interval double_width[] = {
-		$(uniset/uniset --32 eaw:F,W)
-	};
-	EOF
-)
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/6] update_unicode.sh: remove an unnecessary subshell level
  2016-12-13 23:31 [PATCH v2 0/6] unicode_width.h: update the width tables to Unicode 9.0 Beat Bolli
  2016-12-13 23:31 ` [PATCH v2 1/6] update_unicode.sh: move it into contrib/update-unicode Beat Bolli
@ 2016-12-13 23:31 ` Beat Bolli
  2016-12-13 23:31 ` [PATCH v2 3/6] update_unicode.sh: pin the uniset repo to a known good commit Beat Bolli
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Beat Bolli @ 2016-12-13 23:31 UTC (permalink / raw)
  To: git; +Cc: Beat Bolli

After the move into contrib/update-unicode, we no longer create the
unicode directory to have a clean working folder. Instead, the directory
of the script is used. This means that the subshell can be removed.

Signed-off-by: Beat Bolli <dev+git@drbeat.li>
---
 contrib/update-unicode/update_unicode.sh | 53 ++++++++++++++++----------------
 1 file changed, 26 insertions(+), 27 deletions(-)

diff --git a/contrib/update-unicode/update_unicode.sh b/contrib/update-unicode/update_unicode.sh
index 7b90126..ff664ec 100755
--- a/contrib/update-unicode/update_unicode.sh
+++ b/contrib/update-unicode/update_unicode.sh
@@ -7,32 +7,31 @@
 #
 cd "$(dirname "$0")"
 UNICODEWIDTH_H=$(git rev-parse --show-toplevel)/unicode_width.h
+
+if ! test -f UnicodeData.txt; then
+	wget http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
+fi &&
+if ! test -f EastAsianWidth.txt; then
+	wget http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
+fi &&
+if ! test -d uniset; then
+	git clone https://github.com/depp/uniset.git
+fi &&
 (
-	if ! test -f UnicodeData.txt; then
-		wget http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
+	cd uniset &&
+	if ! test -x uniset; then
+		autoreconf -i &&
+		./configure --enable-warnings=-Werror CFLAGS='-O0 -ggdb'
 	fi &&
-	if ! test -f EastAsianWidth.txt; then
-		wget http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
-	fi &&
-	if ! test -d uniset; then
-		git clone https://github.com/depp/uniset.git
-	fi &&
-	(
-		cd uniset &&
-		if ! test -x uniset; then
-			autoreconf -i &&
-			./configure --enable-warnings=-Werror CFLAGS='-O0 -ggdb'
-		fi &&
-		make
-	) &&
-	UNICODE_DIR=. && export UNICODE_DIR &&
-	cat >$UNICODEWIDTH_H <<-EOF
-	static const struct interval zero_width[] = {
-		$(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD |
-		  grep -v plane)
-	};
-	static const struct interval double_width[] = {
-		$(uniset/uniset --32 eaw:F,W)
-	};
-	EOF
-)
+	make
+) &&
+UNICODE_DIR=. && export UNICODE_DIR &&
+cat >$UNICODEWIDTH_H <<-EOF
+static const struct interval zero_width[] = {
+	$(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD |
+	  grep -v plane)
+};
+static const struct interval double_width[] = {
+	$(uniset/uniset --32 eaw:F,W)
+};
+EOF
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/6] update_unicode.sh: pin the uniset repo to a known good commit
  2016-12-13 23:31 [PATCH v2 0/6] unicode_width.h: update the width tables to Unicode 9.0 Beat Bolli
  2016-12-13 23:31 ` [PATCH v2 1/6] update_unicode.sh: move it into contrib/update-unicode Beat Bolli
  2016-12-13 23:31 ` [PATCH v2 2/6] update_unicode.sh: remove an unnecessary subshell level Beat Bolli
@ 2016-12-13 23:31 ` Beat Bolli
  2016-12-15  9:47   ` Dennis Kaarsemaker
  2016-12-13 23:31 ` [PATCH v2 4/6] update-unicode.sh: automatically download newer definition files Beat Bolli
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Beat Bolli @ 2016-12-13 23:31 UTC (permalink / raw)
  To: git; +Cc: Beat Bolli

The uniset upstream has added more commits that for example change the
hexadecimal output in '--32' mode to decimal. Let's pin the repo to a
commit that still outputs the width tables in the format we want.

Signed-off-by: Beat Bolli <dev+git@drbeat.li>
---
 contrib/update-unicode/update_unicode.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/update-unicode/update_unicode.sh b/contrib/update-unicode/update_unicode.sh
index ff664ec..9f1bf31 100755
--- a/contrib/update-unicode/update_unicode.sh
+++ b/contrib/update-unicode/update_unicode.sh
@@ -15,7 +15,8 @@ if ! test -f EastAsianWidth.txt; then
 	wget http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
 fi &&
 if ! test -d uniset; then
-	git clone https://github.com/depp/uniset.git
+	git clone https://github.com/depp/uniset.git &&
+	( cd uniset && git checkout 4b186196dd )
 fi &&
 (
 	cd uniset &&
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 4/6] update-unicode.sh: automatically download newer definition files
  2016-12-13 23:31 [PATCH v2 0/6] unicode_width.h: update the width tables to Unicode 9.0 Beat Bolli
                   ` (2 preceding siblings ...)
  2016-12-13 23:31 ` [PATCH v2 3/6] update_unicode.sh: pin the uniset repo to a known good commit Beat Bolli
@ 2016-12-13 23:31 ` Beat Bolli
  2016-12-14 17:40   ` Beat Bolli
  2016-12-13 23:31 ` [PATCH v2 5/6] update_unicode.sh: remove the plane filter Beat Bolli
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Beat Bolli @ 2016-12-13 23:31 UTC (permalink / raw)
  To: git; +Cc: Beat Bolli

Checking just for the unicode data files' existence is not sufficient;
we should also download them if a newer version exists on the Unicode
consortium's servers. Option -N of wget does this nicely for us.

Reviewed-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Beat Bolli <dev+git@drbeat.li>
---
 contrib/update-unicode/update_unicode.sh | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/contrib/update-unicode/update_unicode.sh b/contrib/update-unicode/update_unicode.sh
index 9f1bf31..56871a1 100755
--- a/contrib/update-unicode/update_unicode.sh
+++ b/contrib/update-unicode/update_unicode.sh
@@ -8,12 +8,8 @@
 cd "$(dirname "$0")"
 UNICODEWIDTH_H=$(git rev-parse --show-toplevel)/unicode_width.h
 
-if ! test -f UnicodeData.txt; then
-	wget http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
-fi &&
-if ! test -f EastAsianWidth.txt; then
-	wget http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
-fi &&
+wget -N http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt \
+	http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt &&
 if ! test -d uniset; then
 	git clone https://github.com/depp/uniset.git &&
 	( cd uniset && git checkout 4b186196dd )
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 5/6] update_unicode.sh: remove the plane filter
  2016-12-13 23:31 [PATCH v2 0/6] unicode_width.h: update the width tables to Unicode 9.0 Beat Bolli
                   ` (3 preceding siblings ...)
  2016-12-13 23:31 ` [PATCH v2 4/6] update-unicode.sh: automatically download newer definition files Beat Bolli
@ 2016-12-13 23:31 ` Beat Bolli
  2016-12-13 23:31 ` [PATCH v2 6/6] unicode_width.h: update the width tables to Unicode 9.0 Beat Bolli
  2016-12-14  1:14 ` [PATCH v2 0/6] " Junio C Hamano
  6 siblings, 0 replies; 12+ messages in thread
From: Beat Bolli @ 2016-12-13 23:31 UTC (permalink / raw)
  To: git; +Cc: Beat Bolli

The uniset upstream has accepted my patches that eliminate the Unicode
plane offsets from the output in '--32' mode.

Remove the corresponding filter in update_unicode.sh.

This also fixes the issue that the plane offsets were not removed from
the second uniset call.

Signed-off-by: Beat Bolli <dev+git@drbeat.li>
---
 contrib/update-unicode/update_unicode.sh | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/contrib/update-unicode/update_unicode.sh b/contrib/update-unicode/update_unicode.sh
index 56871a1..e05db92 100755
--- a/contrib/update-unicode/update_unicode.sh
+++ b/contrib/update-unicode/update_unicode.sh
@@ -25,8 +25,7 @@ fi &&
 UNICODE_DIR=. && export UNICODE_DIR &&
 cat >$UNICODEWIDTH_H <<-EOF
 static const struct interval zero_width[] = {
-	$(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD |
-	  grep -v plane)
+	$(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD)
 };
 static const struct interval double_width[] = {
 	$(uniset/uniset --32 eaw:F,W)
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 6/6] unicode_width.h: update the width tables to Unicode 9.0
  2016-12-13 23:31 [PATCH v2 0/6] unicode_width.h: update the width tables to Unicode 9.0 Beat Bolli
                   ` (4 preceding siblings ...)
  2016-12-13 23:31 ` [PATCH v2 5/6] update_unicode.sh: remove the plane filter Beat Bolli
@ 2016-12-13 23:31 ` Beat Bolli
  2016-12-14  1:14 ` [PATCH v2 0/6] " Junio C Hamano
  6 siblings, 0 replies; 12+ messages in thread
From: Beat Bolli @ 2016-12-13 23:31 UTC (permalink / raw)
  To: git; +Cc: Beat Bolli

Rerunning update-unicode.sh that we fixed in the previous commits
produces these new tables.

Signed-off-by: Beat Bolli <dev+git@drbeat.li>
---
 unicode_width.h | 131 +++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 107 insertions(+), 24 deletions(-)

diff --git a/unicode_width.h b/unicode_width.h
index 47cdd23..02207be 100644
--- a/unicode_width.h
+++ b/unicode_width.h
@@ -25,7 +25,7 @@ static const struct interval zero_width[] = {
 { 0x0825, 0x0827 },
 { 0x0829, 0x082D },
 { 0x0859, 0x085B },
-{ 0x08E4, 0x0902 },
+{ 0x08D4, 0x0902 },
 { 0x093A, 0x093A },
 { 0x093C, 0x093C },
 { 0x0941, 0x0948 },
@@ -120,6 +120,7 @@ static const struct interval zero_width[] = {
 { 0x17C9, 0x17D3 },
 { 0x17DD, 0x17DD },
 { 0x180B, 0x180E },
+{ 0x1885, 0x1886 },
 { 0x18A9, 0x18A9 },
 { 0x1920, 0x1922 },
 { 0x1927, 0x1928 },
@@ -158,7 +159,7 @@ static const struct interval zero_width[] = {
 { 0x1CF4, 0x1CF4 },
 { 0x1CF8, 0x1CF9 },
 { 0x1DC0, 0x1DF5 },
-{ 0x1DFC, 0x1DFF },
+{ 0x1DFB, 0x1DFF },
 { 0x200B, 0x200F },
 { 0x202A, 0x202E },
 { 0x2060, 0x2064 },
@@ -171,13 +172,13 @@ static const struct interval zero_width[] = {
 { 0x3099, 0x309A },
 { 0xA66F, 0xA672 },
 { 0xA674, 0xA67D },
-{ 0xA69F, 0xA69F },
+{ 0xA69E, 0xA69F },
 { 0xA6F0, 0xA6F1 },
 { 0xA802, 0xA802 },
 { 0xA806, 0xA806 },
 { 0xA80B, 0xA80B },
 { 0xA825, 0xA826 },
-{ 0xA8C4, 0xA8C4 },
+{ 0xA8C4, 0xA8C5 },
 { 0xA8E0, 0xA8F1 },
 { 0xA926, 0xA92D },
 { 0xA947, 0xA951 },
@@ -204,7 +205,7 @@ static const struct interval zero_width[] = {
 { 0xABED, 0xABED },
 { 0xFB1E, 0xFB1E },
 { 0xFE00, 0xFE0F },
-{ 0xFE20, 0xFE2D },
+{ 0xFE20, 0xFE2F },
 { 0xFEFF, 0xFEFF },
 { 0xFFF9, 0xFFFB },
 { 0x101FD, 0x101FD },
@@ -228,16 +229,21 @@ static const struct interval zero_width[] = {
 { 0x11173, 0x11173 },
 { 0x11180, 0x11181 },
 { 0x111B6, 0x111BE },
+{ 0x111CA, 0x111CC },
 { 0x1122F, 0x11231 },
 { 0x11234, 0x11234 },
 { 0x11236, 0x11237 },
+{ 0x1123E, 0x1123E },
 { 0x112DF, 0x112DF },
 { 0x112E3, 0x112EA },
-{ 0x11301, 0x11301 },
+{ 0x11300, 0x11301 },
 { 0x1133C, 0x1133C },
 { 0x11340, 0x11340 },
 { 0x11366, 0x1136C },
 { 0x11370, 0x11374 },
+{ 0x11438, 0x1143F },
+{ 0x11442, 0x11444 },
+{ 0x11446, 0x11446 },
 { 0x114B3, 0x114B8 },
 { 0x114BA, 0x114BA },
 { 0x114BF, 0x114C0 },
@@ -245,6 +251,7 @@ static const struct interval zero_width[] = {
 { 0x115B2, 0x115B5 },
 { 0x115BC, 0x115BD },
 { 0x115BF, 0x115C0 },
+{ 0x115DC, 0x115DD },
 { 0x11633, 0x1163A },
 { 0x1163D, 0x1163D },
 { 0x1163F, 0x11640 },
@@ -252,6 +259,16 @@ static const struct interval zero_width[] = {
 { 0x116AD, 0x116AD },
 { 0x116B0, 0x116B5 },
 { 0x116B7, 0x116B7 },
+{ 0x1171D, 0x1171F },
+{ 0x11722, 0x11725 },
+{ 0x11727, 0x1172B },
+{ 0x11C30, 0x11C36 },
+{ 0x11C38, 0x11C3D },
+{ 0x11C3F, 0x11C3F },
+{ 0x11C92, 0x11CA7 },
+{ 0x11CAA, 0x11CB0 },
+{ 0x11CB2, 0x11CB3 },
+{ 0x11CB5, 0x11CB6 },
 { 0x16AF0, 0x16AF4 },
 { 0x16B30, 0x16B36 },
 { 0x16F8F, 0x16F92 },
@@ -262,31 +279,59 @@ static const struct interval zero_width[] = {
 { 0x1D185, 0x1D18B },
 { 0x1D1AA, 0x1D1AD },
 { 0x1D242, 0x1D244 },
+{ 0x1DA00, 0x1DA36 },
+{ 0x1DA3B, 0x1DA6C },
+{ 0x1DA75, 0x1DA75 },
+{ 0x1DA84, 0x1DA84 },
+{ 0x1DA9B, 0x1DA9F },
+{ 0x1DAA1, 0x1DAAF },
+{ 0x1E000, 0x1E006 },
+{ 0x1E008, 0x1E018 },
+{ 0x1E01B, 0x1E021 },
+{ 0x1E023, 0x1E024 },
+{ 0x1E026, 0x1E02A },
 { 0x1E8D0, 0x1E8D6 },
+{ 0x1E944, 0x1E94A },
 { 0xE0001, 0xE0001 },
 { 0xE0020, 0xE007F },
 { 0xE0100, 0xE01EF }
 };
 static const struct interval double_width[] = {
-{ /* plane */ 0x0, 0x1C },
-{ /* plane */ 0x1C, 0x21 },
-{ /* plane */ 0x21, 0x22 },
-{ /* plane */ 0x22, 0x23 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
-{ /* plane */ 0x0, 0x0 },
 { 0x1100, 0x115F },
+{ 0x231A, 0x231B },
 { 0x2329, 0x232A },
+{ 0x23E9, 0x23EC },
+{ 0x23F0, 0x23F0 },
+{ 0x23F3, 0x23F3 },
+{ 0x25FD, 0x25FE },
+{ 0x2614, 0x2615 },
+{ 0x2648, 0x2653 },
+{ 0x267F, 0x267F },
+{ 0x2693, 0x2693 },
+{ 0x26A1, 0x26A1 },
+{ 0x26AA, 0x26AB },
+{ 0x26BD, 0x26BE },
+{ 0x26C4, 0x26C5 },
+{ 0x26CE, 0x26CE },
+{ 0x26D4, 0x26D4 },
+{ 0x26EA, 0x26EA },
+{ 0x26F2, 0x26F3 },
+{ 0x26F5, 0x26F5 },
+{ 0x26FA, 0x26FA },
+{ 0x26FD, 0x26FD },
+{ 0x2705, 0x2705 },
+{ 0x270A, 0x270B },
+{ 0x2728, 0x2728 },
+{ 0x274C, 0x274C },
+{ 0x274E, 0x274E },
+{ 0x2753, 0x2755 },
+{ 0x2757, 0x2757 },
+{ 0x2795, 0x2797 },
+{ 0x27B0, 0x27B0 },
+{ 0x27BF, 0x27BF },
+{ 0x2B1B, 0x2B1C },
+{ 0x2B50, 0x2B50 },
+{ 0x2B55, 0x2B55 },
 { 0x2E80, 0x2E99 },
 { 0x2E9B, 0x2EF3 },
 { 0x2F00, 0x2FD5 },
@@ -313,11 +358,49 @@ static const struct interval double_width[] = {
 { 0xFE68, 0xFE6B },
 { 0xFF01, 0xFF60 },
 { 0xFFE0, 0xFFE6 },
+{ 0x16FE0, 0x16FE0 },
+{ 0x17000, 0x187EC },
+{ 0x18800, 0x18AF2 },
 { 0x1B000, 0x1B001 },
+{ 0x1F004, 0x1F004 },
+{ 0x1F0CF, 0x1F0CF },
+{ 0x1F18E, 0x1F18E },
+{ 0x1F191, 0x1F19A },
 { 0x1F200, 0x1F202 },
-{ 0x1F210, 0x1F23A },
+{ 0x1F210, 0x1F23B },
 { 0x1F240, 0x1F248 },
 { 0x1F250, 0x1F251 },
+{ 0x1F300, 0x1F320 },
+{ 0x1F32D, 0x1F335 },
+{ 0x1F337, 0x1F37C },
+{ 0x1F37E, 0x1F393 },
+{ 0x1F3A0, 0x1F3CA },
+{ 0x1F3CF, 0x1F3D3 },
+{ 0x1F3E0, 0x1F3F0 },
+{ 0x1F3F4, 0x1F3F4 },
+{ 0x1F3F8, 0x1F43E },
+{ 0x1F440, 0x1F440 },
+{ 0x1F442, 0x1F4FC },
+{ 0x1F4FF, 0x1F53D },
+{ 0x1F54B, 0x1F54E },
+{ 0x1F550, 0x1F567 },
+{ 0x1F57A, 0x1F57A },
+{ 0x1F595, 0x1F596 },
+{ 0x1F5A4, 0x1F5A4 },
+{ 0x1F5FB, 0x1F64F },
+{ 0x1F680, 0x1F6C5 },
+{ 0x1F6CC, 0x1F6CC },
+{ 0x1F6D0, 0x1F6D2 },
+{ 0x1F6EB, 0x1F6EC },
+{ 0x1F6F4, 0x1F6F6 },
+{ 0x1F910, 0x1F91E },
+{ 0x1F920, 0x1F927 },
+{ 0x1F930, 0x1F930 },
+{ 0x1F933, 0x1F93E },
+{ 0x1F940, 0x1F94B },
+{ 0x1F950, 0x1F95E },
+{ 0x1F980, 0x1F991 },
+{ 0x1F9C0, 0x1F9C0 },
 { 0x20000, 0x2FFFD },
 { 0x30000, 0x3FFFD }
 };
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/6] unicode_width.h: update the width tables to Unicode 9.0
  2016-12-13 23:31 [PATCH v2 0/6] unicode_width.h: update the width tables to Unicode 9.0 Beat Bolli
                   ` (5 preceding siblings ...)
  2016-12-13 23:31 ` [PATCH v2 6/6] unicode_width.h: update the width tables to Unicode 9.0 Beat Bolli
@ 2016-12-14  1:14 ` Junio C Hamano
  6 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2016-12-14  1:14 UTC (permalink / raw)
  To: Beat Bolli; +Cc: git

Thanks. Very much appreciated.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 4/6] update-unicode.sh: automatically download newer definition files
  2016-12-13 23:31 ` [PATCH v2 4/6] update-unicode.sh: automatically download newer definition files Beat Bolli
@ 2016-12-14 17:40   ` Beat Bolli
  2016-12-14 17:50     ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Beat Bolli @ 2016-12-14 17:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List

On 14.12.16 00:31, Beat Bolli wrote:

> [PATCH v2 4/6] update-unicode.sh: automatically download newer definition files

Dang! And again I'm not capable of putting an underline instead of the
dash...

Junio, would you please reword the subject to

Re: [PATCH v2 4/6] update_unicode.sh: automatically download newer
definition files

Thanks,
Beat


> we should also download them if a newer version exists on the Unicode
> consortium's servers. Option -N of wget does this nicely for us.
> 
> Reviewed-by: Torsten Bögershausen <tboegi@web.de>
> Signed-off-by: Beat Bolli <dev+git@drbeat.li>
> ---
>  contrib/update-unicode/update_unicode.sh | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/contrib/update-unicode/update_unicode.sh b/contrib/update-unicode/update_unicode.sh
> index 9f1bf31..56871a1 100755
> --- a/contrib/update-unicode/update_unicode.sh
> +++ b/contrib/update-unicode/update_unicode.sh
> @@ -8,12 +8,8 @@
>  cd "$(dirname "$0")"
>  UNICODEWIDTH_H=$(git rev-parse --show-toplevel)/unicode_width.h
>  
> -if ! test -f UnicodeData.txt; then
> -	wget http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
> -fi &&
> -if ! test -f EastAsianWidth.txt; then
> -	wget http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
> -fi &&
> +wget -N http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt \
> +	http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt &&
>  if ! test -d uniset; then
>  	git clone https://github.com/depp/uniset.git &&
>  	( cd uniset && git checkout 4b186196dd )
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 4/6] update-unicode.sh: automatically download newer definition files
  2016-12-14 17:40   ` Beat Bolli
@ 2016-12-14 17:50     ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2016-12-14 17:50 UTC (permalink / raw)
  To: Beat Bolli; +Cc: Git List

Beat Bolli <dev+git@drbeat.li> writes:

> On 14.12.16 00:31, Beat Bolli wrote:
>
>> [PATCH v2 4/6] update-unicode.sh: automatically download newer definition files
>
> Dang! And again I'm not capable of putting an underline instead of the
> dash...
>
> Junio, would you please reword the subject to
>
> Re: [PATCH v2 4/6] update_unicode.sh: automatically download newer
> definition files

Will do.

This is an indication that the script is probably named against
people's expectation.  We may want to rename it after the dust
settles.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/6] update_unicode.sh: pin the uniset repo to a known good commit
  2016-12-13 23:31 ` [PATCH v2 3/6] update_unicode.sh: pin the uniset repo to a known good commit Beat Bolli
@ 2016-12-15  9:47   ` Dennis Kaarsemaker
  2016-12-15 17:50     ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Dennis Kaarsemaker @ 2016-12-15  9:47 UTC (permalink / raw)
  To: Beat Bolli, git

On Wed, 2016-12-14 at 00:31 +0100, Beat Bolli wrote:
> +       ( cd uniset && git checkout 4b186196dd )

Micronit, but this is perhaps better written as

git -C uniset checkout 4b186196dd

to avoid the subshell and cd.

D.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/6] update_unicode.sh: pin the uniset repo to a known good commit
  2016-12-15  9:47   ` Dennis Kaarsemaker
@ 2016-12-15 17:50     ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2016-12-15 17:50 UTC (permalink / raw)
  To: Dennis Kaarsemaker; +Cc: Beat Bolli, git

Dennis Kaarsemaker <dennis@kaarsemaker.net> writes:

> On Wed, 2016-12-14 at 00:31 +0100, Beat Bolli wrote:
>> +       ( cd uniset && git checkout 4b186196dd )
>
> Micronit, but this is perhaps better written as
>
> git -C uniset checkout 4b186196dd
>
> to avoid the subshell and cd.
>
> D.

In the context of this script, I would say that is not even a
micronit.  It is "you could do this if you wanted to".

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-12-15 17:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-13 23:31 [PATCH v2 0/6] unicode_width.h: update the width tables to Unicode 9.0 Beat Bolli
2016-12-13 23:31 ` [PATCH v2 1/6] update_unicode.sh: move it into contrib/update-unicode Beat Bolli
2016-12-13 23:31 ` [PATCH v2 2/6] update_unicode.sh: remove an unnecessary subshell level Beat Bolli
2016-12-13 23:31 ` [PATCH v2 3/6] update_unicode.sh: pin the uniset repo to a known good commit Beat Bolli
2016-12-15  9:47   ` Dennis Kaarsemaker
2016-12-15 17:50     ` Junio C Hamano
2016-12-13 23:31 ` [PATCH v2 4/6] update-unicode.sh: automatically download newer definition files Beat Bolli
2016-12-14 17:40   ` Beat Bolli
2016-12-14 17:50     ` Junio C Hamano
2016-12-13 23:31 ` [PATCH v2 5/6] update_unicode.sh: remove the plane filter Beat Bolli
2016-12-13 23:31 ` [PATCH v2 6/6] unicode_width.h: update the width tables to Unicode 9.0 Beat Bolli
2016-12-14  1:14 ` [PATCH v2 0/6] " Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).