From: Beat Bolli <dev+git@drbeat.li>
To: git@vger.kernel.org
Cc: Beat Bolli <dev+git@drbeat.li>
Subject: [PATCH v2 1/6] update_unicode.sh: move it into contrib/update-unicode
Date: Wed, 14 Dec 2016 00:31:39 +0100 [thread overview]
Message-ID: <1481671904-1143-2-git-send-email-dev+git@drbeat.li> (raw)
In-Reply-To: <1481671904-1143-1-git-send-email-dev+git@drbeat.li>
As it's used only by a tiny minority of the Git developer population,
this script does not belong into the main Git source directory.
Move it into contrib/ and adjust the paths to account for the new
location.
Signed-off-by: Beat Bolli <dev+git@drbeat.li>
---
.gitignore | 1 -
contrib/update-unicode/.gitignore | 3 +++
contrib/update-unicode/README | 20 ++++++++++++++++
contrib/update-unicode/update_unicode.sh | 38 ++++++++++++++++++++++++++++++
update_unicode.sh | 40 --------------------------------
5 files changed, 61 insertions(+), 41 deletions(-)
create mode 100644 contrib/update-unicode/.gitignore
create mode 100644 contrib/update-unicode/README
create mode 100755 contrib/update-unicode/update_unicode.sh
delete mode 100755 update_unicode.sh
diff --git a/.gitignore b/.gitignore
index f96e50e..5555ae0 100644
--- a/.gitignore
+++ b/.gitignore
@@ -204,7 +204,6 @@
/config.mak.autogen
/config.mak.append
/configure
-/unicode
/tags
/TAGS
/cscope*
diff --git a/contrib/update-unicode/.gitignore b/contrib/update-unicode/.gitignore
new file mode 100644
index 0000000..b0ebc6a
--- /dev/null
+++ b/contrib/update-unicode/.gitignore
@@ -0,0 +1,3 @@
+uniset/
+UnicodeData.txt
+EastAsianWidth.txt
diff --git a/contrib/update-unicode/README b/contrib/update-unicode/README
new file mode 100644
index 0000000..b9e2fc8
--- /dev/null
+++ b/contrib/update-unicode/README
@@ -0,0 +1,20 @@
+TL;DR: Run update_unicode.sh after the publication of a new Unicode
+standard and commit the resulting unicode_widths.h file.
+
+The long version
+================
+
+The Git source code ships the file unicode_widths.h which contains
+tables of zero and double width Unicode code points, respectively.
+These tables are generated using update_unicode.sh in this directory.
+update_unicode.sh itself uses a third-party tool, uniset, to query two
+Unicode data files for the interesting code points.
+
+On first run, update_unicode.sh clones uniset from Github and builds it.
+This requires a current-ish version of autoconf (2.69 works per December
+2016).
+
+On each run, update_unicode.sh checks whether more recent Unicode data
+files are available from the Unicode consortium, and rebuilds the header
+unicode_widths.h with the new data. The new header can then be
+committed.
diff --git a/contrib/update-unicode/update_unicode.sh b/contrib/update-unicode/update_unicode.sh
new file mode 100755
index 0000000..7b90126
--- /dev/null
+++ b/contrib/update-unicode/update_unicode.sh
@@ -0,0 +1,38 @@
+#!/bin/sh
+#See http://www.unicode.org/reports/tr44/
+#
+#Me Enclosing_Mark an enclosing combining mark
+#Mn Nonspacing_Mark a nonspacing combining mark (zero advance width)
+#Cf Format a format control character
+#
+cd "$(dirname "$0")"
+UNICODEWIDTH_H=$(git rev-parse --show-toplevel)/unicode_width.h
+(
+ if ! test -f UnicodeData.txt; then
+ wget http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
+ fi &&
+ if ! test -f EastAsianWidth.txt; then
+ wget http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
+ fi &&
+ if ! test -d uniset; then
+ git clone https://github.com/depp/uniset.git
+ fi &&
+ (
+ cd uniset &&
+ if ! test -x uniset; then
+ autoreconf -i &&
+ ./configure --enable-warnings=-Werror CFLAGS='-O0 -ggdb'
+ fi &&
+ make
+ ) &&
+ UNICODE_DIR=. && export UNICODE_DIR &&
+ cat >$UNICODEWIDTH_H <<-EOF
+ static const struct interval zero_width[] = {
+ $(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD |
+ grep -v plane)
+ };
+ static const struct interval double_width[] = {
+ $(uniset/uniset --32 eaw:F,W)
+ };
+ EOF
+)
diff --git a/update_unicode.sh b/update_unicode.sh
deleted file mode 100755
index 27af77c..0000000
--- a/update_unicode.sh
+++ /dev/null
@@ -1,40 +0,0 @@
-#!/bin/sh
-#See http://www.unicode.org/reports/tr44/
-#
-#Me Enclosing_Mark an enclosing combining mark
-#Mn Nonspacing_Mark a nonspacing combining mark (zero advance width)
-#Cf Format a format control character
-#
-UNICODEWIDTH_H=../unicode_width.h
-if ! test -d unicode; then
- mkdir unicode
-fi &&
-( cd unicode &&
- if ! test -f UnicodeData.txt; then
- wget http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
- fi &&
- if ! test -f EastAsianWidth.txt; then
- wget http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
- fi &&
- if ! test -d uniset; then
- git clone https://github.com/depp/uniset.git
- fi &&
- (
- cd uniset &&
- if ! test -x uniset; then
- autoreconf -i &&
- ./configure --enable-warnings=-Werror CFLAGS='-O0 -ggdb'
- fi &&
- make
- ) &&
- UNICODE_DIR=. && export UNICODE_DIR &&
- cat >$UNICODEWIDTH_H <<-EOF
- static const struct interval zero_width[] = {
- $(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD |
- grep -v plane)
- };
- static const struct interval double_width[] = {
- $(uniset/uniset --32 eaw:F,W)
- };
- EOF
-)
--
2.7.2
next prev parent reply other threads:[~2016-12-13 23:44 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-13 23:31 [PATCH v2 0/6] unicode_width.h: update the width tables to Unicode 9.0 Beat Bolli
2016-12-13 23:31 ` Beat Bolli [this message]
2016-12-13 23:31 ` [PATCH v2 2/6] update_unicode.sh: remove an unnecessary subshell level Beat Bolli
2016-12-13 23:31 ` [PATCH v2 3/6] update_unicode.sh: pin the uniset repo to a known good commit Beat Bolli
2016-12-15 9:47 ` Dennis Kaarsemaker
2016-12-15 17:50 ` Junio C Hamano
2016-12-13 23:31 ` [PATCH v2 4/6] update-unicode.sh: automatically download newer definition files Beat Bolli
2016-12-14 17:40 ` Beat Bolli
2016-12-14 17:50 ` Junio C Hamano
2016-12-13 23:31 ` [PATCH v2 5/6] update_unicode.sh: remove the plane filter Beat Bolli
2016-12-13 23:31 ` [PATCH v2 6/6] unicode_width.h: update the width tables to Unicode 9.0 Beat Bolli
2016-12-14 1:14 ` [PATCH v2 0/6] " Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1481671904-1143-2-git-send-email-dev+git@drbeat.li \
--to=dev+git@drbeat.li \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).