git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Philip Oakley <philipoakley@iee.email>
To: GitList <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>
Cc: Taylor Blau <me@ttaylorr.com>,
	NSENGIYUMVA WILBERFORCE <nsengiyumvawilberforce@gmail.com>,
	self <philipoakley@iee.email>
Subject: [PATCH v5 5/5] doc: pretty-formats note wide char limitations, and add tests
Date: Thu, 19 Jan 2023 18:18:27 +0000	[thread overview]
Message-ID: <20230119181827.1319-6-philipoakley@iee.email> (raw)
In-Reply-To: <20230119181827.1319-1-philipoakley@iee.email>

The previous commits added clarifications to the column alignment
placeholders, note that the spaces are optional around the parameters.

Also, a proposed extension [1] to allow hard truncation (without
ellipsis '..') highlighted that the existing code does not play well
with wide characters, such as Asian fonts and emojis.

For example, N wide characters take 2N columns so won't fit an odd number
column width, causing misalignment somewhere.

Further analysis also showed that decomposed characters, e.g. separate
`a` + `umlaut` Unicode code-points may also be mis-counted, in some cases
leaving multiple loose `umlauts` all combined together.

Add some notes about these limitations, and add basic tests to demonstrate
them.

The chosen solution for the tests is to substitute any wide character
that overlaps a splitting boundary for the unicode vertical ellipsis
code point as a rare but 'obvious' substitution.

An alternative could be the substitution with a single dot '.' which
matches regular expression usage, and our two dot ellipsis, and further
in scenarios where the bulk of the text is wide characters, would be
obvious. In mainly 'ascii' scenarios a singleton emoji being substituted
by a dot could be confusing.

It is enough that the tests fail cleanly. The final choice for the
substitute character can be deferred.

[1]
https://lore.kernel.org/git/20221030185614.3842-1-philipoakley@iee.email/

Signed-off-by: Philip Oakley <philipoakley@iee.email>
---
 Documentation/pretty-formats.txt |  5 +++++
 t/t4205-log-pretty-formats.sh    | 27 +++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
index e51f1e54e1..3b71334459 100644
--- a/Documentation/pretty-formats.txt
+++ b/Documentation/pretty-formats.txt
@@ -157,6 +157,11 @@ The placeholders are:
 				  only works correctly with N >= 2.
 				  Note 2: spaces around the N and M (see below)
 				  values are optional.
+				  Note 3: Emojis and other wide characters
+				  will take two display columns, which may
+				  over-run column boundaries.
+				  Note 4: decomposed character combining marks
+				  may be misplaced at padding boundaries.
 '%<|( <M> )':: make the next placeholder take at least until Mth
 	     display column, padding spaces on the right if necessary.
 	     Use negative M values for column positions measured
diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
index 0404491d6e..2cba0e0c56 100755
--- a/t/t4205-log-pretty-formats.sh
+++ b/t/t4205-log-pretty-formats.sh
@@ -1018,4 +1018,31 @@ test_expect_success '%(describe:abbrev=...) vs git describe --abbrev=...' '
 	test_cmp expect actual
 '
 
+# pretty-formats note wide char limitations, and add tests
+test_expect_failure 'wide and decomposed characters column counting' '
+
+# from t/lib-unicode-nfc-nfd.sh hex values converted to octal
+	utf8_nfc=$(printf "\303\251") && # e acute combined.
+	utf8_nfd=$(printf "\145\314\201") && # e with a combining acute (i.e. decomposed)
+	utf8_emoji=$(printf "\360\237\221\250") &&
+
+# replacement character when requesting a wide char fits in a single display colum.
+# "half wide" alternative could be a plain ASCII dot `.`
+	utf8_vert_ell=$(printf "\342\213\256") &&
+
+# use ${xxx} here!
+	nfc10="${utf8_nfc}${utf8_nfc}${utf8_nfc}${utf8_nfc}${utf8_nfc}${utf8_nfc}${utf8_nfc}${utf8_nfc}${utf8_nfc}${utf8_nfc}" &&
+	nfd10="${utf8_nfd}${utf8_nfd}${utf8_nfd}${utf8_nfd}${utf8_nfd}${utf8_nfd}${utf8_nfd}${utf8_nfd}${utf8_nfd}${utf8_nfd}" &&
+	emoji5="${utf8_emoji}${utf8_emoji}${utf8_emoji}${utf8_emoji}${utf8_emoji}" &&
+# emoji5 uses 10 display columns
+
+	test_commit "abcdefghij" &&
+	test_commit --no-tag "${nfc10}" &&
+	test_commit --no-tag "${nfd10}" &&
+	test_commit --no-tag "${emoji5}" &&
+	printf "${utf8_emoji}..${utf8_emoji}${utf8_vert_ell}\n${utf8_nfd}..${utf8_nfd}${utf8_nfd}\n${utf8_nfc}..${utf8_nfc}${utf8_nfc}\na..ij\n" >expected &&
+	git log --format="%<(5,mtrunc)%s" -4 >actual &&
+	test_cmp expected actual
+'
+
 test_done
-- 
2.39.1.windows.1


      parent reply	other threads:[~2023-01-19 18:20 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-30 18:56 [PATCH 0/1] extend the truncating pretty formats Philip Oakley
2022-10-30 18:56 ` [PATCH 1/1] pretty-formats: add hard truncation, without ellipsis, options Philip Oakley
2022-10-30 19:23   ` Taylor Blau
2022-10-30 22:01     ` Philip Oakley
2022-10-30 23:42       ` Taylor Blau
2022-10-30 21:41 ` [PATCH 0/1] extend the truncating pretty formats Philip Oakley
2022-11-01 22:57 ` [PATCH v2 " Philip Oakley
2022-11-01 22:57   ` [PATCH v2 1/1] pretty-formats: add hard truncation, without ellipsis, options Philip Oakley
2022-11-01 23:05     ` Philip Oakley
2022-11-02  0:45       ` Taylor Blau
2022-11-02 12:08         ` [PATCH v3] " Philip Oakley
2022-11-12 14:36           ` [PATCH v4] " Philip Oakley
2022-11-21  0:34             ` Junio C Hamano
2022-11-21 18:10               ` Philip Oakley
2022-11-22  0:57                 ` Junio C Hamano
2022-11-23 14:26                   ` Philip Oakley
2022-11-25  7:11                     ` Junio C Hamano
2022-11-26 14:32                       ` Philip Oakley
2022-11-26 22:44                         ` Philip Oakley
2022-11-26 23:19                         ` Junio C Hamano
2022-11-28 13:39                           ` Philip Oakley
2022-11-29  0:18                             ` Junio C Hamano
2022-12-07  0:24                           ` Philip Oakley
2022-12-07  0:54                             ` Junio C Hamano
2023-01-19 18:18             ` [PATCH v5 0/5] Pretty formats: Clarify column alignment Philip Oakley
2023-01-19 18:18               ` [PATCH v5 1/5] doc: pretty-formats: separate parameters from placeholders Philip Oakley
2023-01-19 18:18               ` [PATCH v5 2/5] doc: pretty-formats: delineate `%<|(` parameter values Philip Oakley
2023-01-19 18:18               ` [PATCH v5 3/5] doc: pretty-formats document negative column alignments Philip Oakley
2023-01-19 18:18               ` [PATCH v5 4/5] doc: pretty-formats describe use of ellipsis in truncation Philip Oakley
2023-01-19 18:18               ` Philip Oakley [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230119181827.1319-6-philipoakley@iee.email \
    --to=philipoakley@iee.email \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=nsengiyumvawilberforce@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).