From: tboegi@web.de
To: git@vger.kernel.org, alexander.s.m@gmail.com, Johannes.Schindelin@gmx.de
Cc: "Torsten Bögershausen" <tboegi@web.de>
Subject: [PATCH v4 2/2] diff.c: More changes and tests around utf8_strwidth()
Date: Sat, 3 Sep 2022 07:39:34 +0200 [thread overview]
Message-ID: <20220903053934.15629-1-tboegi@web.de> (raw)
In-Reply-To: <CA+VDVVVmi99i6ZY64tg8RkVXDc5gOzQP_SH12zhDKRkUnhWFgw@mail.gmail.com>
From: Torsten Bögershausen <tboegi@web.de>
When unicode filenames (encoded in UTF-8) are used, the visible width
on the screen is not the same as strlen(filename).
For example, `git log --stat` may produce an output like this:
[snip the header]
Arger.txt | 1 +
Ärger.txt | 1 +
2 files changed, 2 insertions(+)
The last commit uses utf8_strwidth() instead of strlen() in diff.c
and it is time to test the change.
And now we detect another problem that is fixed here: code like this
strbuf_addf(&out, "%-*s", len, name);
(or using the underlying snprintf() function) does not align the
buffer to a minimum of len measured in screen-width, but uses the
memory count.
One could be tempted to wish that snprintf() was UTF-8 aware.
That doesn't seem to be the case anywhere (tested on Linux and Mac),
probably snprintf() uses the "bytes in memory"/strlen() approach to be
compatible with older versions and this will never change.
The basic idea is to change code in diff.c like this
strbuf_addf(&out, "%-*s", len, name);
into something like this:
int padding = len - utf8_strwidth(name);
if (padding < 0)
padding = 0;
strbuf_addf(&out, " %s%*s", name, padding, "");
The real change is slighty bigger, as it, as well, integrates two calls
of strbuf_addf() into one.
Tests:
Two things need to be tested:
- The calculation of the maximum width
- The calculation of padding
The name "textfile" is changed into "tëxtfilë", both have a width of 8.
If strlen() was used, to get the maximum width, the shorter "binfile" would
have been mis-aligned:
binfile | [snip]
tëxtfilë | [snip]
If only "binfile" would be renamed into "binfilë":
binfilë | [snip]
textfile | [snip]
In order to verify that the width is calculated correctly everywhere,
"binfile" is renamed into "binfilë", giving 1 bytes more in strlen()
"tëxtfile" is renamed into "tëxtfilë", 2 byte more in strlen().
The updated t4012-diff-binary.sh checks the correct aligment:
binfilë | [snip]
tëxtfilë | [snip]
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
diff.c | 23 ++++++++++++++---------
t/t4012-diff-binary.sh | 14 +++++++-------
2 files changed, 21 insertions(+), 16 deletions(-)
diff --git a/diff.c b/diff.c
index b5df464de5..35b9da90fe 100644
--- a/diff.c
+++ b/diff.c
@@ -2734,7 +2734,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
char *name = file->print_name;
uintmax_t added = file->added;
uintmax_t deleted = file->deleted;
- int name_len;
+ int name_len, padding;
if (!file->is_interesting && (added + deleted == 0))
continue;
@@ -2753,10 +2753,14 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
if (slash)
name = slash;
}
+ padding = len - utf8_strwidth(name);
+ if (padding < 0)
+ padding = 0;
if (file->is_binary) {
- strbuf_addf(&out, " %s%-*s |", prefix, len, name);
- strbuf_addf(&out, " %*s", number_width, "Bin");
+ strbuf_addf(&out, " %s%s%*s | %*s",
+ prefix, name, padding, "",
+ number_width, "Bin");
if (!added && !deleted) {
strbuf_addch(&out, '\n');
emit_diff_symbol(options, DIFF_SYMBOL_STATS_LINE,
@@ -2776,8 +2780,9 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
continue;
}
else if (file->is_unmerged) {
- strbuf_addf(&out, " %s%-*s |", prefix, len, name);
- strbuf_addstr(&out, " Unmerged\n");
+ strbuf_addf(&out, " %s%s%*s | %*s",
+ prefix, name, padding, "",
+ number_width, "Unmerged");
emit_diff_symbol(options, DIFF_SYMBOL_STATS_LINE,
out.buf, out.len, 0);
strbuf_reset(&out);
@@ -2803,10 +2808,10 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
add = total - del;
}
}
- strbuf_addf(&out, " %s%-*s |", prefix, len, name);
- strbuf_addf(&out, " %*"PRIuMAX"%s",
- number_width, added + deleted,
- added + deleted ? " " : "");
+ strbuf_addf(&out, " %s%s%*s | %*"PRIuMAX"%s",
+ prefix, name, padding, "",
+ number_width, added + deleted,
+ added + deleted ? " " : "");
show_graph(&out, '+', add, add_c, reset);
show_graph(&out, '-', del, del_c, reset);
strbuf_addch(&out, '\n');
diff --git a/t/t4012-diff-binary.sh b/t/t4012-diff-binary.sh
index c509143c81..c64d9d2f40 100755
--- a/t/t4012-diff-binary.sh
+++ b/t/t4012-diff-binary.sh
@@ -113,20 +113,20 @@ test_expect_success 'diff --no-index with binary creation' '
'
cat >expect <<EOF
- binfile | Bin 0 -> 1026 bytes
- textfile | 10000 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ binfilë | Bin 0 -> 1026 bytes
+ tëxtfilë | 10000 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
EOF
test_expect_success 'diff --stat with binary files and big change count' '
- printf "\01\00%1024d" 1 >binfile &&
- git add binfile &&
+ printf "\01\00%1024d" 1 >binfilë &&
+ git add binfilë &&
i=0 &&
while test $i -lt 10000; do
echo $i &&
i=$(($i + 1)) || return 1
- done >textfile &&
- git add textfile &&
- git diff --cached --stat binfile textfile >output &&
+ done >tëxtfilë &&
+ git add tëxtfilë &&
+ git -c core.quotepath=false diff --cached --stat binfilë tëxtfilë >output &&
grep " | " output >actual &&
test_cmp expect actual
'
--
2.34.0
next prev parent reply other threads:[~2022-09-03 5:40 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-09 13:11 [BUG] Unicode filenames handling in `git log --stat` Alexander Meshcheryakov
2022-08-09 18:20 ` Calvin Wan
2022-08-09 19:03 ` Alexander Meshcheryakov
2022-08-09 21:36 ` Calvin Wan
2022-08-10 5:55 ` Junio C Hamano
2022-08-10 8:40 ` Torsten Bögershausen
2022-08-10 8:56 ` Alexander Meshcheryakov
2022-08-10 9:51 ` Torsten Bögershausen
2022-08-10 11:41 ` Torsten Bögershausen
2022-08-10 15:53 ` Junio C Hamano
2022-08-10 17:35 ` Torsten Bögershausen
2022-08-14 13:35 ` [PATCH/RFC 1/1] diff.c: When appropriate, use utf8_strwidth() tboegi
2022-08-14 23:12 ` Junio C Hamano
2022-08-15 6:34 ` Torsten Bögershausen
2022-08-18 21:00 ` Junio C Hamano
2022-08-27 8:50 ` [PATCH v2 " tboegi
2022-08-27 8:54 ` Torsten Bögershausen
2022-08-27 9:50 ` Eric Sunshine
2022-08-29 12:04 ` Johannes Schindelin
2022-08-29 17:54 ` Torsten Bögershausen
2022-08-29 18:37 ` Junio C Hamano
2022-09-02 9:47 ` Johannes Schindelin
2022-09-02 4:21 ` [PATCH v3 1/2] diff.c: When appropriate, use utf8_strwidth(), part1 tboegi
2022-09-02 9:39 ` Johannes Schindelin
2022-09-02 4:21 ` [PATCH v3 2/2] diff.c: More changes and tests around utf8_strwidth() tboegi
2022-09-02 10:12 ` Johannes Schindelin
2022-09-03 5:39 ` [PATCH v4 1/2] diff.c: When appropriate, use utf8_strwidth(), part1 tboegi
2022-09-05 20:46 ` Junio C Hamano
2022-09-07 4:30 ` Torsten Bögershausen
2022-09-07 18:31 ` Junio C Hamano
2022-09-03 5:39 ` tboegi [this message]
2022-09-05 10:13 ` [PATCH v4 2/2] diff.c: More changes and tests around utf8_strwidth() Johannes Schindelin
2022-09-14 15:13 ` [PATCH v5 1/1] diff.c: When appropriate, use utf8_strwidth() tboegi
2022-09-14 16:40 ` Junio C Hamano
2022-09-26 18:43 ` Torsten Bögershausen
2022-10-10 21:58 ` Junio C Hamano
2022-10-20 15:46 ` Torsten Bögershausen
2022-10-20 17:43 ` Junio C Hamano
2022-10-21 15:19 ` Torsten Bögershausen
2022-10-21 21:59 ` Junio C Hamano
2022-10-23 20:02 ` Torsten Bögershausen
2022-09-15 2:57 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220903053934.15629-1-tboegi@web.de \
--to=tboegi@web.de \
--cc=Johannes.Schindelin@gmx.de \
--cc=alexander.s.m@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).