From: "Torsten Bögershausen" <tboegi@web.de>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: git@vger.kernel.org, alexander.s.m@gmail.com
Subject: Re: [PATCH v2 1/1] diff.c: When appropriate, use utf8_strwidth()
Date: Mon, 29 Aug 2022 19:54:25 +0200 [thread overview]
Message-ID: <20220829175425.cmbwtqpxrq4ppnnk@tb-raspi4> (raw)
In-Reply-To: <0q921n79-sr17-2794-83r0-r59rnqq03pp2@tzk.qr>
On Mon, Aug 29, 2022 at 02:04:42PM +0200, Johannes Schindelin wrote:
> Hi Torsten,
> >
> > The choosen solution is to split code in diff.c like this
> >
> > strbuf_addf(&out, "%-*s", len, name);
> >
> > into something like this:
> >
> > size_t num_padding_spaces = 0;
> > // [snip]
> > if (len > utf8_strwidth(name))
> > num_padding_spaces = len - utf8_strwidth(name);
> > strbuf_addf(&out, "%s", name);
> > if (num_padding_spaces)
> > strbuf_addchars(&out, ' ', num_padding_spaces);
>
> ... this sounds like it would benefit from beinv refactored into a
> separate function, e.g. `strbuf_add_padded(buf, utf8string)`, both for
> readability as well as for self-documentation.
Yes, but:
All (tm) strbuf() functions use an unsigned size_t, and are not
tolerant against passing 0 as "do nothing".
A nicer solution (for this patch) could be a change like this:
Instead of
void strbuf_addchars(struct strbuf *sb, int c, size_t n)
{
strbuf_grow(sb, n);
memset(sb->buf + sb->len, c, n);
strbuf_setlen(sb, sb->len + n);
}
We would find:
void strbuf_addchars(struct strbuf *sb, int c, ssize_t n)
{
if (n <= 0)
return;
strbuf_grow(sb, (size_t)n);
memset(sb->buf + sb->len, c, (size_t)n);
strbuf_setlen(sb, sb->len + (size_t)n);
}
I couldn't convince myself to do so.
Since it is mainly diff.c that needs this adjustment/padding of strings,
I coulnd't convince myself to write another function in strbuf.c
>
> Also, it is unclear to me why we have to evaluate `utf8_strwidth()`
> _twice_ and why we do not assign the result to a variable called `width`
> and then have a conditional like
>
> if (width < len) /* pad to `len` columns */
> strbuf_addchars(&out, ' ' , len - width);
>
> instead. That would sound more logical to me.
This is caused by the logic in diff.c:
/*
* Find the longest filename and max number of changes
*/
for (i = 0; (i < count) && (i < data->nr); i++) {
struct diffstat_file *file = data->files[i];
[snip]
len = utf8_strwidth(file->print_name);
if (max_width < len)
max_width = len;
// and later
/*
* From here name_width is the width of the name area,
* and graph_width is the width of the graph area.
* max_change is used to scale graph properly.
*/
for (i = 0; i < count; i++) {
/*
* "scale" the filename
*/
// TB: Which means either shortening it with ...
// Or padding it, if needed, and here we need
// another
name_len = utf8_strwidth(name);
>
> Besides, since the simple change from `strlen()` to `utf8_strwidth()` is
> so different from changing `strbuf_addf(...)`, I would prefer to see them
> split into two patches.
Hm, that is a possiblity. Seems to ease the burden for reviewers.
>
> >
> > Tests:
> > Two things need to be tested:
> > - The calculation of the maximum width
> > - The calculation of num_padding_spaces
> >
> > The name "textfile" is changed into "textfilë", both have a width of 8.
> > If strlen() was used, to get the maximum width, the shorter "binfile" would
> > have been mis-aligned:
> > binfile | [snip]
> > textfilë | [snip]
> >
> > If only "binfile" would be renamed into "binfilë":
> > binfilë | [snip]
> > textfile | [snip]
> >
> > In order to verify that the width is calculated correctly everywhere,
> > "binfile" is renamed into "binfïlë", giving 2 bytes more in strlen()
> > "textfile" is renamed into "textfilë", 1 byte more in strlen(),
> > and the updated t4012-diff-binary.sh checks the correct aligment:
> > binfïlë | [snip]
> > textfilë | [snip]
>
> I wonder whether you can change only _one_ name and still verify the
> correctness. When you make two changes at the same time, it is always
> possible for one change to "cancel out" the other one, and therefore it is
> harder to reason about the correctness of your patch.
Nee, I have a hard time to see how a +/- 1 can "cancel out" a +/- 2.
But I may improve the commit message, to make that more clear.
>
> Better keep it simple and change only one instance (personally,
> I would have changed two letters in the longer one).
That is certainly doable.
>
> >
> > Reported-by: Alexander Meshcheryakov <alexander.s.m@gmail.com>
> > Signed-off-by: Torsten Bögershausen <tboegi@web.de>
> > ---
> > diff.c | 37 +++++++++++++++++++++++--------------
> > t/t4012-diff-binary.sh | 14 +++++++-------
> > 2 files changed, 30 insertions(+), 21 deletions(-)
> >
> > diff --git a/diff.c b/diff.c
> > index 974626a621..cf38e1dc88 100644
> > --- a/diff.c
> > +++ b/diff.c
> > @@ -2591,7 +2591,7 @@ void print_stat_summary(FILE *fp, int files,
> > static void show_stats(struct diffstat_t *data, struct diff_options *options)
> > {
> > int i, len, add, del, adds = 0, dels = 0;
> > - uintmax_t max_change = 0, max_len = 0;
> > + uintmax_t max_change = 0, max_width = 0;
>
> Why rename `max_len`, but not `len`? I would have expected (and agreed
> with seeing) `len` to be renamed to `width`, too.
That is a valid point.
There is, however, already a variable called "width".
And renaming this one into a new one as well ?
>
> > int total_files = data->nr, count;
> > int width, name_width, graph_width, number_width = 0, bin_width = 0;
> > const char *reset, *add_c, *del_c;
> > @@ -2620,9 +2620,9 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
> > continue;
> > }
> > fill_print_name(file);
> > - len = strlen(file->print_name);
> > - if (max_len < len)
> > - max_len = len;
> > + len = utf8_strwidth(file->print_name);
> > + if (max_width < len)
> > + max_width = len;
> >
> > if (file->is_unmerged) {
> > /* "Unmerged" is 8 characters */
> > @@ -2646,7 +2646,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
> >
> > /*
> > * We have width = stat_width or term_columns() columns total.
> > - * We want a maximum of min(max_len, stat_name_width) for the name part.
> > + * We want a maximum of min(max_width, stat_name_width) for the name part.
> > * We want a maximum of min(max_change, stat_graph_width) for the +- part.
> > * We also need 1 for " " and 4 + decimal_width(max_change)
> > * for " | NNNN " and one the empty column at the end, altogether
> > @@ -2701,8 +2701,8 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
> > graph_width = options->stat_graph_width;
> >
> > name_width = (options->stat_name_width > 0 &&
> > - options->stat_name_width < max_len) ?
> > - options->stat_name_width : max_len;
> > + options->stat_name_width < max_width) ?
> > + options->stat_name_width : max_width;
>
> It is a bit sad that the diff lines regarding the renamed variable drown
> out the actual change (`strlen()` -> `utf8_strwidth()`). But the end
> result is nicer.
>
> Thank you for working on this!
> Dscho
Thanks so much for the review - let's see if I can make a better patch
the next days (better say weeks)
next prev parent reply other threads:[~2022-08-29 17:54 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-09 13:11 [BUG] Unicode filenames handling in `git log --stat` Alexander Meshcheryakov
2022-08-09 18:20 ` Calvin Wan
2022-08-09 19:03 ` Alexander Meshcheryakov
2022-08-09 21:36 ` Calvin Wan
2022-08-10 5:55 ` Junio C Hamano
2022-08-10 8:40 ` Torsten Bögershausen
2022-08-10 8:56 ` Alexander Meshcheryakov
2022-08-10 9:51 ` Torsten Bögershausen
2022-08-10 11:41 ` Torsten Bögershausen
2022-08-10 15:53 ` Junio C Hamano
2022-08-10 17:35 ` Torsten Bögershausen
2022-08-14 13:35 ` [PATCH/RFC 1/1] diff.c: When appropriate, use utf8_strwidth() tboegi
2022-08-14 23:12 ` Junio C Hamano
2022-08-15 6:34 ` Torsten Bögershausen
2022-08-18 21:00 ` Junio C Hamano
2022-08-27 8:50 ` [PATCH v2 " tboegi
2022-08-27 8:54 ` Torsten Bögershausen
2022-08-27 9:50 ` Eric Sunshine
2022-08-29 12:04 ` Johannes Schindelin
2022-08-29 17:54 ` Torsten Bögershausen [this message]
2022-08-29 18:37 ` Junio C Hamano
2022-09-02 9:47 ` Johannes Schindelin
2022-09-02 4:21 ` [PATCH v3 1/2] diff.c: When appropriate, use utf8_strwidth(), part1 tboegi
2022-09-02 9:39 ` Johannes Schindelin
2022-09-02 4:21 ` [PATCH v3 2/2] diff.c: More changes and tests around utf8_strwidth() tboegi
2022-09-02 10:12 ` Johannes Schindelin
2022-09-03 5:39 ` [PATCH v4 1/2] diff.c: When appropriate, use utf8_strwidth(), part1 tboegi
2022-09-05 20:46 ` Junio C Hamano
2022-09-07 4:30 ` Torsten Bögershausen
2022-09-07 18:31 ` Junio C Hamano
2022-09-03 5:39 ` [PATCH v4 2/2] diff.c: More changes and tests around utf8_strwidth() tboegi
2022-09-05 10:13 ` Johannes Schindelin
2022-09-14 15:13 ` [PATCH v5 1/1] diff.c: When appropriate, use utf8_strwidth() tboegi
2022-09-14 16:40 ` Junio C Hamano
2022-09-26 18:43 ` Torsten Bögershausen
2022-10-10 21:58 ` Junio C Hamano
2022-10-20 15:46 ` Torsten Bögershausen
2022-10-20 17:43 ` Junio C Hamano
2022-10-21 15:19 ` Torsten Bögershausen
2022-10-21 21:59 ` Junio C Hamano
2022-10-23 20:02 ` Torsten Bögershausen
2022-09-15 2:57 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220829175425.cmbwtqpxrq4ppnnk@tb-raspi4 \
--to=tboegi@web.de \
--cc=Johannes.Schindelin@gmx.de \
--cc=alexander.s.m@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).