git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Philip Oakley <philipoakley@iee.email>
To: Junio C Hamano <gitster@pobox.com>
Cc: GitList <git@vger.kernel.org>, Taylor Blau <me@ttaylorr.com>,
	NSENGIYUMVA WILBERFORCE <nsengiyumvawilberforce@gmail.com>
Subject: Re: [PATCH v4] pretty-formats: add hard truncation, without ellipsis, options
Date: Wed, 7 Dec 2022 00:24:34 +0000	[thread overview]
Message-ID: <093e1dca-b9d4-f1f2-0845-ad6711622cf5@iee.email> (raw)
In-Reply-To: <xmqq35a5cnhq.fsf@gitster.g>

Apologies for the delays.
On 26/11/2022 23:19, Junio C Hamano wrote:
> Philip Oakley <philipoakley@iee.email> writes:
>
>>>  in that they may do "[][].." or "[][][]" when told to
>>> "trunc" fill a string with four or more double-width letters into a
>>> 5 display space.  But the point is at least for these with ellipsis
>>> it is fairly clear what the desired behaviour is.
>> That "is fairly clear" is probably the problem. In retrospect it's not
>> clear in the docs that the "%<(N" format is (would appear to be) about
>> defining the display width, in terminal character columns, that the
>> selected parameter is to be displayed within.
>>
>> The code already pads the displayed parameter with spaces as required if
>> the parameter is shorter than the display width - the else condition in
>> pretty.c L1750
>>
>>>   For "trunc" in
>>> the above example, I think the right thing for it to do would be to
>>> do "[][].", i.e. consume exactly 5 display columns, and avoid
>>> exceeding the given space by not giving two dots but just one.
>> The existing choice is padding "[][]" with a single space to reach 5
>> display chars.
>> For the 6-char "[][][]" truncation it is "[][..", i.e. 3 chars from
>> "[][][]", then the two ".." dots of the ellipsis.
> Here, I realize that I did not explain the scenario well.  The
> message you are responding to was meant to be a clarification of my
> earlier message and it should have done a better job but apparently
> I failed.  Sorry, and let me try again.
>
> The single example I meant to use to illustrate the scenario I worry
> about is this.  There is a string, in which there are four (or more)
> letters, each of which occupies two display columns.

I hadn't appreciated that utf8 has 'wide' characters that are
effectively double width, i.e. use 2 display columns, such as emojis.
I had been well aware of the multi-byte nature of utf8, and vaguely
aware of the potential for combined characters.

>   And '[]' in my
> earlier messages stood for a SINGLE such letter (I just wanted to
> stick to ASCII, instead of using East Asian script, for
> illustration).  So "[][.." is not possible (you are chomping the
> second such letter in half).
>
> I could use East Asian 一二三四 (there are four letters, denoting
> one, two, three, and four, each occupying two display spaces when
> typeset in a fixed width font),

These 4 characters came through ok.
>  but to make it easier to see in
> ASCII only text, let's pretend "[1]", "[2]", "[3]", "[4]" are such
> letters.  You cannot chomp them in the middle (and please pretend
> each of them occupy two, not three, display spaces).
>
> When the given display space is 6 columns, we can fit 2 such letters
> plus ".." in the space.  If the original string were [1][2][3][4],
> it is clear trunk and ltrunk can do "[1][2].." 

> (remember [n] stands
> for a single letter whose width is 2 columns, so that takes 6
> columns) 

Aside: The man pages aren't that clear about the distinction between
display columns and characters, both for these width based formats
(allow this placeholder parameter a width of <N> columns), and the
terminal's column position formats (start this parameter at the
on-screen column <N>) .


> and "..[3][4]", respectively.  It also is clear that Trunk
> and Ltrunk can do "[1][2][3]" and "[2][3][4]", respectively.  We
> truncate the given string so that we fill the alloted display
> columns fully.

While this example is clear, it's not clear what should be done if we
have mixed width strings, e.g. with emojis, as the boundaries in random
text will also be randomly placed.
>
> If the given display space is 5 columns, the desirable behaviour for
> trunk and ltrunk is still clear.  Instead of consuming two dots, we
> could use a single dot as the filler.  As I said, I suspect that the
> implementation of trunk and ltrunc does this correctly, though.

The current implementation is buggy in the case of combining character
code points. We forget to add the (zero space) combining code points
once we have the base character when it is the character before the
split (where the double dot ".." is inserted). I.e. the zero space
characters are added after the ".." double dot.

This can cause problems with the existing code for `mtrunc` where the
'lost' combing code points then become attached to the preceding "two
dots".
>
> My worry is it is not clear what Trunk and Ltrunk should do in that
> case.  There is no way to fit a substring of [1][2][3][4] into 5
> columns without any filler.

I'm not sure that anyone has fully solved that issue of what to do when
a wide character overlaps the end of an available display space, such as
terminal word-wrap when resizing the window (in my mintty terminal
window it displays a 'space' then starts the wide character on a new line).

There's also a 'bug' reported for one of the microsoft terminals that
the user wants to position the cursor at a column that is the middle of
a wide emoji character and wants half of it over written!

For our case (no wordwrap) I'd be minded to to mark the end column with
a single width character to show that some (wide) thing should be here,
but we've had to cut it off, such as the vertical ellipsis. At least
it's explainable.

I'll at least work on the doc clarification regarding the column width,
column position and wide char (2-col) issue, and hopefully a few failing
tests for the combing code point and the wide char fitment issue.

--
Philip

  parent reply	other threads:[~2022-12-07  0:25 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-30 18:56 [PATCH 0/1] extend the truncating pretty formats Philip Oakley
2022-10-30 18:56 ` [PATCH 1/1] pretty-formats: add hard truncation, without ellipsis, options Philip Oakley
2022-10-30 19:23   ` Taylor Blau
2022-10-30 22:01     ` Philip Oakley
2022-10-30 23:42       ` Taylor Blau
2022-10-30 21:41 ` [PATCH 0/1] extend the truncating pretty formats Philip Oakley
2022-11-01 22:57 ` [PATCH v2 " Philip Oakley
2022-11-01 22:57   ` [PATCH v2 1/1] pretty-formats: add hard truncation, without ellipsis, options Philip Oakley
2022-11-01 23:05     ` Philip Oakley
2022-11-02  0:45       ` Taylor Blau
2022-11-02 12:08         ` [PATCH v3] " Philip Oakley
2022-11-12 14:36           ` [PATCH v4] " Philip Oakley
2022-11-21  0:34             ` Junio C Hamano
2022-11-21 18:10               ` Philip Oakley
2022-11-22  0:57                 ` Junio C Hamano
2022-11-23 14:26                   ` Philip Oakley
2022-11-25  7:11                     ` Junio C Hamano
2022-11-26 14:32                       ` Philip Oakley
2022-11-26 22:44                         ` Philip Oakley
2022-11-26 23:19                         ` Junio C Hamano
2022-11-28 13:39                           ` Philip Oakley
2022-11-29  0:18                             ` Junio C Hamano
2022-12-07  0:24                           ` Philip Oakley [this message]
2022-12-07  0:54                             ` Junio C Hamano
2023-01-19 18:18             ` [PATCH v5 0/5] Pretty formats: Clarify column alignment Philip Oakley
2023-01-19 18:18               ` [PATCH v5 1/5] doc: pretty-formats: separate parameters from placeholders Philip Oakley
2023-01-19 18:18               ` [PATCH v5 2/5] doc: pretty-formats: delineate `%<|(` parameter values Philip Oakley
2023-01-19 18:18               ` [PATCH v5 3/5] doc: pretty-formats document negative column alignments Philip Oakley
2023-01-19 18:18               ` [PATCH v5 4/5] doc: pretty-formats describe use of ellipsis in truncation Philip Oakley
2023-01-19 18:18               ` [PATCH v5 5/5] doc: pretty-formats note wide char limitations, and add tests Philip Oakley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=093e1dca-b9d4-f1f2-0845-ad6711622cf5@iee.email \
    --to=philipoakley@iee.email \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=nsengiyumvawilberforce@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).