git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Philip Oakley <philipoakley@iee.email>
To: Junio C Hamano <gitster@pobox.com>
Cc: GitList <git@vger.kernel.org>, Taylor Blau <me@ttaylorr.com>,
	NSENGIYUMVA WILBERFORCE <nsengiyumvawilberforce@gmail.com>
Subject: Re: [PATCH v4] pretty-formats: add hard truncation, without ellipsis, options
Date: Mon, 28 Nov 2022 13:39:52 +0000	[thread overview]
Message-ID: <b7b84dde-a723-0773-279f-c04c7f35cb7f@iee.email> (raw)
In-Reply-To: <xmqq35a5cnhq.fsf@gitster.g>

On 26/11/2022 23:19, Junio C Hamano wrote:
> Philip Oakley <philipoakley@iee.email> writes:
>
>>>  in that they may do "[][].." or "[][][]" when told to
>>> "trunc" fill a string with four or more double-width letters into a
>>> 5 display space.  But the point is at least for these with ellipsis
>>> it is fairly clear what the desired behaviour is.
>> That "is fairly clear" is probably the problem. In retrospect it's not
>> clear in the docs that the "%<(N" format is (would appear to be) about
>> defining the display width, in terminal character columns, that the
>> selected parameter is to be displayed within.
>>
>> The code already pads the displayed parameter with spaces as required if
>> the parameter is shorter than the display width - the else condition in
>> pretty.c L1750
>>
>>>   For "trunc" in
>>> the above example, I think the right thing for it to do would be to
>>> do "[][].", i.e. consume exactly 5 display columns, and avoid
>>> exceeding the given space by not giving two dots but just one.
>> The existing choice is padding "[][]" with a single space to reach 5
>> display chars.
>> For the 6-char "[][][]" truncation it is "[][..", i.e. 3 chars from
>> "[][][]", then the two ".." dots of the ellipsis.
> Here, I realize that I did not explain the scenario well.  The
> message you are responding to was meant to be a clarification of my
> earlier message and it should have done a better job but apparently
> I failed.  Sorry, and let me try again.
>
> The single example I meant to use to illustrate the scenario I worry
> about is this.  There is a string, in which there are four (or more)
> letters, each of which occupies two display columns.  And '[]' in my
> earlier messages stood for a SINGLE such letter (I just wanted to
> stick to ASCII, instead of using East Asian script, for
> illustration).  So "[][.." is not possible (you are chomping the
> second such letter in half).
>
> I could use East Asian 一二三四 (there are four letters, denoting
> one, two, three, and four, each occupying two display spaces when
> typeset in a fixed width font),

Thanks for that clarification, I'd been thinking it was about c char
(bytes) such as ASCII and multi-byte characters (code points), e.g.
European umlaut style distinctions.

I hadn't really picked up on the distinction between wide and narrow
'glyphs' (if that's the right term to use).
 
I see that the code does properly count the widths of narrow and wide
code points as 1 and 2 columns of the display, but then doesn't
explicitly try any adjustment for the wide code point problem you noted.
>  but to make it easier to see in
> ASCII only text, let's pretend "[1]", "[2]", "[3]", "[4]" are such
> letters.  You cannot chomp them in the middle (and please pretend
> each of them occupy two, not three, display spaces).
>
> When the given display space is 6 columns, we can fit 2 such letters
> plus ".." in the space.  If the original string were [1][2][3][4],
> it is clear trunk and ltrunk can do "[1][2].." (remember [n] stands
> for a single letter whose width is 2 columns, so that takes 6
> columns) and "..[3][4]", respectively.  It also is clear that Trunk
> and Ltrunk can do "[1][2][3]" and "[2][3][4]", respectively.  We
> truncate the given string so that we fill the alloted display
> columns fully.
>
> If the given display space is 5 columns, the desirable behaviour for
> trunk and ltrunk is still clear.  Instead of consuming two dots, we
> could use a single dot as the filler.  As I said, I suspect that the
> implementation of trunk and ltrunc does this correctly, though.

I believe there is a possible solution that, if we detect a column
over-run, then we can go back and replace the current two column double
dot with a narrow U+2026 Horizontal ellipsis, to regain the needed column.
>
> My worry is it is not clear what Trunk and Ltrunk should do in that
> case.  There is no way to fit a substring of [1][2][3][4] into 5
> columns without any filler.
For this case where the final code point overruns, my solution
could/would be to use the Vertical ellipsis U+22EE "⋮" to re-write that
final character (though the Unicode Replacement Character "�" could be
used, but that's ugly)

I suspect the code would need some close reading to ensure that the
column counting and replacement would correctly cope with the 'off by
one' wide width case inside the strbuf_utf8_replace().

I.e. given the same off-by-one position and replacement length, get back
to the same point to replace either the double dot or the final code
point in an idempotent manner.

The logic feels sound, as long as there are no three wide crocodile
code-points. Either we counted the right number of columns, or we
over-ran by one, so we go back and substitute with a one-for-two
replacement.

Philip

For watchers, https://github.com/microsoft/terminal/issues/4345 shows
some of the issues in the general case.

  reply	other threads:[~2022-11-28 13:40 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-30 18:56 [PATCH 0/1] extend the truncating pretty formats Philip Oakley
2022-10-30 18:56 ` [PATCH 1/1] pretty-formats: add hard truncation, without ellipsis, options Philip Oakley
2022-10-30 19:23   ` Taylor Blau
2022-10-30 22:01     ` Philip Oakley
2022-10-30 23:42       ` Taylor Blau
2022-10-30 21:41 ` [PATCH 0/1] extend the truncating pretty formats Philip Oakley
2022-11-01 22:57 ` [PATCH v2 " Philip Oakley
2022-11-01 22:57   ` [PATCH v2 1/1] pretty-formats: add hard truncation, without ellipsis, options Philip Oakley
2022-11-01 23:05     ` Philip Oakley
2022-11-02  0:45       ` Taylor Blau
2022-11-02 12:08         ` [PATCH v3] " Philip Oakley
2022-11-12 14:36           ` [PATCH v4] " Philip Oakley
2022-11-21  0:34             ` Junio C Hamano
2022-11-21 18:10               ` Philip Oakley
2022-11-22  0:57                 ` Junio C Hamano
2022-11-23 14:26                   ` Philip Oakley
2022-11-25  7:11                     ` Junio C Hamano
2022-11-26 14:32                       ` Philip Oakley
2022-11-26 22:44                         ` Philip Oakley
2022-11-26 23:19                         ` Junio C Hamano
2022-11-28 13:39                           ` Philip Oakley [this message]
2022-11-29  0:18                             ` Junio C Hamano
2022-12-07  0:24                           ` Philip Oakley
2022-12-07  0:54                             ` Junio C Hamano
2023-01-19 18:18             ` [PATCH v5 0/5] Pretty formats: Clarify column alignment Philip Oakley
2023-01-19 18:18               ` [PATCH v5 1/5] doc: pretty-formats: separate parameters from placeholders Philip Oakley
2023-01-19 18:18               ` [PATCH v5 2/5] doc: pretty-formats: delineate `%<|(` parameter values Philip Oakley
2023-01-19 18:18               ` [PATCH v5 3/5] doc: pretty-formats document negative column alignments Philip Oakley
2023-01-19 18:18               ` [PATCH v5 4/5] doc: pretty-formats describe use of ellipsis in truncation Philip Oakley
2023-01-19 18:18               ` [PATCH v5 5/5] doc: pretty-formats note wide char limitations, and add tests Philip Oakley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b7b84dde-a723-0773-279f-c04c7f35cb7f@iee.email \
    --to=philipoakley@iee.email \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=nsengiyumvawilberforce@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).