git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Kristoffer Haugsbakk" <code@khaugsbakk.name>
Cc: "Manlio Perillo" <manlio.perillo@gmail.com>,  git@vger.kernel.org
Subject: Re: Clarify the meaning of "character" in the documentation
Date: Tue, 05 Mar 2024 07:32:38 -0800	[thread overview]
Message-ID: <xmqqsf1466eh.fsf@gitster.g> (raw)
In-Reply-To: <5e10f1e5-b87f-43cd-ac1e-d7c01b7dad21@app.fastmail.com> (Kristoffer Haugsbakk's message of "Tue, 05 Mar 2024 10:00:06 +0100")

"Kristoffer Haugsbakk" <code@khaugsbakk.name> writes:

>> As an example, with
>> git config --add core.commentChar •  // Bullet (U+2022)
>> git does not complain, but it is rejected later.
>
> I think this is more about `git config --add` not doing any
> validation. It just sets things. You can do `git config --add
> core.commentChar 'ffd'` and get the same effect.

This is not wrong per-se, but it merely explains why "config" takes
it without complaining (the command just does not know anything
about what each variable means and what the valid range of values
are).  core.commentChar is limited to "a byte" so in the context of
everything else (like commit log message in the editor) being UTF-8,
it means ASCII would only work there.

As you said, we should document core.commentChar as limited to an
ASCII character, at least as a short term solution.

I personally do not see a reason, however, why we need to be limited
to a single byte, though.  If a patch cleanly implements to allow us
to use any one-or-more-byte sequence as core.commentChar, I do not
offhand see a good reason to reject it---it would be fully backward
compatible and allows you to use a UTF-8 charcter outside ASCII, as
well as "//" and the like.

> Alphanumeric characters (a-z and A-Z and 0-9) are ASCII. And one ASCII
> char is represented using one byte in UTF-8. This already looks precise
> to me.

Correct.

> I’ve never run into a case where git-diff(1) does not handle UTF-8. I
> don’t even know if it really needs to “handle” it per se as opposed to
> just treating it as opaque bytes. Maybe it matters for things like
> whitespace and word-boundaries, I don’t know.

The core part of "diff" is very much line oriented, and after
chopping your random sequence of bytes at each LF that appears in
it, the code is pretty oblivious to the character boundary, except
for a few cases.  "-w" needs to know what the whitespace characters
are (it knows only the limited basic set like SP HT and probably
VT), "-i" needs to know that "A" and "a" are equivalent (I think it
only knows the ASCII, but I may be misremembering).  Outside the
core part of "diff", there are frills that need to know about
character boundaries, like chopping the function header comment
placed on a hunk header "@@ -1682,7 +1682,7 @@" to a reasonable
length, --color-words/--word-diff that first separates lines into
multi-character tokens and align matching sequences in them, etc.


  reply	other threads:[~2024-03-05 15:33 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-05  8:43 Clarify the meaning of "character" in the documentation Manlio Perillo
2024-03-05  9:00 ` Kristoffer Haugsbakk
2024-03-05 15:32   ` Junio C Hamano [this message]
2024-03-05 15:42     ` Dragan Simic
2024-03-05 16:38       ` Junio C Hamano
2024-03-05 17:28         ` Dragan Simic
2024-03-06  8:08         ` [messy PATCH] multi-byte core.commentChar Jeff King
2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
2024-03-07  9:15             ` [PATCH 01/15] strbuf: simplify comment-handling in add_lines() helper Jeff King
2024-03-07  9:16             ` [PATCH 02/15] strbuf: avoid static variables in strbuf_add_commented_lines() Jeff King
2024-03-07  9:18             ` [PATCH 03/15] commit: refactor base-case of adjust_comment_line_char() Jeff King
2024-03-07  9:19             ` [PATCH 04/15] strbuf: avoid shadowing global comment_line_char name Jeff King
2024-03-07  9:20             ` [PATCH 05/15] environment: store comment_line_char as a string Jeff King
2024-03-07  9:21             ` [PATCH 06/15] strbuf: accept a comment string for strbuf_stripspace() Jeff King
2024-03-07  9:53               ` Jeff King
2024-03-07  9:22             ` [PATCH 07/15] strbuf: accept a comment string for strbuf_commented_addf() Jeff King
2024-03-07  9:23             ` [PATCH 08/15] strbuf: accept a comment string for strbuf_add_commented_lines() Jeff King
2024-03-07  9:23             ` [PATCH 09/15] prefer comment_line_str to comment_line_char for printing Jeff King
2024-03-07  9:24             ` [PATCH 10/15] find multi-byte comment chars in NUL-terminated strings Jeff King
2024-03-07  9:26             ` [PATCH 11/15] find multi-byte comment chars in unterminated buffers Jeff King
2024-03-07 11:08               ` Jeff King
2024-03-07 19:41                 ` René Scharfe
2024-03-07 19:47                   ` René Scharfe
2024-03-07 19:42               ` René Scharfe
2024-03-08 10:17                 ` Phillip Wood
2024-03-08 15:58                   ` Junio C Hamano
2024-03-08 16:20                     ` Phillip Wood
2024-03-12  8:19                       ` Jeff King
2024-03-12 14:36                         ` phillip.wood123
2024-03-13  6:23                           ` Jeff King
2024-03-12  8:05                 ` Jeff King
2024-03-14 19:37                   ` René Scharfe
2024-03-07  9:27             ` [PATCH 12/15] sequencer: handle multi-byte comment characters when writing todo list Jeff King
2024-03-08 10:20               ` Phillip Wood
2024-03-12  8:21                 ` Jeff King
2024-03-07  9:28             ` [PATCH 13/15] wt-status: drop custom comment-char stringification Jeff King
2024-03-07  9:30             ` [PATCH 14/15] environment: drop comment_line_char compatibility macro Jeff King
2024-03-07  9:34             ` [PATCH 15/15] config: allow multi-byte core.commentChar Jeff King
2024-03-08 11:07             ` [PATCH 0/15] " Phillip Wood
2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
2024-03-12  9:17               ` [PATCH v2 01/16] config: forbid newline as core.commentChar Jeff King
2024-03-12  9:17               ` [PATCH v2 02/16] strbuf: simplify comment-handling in add_lines() helper Jeff King
2024-03-12  9:17               ` [PATCH v2 03/16] strbuf: avoid static variables in strbuf_add_commented_lines() Jeff King
2024-03-12  9:17               ` [PATCH v2 04/16] commit: refactor base-case of adjust_comment_line_char() Jeff King
2024-03-12  9:17               ` [PATCH v2 05/16] strbuf: avoid shadowing global comment_line_char name Jeff King
2024-03-12  9:17               ` [PATCH v2 06/16] environment: store comment_line_char as a string Jeff King
2024-03-12  9:17               ` [PATCH v2 07/16] strbuf: accept a comment string for strbuf_stripspace() Jeff King
2024-03-12  9:17               ` [PATCH v2 08/16] strbuf: accept a comment string for strbuf_commented_addf() Jeff King
2024-03-12  9:17               ` [PATCH v2 09/16] strbuf: accept a comment string for strbuf_add_commented_lines() Jeff King
2024-03-12  9:17               ` [PATCH v2 10/16] prefer comment_line_str to comment_line_char for printing Jeff King
2024-03-12  9:17               ` [PATCH v2 11/16] find multi-byte comment chars in NUL-terminated strings Jeff King
2024-03-12  9:17               ` [PATCH v2 12/16] find multi-byte comment chars in unterminated buffers Jeff King
2024-03-12  9:17               ` [PATCH v2 13/16] sequencer: handle multi-byte comment characters when writing todo list Jeff King
2024-03-12  9:17               ` [PATCH v2 14/16] wt-status: drop custom comment-char stringification Jeff King
2024-03-12  9:17               ` [PATCH v2 15/16] environment: drop comment_line_char compatibility macro Jeff King
2024-03-12  9:17               ` [PATCH v2 16/16] config: allow multi-byte core.commentChar Jeff King
2024-03-13 18:23                 ` Kristoffer Haugsbakk
2024-03-13 18:39                   ` Junio C Hamano
2024-03-15  5:59                   ` Jeff King
2024-03-15  7:16                     ` Kristoffer Haugsbakk
2024-03-15  8:10                       ` Jeff King
2024-03-15 13:30                         ` Kristoffer Haugsbakk
2024-03-15 15:40                         ` Junio C Hamano
2024-03-16  5:50                           ` Jeff King
2024-03-26 22:10                         ` Junio C Hamano
2024-03-26 22:12                           ` Kristoffer Haugsbakk
2024-03-27  7:46                           ` Jeff King
2024-03-27  8:19                             ` [PATCH 17/16] config: add core.commentString Jeff King
2024-03-27 12:45                               ` Chris Torek
2024-03-27 16:13                               ` Junio C Hamano
2024-03-28  9:47                                 ` Jeff King
2024-03-27 14:53                             ` [PATCH v2 16/16] config: allow multi-byte core.commentChar Junio C Hamano
2024-03-12 14:40               ` [PATCH v2 0/16] " phillip.wood123
2024-03-12 20:30                 ` Junio C Hamano
2024-03-05 16:58       ` Clarify the meaning of "character" in the documentation Kristoffer Haugsbakk
2024-03-05 17:20         ` Dragan Simic
2024-03-05 17:37           ` Kristoffer Haugsbakk
2024-03-05 21:19             ` Dragan Simic
2024-03-05 16:51     ` Kristoffer Haugsbakk
2024-03-05 17:37       ` Junio C Hamano
2024-03-05 17:49         ` Kristoffer Haugsbakk
2024-03-05 22:48   ` brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqsf1466eh.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=code@khaugsbakk.name \
    --cc=git@vger.kernel.org \
    --cc=manlio.perillo@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).