[PATCH] pretty-options.txt: describe supported encoding

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH] pretty-options.txt: describe supported encoding
@ 2021-08-26 21:34 Christopher Yeleighton via GitGitGadget
  2021-08-27 10:46 ` Bagas Sanjaya
  0 siblings, 1 reply; 7+ messages in thread
From: Christopher Yeleighton via GitGitGadget @ 2021-08-26 21:34 UTC (permalink / raw)
  To: git; +Cc: Christopher Yeleighton, Christopher Yeleighton

From: Christopher Yeleighton <ne01026@shark.2a.pl>

Please fix the manual for git log.  It should say what encoding is recognised
(namely if supported by iconv(1), except that POSIX character maps of
iconv(1p) are not supported), and that an unrecognised encoding is ignored.

Signed-off-by:  <ne01026@shark.2a.pl>
---
    log: describe supported encoding

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1079%2Fyecril71pl%2Fpatch-1-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1079/yecril71pl/patch-1-v1
Pull-Request: https://github.com/git/git/pull/1079

 Documentation/pretty-options.txt | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/Documentation/pretty-options.txt b/Documentation/pretty-options.txt
index 27ddaf84a19..4f8376d681b 100644
--- a/Documentation/pretty-options.txt
+++ b/Documentation/pretty-options.txt
@@ -36,9 +36,13 @@ people using 80-column terminals.
 	The commit objects record the encoding used for the log message
 	in their encoding header; this option can be used to tell the
 	command to re-code the commit log message in the encoding
-	preferred by the user.  For non plumbing commands this
-	defaults to UTF-8. Note that if an object claims to be encoded
-	in `X` and we are outputting in `X`, we will output the object
+	preferred by the user.
+	The encoding must be a system encoding supported by iconv(1),
+	otherwise this option will be ignored.
+	POSIX character maps used by iconv(1p) are not supported.
+	For non-plumbing commands this defaults to UTF-8.
+	Note that if an object claims to be encoded in `X`
+	and we are outputting in `X`, we shall output the object
 	verbatim; this means that invalid sequences in the original
 	commit may be copied to the output.
 

base-commit: c4203212e360b25a1c69467b5a8437d45a373cac
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] pretty-options.txt: describe supported encoding
  2021-08-26 21:34 [PATCH] pretty-options.txt: describe supported encoding Christopher Yeleighton via GitGitGadget
@ 2021-08-27 10:46 ` Bagas Sanjaya
  2021-08-27 11:47   ` Krzysztof Żelechowski
  2021-08-27 11:51   ` [PATCH v2] " Krzysztof Żelechowski
  0 siblings, 2 replies; 7+ messages in thread
From: Bagas Sanjaya @ 2021-08-27 10:46 UTC (permalink / raw)
  To: Christopher Yeleighton via GitGitGadget, git
  Cc: Christopher Yeleighton, Christopher Yeleighton

On 27/08/21 04.34, Christopher Yeleighton via GitGitGadget wrote:
> From: Christopher Yeleighton <ne01026@shark.2a.pl>
> 
> Please fix the manual for git log.  It should say what encoding is recognised
> (namely if supported by iconv(1), except that POSIX character maps of
> iconv(1p) are not supported), and that an unrecognised encoding is ignored.
> 
> Signed-off-by:  <ne01026@shark.2a.pl>
> ---

The commit message should be:
"git log recognizes only system encodings supported by iconv(1), but not 
POSIX character maps used by iconv(1p). Document it.".

>   	The commit objects record the encoding used for the log message
>   	in their encoding header; this option can be used to tell the
>   	command to re-code the commit log message in the encoding
> -	preferred by the user.  For non plumbing commands this
> -	defaults to UTF-8. Note that if an object claims to be encoded
> -	in `X` and we are outputting in `X`, we will output the object
> +	preferred by the user.
> +	The encoding must be a system encoding supported by iconv(1),
> +	otherwise this option will be ignored.
> +	POSIX character maps used by iconv(1p) are not supported.
> +	For non-plumbing commands this defaults to UTF-8.
> +	Note that if an object claims to be encoded in `X`
> +	and we are outputting in `X`, we shall output the object
>   	verbatim; this means that invalid sequences in the original
>   	commit may be copied to the output.
>   

I think POSIX character maps and encoding are the same, what are their 
differences? Reading iconv(1p) [1] doesn't give definition of the former.

[1]: https://man7.org/linux/man-pages/man1/iconv.1p.html

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] pretty-options.txt: describe supported encoding
  2021-08-27 10:46 ` Bagas Sanjaya
@ 2021-08-27 11:47   ` Krzysztof Żelechowski
  2021-08-27 11:51   ` [PATCH v2] " Krzysztof Żelechowski
  1 sibling, 0 replies; 7+ messages in thread
From: Krzysztof Żelechowski @ 2021-08-27 11:47 UTC (permalink / raw)
  To: Christopher Yeleighton via GitGitGadget, git, Bagas Sanjaya
  Cc: Christopher Yeleighton

Dnia piątek, 27 sierpnia 2021 12:46:22 CEST Bagas Sanjaya pisze:
> I think POSIX character maps and encoding are the same, what are their
> differences? Reading iconv(1p) [1] doesn't give definition of the former.
> 
> [1]: https://man7.org/linux/man-pages/man1/iconv.1p.html

System encoding providers are code, POSIX character maps are data.

Chris




^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2] pretty-options.txt: describe supported encoding
  2021-08-27 10:46 ` Bagas Sanjaya
  2021-08-27 11:47   ` Krzysztof Żelechowski
@ 2021-08-27 11:51   ` Krzysztof Żelechowski
  2021-08-27 17:03     ` Junio C Hamano
  1 sibling, 1 reply; 7+ messages in thread
From: Krzysztof Żelechowski @ 2021-08-27 11:51 UTC (permalink / raw)
  To: Christopher Yeleighton via GitGitGadget, git, Bagas Sanjaya
  Cc: Christopher Yeleighton

git log recognises only system encodings supported by iconv(1), but not 
POSIX character maps used by iconv(1p). Document it.

Signed-off-by:  <ne01026@shark.2a.pl>

diff --git a/Documentation/pretty-options.txt b/Documentation/pretty-
options.txt
index 27ddaf84a19..4f8376d681b 100644
--- a/Documentation/pretty-options.txt
+++ b/Documentation/pretty-options.txt
@@ -36,9 +36,13 @@ people using 80-column terminals.
        The commit objects record the encoding used for the log message
        in their encoding header; this option can be used to tell the
        command to re-code the commit log message in the encoding
-       preferred by the user.  For non plumbing commands this
-       defaults to UTF-8. Note that if an object claims to be encoded
-       in `X` and we are outputting in `X`, we will output the object
+       preferred by the user.
+       The encoding must be a system encoding supported by iconv(1),
+       otherwise this option will be ignored.
+       POSIX character maps used by iconv(1p) are not supported.
+       For non-plumbing commands this defaults to UTF-8.
+       Note that if an object claims to be encoded in `X`
+       and we are outputting in `X`, we shall output the object
        verbatim; this means that invalid sequences in the original
        commit may be copied to the output.
 



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] pretty-options.txt: describe supported encoding
  2021-08-27 11:51   ` [PATCH v2] " Krzysztof Żelechowski
@ 2021-08-27 17:03     ` Junio C Hamano
  2021-08-27 18:03       ` Jeff King
  2021-08-27 23:20       ` Krzysztof Żelechowski
  0 siblings, 2 replies; 7+ messages in thread
From: Junio C Hamano @ 2021-08-27 17:03 UTC (permalink / raw)
  To: Krzysztof Żelechowski
  Cc: Christopher Yeleighton via GitGitGadget, git, Bagas Sanjaya,
	Christopher Yeleighton

Krzysztof Żelechowski <giecrilj@stegny.2a.pl> writes:

> git log recognises only system encodings supported by iconv(1), but not 
> POSIX character maps used by iconv(1p). Document it.
>
> Signed-off-by:  <ne01026@shark.2a.pl>

The "Human Readable Name <email@add.re.ss>" on this line must match
the one on the "From: " line that records the author of the patch.

If you are forwarding somebody else's patch (with or without
improvement), we also need your sign off.

> diff --git a/Documentation/pretty-options.txt b/Documentation/pretty-
> options.txt
> index 27ddaf84a19..4f8376d681b 100644
> --- a/Documentation/pretty-options.txt
> +++ b/Documentation/pretty-options.txt
> @@ -36,9 +36,13 @@ people using 80-column terminals.
>         The commit objects record the encoding used for the log message
>         in their encoding header; this option can be used to tell the
>         command to re-code the commit log message in the encoding
> -       preferred by the user.  For non plumbing commands this
> -       defaults to UTF-8. Note that if an object claims to be encoded
> -       in `X` and we are outputting in `X`, we will output the object
> +       preferred by the user.

> +       The encoding must be a system encoding supported by iconv(1),
> +       otherwise this option will be ignored.
> +       POSIX character maps used by iconv(1p) are not supported.

This paragraph is a bit hard to grok.

I think it is saying that the "-f frommap -t tomap" form in [*1*]
that can use arbitrary character set description file is not
supported, but "-f fromcode -t tocode" form, which also is what
iconv_open() takes [*2*], is supported.  Am I reading it correctly?

Is there an easier-to-read way to explain the distinction to our
average reader?

What I am getting at is this.  Imagine average users who need to see
their commits recoded to iso-8859-2.  They see "git log" has
"--encoding=<encoding>" option, read the above paragraph and wonder
if they are on the supported side or unsupported side of the above
paragraph.  I want to make it easy for them to stop wondering.

For that purpose, "iconv(1) vs iconv(1p)" would not help them very
much, especially considering that not all Git users are UNIX users
(they probably do not even know what (1) and (1p) means).

> +       For non-plumbing commands this defaults to UTF-8.

I think I can guess why the patch wants to change "non plumbing" to
"non-plumbing" (I do not strongly care either way, so I'd take the
patch without complaint about that particular change).  It would
have been nicer to mention this change in the proposed commit log
message, though, but that is minor.

> +       Note that if an object claims to be encoded in `X`
> +       and we are outputting in `X`, we shall output the object
>         verbatim; this means that invalid sequences in the original
>         commit may be copied to the output.

I probably wouldn't have noticed this if a new manual page used
"shall" consistently, but since the original deliberately used
"will" and the patch changes it to "shall", I have to ask: why?

I think our end-user facing manual pages tend to avoid the latter.
We do use "shall" in the RFC2119/BCP14 sense on the technical side
of our documentation where we give requirements to the third-party
implementations so that they can interoperate with us, but this is
not such a description.

Thanks.

[References]

*1* https://pubs.opengroup.org/onlinepubs/9699919799/utilities/iconv.html 
*2* https://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv_open.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] pretty-options.txt: describe supported encoding
  2021-08-27 17:03     ` Junio C Hamano
@ 2021-08-27 18:03       ` Jeff King
  2021-08-27 23:20       ` Krzysztof Żelechowski
  1 sibling, 0 replies; 7+ messages in thread
From: Jeff King @ 2021-08-27 18:03 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Krzysztof Żelechowski,
	Christopher Yeleighton via GitGitGadget, git, Bagas Sanjaya,
	Christopher Yeleighton

On Fri, Aug 27, 2021 at 10:03:56AM -0700, Junio C Hamano wrote:

> > +       The encoding must be a system encoding supported by iconv(1),
> > +       otherwise this option will be ignored.
> > +       POSIX character maps used by iconv(1p) are not supported.
> 
> This paragraph is a bit hard to grok.
> 
> I think it is saying that the "-f frommap -t tomap" form in [*1*]
> that can use arbitrary character set description file is not
> supported, but "-f fromcode -t tocode" form, which also is what
> iconv_open() takes [*2*], is supported.  Am I reading it correctly?
> 
> Is there an easier-to-read way to explain the distinction to our
> average reader?
> 
> What I am getting at is this.  Imagine average users who need to see
> their commits recoded to iso-8859-2.  They see "git log" has
> "--encoding=<encoding>" option, read the above paragraph and wonder
> if they are on the supported side or unsupported side of the above
> paragraph.  I want to make it easy for them to stop wondering.
> 
> For that purpose, "iconv(1) vs iconv(1p)" would not help them very
> much, especially considering that not all Git users are UNIX users
> (they probably do not even know what (1) and (1p) means).

I likewise found the mention of character maps confusing. If we were to
refer to anything, it would be iconv(3) or iconv_open(3). But really,
all of the discussion that led to this patch seemed to be about the
distinction between "character set conversion" (or "character encoding",
or "codeset conversion", all terms used by the POSIX pages) and the
syntactic encoding of HTML.

Is there any version of iconv that would convert "<" to "&lt;"?

I guess that _conceptually_ one could think of that as a multi-byte
character conversion, but it seems to me that it is generally considered
a layer above (after all, the original "<" and characters in the HTML
entity have to be in some character encoding; generally ASCII, but I
think you could have UTF-16 HTML, too).

What I'm getting it is that maybe we just need to use a less generic
word than "encoding". Perhaps just s/encoding/character &/ or something?
And maybe add something like:

  Conversions are done using the system iconv(3) function. The set of
  available encodings will depend on your system.

You _can_ use "iconv -l" to get such a list on many systems, but it is
not even necessarily the same list.

I also wonder if other mentions of encoding would want to use the same
term (e.g., gitattributes working-tree-encoding), and of course
i18n.commitEncoding (though peeking at the latter, it seems to already
say "Character encoding", so maybe that is sufficient).

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] pretty-options.txt: describe supported encoding
  2021-08-27 17:03     ` Junio C Hamano
  2021-08-27 18:03       ` Jeff King
@ 2021-08-27 23:20       ` Krzysztof Żelechowski
  1 sibling, 0 replies; 7+ messages in thread
From: Krzysztof Żelechowski @ 2021-08-27 23:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Christopher Yeleighton via GitGitGadget, git, Bagas Sanjaya,
	Christopher Yeleighton

Dnia piątek, 27 sierpnia 2021 19:03:56 CEST Junio C Hamano pisze:
> > +       The encoding must be a system encoding supported by iconv(1),
> > +       otherwise this option will be ignored.
> > +       POSIX character maps used by iconv(1p) are not supported.
> 
> This paragraph is a bit hard to grok.
> 
> I think it is saying that the "-f frommap -t tomap" form in [*1*]
> that can use arbitrary character set description file is not
> supported, but "-f fromcode -t tocode" form, which also is what
> iconv_open() takes [*2*], is supported.  Am I reading it correctly?

Yes

> 
> Is there an easier-to-read way to explain the distinction to our
> average reader?

It is not our job to explain what POSIX character maps are.  The takeaway is 
they are unsupported; if you do not know what they are, why should you bother?

> 
> What I am getting at is this.  Imagine average users who need to see
> their commits recoded to iso-8859-2.  They see "git log" has
> "--encoding=<encoding>" option, read the above paragraph and wonder
> if they are on the supported side or unsupported side of the above
> paragraph.  I want to make it easy for them to stop wondering.
> 
> For that purpose, "iconv(1) vs iconv(1p)" would not help them very
> much, especially considering that not all Git users are UNIX users
> (they probably do not even know what (1) and (1p) means).

I am sorry, as a UNIX user I have no idea what iconv, being part of the GNU C 
library, means and how it works on a non-UNIX system that does not contain 
one.  If you know, could you enlighten us please?

> I think our end-user facing manual pages tend to avoid the latter.
> We do use "shall" in the RFC2119/BCP14 sense on the technical side
> of our documentation where we give requirements to the third-party
> implementations so that they can interoperate with us, but this is
> not such a description.
> 
> Thanks.

I shall revert it after we have come to an agreement about the POSIX stuff.

BR,
Chris





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-08-27 23:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-26 21:34 [PATCH] pretty-options.txt: describe supported encoding Christopher Yeleighton via GitGitGadget
2021-08-27 10:46 ` Bagas Sanjaya
2021-08-27 11:47   ` Krzysztof Żelechowski
2021-08-27 11:51   ` [PATCH v2] " Krzysztof Żelechowski
2021-08-27 17:03     ` Junio C Hamano
2021-08-27 18:03       ` Jeff King
2021-08-27 23:20       ` Krzysztof Żelechowski

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).