git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Krzysztof Żelechowski" <giecrilj@stegny.2a.pl>,
	"Christopher Yeleighton via GitGitGadget"
	<gitgitgadget@gmail.com>,
	git@vger.kernel.org, "Bagas Sanjaya" <bagasdotme@gmail.com>,
	"Christopher Yeleighton" <ne01026@shark.2a.pl>
Subject: Re: [PATCH v2] pretty-options.txt: describe supported encoding
Date: Fri, 27 Aug 2021 14:03:39 -0400	[thread overview]
Message-ID: <YSko++W+QHiDX81X@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqq5yvqbz0j.fsf@gitster.g>

On Fri, Aug 27, 2021 at 10:03:56AM -0700, Junio C Hamano wrote:

> > +       The encoding must be a system encoding supported by iconv(1),
> > +       otherwise this option will be ignored.
> > +       POSIX character maps used by iconv(1p) are not supported.
> 
> This paragraph is a bit hard to grok.
> 
> I think it is saying that the "-f frommap -t tomap" form in [*1*]
> that can use arbitrary character set description file is not
> supported, but "-f fromcode -t tocode" form, which also is what
> iconv_open() takes [*2*], is supported.  Am I reading it correctly?
> 
> Is there an easier-to-read way to explain the distinction to our
> average reader?
> 
> What I am getting at is this.  Imagine average users who need to see
> their commits recoded to iso-8859-2.  They see "git log" has
> "--encoding=<encoding>" option, read the above paragraph and wonder
> if they are on the supported side or unsupported side of the above
> paragraph.  I want to make it easy for them to stop wondering.
> 
> For that purpose, "iconv(1) vs iconv(1p)" would not help them very
> much, especially considering that not all Git users are UNIX users
> (they probably do not even know what (1) and (1p) means).

I likewise found the mention of character maps confusing. If we were to
refer to anything, it would be iconv(3) or iconv_open(3). But really,
all of the discussion that led to this patch seemed to be about the
distinction between "character set conversion" (or "character encoding",
or "codeset conversion", all terms used by the POSIX pages) and the
syntactic encoding of HTML.

Is there any version of iconv that would convert "<" to "&lt;"?

I guess that _conceptually_ one could think of that as a multi-byte
character conversion, but it seems to me that it is generally considered
a layer above (after all, the original "<" and characters in the HTML
entity have to be in some character encoding; generally ASCII, but I
think you could have UTF-16 HTML, too).

What I'm getting it is that maybe we just need to use a less generic
word than "encoding". Perhaps just s/encoding/character &/ or something?
And maybe add something like:

  Conversions are done using the system iconv(3) function. The set of
  available encodings will depend on your system.

You _can_ use "iconv -l" to get such a list on many systems, but it is
not even necessarily the same list.

I also wonder if other mentions of encoding would want to use the same
term (e.g., gitattributes working-tree-encoding), and of course
i18n.commitEncoding (though peeking at the latter, it seems to already
say "Character encoding", so maybe that is sufficient).

-Peff

  reply	other threads:[~2021-08-27 18:03 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-26 21:34 [PATCH] pretty-options.txt: describe supported encoding Christopher Yeleighton via GitGitGadget
2021-08-27 10:46 ` Bagas Sanjaya
2021-08-27 11:47   ` Krzysztof Żelechowski
2021-08-27 11:51   ` [PATCH v2] " Krzysztof Żelechowski
2021-08-27 17:03     ` Junio C Hamano
2021-08-27 18:03       ` Jeff King [this message]
2021-08-27 23:20       ` Krzysztof Żelechowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YSko++W+QHiDX81X@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=bagasdotme@gmail.com \
    --cc=giecrilj@stegny.2a.pl \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=ne01026@shark.2a.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).