git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Krzysztof Żelechowski" <giecrilj@stegny.2a.pl>,
	git@vger.kernel.org,
	"Hamza Mahfooz" <someguy@effective-light.com>
Subject: Re: *Really* noisy encoding warnings post-v2.33.0
Date: Fri, 8 Oct 2021 22:36:02 -0400	[thread overview]
Message-ID: <YWEAEjIN0HVHbIpg@coredump.intra.peff.net> (raw)
In-Reply-To: <87ily7m1mv.fsf@evledraar.gmail.com>

On Sat, Oct 09, 2021 at 02:58:10AM +0200, Ævar Arnfjörð Bjarmason wrote:

> I ran into this while testing the grep coloring patch[1] (but it's
> unrelated). Before this commit e.g.:
> 
>     LC_ALL=C ~/g/git/git -P -c i18n.commitEncoding=ascii log --author=Ævar -100|wc -l
>     28333
> 
> So ~3k lines for my last 100 commits, but then:
> 
>     $ LC_ALL=C ~/g/git/git -P -c i18n.commitEncoding=ascii log --author=Ævar -100 2>&1|grep -c ^warning
>     299
> 
> At first I thought it was spewing warnings for every failed re-encoded
> line in some cases, because I get hundreds at a time sometimes, but it's
> because stderr and stdout I/O buffering is different (a common
> case). Adding a "fflush(stderr)" "fixes" that.

I don't think the buffering is the issue. By default stderr flushes on
lines, and we flush commits after showing them. If you take away "-P"
(or look at the combined 2>&1 output in order), you'll see that they are
grouped.

Now one thing you might notice is that there may be multiple warnings
between output commits. But that's because we really are re-encoding
each of those intermediate commits to do your --author grep. And if that
re-encoding fails, we may well be producing the wrong output, because
the matching won't be correct (in your case, presumably the correct
output should be _nothing_, because Æ is not an ascii character).

I do think the current warning is particularly bad there, because it
doesn't even mention the commit oid. So something like:

diff --git a/pretty.c b/pretty.c
index 708b618cfe..ddf501632d 100644
--- a/pretty.c
+++ b/pretty.c
@@ -673,7 +673,8 @@ const char *repo_logmsg_reencode(struct repository *r,
 	 * case we just return the commit message verbatim.
 	 */
 	if (!out) {
-		warning("unable to reencode commit to '%s'", output_encoding);
+		warning("unable to reencode commit %s to '%s'",
+			oid_to_hex(&commit->object.oid), output_encoding);
 		return msg;
 	}
 	return out;

means you get output like:

  $ git -c i18n.commitEncoding=ascii log --format='%h %s' --author=Ævar -100
  warning: unable to reencode commit c90cfc225baaf64af311f7e2953267e4de636205 to 'ascii'
  warning: unable to reencode commit 1d1d731d30cbcd5f3a6a5cbac1fe218e4d4db72b to 'ascii'
  warning: unable to reencode commit 66237bcf60df357f188551e1ea4db90f94c519ae to 'ascii'
  warning: unable to reencode commit 100c2da2d3a330366588143d720f09a88926972a to 'ascii'
  warning: unable to reencode commit 59580685bee17de3efff614df7f508133d1e4a7a to 'ascii'
  59580685be config.h: remove unused git_config_get_untracked_cache() declaration
  warning: unable to reencode commit 067e73c8aee9aeb05eac939205274cd2ad8b7cae to 'ascii'
  067e73c8ae log-tree.h: remove unused function declarations
  [...etc...]

If that were coupled with, say, an advise() call to explain that output
and matching might be inaccurate (and show that _once_), that might
might it more clear what's going on.

Now I am sympathetic to flooding the user with too many messages, and
maybe reducing this to a single instance of "some commit messages could
not be re-encoded; output and matching might be inaccurate" is the right
thing. But in a sense, it's also working as designed: what you asked for
is producing wrong output over and over, and Git is saying so.

I'm not even sure what you're trying to do with that command. It could
never output a single correct commit, because you've asked to match only
commits that will be shown in the wrong encoding.

> But anyway, I think we've got a lot of users who say *do* want to
> reencode something from say UTF-8 to latin1, but then might have the
> occasional non-latin1 representable data. The old behavior of silently
> falling back is going to be much better for those users, or maybe show
> one warning at the end or something, if you feel it really needs to be
> kept.

If there are real-world cases where the quantity of errors is really
getting in the way, I'm open to the idea of having a single error
message. And personally, I don't really have any experience working with
broken encodings (all my commits are in utf8, and that's what I use as
output). It just seems weird to me that 'git log --encoding=foo' would
quietly ignore the option entirely (i.e., the old behavior, which did
lead to a confused user and a post to the list).

-Peff

  parent reply	other threads:[~2021-10-09  2:36 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-24  9:00 git log --encoding=HTML is not supported Krzysztof Żelechowski
2021-08-24 10:31 ` Bagas Sanjaya
2021-08-24 10:33   ` Krzysztof Żelechowski
2021-08-24 10:46     ` Bagas Sanjaya
2021-08-24 19:11       ` Junio C Hamano
2021-08-25  0:57 ` Jeff King
2021-08-25 16:31   ` Junio C Hamano
2021-08-27 18:30     ` Jeff King
2021-08-27 18:32       ` Jeff King
2021-08-27 19:47         ` Junio C Hamano
2021-10-09  0:58       ` *Really* noisy encoding warnings post-v2.33.0 Ævar Arnfjörð Bjarmason
2021-10-09  1:29         ` Ævar Arnfjörð Bjarmason
2021-10-09  2:36         ` Jeff King [this message]
2021-10-09  2:42           ` Jeff King
2021-10-09 13:47             ` Ævar Arnfjörð Bjarmason
2021-10-27 11:03               ` Jeff King
2021-10-29 10:47                 ` Ævar Arnfjörð Bjarmason
2021-10-29 20:40                   ` Jeff King
2021-10-29 20:45                     ` Junio C Hamano
2021-10-29 20:52                       ` Junio C Hamano
2021-10-29 21:10                         ` Jeff King
2021-10-22 22:58             ` Ævar Arnfjörð Bjarmason
2021-10-10 13:53           ` Johannes Sixt
2021-10-10 15:43             ` Ævar Arnfjörð Bjarmason
2021-08-25 23:00   ` git log --encoding=HTML is not supported Krzysztof Żelechowski
2021-08-27 18:33     ` Jeff King
2021-08-25 23:28   ` Krzysztof Żelechowski
2021-08-25 23:47     ` Bryan Turner
2021-08-26 15:37       ` Junio C Hamano
2021-08-26 20:52         ` Krzysztof Żelechowski
2021-08-27 15:59           ` Junio C Hamano
2021-08-27 18:37             ` Jeff King
2021-08-27 21:51               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YWEAEjIN0HVHbIpg@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=giecrilj@stegny.2a.pl \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=someguy@effective-light.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).