From: CH <ch-and-git.vger.kernel.org@ch.pkts.ca>
To: git@vger.kernel.org
Subject: Feature request: better error messages when UTF-8 bites
Date: Wed, 27 Jul 2022 13:21:43 -0700 [thread overview]
Message-ID: <f5a49da29fd0e5577083f1006d394158@ch.pkts.ca> (raw)
Hi;
Just found an annoyance in `git log` (and likely elsewhere) that may
warrant a change:
Somehow when copying and pasting a commit from a website to the command
line, a UTF-8 Byte Order Mark (BOM)
[https://en.wikipedia.org/wiki/Byte_order_mark] was appended to one of
the commit ids. BOMs are invisible, as are many other UTF-8 code
points. The upshot was that Git didn't like it, and complained
bitterly:
> $ strace -etrace=execve -s 200 git diff
> 038179704f0066aa815d5429221cf381ff4ef289
> 47346a462d8ba40b9a8b073e351c362522c46aa6
>
> execve("/usr/bin/git", ["git", "diff",
> "038179704f0066aa815d5429221cf381ff4ef289\357\273\277",
> "47346a462d8ba40b9a8b073e351c362522c46aa6"], 0x7fffec3c4bb0 /* 80 vars
> */) = 0
>
> fatal: ambiguous argument '038179704f0066aa815d5429221cf381ff4ef289':
> unknown revision or path not in the working tree.
> Use '--' to separate paths from revisions, like this:
> 'git <command> [<revision>...] -- [<file>...]'
> +++ exited with 128 +++
Feature request:
================
When printing the "fatal: ambiguous argument '......': ....", perhaps
escape (url or otherwise) the ambiguous argument when printing it in the
error message, or maybe add a sentence about non-ASCII characters being
found.
This is sort of a difficult corner-case, in that it is perfectly legal
to have UTF-8 characters in a branch or tag name (see
git-check-ref-format for the allowed characters), so someone could
indeed create a branch named
"038179704f0066aa815d5429221cf381ff4ef289\357\273\277" if they were a
tortured soul bent on overthrowing polite society. Rejecting input
because it has bytes with values above \177 is therefore not a solution.
Similarly, scanning the input for invisible UTF-8 characters (or even
invalid UTF-8 sequences) is leaning too far the other way: git should
not be validating character encodings. It should stay encoding-neutral,
as the alternative leads to madness, driving developers into becoming
tortured souls bent on rigidly enforcing polite society. We have enough
of those already.
It's unclear as to whether violent overthrow or rigid enforcement is the
lesser of two evils, but let's not perform the experiment to find out.
:-)
Cheers!
--
CH (ch-and-git.vger.kernel.org@ch.pkts.ca)
next reply other threads:[~2022-07-27 20:21 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-27 20:21 CH [this message]
2022-07-28 5:42 ` Feature request: better error messages when UTF-8 bites Johannes Sixt
2022-07-28 9:40 ` Thomas Guyot
2022-07-28 18:01 ` Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f5a49da29fd0e5577083f1006d394158@ch.pkts.ca \
--to=ch-and-git.vger.kernel.org@ch.pkts.ca \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).