git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Toon Claes <toon@iotcl.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Phillip Wood <phillip.wood123@gmail.com>,
	Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH v5 1/1] cat-file: quote-format name in error when using -z
Date: Fri, 12 May 2023 10:54:20 +0200	[thread overview]
Message-ID: <87h6sh6f81.fsf@iotcl.com> (raw)
In-Reply-To: <xmqqpm782be6.fsf@gitster.g>


Junio C Hamano <gitster@pobox.com> writes:

> Toon Claes <toon@iotcl.com> writes:
>
>> Since it's supported to have NUL-delimited input, introduced in
>> db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with `-z`,
>> 2022-07-22), it's possible to pass paths that contain newlines.
>
> It has been a while since I saw this patch the last time, and it did
> not immediately click to me how "pass paths" relates to passing
> object names, which is what "--batch-check" takes.  Perhaps
>
>     "cat-file --batch-check" may be fed object names of blobs and
>     trees as "<commit>:<path>", and with its "-z" option, it is
>     possible to feed <commit> or <path> with newlines in it.
>
> or something?

Good suggestion.

>> This
>> works great when the object is found, but when it's not, the input path
>> is returned in the error message. Because this can contain newlines, the
>> error message might get spread over multiple lines, making it harder to
>> machine-parse this error message.
>
> Good description.  I may suggest
>
>     "the input path is returned" -> "the input is shown verbatim"
>
> because <path> is not the only thing that can contain LF.  E.g.
>
>     $ git show -s 'HEAD^{/introduced in
>     > db9d67}'
>
> can find the commit resulting from this patch, so
>
>     $ printf "%s\0" 'HEAD^{/introduced in
>     > db9d67}:builtin/cat-file.cc' |
>     > git cat-file --batch-check -z

Thanks for that thorough explanation, makes a lot of sense.

>
> would be an input record with newline in it, that has no newline in
> the path.
>
>> With this change, the input is quote-formatted in the error message, if
>> needed. This ensures the error message is always on a single line and
>> makes parsing the error more straightforward.
>
> Drop "With this change, ..." and give a command to the codebase to
> c-quote the object name in the output, e.g.
>
>     C-quote the object name from the input in the error message as
>     needed, to ensure that the error message is on a single line and
>     ...

Sure, I'll update the commit message accordingly.

> The other side of the coin, however, is that an existing project
> that is sane enough not to have a path with LF in it, but is not
> sane enough to avoid a path with double-quote in it, would now stop
> being able to parse the error message for a missing path.

Ah interesting, this is not a case I did consider before.

> It is a regression, and we may argue that it is not a large enough
> regression to block progress given by this patch, but if it is not
> common enough to have funny characters in the paths then we wouldn't
> be seeing this patch in the first place.  So I would prefer to see
> that we at least admit that we are deliberately making this change
> with a known regression.  Perhaps add something like
>
>     Note that if a project already parses the error message
>     correctly because it does not have paths with newlines, this is
>     a breaking change if it has paths with characters that need
>     c-quoting (like double quotes and backslashes) that are not
>     newlines.  We consider that this is a small enough price to pay
>     to allow newlines in paths because ...
>
> and fill the "because ..." part is sensible.  I am not filling the
> "because" part, simply because I do not offhand see any good excuse
> to rob Peter to pay Paul in this case.

I intended to finalize this patch sooner, but other priorities popped
up. The -z flag was added in v2.38 and it would have been nice to have
the patch in v2.40, this would reduce the number of Peters affected. Now
we're a couple months and yet another release further between
introduction of the flag and making the regression, so I feel your
sentiment.

I previous conversations[1] we've been talking about making the output
NUL-terminated as well. We agreed on the cquote fix because the -z flag
was still fresh, but maybe at this time we need to revisit this.

Ideally the output should be NUL-terminated if -z is used. This was also
suggested[2] when the flag was introduced. Obviously we cannot change
this now, because it would break behavior for *everyone* using -z, not
only when funny names are used. So if we want to go this route, we
should only do so with another flag (e.g. `--null-output`) or a config
option.

But I was looking at the git-config(1) documentation:

> core.quotePath::
> 	Commands that output paths (e.g. 'ls-files', 'diff'), will
> 	quote "unusual" characters in the pathname by enclosing the
> 	pathname in double-quotes and escaping those characters with
> 	backslashes in the same way C escapes control characters (e.g.
> 	`\t` for TAB, `\n` for LF, `\\` for backslash) or bytes with
> 	values larger than 0x80 (e.g. octal `\302\265` for "micro" in
> 	UTF-8).  If this variable is set to false, bytes higher than
> 	0x80 are not considered "unusual" any more. Double-quotes,
> 	backslash and control characters are always escaped regardless
> 	of the setting of this variable.  A simple space character is
> 	not considered "unusual".  Many commands can output pathnames
> 	completely verbatim using the `-z` option. The default value
> 	is true.

If you read this, the changes of this patch fully contradict this. Also
documentation on other commands (e.g. git-check-ignore(1)) using `-z`
will mention the verbatim output. So at the moment I'm leaning toward
another solution. I'm looping in Taylor as he added the flag initially,
so he might have some insights how they are using it and how they expect
it to work.

-- Toon

[1]: https://lore.kernel.org/git/xmqq5yekyvrh.fsf@gitster.g/
[2]: https://lore.kernel.org/git/66b71194-ad0e-18d0-e43b-71e5c47ba111@gmail.com/

  reply	other threads:[~2023-05-12 10:24 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-09 15:00 [PATCH 0/1] cat-file: quote-format name in error when using -z Toon Claes
2022-12-09 15:00 ` [PATCH 1/1] " Toon Claes
2022-12-09 19:33   ` Phillip Wood
2022-12-09 23:58     ` Junio C Hamano
2022-12-11 16:30       ` Phillip Wood
2022-12-12  0:11         ` Junio C Hamano
2022-12-12 11:34           ` Toon Claes
2022-12-12 22:09             ` Junio C Hamano
2022-12-13 15:06           ` Phillip Wood
2022-12-14  8:29             ` Junio C Hamano
2022-12-20  5:31     ` Toon Claes
2022-12-20 10:18       ` Phillip Wood
2022-12-21 12:42         ` Toon Claes
2023-01-05  6:24 ` [PATCH v2 0/1] " Toon Claes
2023-01-05  6:24   ` [PATCH v2 1/1] " Toon Claes
2023-01-16 19:07   ` [PATCH v3 0/1] " Toon Claes
2023-01-16 19:07     ` [PATCH v3 1/1] " Toon Claes
2023-01-17 15:24       ` Phillip Wood
2023-03-03 19:17     ` [PATCH v4 0/2] " Toon Claes
2023-03-03 19:17       ` [PATCH v4 1/2] cat-file: extract printing batch error message into function Toon Claes
2023-03-03 20:26         ` Junio C Hamano
2023-03-03 23:14           ` Junio C Hamano
2023-05-10 19:01             ` [PATCH v5 0/1] cat-file: quote-format name in error when using -z Toon Claes
2023-05-10 19:01               ` [PATCH v5 1/1] " Toon Claes
2023-05-10 20:13                 ` Junio C Hamano
2023-05-12  8:54                   ` Toon Claes [this message]
2023-05-12 16:57                     ` Junio C Hamano
2023-05-15  8:47                       ` Phillip Wood
2023-05-15 17:20                         ` Junio C Hamano
2023-06-02 13:29                           ` Phillip Wood
2023-03-03 19:17       ` [PATCH v4 2/2] " Toon Claes
2023-03-03 20:14       ` [PATCH v4 0/2] " Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h6sh6f81.fsf@iotcl.com \
    --to=toon@iotcl.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=phillip.wood123@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).