git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Toon Claes <toon@iotcl.com>
Cc: git@vger.kernel.org, Phillip Wood <phillip.wood123@gmail.com>
Subject: Re: [PATCH v5 1/1] cat-file: quote-format name in error when using -z
Date: Wed, 10 May 2023 13:13:05 -0700	[thread overview]
Message-ID: <xmqqpm782be6.fsf@gitster.g> (raw)
In-Reply-To: <20230510190116.795641-2-toon@iotcl.com> (Toon Claes's message of "Wed, 10 May 2023 21:01:16 +0200")

Toon Claes <toon@iotcl.com> writes:

> Since it's supported to have NUL-delimited input, introduced in
> db9d67f2e9 (builtin/cat-file.c: support NUL-delimited input with `-z`,
> 2022-07-22), it's possible to pass paths that contain newlines.

It has been a while since I saw this patch the last time, and it did
not immediately click to me how "pass paths" relates to passing
object names, which is what "--batch-check" takes.  Perhaps

    "cat-file --batch-check" may be fed object names of blobs and
    trees as "<commit>:<path>", and with its "-z" option, it is
    possible to feed <commit> or <path> with newlines in it.

or something?

> This
> works great when the object is found, but when it's not, the input path
> is returned in the error message. Because this can contain newlines, the
> error message might get spread over multiple lines, making it harder to
> machine-parse this error message.

Good description.  I may suggest

    "the input path is returned" -> "the input is shown verbatim"

because <path> is not the only thing that can contain LF.  E.g.

    $ git show -s 'HEAD^{/introduced in
    > db9d67}'

can find the commit resulting from this patch, so

    $ printf "%s\0" 'HEAD^{/introduced in
    > db9d67}:builtin/cat-file.cc' |
    > git cat-file --batch-check -z 

would be an input record with newline in it, that has no newline in
the path.

> With this change, the input is quote-formatted in the error message, if
> needed. This ensures the error message is always on a single line and
> makes parsing the error more straightforward.

Drop "With this change, ..." and give a command to the codebase to
c-quote the object name in the output, e.g.

    C-quote the object name from the input in the error message as
    needed, to ensure that the error message is on a single line and
    ...

The other side of the coin, however, is that an existing project
that is sane enough not to have a path with LF in it, but is not
sane enough to avoid a path with double-quote in it, would now stop
being able to parse the error message for a missing path.

It is a regression, and we may argue that it is not a large enough
regression to block progress given by this patch, but if it is not
common enough to have funny characters in the paths then we wouldn't
be seeing this patch in the first place.  So I would prefer to see
that we at least admit that we are deliberately making this change
with a known regression.  Perhaps add something like

    Note that if a project already parses the error message
    correctly because it does not have paths with newlines, this is
    a breaking change if it has paths with characters that need
    c-quoting (like double quotes and backslashes) that are not
    newlines.  We consider that this is a small enough price to pay
    to allow newlines in paths because ...

and fill the "because ..." part is sensible.  I am not filling the
"because" part, simply because I do not offhand see any good excuse
to rob Peter to pay Paul in this case.

> @@ -470,8 +471,17 @@ static void batch_object_write(const char *obj_name,
>  						       &data->oid, &data->info,
>  						       OBJECT_INFO_LOOKUP_REPLACE);
>  		if (ret < 0) {
> +			struct strbuf quoted = STRBUF_INIT;
> +
> +			if (opt->nul_terminated &&
> +			    obj_name) {
> +				quote_c_style(obj_name, &quoted, NULL, 0);
> +				obj_name = quoted.buf;
> +			}
> +
>  			printf("%s missing\n",
>  			       obj_name ? obj_name : oid_to_hex(&data->oid));
> +			strbuf_release(&quoted);
>  			fflush(stdout);
>  			return;
>  		}

Interesting.

When running "--batch-all-objects", we do not have obj_name, and we
do not have anything to "quote", but the fallback label is the full
hexadecimal object name, and we do not have to worry about line
breaks.

OK.

> @@ -518,6 +528,13 @@ static void batch_one_object(const char *obj_name,
>  	result = get_oid_with_context(the_repository, obj_name,
>  				      flags, &data->oid, &ctx);
>  	if (result != FOUND) {
> +		struct strbuf quoted = STRBUF_INIT;
> +
> +		if (opt->nul_terminated) {
> +			quote_c_style(obj_name, &quoted, NULL, 0);
> +			obj_name = quoted.buf;
> +		}
> +
>  		switch (result) {
>  		case MISSING_OBJECT:
>  			printf("%s missing\n", obj_name);
> @@ -542,6 +559,8 @@ static void batch_one_object(const char *obj_name,
>  			       result);
>  			break;
>  		}
> +
> +		strbuf_release(&quoted);
>  		fflush(stdout);
>  		return;
>  	}

This is the side that runs without "--batch-all-objects" and always
has non-null obj_name, so it looks a bit different from the other
hunk and is more straight-forward.  Looking good.

Thanks.

  reply	other threads:[~2023-05-10 20:13 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-09 15:00 [PATCH 0/1] cat-file: quote-format name in error when using -z Toon Claes
2022-12-09 15:00 ` [PATCH 1/1] " Toon Claes
2022-12-09 19:33   ` Phillip Wood
2022-12-09 23:58     ` Junio C Hamano
2022-12-11 16:30       ` Phillip Wood
2022-12-12  0:11         ` Junio C Hamano
2022-12-12 11:34           ` Toon Claes
2022-12-12 22:09             ` Junio C Hamano
2022-12-13 15:06           ` Phillip Wood
2022-12-14  8:29             ` Junio C Hamano
2022-12-20  5:31     ` Toon Claes
2022-12-20 10:18       ` Phillip Wood
2022-12-21 12:42         ` Toon Claes
2023-01-05  6:24 ` [PATCH v2 0/1] " Toon Claes
2023-01-05  6:24   ` [PATCH v2 1/1] " Toon Claes
2023-01-16 19:07   ` [PATCH v3 0/1] " Toon Claes
2023-01-16 19:07     ` [PATCH v3 1/1] " Toon Claes
2023-01-17 15:24       ` Phillip Wood
2023-03-03 19:17     ` [PATCH v4 0/2] " Toon Claes
2023-03-03 19:17       ` [PATCH v4 1/2] cat-file: extract printing batch error message into function Toon Claes
2023-03-03 20:26         ` Junio C Hamano
2023-03-03 23:14           ` Junio C Hamano
2023-05-10 19:01             ` [PATCH v5 0/1] cat-file: quote-format name in error when using -z Toon Claes
2023-05-10 19:01               ` [PATCH v5 1/1] " Toon Claes
2023-05-10 20:13                 ` Junio C Hamano [this message]
2023-05-12  8:54                   ` Toon Claes
2023-05-12 16:57                     ` Junio C Hamano
2023-05-15  8:47                       ` Phillip Wood
2023-05-15 17:20                         ` Junio C Hamano
2023-06-02 13:29                           ` Phillip Wood
2023-03-03 19:17       ` [PATCH v4 2/2] " Toon Claes
2023-03-03 20:14       ` [PATCH v4 0/2] " Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqpm782be6.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=phillip.wood123@gmail.com \
    --cc=toon@iotcl.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).