git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, gitster@pobox.com, John Cai <johncai86@gmail.com>
Subject: Re: [PATCH 2/2] builtin/cat-file.c: support NUL-delimited input with `-z`
Date: Sat, 23 Jul 2022 07:17:48 +0200	[thread overview]
Message-ID: <220723.86leskqtej.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <ed1583223f63cfde99829069f14af62e4f0f2a82.1658532524.git.me@ttaylorr.com>


On Fri, Jul 22 2022, Taylor Blau wrote:

[I think John Cai would appreciate being CC'd on this]

> It's tempting to think that we could use `strbuf_getwholeline()` and
> specify either `\n` or `\0` as the terminating character. But for input
> on platforms that include a CR character preceeding the LF, this
> wouldn't quite be the same, since `strbuf_getline(...)` will trim any
> trailing CR, while `strbuf_getwholeline(&buf, stdin, '\n')` will not.

I commend the effort to maintain bug-for-bug compatibility, but do we
care about this distinction, or is it just an accident that we support
\r\n here in the first place?

This doesn't apply to the rest of cat-file directly, but the origin of
the recent --batch-command mode cdoe was to lift the same-ish code from
builtin/update-ref.c, whose \n or \0 mode does exactly that:

	while (!strbuf_getwholeline(&input, stdin, line_termination)) {

I.e. it doesn't support \r\n, just \n or \0.

Isn't that fine? I may be missing something, but why isn't it OK even on
platforms that use \r\n for their normal *.txt endings to only use \n or
\0 for this bit of protocol?

For the command mode at least this passes our tests:
	
	diff --git a/builtin/cat-file.c b/builtin/cat-file.c
	index f42782e955f..8646059472d 100644
	--- a/builtin/cat-file.c
	+++ b/builtin/cat-file.c
	@@ -614,12 +614,16 @@ static void batch_objects_command(struct batch_options *opt,
	 	struct queued_cmd *queued_cmd = NULL;
	 	size_t alloc = 0, nr = 0;
	 
	-	while (!strbuf_getline(&input, stdin)) {
	+	while (!strbuf_getwholeline(&input, stdin, '\n')) {
	 		int i;
	 		const struct parse_cmd *cmd = NULL;
	 		const char *p = NULL, *cmd_end;
	 		struct queued_cmd call = {0};
	 
	+		if (input.len > 0 && input.buf[input.len - 1] == '\n')
	+			--input.len;
	+		input.buf[input.len] = '\0';
	+
	 		if (!input.len)
	 			die(_("empty command in input"));
	 		if (isspace(*input.buf))

So maybe we should just do something like that instead? I.e. declare
that a mistake.

As for the rest of cat-file 05d5667fec9 (git-cat-file: Add --batch-check
option, 2008-04-23) documents that it's LF, not CR LF, ditto
git-cat-file.txt.

So isn't this just an accident in of us having used the strbuf_getline()
function to mean "\n", but actually it also does "\r\n".

Which is a really unfortunately named function b.t.w., since it sneaks
this bit of Windows portability into places that may not want it in the
first place.

>  strlen () {
>      echo_without_newline "$1" | wc -c | sed -e 's/^ *//'
>  }
> @@ -398,6 +403,12 @@ test_expect_success '--batch with multiple sha1s gives correct format' '
>  	test "$(maybe_remove_timestamp "$batch_output" 1)" = "$(maybe_remove_timestamp "$(echo_without_newline "$batch_input" | git cat-file --batch)" 1)"
>  '
>  
> +test_expect_success '--batch, -z with multiple sha1s gives correct format' '
> +	echo_without_newline_nul "$batch_input" >in &&
> +	test "$(maybe_remove_timestamp "$batch_output" 1)" = \
> +	"$(maybe_remove_timestamp "$(git cat-file --batch -z <in)" 1)"

This...

> +'
> +
>  batch_check_input="$hello_sha1
>  $tree_sha1
>  $commit_sha1
> @@ -418,6 +429,24 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
>      "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
>  '
>  
> +test_expect_success "--batch-check, -z with multiple sha1s gives correct format" '
> +    echo_without_newline_nul "$batch_check_input" >in &&
> +    test "$batch_check_output" = "$(git cat-file --batch-check -z <in)"

This....

> +'
> +
> +test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
> +	touch -- "newline${LF}embedded" &&
> +	git add -- "newline${LF}embedded" &&
> +	git commit -m "file with newline embedded" &&
> +	test_tick &&
> +
> +	printf "HEAD:newline${LF}embedded" >in &&
> +	git cat-file --batch-check -z <in >actual &&
> +
> +	echo "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect &&

..and this hides git's exit code, better to pipe to a file, use test_cmp
etc. etc.

  parent reply	other threads:[~2022-07-23  5:39 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-22 23:28 [PATCH 0/2] cat-file: support NUL-delimited input with `-z` Taylor Blau
2022-07-22 23:29 ` [PATCH 1/2] t1006: extract --batch-command inputs to variables Taylor Blau
2022-07-22 23:29 ` [PATCH 2/2] builtin/cat-file.c: support NUL-delimited input with `-z` Taylor Blau
2022-07-22 23:41   ` Chris Torek
2022-07-25 23:39     ` Taylor Blau
2022-07-23  5:17   ` Ævar Arnfjörð Bjarmason [this message]
2022-07-23 17:45     ` Junio C Hamano
2022-07-25 23:44       ` Taylor Blau
2022-07-27 14:10         ` Junio C Hamano
2022-07-23  5:35   ` Junio C Hamano
2022-07-25 23:50     ` Taylor Blau
2022-07-27 14:20       ` Junio C Hamano
2022-07-31 15:50   ` Phillip Wood
2022-08-11 11:52   ` Ævar Arnfjörð Bjarmason
2022-07-23  4:44 ` [PATCH 0/2] cat-file: " Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220723.86leskqtej.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johncai86@gmail.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).