git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
       [not found] <20130801201842.GA16809@kitenet.net>
@ 2013-08-02  6:40 ` Jonathan Nieder
  2013-08-02 10:54   ` Jeff King
  0 siblings, 1 reply; 11+ messages in thread
From: Jonathan Nieder @ 2013-08-02  6:40 UTC (permalink / raw)
  To: Joey Hess; +Cc: Jeff King, Junio C Hamano, git

Hi Joey,

Joey Hess wrote[1]:

> Commit c334b87b30c1464a1ab563fe1fb8de5eaf0e5bac caused a reversion in
> git-cat-file --batch. 
>
> With an older version:
>
> joey@gnu:~/tmp/rrr>git cat-file --batch
> :file name
> e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 blob 0
>
> With the new version:
>
> joey@wren:~/tmp/r>git cat-file --batch
> :file name
> :file missing
>
> This has broken git-annex's support for operating on files/directories
> containing whitespace. I cannot see a way to query such a filename using
> the new interface.

Oh dear.  Luckily you caught this before the final 1.8.4 release.  I
wonder if we should just revert c334b87b (cat-file: split --batch
input lines on whitespace, 2013-07-11) for now.

Thanks,
Jonathan

[1] http://bugs.debian.org/718517

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
  2013-08-02  6:40 ` [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces Jonathan Nieder
@ 2013-08-02 10:54   ` Jeff King
  2013-08-02 11:59     ` Jeff King
  2013-08-02 16:32     ` Junio C Hamano
  0 siblings, 2 replies; 11+ messages in thread
From: Jeff King @ 2013-08-02 10:54 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Joey Hess, Junio C Hamano, git

On Thu, Aug 01, 2013 at 11:40:03PM -0700, Jonathan Nieder wrote:

> > Commit c334b87b30c1464a1ab563fe1fb8de5eaf0e5bac caused a reversion in
> > git-cat-file --batch.
> >
> > With an older version:
> >
> > joey@gnu:~/tmp/rrr>git cat-file --batch
> > :file name
> > e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 blob 0
> >
> > With the new version:
> >
> > joey@wren:~/tmp/r>git cat-file --batch
> > :file name
> > :file missing
> [...]
> Oh dear.  Luckily you caught this before the final 1.8.4 release.  I
> wonder if we should just revert c334b87b (cat-file: split --batch
> input lines on whitespace, 2013-07-11) for now.

Ugh. Yeah, the incorrect assumption from the commit message of c334b87b
is "Object names cannot contain spaces...". Refs cannot, but filename
specifiers after a colon can.

We need to revert that commit before the release. It can either be
replaced with:

  1. A "--split" (or similar) option to use the behavior only when
     desired.

  2. Enabling splitting only when %(rest) is used in the output format.

And I suppose it is too late in the cycle for either of those to go into
v1.8.4. That's a shame, but I think losing that particular patch does
not affect the rest of the series, so we are OK to ship without it.

Thanks Joey for a timely bug report.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
  2013-08-02 10:54   ` Jeff King
@ 2013-08-02 11:59     ` Jeff King
  2013-08-02 15:27       ` Joey Hess
  2013-08-02 16:41       ` Junio C Hamano
  2013-08-02 16:32     ` Junio C Hamano
  1 sibling, 2 replies; 11+ messages in thread
From: Jeff King @ 2013-08-02 11:59 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Joey Hess, Junio C Hamano, git

On Fri, Aug 02, 2013 at 03:54:02AM -0700, Jeff King wrote:

> We need to revert that commit before the release. It can either be
> replaced with:
> 
>   1. A "--split" (or similar) option to use the behavior only when
>      desired.
> 
>   2. Enabling splitting only when %(rest) is used in the output format.

Of the two, I think the latter is more sensible; the former is
unnecessarily placing the burden on the user to match "--split" with
their use of "%(rest)". The second is pointless without the first.

A patch to implement (2) is below.

By the way, Joey, I am not sure how safe "git cat-file --batch-check" is
for arbitrary filenames. In particular, I don't know how it would react
to a filename with an embedded newline (and I do not think it will undo
quoting). Certainly that does not excuse this regression; even if what
you are doing is not 100% reliable, it is good enough in sane situations
and we should not be breaking it. But you may want to double-check the
behavior of your scripts in such a case, and we may need to add a "-z"
to support it reliably.

The "rev-list --objects" output may contain such paths, of course, but
they will be quoted, and "%(rest)" does not care (it is not trying to
interpret the paths, but will reliably relay the quoted bits to the
output).

-- >8 --
Subject: [PATCH] cat-file: only split on whitespace when %(rest) is used

Commit c334b87 recently taught `cat-file --batch-check` to
split input lines on whitespace, and stash everything after
the first token into the %(rest) output format element. That
commit claims:

   Object names cannot contain spaces, so any input with
   spaces would have resulted in a "missing" line.

But that is not correct. Refs, object sha1s, and various
peeling suffixes cannot contain spaces, but some object
names can. In particular:

  1. Tree paths like "[<tree>]:path with whitespace"

  2. Reflog specifications like "@{2 days ago}"

  3. Commit searches like "rev^{/grep me}" or ":/grep me"

To remain backwards compatible, we cannot split on
whitespace by default. This patch teaches cat-file to only
do the splitting when "%(rest)" is used by the output
format. Since that element did not exist at all until
c334b87, old scripts cannot be affected.

The existence of object names with spaces does mean that you
cannot reliably do:

  echo ":path with space and other data" |
  git cat-file --batch-check="%(objectname) %(rest)"

as it would split the path and feed only ":path" to
get_sha1. But that command is nonsensical. If you wanted to
see "and other data" in "%(rest)", git cannot possibly know
where the filename ends and the "rest" begins.

It might be more robust to have something like "-z" to
separate the input elements. But this patch is still a
reasonable step before having that.  It makes the easy cases
easy; people who do not care about %(rest) do not have to
consider it, and the %(rest) code handles the spaces and
newlines of "rev-list --objects" correctly.

Hard cases remain hard but possible (if you might get
whitespace in your input, you do not get to use %(rest) and
must split and join the output yourself using more flexible
tools). And most importantly, it does not preclude us from
having different splitting rules later if a "-z" (or
similar) option is added.  So we can make the hard
cases easier later, if we choose to.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-cat-file.txt | 16 ++++++++--------
 builtin/cat-file.c             | 31 +++++++++++++++++++++----------
 t/t1006-cat-file.sh            |  8 ++++++++
 3 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 3ddec0b..21cffe2 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -86,12 +86,9 @@ If `--batch` or `--batch-check` is given, `cat-file` will read objects
 ------------
 
 If `--batch` or `--batch-check` is given, `cat-file` will read objects
-from stdin, one per line, and print information about them.
-
-Each line is split at the first whitespace boundary. All characters
-before that whitespace are considered as a whole object name, and are
-parsed as if given to linkgit:git-rev-parse[1]. Characters after that
-whitespace can be accessed using the `%(rest)` atom (see below).
+from stdin, one per line, and print information about them. By default,
+the whole line is considered as an object, as if it were fed to
+linkgit:git-rev-parse[1].
 
 You can specify the information shown for each object by using a custom
 `<format>`. The `<format>` is copied literally to stdout for each
@@ -113,8 +110,11 @@ newline. The available atoms are:
 	note about on-disk sizes in the `CAVEATS` section below.
 
 `rest`::
-	The text (if any) found after the first run of whitespace on the
-	input line (i.e., the "rest" of the line).
+	If this atom is used in the output string, input lines are split
+	at the first whitespace boundary. All characters before that
+	whitespace are considered to be the object name; characters
+	after that first run of whitespace (i.e., the "rest" of the
+	line) are output in place of the `%(rest)` atom.
 
 If no format is specified, the default format is `%(objectname)
 %(objecttype) %(objectsize)`.
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 163ce6c..07b4818 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -128,6 +128,13 @@ struct expand_data {
 	int mark_query;
 
 	/*
+	 * Whether to split the input on whitespace before feeding it to
+	 * get_sha1; this is decided during the mark_query phase based on
+	 * whether we have a %(rest) token in our format.
+	 */
+	int split_on_whitespace;
+
+	/*
 	 * After a mark_query run, this object_info is set up to be
 	 * passed to sha1_object_info_extended. It will point to the data
 	 * elements above, so you can retrieve the response from there.
@@ -165,7 +172,9 @@ static void expand_atom(struct strbuf *sb, const char *atom, int len,
 		else
 			strbuf_addf(sb, "%lu", data->disk_size);
 	} else if (is_atom("rest", atom, len)) {
-		if (!data->mark_query && data->rest)
+		if (data->mark_query)
+			data->split_on_whitespace = 1;
+		else if (data->rest)
 			strbuf_addstr(sb, data->rest);
 	} else
 		die("unknown format element: %.*s", len, atom);
@@ -280,16 +289,18 @@ static int batch_objects(struct batch_options *opt)
 		char *p;
 		int error;
 
-		/*
-		 * Split at first whitespace, tying off the beginning of the
-		 * string and saving the remainder (or NULL) in data.rest.
-		 */
-		p = strpbrk(buf.buf, " \t");
-		if (p) {
-			while (*p && strchr(" \t", *p))
-				*p++ = '\0';
+		if (data.split_on_whitespace) {
+			/*
+			 * Split at first whitespace, tying off the beginning of the
+			 * string and saving the remainder (or NULL) in data.rest.
+			 */
+			p = strpbrk(buf.buf, " \t");
+			if (p) {
+				while (*p && strchr(" \t", *p))
+					*p++ = '\0';
+			}
+			data.rest = p;
 		}
-		data.rest = p;
 
 		error = batch_one_object(buf.buf, opt, &data);
 		if (error)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index d499d02..a420742 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -98,6 +98,14 @@ run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
 
 run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
 
+test_expect_success '--batch-check without %(rest) considers whole line' '
+	echo "$hello_sha1 blob $hello_size" >expect &&
+	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
+	test_when_finished "git update-index --remove \"white space\"" &&
+	echo ":white space" | git cat-file --batch-check >actual &&
+	test_cmp expect actual
+'
+
 tree_sha1=$(git write-tree)
 tree_size=33
 tree_pretty_content="100644 blob $hello_sha1	hello"
-- 
1.8.4.rc0.3.g042a762

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
  2013-08-02 11:59     ` Jeff King
@ 2013-08-02 15:27       ` Joey Hess
  2013-08-02 16:14         ` Brandon Casey
  2013-08-02 16:41       ` Junio C Hamano
  1 sibling, 1 reply; 11+ messages in thread
From: Joey Hess @ 2013-08-02 15:27 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 771 bytes --]

Jeff King wrote:
> By the way, Joey, I am not sure how safe "git cat-file --batch-check" is
> for arbitrary filenames. In particular, I don't know how it would react
> to a filename with an embedded newline (and I do not think it will undo
> quoting). Certainly that does not excuse this regression; even if what
> you are doing is not 100% reliable, it is good enough in sane situations
> and we should not be breaking it. But you may want to double-check the
> behavior of your scripts in such a case, and we may need to add a "-z"
> to support it reliably.

Yes, I would prefer to have a -z mode. I think my code otherwise handles
newlines.

Thanks for the quick fix. I agree that only enabling the behavior with
%{rest} makes sense.

-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
  2013-08-02 15:27       ` Joey Hess
@ 2013-08-02 16:14         ` Brandon Casey
  0 siblings, 0 replies; 11+ messages in thread
From: Brandon Casey @ 2013-08-02 16:14 UTC (permalink / raw)
  To: Joey Hess; +Cc: Jeff King, Jonathan Nieder, Junio C Hamano, git@vger.kernel.org

On Fri, Aug 2, 2013 at 8:27 AM, Joey Hess <joey@kitenet.net> wrote:
> Jeff King wrote:
>> By the way, Joey, I am not sure how safe "git cat-file --batch-check" is
>> for arbitrary filenames. In particular, I don't know how it would react
>> to a filename with an embedded newline (and I do not think it will undo
>> quoting). Certainly that does not excuse this regression; even if what
>> you are doing is not 100% reliable, it is good enough in sane situations
>> and we should not be breaking it. But you may want to double-check the
>> behavior of your scripts in such a case, and we may need to add a "-z"
>> to support it reliably.
>
> Yes, I would prefer to have a -z mode. I think my code otherwise handles
> newlines.
>
> Thanks for the quick fix. I agree that only enabling the behavior with
> %{rest} makes sense.
>
> --
> see shy jo

/methinks we've identified a gap in our test coverage.  Care to add a
test that covers the functionality that git-annex depends on?

-Brandon

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
  2013-08-02 10:54   ` Jeff King
  2013-08-02 11:59     ` Jeff King
@ 2013-08-02 16:32     ` Junio C Hamano
  1 sibling, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2013-08-02 16:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, Joey Hess, git

Jeff King <peff@peff.net> writes:

> We need to revert that commit before the release. It can either be
> replaced with:
>
>   1. A "--split" (or similar) option to use the behavior only when
>      desired.
>
>   2. Enabling splitting only when %(rest) is used in the output format.
>
> And I suppose it is too late in the cycle for either of those to go into
> v1.8.4. That's a shame, but I think losing that particular patch does
> not affect the rest of the series, so we are OK to ship without it.
>
> Thanks Joey for a timely bug report.

Thanks.  Will do this to jk/cat-file-batch-optim topic and merge it
to 'master' for now.

-- >8 --
Subject: [PATCH] Revert "cat-file: split --batch input lines on whitespace"

This reverts commit c334b87b30c1464a1ab563fe1fb8de5eaf0e5bac; the
update assumed that people only used the command to read from
"rev-list --objects" output, whose lines begin with a 40-hex object
name followed by a whitespace, but it turns out that scripts feed
random extended SHA-1 expressions (e.g. "HEAD:$pathname") in which
a whitespace has to be kept.
---
 Documentation/git-cat-file.txt | 10 ++--------
 builtin/cat-file.c             | 20 +-------------------
 t/t1006-cat-file.sh            |  7 -------
 3 files changed, 3 insertions(+), 34 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 3ddec0b..10fbc6a 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -88,10 +88,8 @@ BATCH OUTPUT
 If `--batch` or `--batch-check` is given, `cat-file` will read objects
 from stdin, one per line, and print information about them.
 
-Each line is split at the first whitespace boundary. All characters
-before that whitespace are considered as a whole object name, and are
-parsed as if given to linkgit:git-rev-parse[1]. Characters after that
-whitespace can be accessed using the `%(rest)` atom (see below).
+Each line is considered as a whole object name, and is parsed as if
+given to linkgit:git-rev-parse[1].
 
 You can specify the information shown for each object by using a custom
 `<format>`. The `<format>` is copied literally to stdout for each
@@ -112,10 +110,6 @@ newline. The available atoms are:
 	The size, in bytes, that the object takes up on disk. See the
 	note about on-disk sizes in the `CAVEATS` section below.
 
-`rest`::
-	The text (if any) found after the first run of whitespace on the
-	input line (i.e., the "rest" of the line).
-
 If no format is specified, the default format is `%(objectname)
 %(objecttype) %(objectsize)`.
 
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 163ce6c..4253460 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -119,7 +119,6 @@ struct expand_data {
 	enum object_type type;
 	unsigned long size;
 	unsigned long disk_size;
-	const char *rest;
 
 	/*
 	 * If mark_query is true, we do not expand anything, but rather
@@ -164,9 +163,6 @@ static void expand_atom(struct strbuf *sb, const char *atom, int len,
 			data->info.disk_sizep = &data->disk_size;
 		else
 			strbuf_addf(sb, "%lu", data->disk_size);
-	} else if (is_atom("rest", atom, len)) {
-		if (!data->mark_query && data->rest)
-			strbuf_addstr(sb, data->rest);
 	} else
 		die("unknown format element: %.*s", len, atom);
 }
@@ -277,21 +273,7 @@ static int batch_objects(struct batch_options *opt)
 	warn_on_object_refname_ambiguity = 0;
 
 	while (strbuf_getline(&buf, stdin, '\n') != EOF) {
-		char *p;
-		int error;
-
-		/*
-		 * Split at first whitespace, tying off the beginning of the
-		 * string and saving the remainder (or NULL) in data.rest.
-		 */
-		p = strpbrk(buf.buf, " \t");
-		if (p) {
-			while (*p && strchr(" \t", *p))
-				*p++ = '\0';
-		}
-		data.rest = p;
-
-		error = batch_one_object(buf.buf, opt, &data);
+		int error = batch_one_object(buf.buf, opt, &data);
 		if (error)
 			return error;
 	}
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index d499d02..4e911fb 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -78,13 +78,6 @@ $content"
 	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
 	test_cmp expect actual
     '
-
-    test_expect_success '--batch-check with %(rest)' '
-	echo "$type this is some extra content" >expect &&
-	echo "$sha1    this is some extra content" |
-		git cat-file --batch-check="%(objecttype) %(rest)" >actual &&
-	test_cmp expect actual
-    '
 }
 
 hello_content="Hello World"
-- 
1.8.4-rc1-125-g7a0ec02

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
  2013-08-02 11:59     ` Jeff King
  2013-08-02 15:27       ` Joey Hess
@ 2013-08-02 16:41       ` Junio C Hamano
  2013-08-02 17:28         ` Jeff King
  1 sibling, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2013-08-02 16:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, Joey Hess, git

Jeff King <peff@peff.net> writes:

> On Fri, Aug 02, 2013 at 03:54:02AM -0700, Jeff King wrote:
>
>> We need to revert that commit before the release. It can either be
>> replaced with:
>> 
>>   1. A "--split" (or similar) option to use the behavior only when
>>      desired.
>> 
>>   2. Enabling splitting only when %(rest) is used in the output format.
>
> Of the two, I think the latter is more sensible; the former is
> unnecessarily placing the burden on the user to match "--split" with
> their use of "%(rest)". The second is pointless without the first.
>
> A patch to implement (2) is below.

As I'd queue this on top of the revert, I had to wrangle it a bit to
make it relative, i.e. "this resurrects what the other reverted
patch did but in a weaker/safer form".

This will be kept outside this cycle.  Thanks for a quick fix.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
  2013-08-02 16:41       ` Junio C Hamano
@ 2013-08-02 17:28         ` Jeff King
  2013-08-02 18:30           ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff King @ 2013-08-02 17:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Joey Hess, git

On Fri, Aug 02, 2013 at 09:41:52AM -0700, Junio C Hamano wrote:

> > Of the two, I think the latter is more sensible; the former is
> > unnecessarily placing the burden on the user to match "--split" with
> > their use of "%(rest)". The second is pointless without the first.
> >
> > A patch to implement (2) is below.
> 
> As I'd queue this on top of the revert, I had to wrangle it a bit to
> make it relative, i.e. "this resurrects what the other reverted
> patch did but in a weaker/safer form".

Yeah, sorry. After doing the patch I had the thought that maybe the
least invasive thing would be the fix rather than the straight revert
(we are counting on my assertion that just reverting out part of the
series will be OK; I'm pretty sure that is the case, but it is not
risk-free, either).

I didn't see the result of your wrangling in pu, but I will keep an eye
out to double-check it (unless you did not finish, in which case I am
happy to do the wrangling myself).

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
  2013-08-02 17:28         ` Jeff King
@ 2013-08-02 18:30           ` Junio C Hamano
  2013-08-02 20:05             ` Jonathan Nieder
  2013-08-03  7:18             ` Jeff King
  0 siblings, 2 replies; 11+ messages in thread
From: Junio C Hamano @ 2013-08-02 18:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, Joey Hess, git

Jeff King <peff@peff.net> writes:

> On Fri, Aug 02, 2013 at 09:41:52AM -0700, Junio C Hamano wrote:
>
>> > Of the two, I think the latter is more sensible; the former is
>> > unnecessarily placing the burden on the user to match "--split" with
>> > their use of "%(rest)". The second is pointless without the first.
>> >
>> > A patch to implement (2) is below.
>> 
>> As I'd queue this on top of the revert, I had to wrangle it a bit to
>> make it relative, i.e. "this resurrects what the other reverted
>> patch did but in a weaker/safer form".
>
> Yeah, sorry. After doing the patch I had the thought that maybe the
> least invasive thing would be the fix rather than the straight revert
> (we are counting on my assertion that just reverting out part of the
> series will be OK; I'm pretty sure that is the case, but it is not
> risk-free, either).
>
> I didn't see the result of your wrangling in pu, but I will keep an eye
> out to double-check it (unless you did not finish, in which case I am
> happy to do the wrangling myself).

Here is what is on top of the revert that has been pushed out on
'pu'.

Thanks.

-- >8 --
From: Jeff King <peff@peff.net>
Date: Fri, 2 Aug 2013 04:59:07 -0700
Subject: [PATCH] cat-file: only split on whitespace when %(rest) is used

Commit c334b87b (cat-file: split --batch input lines on whitespace,
2013-07-11) taught `cat-file --batch-check` to split input lines on
the first whitespace, and stash everything after the first token
into the %(rest) output format element.  It claimed:

   Object names cannot contain spaces, so any input with
   spaces would have resulted in a "missing" line.

But that is not correct.  Refs, object sha1s, and various peeling
suffixes cannot contain spaces, but some object names can. In
particular:

  1. Tree paths like "[<tree>]:path with whitespace"

  2. Reflog specifications like "@{2 days ago}"

  3. Commit searches like "rev^{/grep me}" or ":/grep me"

To remain backwards compatible, we cannot split on whitespace by
default, hence we will ship 1.8.4 with the commit reverted.

Resurrect its attempt but in a weaker form; only do the splitting
when "%(rest)" is used in the output format. Since that element did
not exist at all before c334b87, old scripts cannot be affected.

The existence of object names with spaces does mean that you
cannot reliably do:

  echo ":path with space and other data" |
  git cat-file --batch-check="%(objectname) %(rest)"

as it would split the path and feed only ":path" to get_sha1. But
that command is nonsensical. If you wanted to see "and other data"
in "%(rest)", git cannot possibly know where the filename ends and
the "rest" begins.

It might be more robust to have something like "-z" to separate the
input elements. But this patch is still a reasonable step before
having that.  It makes the easy cases easy; people who do not care
about %(rest) do not have to consider it, and the %(rest) code
handles the spaces and newlines of "rev-list --objects" correctly.

Hard cases remain hard but possible (if you might get whitespace in
your input, you do not get to use %(rest) and must split and join
the output yourself using more flexible tools). And most
importantly, it does not preclude us from having different splitting
rules later if a "-z" (or similar) option is added.  So we can make
the hard cases easier later, if we choose to.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-cat-file.txt | 14 ++++++++++----
 builtin/cat-file.c             | 31 ++++++++++++++++++++++++++++++-
 t/t1006-cat-file.sh            | 15 +++++++++++++++
 3 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 10fbc6a..21cffe2 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -86,10 +86,9 @@ BATCH OUTPUT
 ------------
 
 If `--batch` or `--batch-check` is given, `cat-file` will read objects
-from stdin, one per line, and print information about them.
-
-Each line is considered as a whole object name, and is parsed as if
-given to linkgit:git-rev-parse[1].
+from stdin, one per line, and print information about them. By default,
+the whole line is considered as an object, as if it were fed to
+linkgit:git-rev-parse[1].
 
 You can specify the information shown for each object by using a custom
 `<format>`. The `<format>` is copied literally to stdout for each
@@ -110,6 +109,13 @@ newline. The available atoms are:
 	The size, in bytes, that the object takes up on disk. See the
 	note about on-disk sizes in the `CAVEATS` section below.
 
+`rest`::
+	If this atom is used in the output string, input lines are split
+	at the first whitespace boundary. All characters before that
+	whitespace are considered to be the object name; characters
+	after that first run of whitespace (i.e., the "rest" of the
+	line) are output in place of the `%(rest)` atom.
+
 If no format is specified, the default format is `%(objectname)
 %(objecttype) %(objectsize)`.
 
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 4253460..07b4818 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -119,6 +119,7 @@ struct expand_data {
 	enum object_type type;
 	unsigned long size;
 	unsigned long disk_size;
+	const char *rest;
 
 	/*
 	 * If mark_query is true, we do not expand anything, but rather
@@ -127,6 +128,13 @@ struct expand_data {
 	int mark_query;
 
 	/*
+	 * Whether to split the input on whitespace before feeding it to
+	 * get_sha1; this is decided during the mark_query phase based on
+	 * whether we have a %(rest) token in our format.
+	 */
+	int split_on_whitespace;
+
+	/*
 	 * After a mark_query run, this object_info is set up to be
 	 * passed to sha1_object_info_extended. It will point to the data
 	 * elements above, so you can retrieve the response from there.
@@ -163,6 +171,11 @@ static void expand_atom(struct strbuf *sb, const char *atom, int len,
 			data->info.disk_sizep = &data->disk_size;
 		else
 			strbuf_addf(sb, "%lu", data->disk_size);
+	} else if (is_atom("rest", atom, len)) {
+		if (data->mark_query)
+			data->split_on_whitespace = 1;
+		else if (data->rest)
+			strbuf_addstr(sb, data->rest);
 	} else
 		die("unknown format element: %.*s", len, atom);
 }
@@ -273,7 +286,23 @@ static int batch_objects(struct batch_options *opt)
 	warn_on_object_refname_ambiguity = 0;
 
 	while (strbuf_getline(&buf, stdin, '\n') != EOF) {
-		int error = batch_one_object(buf.buf, opt, &data);
+		char *p;
+		int error;
+
+		if (data.split_on_whitespace) {
+			/*
+			 * Split at first whitespace, tying off the beginning of the
+			 * string and saving the remainder (or NULL) in data.rest.
+			 */
+			p = strpbrk(buf.buf, " \t");
+			if (p) {
+				while (*p && strchr(" \t", *p))
+					*p++ = '\0';
+			}
+			data.rest = p;
+		}
+
+		error = batch_one_object(buf.buf, opt, &data);
 		if (error)
 			return error;
 	}
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 4e911fb..a420742 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -78,6 +78,13 @@ $content"
 	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
 	test_cmp expect actual
     '
+
+    test_expect_success '--batch-check with %(rest)' '
+	echo "$type this is some extra content" >expect &&
+	echo "$sha1    this is some extra content" |
+		git cat-file --batch-check="%(objecttype) %(rest)" >actual &&
+	test_cmp expect actual
+    '
 }
 
 hello_content="Hello World"
@@ -91,6 +98,14 @@ test_expect_success "setup" '
 
 run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
 
+test_expect_success '--batch-check without %(rest) considers whole line' '
+	echo "$hello_sha1 blob $hello_size" >expect &&
+	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
+	test_when_finished "git update-index --remove \"white space\"" &&
+	echo ":white space" | git cat-file --batch-check >actual &&
+	test_cmp expect actual
+'
+
 tree_sha1=$(git write-tree)
 tree_size=33
 tree_pretty_content="100644 blob $hello_sha1	hello"
-- 
1.8.4-rc1-129-g1f3472b

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
  2013-08-02 18:30           ` Junio C Hamano
@ 2013-08-02 20:05             ` Jonathan Nieder
  2013-08-03  7:18             ` Jeff King
  1 sibling, 0 replies; 11+ messages in thread
From: Jonathan Nieder @ 2013-08-02 20:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Joey Hess, git

Junio C Hamano wrote:

> Here is what is on top of the revert that has been pushed out on
> 'pu'.

For what it's worth,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

[...]
> To remain backwards compatible, we cannot split on whitespace by
> default, hence we will ship 1.8.4 with the commit reverted.
[...]
> It might be more robust to have something like "-z" to separate the
> input elements. But this patch is still a reasonable step before
> having that.  It makes the easy cases easy; people who do not care
> about %(rest) do not have to consider it, and the %(rest) code
> handles the spaces and newlines of "rev-list --objects" correctly.

Another idea for the future might be to start rejecting refnames
starting with a double-quote '"', which would make it safe to treat a
leading quote-mark as the start of a C-style quoted string.  But
currently that would technically be a breaking change, making "-z"
more useful in the meantime.

I think several commands already don't deal well with filenames with
newlines.  I hope POSIX forbids them (with some suitable migration
plan) soonish and even wouldn't mind if git were taught to refuse to
track them.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces
  2013-08-02 18:30           ` Junio C Hamano
  2013-08-02 20:05             ` Jonathan Nieder
@ 2013-08-03  7:18             ` Jeff King
  1 sibling, 0 replies; 11+ messages in thread
From: Jeff King @ 2013-08-03  7:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Joey Hess, git

On Fri, Aug 02, 2013 at 11:30:03AM -0700, Junio C Hamano wrote:

> > I didn't see the result of your wrangling in pu, but I will keep an eye
> > out to double-check it (unless you did not finish, in which case I am
> > happy to do the wrangling myself).
> 
> Here is what is on top of the revert that has been pushed out on
> 'pu'.

Thanks, that looks good to me.

We may want to also squash in the patch below, which puts the pointer
variable in the most-local block and re-wraps the newly indented comment
for line length. Neither introduced by your adaptation, but they became
more obvious to me when seen on top of the revert.

-Peff

---
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 07b4818..41afaa5 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -286,15 +286,15 @@ static int batch_objects(struct batch_options *opt)
 	warn_on_object_refname_ambiguity = 0;
 
 	while (strbuf_getline(&buf, stdin, '\n') != EOF) {
-		char *p;
 		int error;
 
 		if (data.split_on_whitespace) {
 			/*
-			 * Split at first whitespace, tying off the beginning of the
-			 * string and saving the remainder (or NULL) in data.rest.
+			 * Split at first whitespace, tying off the beginning
+			 * of the string and saving the remainder (or NULL) in
+			 * data.rest.
 			 */
-			p = strpbrk(buf.buf, " \t");
+			char *p = strpbrk(buf.buf, " \t");
 			if (p) {
 				while (*p && strchr(" \t", *p))
 					*p++ = '\0';

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-08-03  7:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20130801201842.GA16809@kitenet.net>
2013-08-02  6:40 ` [regression] Re: git-cat-file --batch reversion; cannot query filenames with spaces Jonathan Nieder
2013-08-02 10:54   ` Jeff King
2013-08-02 11:59     ` Jeff King
2013-08-02 15:27       ` Joey Hess
2013-08-02 16:14         ` Brandon Casey
2013-08-02 16:41       ` Junio C Hamano
2013-08-02 17:28         ` Jeff King
2013-08-02 18:30           ` Junio C Hamano
2013-08-02 20:05             ` Jonathan Nieder
2013-08-03  7:18             ` Jeff King
2013-08-02 16:32     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).