git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/4] rev-list: introduce NUL-delimited output mode
@ 2025-03-10 19:28 Justin Tobler
  2025-03-10 19:28 ` [PATCH 1/4] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
                   ` (7 more replies)
  0 siblings, 8 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-10 19:28 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, Justin Tobler

When walking objects, git-rev-list(1) prints each object entry on a
separate line in the form:

        <oid> LF

Some options, such as `--objects`, may print additional information
about the object on the same line:

        <oid> SP [<path>] LF

In this mode, if the object path contains a newline it is truncated at
the newline.

When the `--missing={print,print-info}` option is provided, information
about any missing objects encountered during the object walk are also
printed in the form:

        ?<oid> [SP <token>=<value>]... LF

where values containing LF or SP are printed in a token specific fashion
so that the resulting encoded value does not contain either of these two
problematic bytes. For example, missing object paths are quoted in the C
style so they contain LF or SP.

To make machine parsing easier, this series introduces a NUL-delimited
output mode for git-rev-list(1) via a `-z` option following a suggestion
from Junio in a previous thread[1]. In this mode, instead of LF, each
object is delimited with two NUL bytes and any object metadata is
separated with a single NUL byte. Examples:

        <oid> NUL NUL
        <oid> [NUL <path>] NUL NUL
        ?<oid> [NUL <token>=<value>]... NUL NUL

In this mode, path and value info are printed as-is without any special
encoding or truncation.

For now this series only adds support for use with the `--objects` and
`--missing` options. Usage of `-z` with other options is rejected, so it
can potentially be added in the future.

One idea I had, but did not implement in this version, was to also use
the `<token>=<value>` format for regular non-missing object info while
in the NUL-delimited mode. I could see this being a bit more flexible
instead of relying strictly on order. Interested if anyone has thoughts
on this. :)

This series is structured as follows:

        - Patches 1 and 2 do some minor preparatory refactors.

        - Patch 3 adds the `-z` option to git-rev-list(1) to print
          objects in a NUL-delimited fashion. Printed object paths with
          the `--objects` option are also handled.

        - Patch 4 teaches the `--missing` option how to print info in a
          NUL-delimited fashion.

Thanks for taking a look,
-Justin

[1]: <xmqq5xlor0la.fsf@gitster.g>

Justin Tobler (4):
  rev-list: inline `show_object_with_name()` in `show_object()`
  rev-list: refactor early option parsing
  rev-list: support delimiting objects with NUL bytes
  rev-list: support NUL-delimited --missing option

 Documentation/rev-list-options.adoc | 26 +++++++++
 builtin/rev-list.c                  | 86 ++++++++++++++++++++++-------
 revision.c                          |  8 ---
 revision.h                          |  2 -
 t/t6000-rev-list-misc.sh            | 34 ++++++++++++
 t/t6022-rev-list-missing.sh         | 30 ++++++++++
 6 files changed, 155 insertions(+), 31 deletions(-)


base-commit: 87a0bdbf0f72b7561f3cd50636eee33dcb7dbcc3
-- 
2.49.0.rc2



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 1/4] rev-list: inline `show_object_with_name()` in `show_object()`
  2025-03-10 19:28 [PATCH 0/4] rev-list: introduce NUL-delimited output mode Justin Tobler
@ 2025-03-10 19:28 ` Justin Tobler
  2025-03-10 20:51   ` Junio C Hamano
  2025-03-10 19:28 ` [PATCH 2/4] rev-list: refactor early option parsing Justin Tobler
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 60+ messages in thread
From: Justin Tobler @ 2025-03-10 19:28 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, Justin Tobler

The `show_object_with_name()` function only has a single call site.
Inline call to `show_object_with_name()` in `show_object()` so the
explicit function can be cleaned up and live closer to where it is used.
While at it, factor out the code that prints the OID and newline for
both objects with and without a name. In a subsequent commit,
`show_object()` is modified to support printing object information in a
NUL-delimited format.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/rev-list.c | 13 +++++++++----
 revision.c         |  8 --------
 revision.h         |  2 --
 3 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index bb26bee0d4..dcd079c16c 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -357,10 +357,15 @@ static void show_object(struct object *obj, const char *name, void *cb_data)
 		return;
 	}
 
-	if (arg_show_object_names)
-		show_object_with_name(stdout, obj, name);
-	else
-		printf("%s\n", oid_to_hex(&obj->oid));
+	printf("%s", oid_to_hex(&obj->oid));
+
+	if (arg_show_object_names) {
+		putchar(' ');
+		for (const char *p = name; *p && *p != '\n'; p++)
+			putchar(*p);
+	}
+
+	putchar('\n');
 }
 
 static void show_edge(struct commit *commit)
diff --git a/revision.c b/revision.c
index c4390f0938..0eaebe4478 100644
--- a/revision.c
+++ b/revision.c
@@ -59,14 +59,6 @@ implement_shared_commit_slab(revision_sources, char *);
 
 static inline int want_ancestry(const struct rev_info *revs);
 
-void show_object_with_name(FILE *out, struct object *obj, const char *name)
-{
-	fprintf(out, "%s ", oid_to_hex(&obj->oid));
-	for (const char *p = name; *p && *p != '\n'; p++)
-		fputc(*p, out);
-	fputc('\n', out);
-}
-
 static void mark_blob_uninteresting(struct blob *blob)
 {
 	if (!blob)
diff --git a/revision.h b/revision.h
index 71e984c452..21c6a69899 100644
--- a/revision.h
+++ b/revision.h
@@ -489,8 +489,6 @@ void mark_parents_uninteresting(struct rev_info *revs, struct commit *commit);
 void mark_tree_uninteresting(struct repository *r, struct tree *tree);
 void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *trees);
 
-void show_object_with_name(FILE *, struct object *, const char *);
-
 /**
  * Helpers to check if a reference should be excluded.
  */
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 2/4] rev-list: refactor early option parsing
  2025-03-10 19:28 [PATCH 0/4] rev-list: introduce NUL-delimited output mode Justin Tobler
  2025-03-10 19:28 ` [PATCH 1/4] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
@ 2025-03-10 19:28 ` Justin Tobler
  2025-03-10 20:54   ` Junio C Hamano
  2025-03-10 19:28 ` [PATCH 3/4] rev-list: support delimiting objects with NUL bytes Justin Tobler
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 60+ messages in thread
From: Justin Tobler @ 2025-03-10 19:28 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, Justin Tobler

Before invoking `setup_revisions()`, the `--missing` and
`--exclude-promisor-objects` options are parsed early. In a subsequent
commit, another option is added that must be parsed early.

Refactor the code to parse both options in a single early pass.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/rev-list.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index dcd079c16c..04d9c893b5 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -16,6 +16,7 @@
 #include "object-file.h"
 #include "object-store-ll.h"
 #include "pack-bitmap.h"
+#include "parse-options.h"
 #include "log-tree.h"
 #include "graph.h"
 #include "bisect.h"
@@ -639,19 +640,15 @@ int cmd_rev_list(int argc,
 		if (!strcmp(arg, "--exclude-promisor-objects")) {
 			fetch_if_missing = 0;
 			revs.exclude_promisor_objects = 1;
-			break;
-		}
-	}
-	for (i = 1; i < argc; i++) {
-		const char *arg = argv[i];
-		if (skip_prefix(arg, "--missing=", &arg)) {
-			if (revs.exclude_promisor_objects)
-				die(_("options '%s' and '%s' cannot be used together"), "--exclude-promisor-objects", "--missing");
-			if (parse_missing_action_value(arg))
-				break;
+		} else if (skip_prefix(arg, "--missing=", &arg)) {
+			parse_missing_action_value(arg);
 		}
 	}
 
+	die_for_incompatible_opt2(revs.exclude_promisor_objects,
+				  "--exclude_promisor_objects",
+				  arg_missing_action, "--missing");
+
 	if (arg_missing_action)
 		revs.do_not_die_on_missing_objects = 1;
 
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 3/4] rev-list: support delimiting objects with NUL bytes
  2025-03-10 19:28 [PATCH 0/4] rev-list: introduce NUL-delimited output mode Justin Tobler
  2025-03-10 19:28 ` [PATCH 1/4] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
  2025-03-10 19:28 ` [PATCH 2/4] rev-list: refactor early option parsing Justin Tobler
@ 2025-03-10 19:28 ` Justin Tobler
  2025-03-10 20:59   ` Junio C Hamano
  2025-03-12  7:50   ` Patrick Steinhardt
  2025-03-10 19:28 ` [PATCH 4/4] rev-list: support NUL-delimited --missing option Justin Tobler
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-10 19:28 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, Justin Tobler

When walking objects, git-rev-list(1) prints each object entry on a
separate line. Some options, such as `--objects`, may print additional
information about tree and blob object on the same line in the form:

        $ git rev-list --objects <rev>
        <tree/blob oid> SP [<path>] LF

Note that in this form the SP is appended regardless of whether the tree
or blob object has path information available. Paths containing a
newline are also truncated at the newline.

Introduce the `-z` option for git-rev-list(1) which reformats the output
to use NUL-delimiters between objects and associated info. Each object
line uses two NUL bytes to indicate the end of an object entry and a
single NUL byte to delimit between object information in the following
form:

        $ git rev-list -z --objects <rev>
        <oid> [NUL <path>] NUL NUL

For now, the `--objects` flag is the only option that can be used in
combination with `-z`. In this mode, the object path is not truncated at
newlines. In a subsequent commit, NUL-delimiter support for other
options is added. Other options that do not make sense with be used in
combination with `-z` are rejected.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/rev-list-options.adoc | 18 +++++++++++++
 builtin/rev-list.c                  | 39 +++++++++++++++++++++++++----
 t/t6000-rev-list-misc.sh            | 34 +++++++++++++++++++++++++
 3 files changed, 86 insertions(+), 5 deletions(-)

diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
index 785c0786e0..d21016d657 100644
--- a/Documentation/rev-list-options.adoc
+++ b/Documentation/rev-list-options.adoc
@@ -361,6 +361,24 @@ ifdef::git-rev-list[]
 --progress=<header>::
 	Show progress reports on stderr as objects are considered. The
 	`<header>` text will be printed with each progress update.
+
+-z::
+	Instead of being newline-delimited, each outputted object is delimited
+	with two NUL bytes in the following form:
++
+-----------------------------------------------------------------------
+<OID> NUL NUL
+-----------------------------------------------------------------------
++
+When the `--objects` option is also present, available object name information
+is printed in the following form without any truncation for object names
+containing newline characters:
++
+-----------------------------------------------------------------------
+<OID> [NUL <object-name>] NUL NUL
+-----------------------------------------------------------------------
++
+This option is only compatible with `--objects`.
 endif::git-rev-list[]
 
 History Simplification
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 04d9c893b5..86b3ce5806 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -65,6 +65,7 @@ static const char rev_list_usage[] =
 "    --abbrev-commit\n"
 "    --left-right\n"
 "    --count\n"
+"    -z\n"
 "  special purpose:\n"
 "    --bisect\n"
 "    --bisect-vars\n"
@@ -97,10 +98,23 @@ static int arg_show_object_names = 1;
 
 #define DEFAULT_OIDSET_SIZE     (16*1024)
 
+static int nul_delim;
 static int show_disk_usage;
 static off_t total_disk_usage;
 static int human_readable;
 
+static void print_object_term(int nul_delim)
+{
+	char line_sep = '\n';
+
+	if (nul_delim)
+		line_sep = '\0';
+
+	putchar(line_sep);
+	if (nul_delim)
+		putchar(line_sep);
+}
+
 static off_t get_object_disk_usage(struct object *obj)
 {
 	off_t size;
@@ -264,7 +278,7 @@ static void show_commit(struct commit *commit, void *data)
 	if (revs->commit_format == CMIT_FMT_ONELINE)
 		putchar(' ');
 	else if (revs->include_header)
-		putchar('\n');
+		print_object_term(nul_delim);
 
 	if (revs->verbose_header) {
 		struct strbuf buf = STRBUF_INIT;
@@ -361,12 +375,17 @@ static void show_object(struct object *obj, const char *name, void *cb_data)
 	printf("%s", oid_to_hex(&obj->oid));
 
 	if (arg_show_object_names) {
-		putchar(' ');
-		for (const char *p = name; *p && *p != '\n'; p++)
-			putchar(*p);
+		if (nul_delim && *name) {
+			putchar('\0');
+			printf("%s", name);
+		} else if (!nul_delim) {
+			putchar(' ');
+			for (const char *p = name; *p && *p != '\n'; p++)
+				putchar(*p);
+		}
 	}
 
-	putchar('\n');
+	print_object_term(nul_delim);
 }
 
 static void show_edge(struct commit *commit)
@@ -642,6 +661,8 @@ int cmd_rev_list(int argc,
 			revs.exclude_promisor_objects = 1;
 		} else if (skip_prefix(arg, "--missing=", &arg)) {
 			parse_missing_action_value(arg);
+		} else if (!strcmp(arg, "-z")) {
+			nul_delim = 1;
 		}
 	}
 
@@ -757,6 +778,14 @@ int cmd_rev_list(int argc,
 		usage(rev_list_usage);
 
 	}
+
+	if (nul_delim) {
+		if (revs.graph || revs.verbose_header || show_disk_usage ||
+		    info.show_timestamp || info.header_prefix || bisect_list ||
+		    use_bitmap_index || revs.edge_hint || arg_missing_action)
+			die(_("-z option used with unsupported option"));
+	}
+
 	if (revs.commit_format != CMIT_FMT_USERFORMAT)
 		revs.include_header = 1;
 	if (revs.commit_format != CMIT_FMT_UNSPECIFIED) {
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index 6289a2e8b0..25c2f2f238 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -182,4 +182,38 @@ test_expect_success 'rev-list --unpacked' '
 	test_cmp expect actual
 '
 
+test_expect_success 'rev-list -z' '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	test_commit -C repo 1 &&
+	test_commit -C repo 2 &&
+
+	oid1=$(git -C repo rev-parse HEAD) &&
+	oid2=$(git -C repo rev-parse HEAD~) &&
+
+	printf "%s\0\0%s\0\0" "$oid1" "$oid2" >expect &&
+	git -C repo rev-list -z HEAD >actual &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'rev-list -z --objects' '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	test_commit -C repo 1 &&
+	test_commit -C repo 2 &&
+
+	oid1=$(git -C repo rev-parse HEAD:1.t) &&
+	oid2=$(git -C repo rev-parse HEAD:2.t) &&
+	path1=1.t &&
+	path2=2.t &&
+
+	printf "%s\0%s\0\0%s\0%s\0\0" "$oid1" "$path1" "$oid2" "$path2" >expect &&
+	git -C repo rev-list -z --objects HEAD:1.t HEAD:2.t >actual &&
+
+	test_cmp expect actual
+'
+
 test_done
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 4/4] rev-list: support NUL-delimited --missing option
  2025-03-10 19:28 [PATCH 0/4] rev-list: introduce NUL-delimited output mode Justin Tobler
                   ` (2 preceding siblings ...)
  2025-03-10 19:28 ` [PATCH 3/4] rev-list: support delimiting objects with NUL bytes Justin Tobler
@ 2025-03-10 19:28 ` Justin Tobler
  2025-03-10 20:37 ` [PATCH 0/4] rev-list: introduce NUL-delimited output mode Junio C Hamano
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-10 19:28 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, Justin Tobler

The `--missing={print,print-info}` option for git-rev-list(1) prints
missing objects found while performing the revision walk. Add support
for printing missing objects in a NUL-delimited format when the `-z`
option is enabled.

        $ git rev-list -z --missing=print-info <rev>
        <oid> NUL NUL
        ?<oid> [NUL <token>=<value>]... NUL NUL

In this mode, values containing special characters or spaces are printed
as-is without being escaped or quoted.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/rev-list-options.adoc | 10 +++++++++-
 builtin/rev-list.c                  | 27 +++++++++++++++++++-------
 t/t6022-rev-list-missing.sh         | 30 +++++++++++++++++++++++++++++
 3 files changed, 59 insertions(+), 8 deletions(-)

diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
index d21016d657..48648b7507 100644
--- a/Documentation/rev-list-options.adoc
+++ b/Documentation/rev-list-options.adoc
@@ -378,7 +378,15 @@ containing newline characters:
 <OID> [NUL <object-name>] NUL NUL
 -----------------------------------------------------------------------
 +
-This option is only compatible with `--objects`.
+When the `--missing` option is provided, missing objects are printed in the
+following form where value is printed as-is without any token specific
+encoding:
++
+-----------------------------------------------------------------------
+?<OID> [NUL <token>=<value>]... NUL NUL
+-----------------------------------------------------------------------
++
+This option is only compatible with `--objects` and `--missing`.
 endif::git-rev-list[]
 
 History Simplification
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 86b3ce5806..5bbc4a787e 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -145,25 +145,38 @@ static void print_missing_object(struct missing_objects_map_entry *entry,
 				 int print_missing_info)
 {
 	struct strbuf sb = STRBUF_INIT;
+	char info_sep = ' ';
+
+	if (nul_delim)
+		info_sep = '\0';
+
+	printf("?%s", oid_to_hex(&entry->entry.oid));
 
 	if (!print_missing_info) {
-		printf("?%s\n", oid_to_hex(&entry->entry.oid));
+		print_object_term(nul_delim);
 		return;
 	}
 
 	if (entry->path && *entry->path) {
 		struct strbuf path = STRBUF_INIT;
 
-		strbuf_addstr(&sb, " path=");
-		quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP);
-		strbuf_addbuf(&sb, &path);
+		strbuf_addf(&sb, "%cpath=", info_sep);
+
+		if (nul_delim) {
+			strbuf_addstr(&sb, entry->path);
+		} else {
+			quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP);
+			strbuf_addbuf(&sb, &path);
+		}
 
 		strbuf_release(&path);
 	}
 	if (entry->type)
-		strbuf_addf(&sb, " type=%s", type_name(entry->type));
+		strbuf_addf(&sb, "%ctype=%s", info_sep, type_name(entry->type));
+
+	fwrite(sb.buf, sizeof(char), sb.len, stdout);
+	print_object_term(nul_delim);
 
-	printf("?%s%s\n", oid_to_hex(&entry->entry.oid), sb.buf);
 	strbuf_release(&sb);
 }
 
@@ -782,7 +795,7 @@ int cmd_rev_list(int argc,
 	if (nul_delim) {
 		if (revs.graph || revs.verbose_header || show_disk_usage ||
 		    info.show_timestamp || info.header_prefix || bisect_list ||
-		    use_bitmap_index || revs.edge_hint || arg_missing_action)
+		    use_bitmap_index || revs.edge_hint)
 			die(_("-z option used with unsupported option"));
 	}
 
diff --git a/t/t6022-rev-list-missing.sh b/t/t6022-rev-list-missing.sh
index 3e2790d4c8..3ae25e4cfb 100755
--- a/t/t6022-rev-list-missing.sh
+++ b/t/t6022-rev-list-missing.sh
@@ -198,4 +198,34 @@ do
 	'
 done
 
+test_expect_success "-z nul-delimited --missing" '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	(
+		cd repo &&
+		git commit --allow-empty -m first &&
+
+		path="foo bar" &&
+		echo foobar >"$path" &&
+		git add -A &&
+		git commit -m second &&
+
+		oid=$(git rev-parse "HEAD:$path") &&
+		type="$(git cat-file -t $oid)" &&
+
+		obj_path=".git/objects/$(test_oid_to_path $oid)" &&
+
+		git rev-list -z --objects --no-object-names \
+			HEAD ^"$oid" >expect &&
+		printf "?%s\0path=%s\0type=%s\0\0" "$oid" "$path" "$type" >>expect &&
+
+		mv "$obj_path" "$obj_path.hidden" &&
+		git rev-list -z --objects --no-object-names \
+			--missing=print-info HEAD >actual &&
+
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-10 19:28 [PATCH 0/4] rev-list: introduce NUL-delimited output mode Justin Tobler
                   ` (3 preceding siblings ...)
  2025-03-10 19:28 ` [PATCH 4/4] rev-list: support NUL-delimited --missing option Justin Tobler
@ 2025-03-10 20:37 ` Junio C Hamano
  2025-03-10 21:08   ` Junio C Hamano
  2025-03-11 23:19   ` Justin Tobler
  2025-03-10 22:38 ` D. Ben Knoble
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 60+ messages in thread
From: Junio C Hamano @ 2025-03-10 20:37 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, ps, christian.couder

Justin Tobler <jltobler@gmail.com> writes:

>         ?<oid> [SP <token>=<value>]... LF
>
> where values containing LF or SP are printed in a token specific fashion
> so that the resulting encoded value does not contain either of these two
> problematic bytes. For example, missing object paths are quoted in the C
> style so they contain LF or SP.

"so" -> "when"???

> To make machine parsing easier, this series introduces a NUL-delimited
> output mode for git-rev-list(1) via a `-z` option following a suggestion
> from Junio in a previous thread[1]. In this mode, instead of LF, each
> object is delimited with two NUL bytes and any object metadata is
> separated with a single NUL byte. Examples:
>
>         <oid> NUL NUL
>         <oid> [NUL <path>] NUL NUL

Why do we need double-NUL in the above two cases?

>         ?<oid> [NUL <token>=<value>]... NUL NUL

This one I understand; we could do without double-NUL and take the
lack of "=" in the token after NUL termination as the sign that the
previous record ended, though, to avoid double-NUL while keeping the
format extensible.

As this topic is designing essentially a new and machine parseable
format, we could even unify all three formats into one.  For example,
the format could be like this:

	<oid> NUL [<attr>=<value> NUL]...

where

 (1) A record ends when a new record begins.

 (2) The beginning of a new record is signaled by <oid> that is all
     hexadecimal and does not have any '=' in it.

 (3) The traditional "rev-list --objects" output that gives path in
     addition to the object name uses "path" as the <attr> name,
     i.e. such a record looks like "<oid> NUL path=<path> NUL".

 (4) The traditional "rev-list --missing" output loses the leading
     "?"; it is replaced by "missing" as the <attr> name, i.e. such
     a record may look like "<oid> NUL missing=yes NUL..." together
     with other "<token>=<value> NUL" pairs appended as needed at
     the end.

Hmm?


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/4] rev-list: inline `show_object_with_name()` in `show_object()`
  2025-03-10 19:28 ` [PATCH 1/4] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
@ 2025-03-10 20:51   ` Junio C Hamano
  0 siblings, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2025-03-10 20:51 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, ps, christian.couder

Justin Tobler <jltobler@gmail.com> writes:

> The `show_object_with_name()` function only has a single call site.
> Inline call to `show_object_with_name()` in `show_object()` so the
> explicit function can be cleaned up and live closer to where it is used.
> While at it, factor out the code that prints the OID and newline for
> both objects with and without a name. In a subsequent commit,
> `show_object()` is modified to support printing object information in a
> NUL-delimited format.
>
> Signed-off-by: Justin Tobler <jltobler@gmail.com>
> ---
>  builtin/rev-list.c | 13 +++++++++----
>  revision.c         |  8 --------
>  revision.h         |  2 --
>  3 files changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index bb26bee0d4..dcd079c16c 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -357,10 +357,15 @@ static void show_object(struct object *obj, const char *name, void *cb_data)
>  		return;
>  	}
>  
> -	if (arg_show_object_names)
> -		show_object_with_name(stdout, obj, name);
> -	else
> -		printf("%s\n", oid_to_hex(&obj->oid));
> +	printf("%s", oid_to_hex(&obj->oid));
> +
> +	if (arg_show_object_names) {
> +		putchar(' ');
> +		for (const char *p = name; *p && *p != '\n'; p++)
> +			putchar(*p);
> +	}
> +
> +	putchar('\n');
>  }

Makes sense.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/4] rev-list: refactor early option parsing
  2025-03-10 19:28 ` [PATCH 2/4] rev-list: refactor early option parsing Justin Tobler
@ 2025-03-10 20:54   ` Junio C Hamano
  2025-03-12 21:39     ` Justin Tobler
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2025-03-10 20:54 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, ps, christian.couder

Justin Tobler <jltobler@gmail.com> writes:

> @@ -639,19 +640,15 @@ int cmd_rev_list(int argc,
>  		if (!strcmp(arg, "--exclude-promisor-objects")) {
>  			fetch_if_missing = 0;
>  			revs.exclude_promisor_objects = 1;
> -			break;
> -		}
> -	}
> -	for (i = 1; i < argc; i++) {
> -		const char *arg = argv[i];
> -		if (skip_prefix(arg, "--missing=", &arg)) {
> -			if (revs.exclude_promisor_objects)
> -				die(_("options '%s' and '%s' cannot be used together"), "--exclude-promisor-objects", "--missing");
> -			if (parse_missing_action_value(arg))
> -				break;
> +		} else if (skip_prefix(arg, "--missing=", &arg)) {
> +			parse_missing_action_value(arg);
>  		}
>  	}

There is a huge NEEDSWORK comment that essentially says that the
above two loops that this patch combines into one is fundamentally
broken.  I suspect that the remaining two patches in this series
would punt and not improve them, but offhand I am not sure if this
change is making it harder to fix them properly easier or harder.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/4] rev-list: support delimiting objects with NUL bytes
  2025-03-10 19:28 ` [PATCH 3/4] rev-list: support delimiting objects with NUL bytes Justin Tobler
@ 2025-03-10 20:59   ` Junio C Hamano
  2025-03-12 21:39     ` Justin Tobler
  2025-03-12  7:50   ` Patrick Steinhardt
  1 sibling, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2025-03-10 20:59 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, ps, christian.couder

Justin Tobler <jltobler@gmail.com> writes:

> +-z::
> +	Instead of being newline-delimited, each outputted object is delimited
> +	with two NUL bytes in the following form:
> ++
> +-----------------------------------------------------------------------
> +<OID> NUL NUL
> +-----------------------------------------------------------------------
> ++
> +When the `--objects` option is also present, available object name information
> +is printed in the following form without any truncation for object names
> +containing newline characters:
> ++
> +-----------------------------------------------------------------------
> +<OID> [NUL <object-name>] NUL NUL
> +-----------------------------------------------------------------------
> ++
> +This option is only compatible with `--objects`.
>  endif::git-rev-list[]

As I said, I do not think we strongly want double-NUL and would
prefer to do away with a single NUL if possible.

> +static int nul_delim;
>  static int show_disk_usage;
>  static off_t total_disk_usage;
>  static int human_readable;
>  
> +static void print_object_term(int nul_delim)
> +{
> +	char line_sep = '\n';
> +
> +	if (nul_delim)
> +		line_sep = '\0';
> +
> +	putchar(line_sep);
> +	if (nul_delim)
> +		putchar(line_sep);
> +}

This looks, to put it mildly, strange.  The concept of the line
delimiter byte (which can take any single byte) is wider than having
a NUL as the line delimiter byte.  Why would we even want both?

IOW, wouldn't it make more sense to have line_delim as the global
(or per-invocation parameter to this function) and have
print_object_term() just use it?  If you want to make it behave
differently only when line-delimter is NUL (which I do not
recommend), you can switch on the value of line_delimiter being NUL.
So I do not see a merit in having two separate variables (except for
confusing future readers).



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-10 20:37 ` [PATCH 0/4] rev-list: introduce NUL-delimited output mode Junio C Hamano
@ 2025-03-10 21:08   ` Junio C Hamano
  2025-03-11 23:24     ` Justin Tobler
  2025-03-11 23:19   ` Justin Tobler
  1 sibling, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2025-03-10 21:08 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, ps, christian.couder

Junio C Hamano <gitster@pobox.com> writes:

> As this topic is designing essentially a new and machine parseable
> format, we could even unify all three formats into one.  For example,
> the format could be like this:
>
> 	<oid> NUL [<attr>=<value> NUL]...
>
> where

(0) "rev-list" that gives only a sequence of "<oid>" for commit-ish,
    as well as "rev-list --boundary", would fall out as a natural
    consequence.  Bog-standard "list of commits" would see a
    sequence of "<oid> NUL", while a boundary object would see
    "<oid> NUL boundary=yes NUL".

>  (1) A record ends when a new record begins.
>
>  (2) The beginning of a new record is signaled by <oid> that is all
>      hexadecimal and does not have any '=' in it.
>
>  (3) The traditional "rev-list --objects" output that gives path in
>      addition to the object name uses "path" as the <attr> name,
>      i.e. such a record looks like "<oid> NUL path=<path> NUL".
>
>  (4) The traditional "rev-list --missing" output loses the leading
>      "?"; it is replaced by "missing" as the <attr> name, i.e. such
>      a record may look like "<oid> NUL missing=yes NUL..." together
>      with other "<token>=<value> NUL" pairs appended as needed at
>      the end.



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-10 19:28 [PATCH 0/4] rev-list: introduce NUL-delimited output mode Justin Tobler
                   ` (4 preceding siblings ...)
  2025-03-10 20:37 ` [PATCH 0/4] rev-list: introduce NUL-delimited output mode Junio C Hamano
@ 2025-03-10 22:38 ` D. Ben Knoble
  2025-03-11 22:59   ` Justin Tobler
  2025-03-11 23:57 ` Jeff King
  2025-03-13  0:17 ` [PATCH v2 0/6] " Justin Tobler
  7 siblings, 1 reply; 60+ messages in thread
From: D. Ben Knoble @ 2025-03-10 22:38 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, ps, christian.couder, Junio C Hamano

On Mon, Mar 10, 2025 at 3:32 PM Justin Tobler <jltobler@gmail.com> wrote:
>
> When walking objects, git-rev-list(1) prints each object entry on a
> separate line in the form:
>
>         <oid> LF
>
> Some options, such as `--objects`, may print additional information
> about the object on the same line:
>
>         <oid> SP [<path>] LF
>
> In this mode, if the object path contains a newline it is truncated at
> the newline.
>
> When the `--missing={print,print-info}` option is provided, information
> about any missing objects encountered during the object walk are also
> printed in the form:
>
>         ?<oid> [SP <token>=<value>]... LF
>
> where values containing LF or SP are printed in a token specific fashion
> so that the resulting encoded value does not contain either of these two
> problematic bytes. For example, missing object paths are quoted in the C
> style so they contain LF or SP.
>
> To make machine parsing easier, this series introduces a NUL-delimited
> output mode for git-rev-list(1) via a `-z` option following a suggestion
> from Junio in a previous thread[1]. In this mode, instead of LF, each
> object is delimited with two NUL bytes and any object metadata is
> separated with a single NUL byte. Examples:
>
>         <oid> NUL NUL
>         <oid> [NUL <path>] NUL NUL
>         ?<oid> [NUL <token>=<value>]... NUL NUL
>
> In this mode, path and value info are printed as-is without any special
> encoding or truncation.
>
> For now this series only adds support for use with the `--objects` and
> `--missing` options. Usage of `-z` with other options is rejected, so it
> can potentially be added in the future.
>
> One idea I had, but did not implement in this version, was to also use
> the `<token>=<value>` format for regular non-missing object info while
> in the NUL-delimited mode. I could see this being a bit more flexible
> instead of relying strictly on order. Interested if anyone has thoughts
> on this. :)

Without taking a deeper look, I think token=value has the benefit of
being self-describing at the cost of more output bytes (which might
matter over the wire, for example). Generally I like the idea;
sometimes I find it troublesome having to parse prose manuals for the
specifics of output formats like field order, especially when I end up
coding a parser for the format. If the field order doesn’t matter to
the consumer, then perhaps using ordered fields AWK-style is
inappropriately terse?

OTOH, the -z format is for machines, and they don’t need human labels
;) [I think token labels would be a great parser-writing and debugging
aid]

Best,
Ben

>
> This series is structured as follows:
>
>         - Patches 1 and 2 do some minor preparatory refactors.
>
>         - Patch 3 adds the `-z` option to git-rev-list(1) to print
>           objects in a NUL-delimited fashion. Printed object paths with
>           the `--objects` option are also handled.
>
>         - Patch 4 teaches the `--missing` option how to print info in a
>           NUL-delimited fashion.
>
> Thanks for taking a look,
> -Justin
>
> [1]: <xmqq5xlor0la.fsf@gitster.g>
>
> Justin Tobler (4):
>   rev-list: inline `show_object_with_name()` in `show_object()`
>   rev-list: refactor early option parsing
>   rev-list: support delimiting objects with NUL bytes
>   rev-list: support NUL-delimited --missing option
>
>  Documentation/rev-list-options.adoc | 26 +++++++++
>  builtin/rev-list.c                  | 86 ++++++++++++++++++++++-------
>  revision.c                          |  8 ---
>  revision.h                          |  2 -
>  t/t6000-rev-list-misc.sh            | 34 ++++++++++++
>  t/t6022-rev-list-missing.sh         | 30 ++++++++++
>  6 files changed, 155 insertions(+), 31 deletions(-)
>
>
> base-commit: 87a0bdbf0f72b7561f3cd50636eee33dcb7dbcc3
> --
> 2.49.0.rc2
>
>


-- 
D. Ben Knoble


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-10 22:38 ` D. Ben Knoble
@ 2025-03-11 22:59   ` Justin Tobler
  0 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-11 22:59 UTC (permalink / raw)
  To: D. Ben Knoble; +Cc: git, ps, christian.couder, Junio C Hamano

On 25/03/10 06:38PM, D. Ben Knoble wrote:
> On Mon, Mar 10, 2025 at 3:32 PM Justin Tobler <jltobler@gmail.com> wrote:
> >
> > When walking objects, git-rev-list(1) prints each object entry on a
> > separate line in the form:
> >
> >         <oid> LF
> >
> > Some options, such as `--objects`, may print additional information
> > about the object on the same line:
> >
> >         <oid> SP [<path>] LF
> >
> > In this mode, if the object path contains a newline it is truncated at
> > the newline.
> >
> > When the `--missing={print,print-info}` option is provided, information
> > about any missing objects encountered during the object walk are also
> > printed in the form:
> >
> >         ?<oid> [SP <token>=<value>]... LF
> >
> > where values containing LF or SP are printed in a token specific fashion
> > so that the resulting encoded value does not contain either of these two
> > problematic bytes. For example, missing object paths are quoted in the C
> > style so they contain LF or SP.
> >
> > To make machine parsing easier, this series introduces a NUL-delimited
> > output mode for git-rev-list(1) via a `-z` option following a suggestion
> > from Junio in a previous thread[1]. In this mode, instead of LF, each
> > object is delimited with two NUL bytes and any object metadata is
> > separated with a single NUL byte. Examples:
> >
> >         <oid> NUL NUL
> >         <oid> [NUL <path>] NUL NUL
> >         ?<oid> [NUL <token>=<value>]... NUL NUL
> >
> > In this mode, path and value info are printed as-is without any special
> > encoding or truncation.
> >
> > For now this series only adds support for use with the `--objects` and
> > `--missing` options. Usage of `-z` with other options is rejected, so it
> > can potentially be added in the future.
> >
> > One idea I had, but did not implement in this version, was to also use
> > the `<token>=<value>` format for regular non-missing object info while
> > in the NUL-delimited mode. I could see this being a bit more flexible
> > instead of relying strictly on order. Interested if anyone has thoughts
> > on this. :)
> 
> Without taking a deeper look, I think token=value has the benefit of
> being self-describing at the cost of more output bytes (which might
> matter over the wire, for example). Generally I like the idea;
> sometimes I find it troublesome having to parse prose manuals for the
> specifics of output formats like field order, especially when I end up
> coding a parser for the format. If the field order doesn’t matter to
> the consumer, then perhaps using ordered fields AWK-style is
> inappropriately terse?
> 
> OTOH, the -z format is for machines, and they don’t need human labels
> ;) [I think token labels would be a great parser-writing and debugging
> aid]

One of the challenges with parsing git-rev-list(1) is all the various
forms it can take based on the options provided. For example:

    $ git rev-list --timestamp --objects --parents <rev>

    timestamp SP <oid> [SP <parent oid>] LF   (commit)
    <oid> SP [<path>] LF                      (tree/blob)

Relying strictly on order can be a bit tricky to parse due to how the
output format can change even line to line. So even for machine parsing,
labels may help simplify things if all object records follow something
along the lines of:

    <oid> NUL [<token>=<value> NUL]...

As you mentioned, this could potentially also be useful for users since
the attributes would be self-describing. This series is currently
focussed on the machine parsing side, but I think support for this mode
in a human-readable format could be added via a separate option in the
future.

-Justin


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-10 20:37 ` [PATCH 0/4] rev-list: introduce NUL-delimited output mode Junio C Hamano
  2025-03-10 21:08   ` Junio C Hamano
@ 2025-03-11 23:19   ` Justin Tobler
  2025-03-11 23:44     ` Junio C Hamano
  1 sibling, 1 reply; 60+ messages in thread
From: Justin Tobler @ 2025-03-11 23:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ps, christian.couder

On 25/03/10 01:37PM, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
> > To make machine parsing easier, this series introduces a NUL-delimited
> > output mode for git-rev-list(1) via a `-z` option following a suggestion
> > from Junio in a previous thread[1]. In this mode, instead of LF, each
> > object is delimited with two NUL bytes and any object metadata is
> > separated with a single NUL byte. Examples:
> >
> >         <oid> NUL NUL
> >         <oid> [NUL <path>] NUL NUL
> 
> Why do we need double-NUL in the above two cases?

In the `<oid> [NUL <path>] NUL NUL` case, it would technically be
possible for an object path to match an OID. The use of two NUL bytes
signals when the object record ends.

Without someother mechanism to know when a record starts/stops, even the
`<oid> NUL NUL` case would need the two trailing NUL bytes to avoid
being considered a potential path.

If the output format would not result in any additional object metadata
being appended, we could use a single NUL byte to delimit between
objects in this case, but always using two NUL bytes allowed for a more
consistent format.

> 
> >         ?<oid> [NUL <token>=<value>]... NUL NUL
> 
> This one I understand; we could do without double-NUL and take the
> lack of "=" in the token after NUL termination as the sign that the
> previous record ended, though, to avoid double-NUL while keeping the
> format extensible.
> 
> As this topic is designing essentially a new and machine parseable
> format, we could even unify all three formats into one.  For example,
> the format could be like this:
> 
> 	<oid> NUL [<attr>=<value> NUL]...

I was also considering something similar. This format could allow other
object metadata like `--timestamp` to be supported in the future with a
more flexible format. In the next version I'll implement a unified
format here.

> 
> where
> 
>  (1) A record ends when a new record begins.
> 
>  (2) The beginning of a new record is signaled by <oid> that is all
>      hexadecimal and does not have any '=' in it.

I think this is a good idea. By always appending printed object metadata
in the form `<token>=<value>`, we know that any entry without '=' must
be the start of a new record. This removes the need for the two NUL
bytes to indicate the end of a record.

I'll use only a single NUL byte to delimit in the next version.

> 
>  (3) The traditional "rev-list --objects" output that gives path in
>      addition to the object name uses "path" as the <attr> name,
>      i.e. such a record looks like "<oid> NUL path=<path> NUL".
> 
>  (4) The traditional "rev-list --missing" output loses the leading
>      "?"; it is replaced by "missing" as the <attr> name, i.e. such
>      a record may look like "<oid> NUL missing=yes NUL..." together
>      with other "<token>=<value> NUL" pairs appended as needed at
>      the end.

I think this is good. Instead of prefixing missing OIDs with '?', we can
just append another token/value pair `missing=yes`.

Thanks,
-Justin


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-10 21:08   ` Junio C Hamano
@ 2025-03-11 23:24     ` Justin Tobler
  0 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-11 23:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ps, christian.couder

On 25/03/10 02:08PM, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > As this topic is designing essentially a new and machine parseable
> > format, we could even unify all three formats into one.  For example,
> > the format could be like this:
> >
> > 	<oid> NUL [<attr>=<value> NUL]...
> >
> > where
> 
> (0) "rev-list" that gives only a sequence of "<oid>" for commit-ish,
>     as well as "rev-list --boundary", would fall out as a natural
>     consequence.  Bog-standard "list of commits" would see a
>     sequence of "<oid> NUL", while a boundary object would see
>     "<oid> NUL boundary=yes NUL".

I had not considered handling the `--boundary` option. It looks like
boundary objects are printed as part of `show_commit()`, so I can adapt
the handling and do something similar to missing objects:

    $ git rev-list -z --boundary <rev>
    <oid> NUL boundary=yes NUL

This would remain consistent with the unified format.

-Justin


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-11 23:19   ` Justin Tobler
@ 2025-03-11 23:44     ` Junio C Hamano
  2025-03-12  7:37       ` Patrick Steinhardt
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2025-03-11 23:44 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, ps, christian.couder

Justin Tobler <jltobler@gmail.com> writes:

>>  (4) The traditional "rev-list --missing" output loses the leading
>>      "?"; it is replaced by "missing" as the <attr> name, i.e. such
>>      a record may look like "<oid> NUL missing=yes NUL..." together
>>      with other "<token>=<value> NUL" pairs appended as needed at
>>      the end.
>
> I think this is good. Instead of prefixing missing OIDs with '?', we can
> just append another token/value pair `missing=yes`.

And we may want to avoid excessive bloat of the output that is not
primarily meant for human consumption, in which case perhaps a
common things like "missing" and "path" can be given a shorter
token, perhaps like "m=yes" and "p=Makefile"?


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-10 19:28 [PATCH 0/4] rev-list: introduce NUL-delimited output mode Justin Tobler
                   ` (5 preceding siblings ...)
  2025-03-10 22:38 ` D. Ben Knoble
@ 2025-03-11 23:57 ` Jeff King
  2025-03-12  7:42   ` Patrick Steinhardt
  2025-03-12 22:09   ` Justin Tobler
  2025-03-13  0:17 ` [PATCH v2 0/6] " Justin Tobler
  7 siblings, 2 replies; 60+ messages in thread
From: Jeff King @ 2025-03-11 23:57 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, ps, christian.couder

On Mon, Mar 10, 2025 at 02:28:25PM -0500, Justin Tobler wrote:

> To make machine parsing easier, this series introduces a NUL-delimited
> output mode for git-rev-list(1) via a `-z` option following a suggestion
> from Junio in a previous thread[1]. In this mode, instead of LF, each
> object is delimited with two NUL bytes and any object metadata is
> separated with a single NUL byte. Examples:
> 
>         <oid> NUL NUL
>         <oid> [NUL <path>] NUL NUL
>         ?<oid> [NUL <token>=<value>]... NUL NUL
> 
> In this mode, path and value info are printed as-is without any special
> encoding or truncation.

I think this is a good direction, but I have two compatibility
questions:

  1. What should "git rev-list -z --stdin" do? In most other programs
     with a "-z" option it affects both input and output. I don't
     particularly care about this case myself, but it will be hard to
     change later. So we probably want to decide now.

  2. I was a little surprised that rev-list already takes a "-z" option,
     but it doesn't seem to do anything. I guess it's probably picked up
     via diff_opt_parse(), though (I think) you can't actually trigger a
     diff via rev-list itself. So even though this is a change in
     behavior, probably nobody would have been using it until now?

     If it is possible to see some effect from "-z" now (I didn't dig
     very far), then it may be better to continue to let the diff
     options parser handle it, and simply pick the result out of
     revs.diffopt.line_termination. As your patch 3 is written, I think
     the diff code probably doesn't see it anymore at all.

-Peff


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-11 23:44     ` Junio C Hamano
@ 2025-03-12  7:37       ` Patrick Steinhardt
  2025-03-12 21:45         ` Justin Tobler
  0 siblings, 1 reply; 60+ messages in thread
From: Patrick Steinhardt @ 2025-03-12  7:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Justin Tobler, git, christian.couder

On Tue, Mar 11, 2025 at 04:44:10PM -0700, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
> 
> >>  (4) The traditional "rev-list --missing" output loses the leading
> >>      "?"; it is replaced by "missing" as the <attr> name, i.e. such
> >>      a record may look like "<oid> NUL missing=yes NUL..." together
> >>      with other "<token>=<value> NUL" pairs appended as needed at
> >>      the end.
> >
> > I think this is good. Instead of prefixing missing OIDs with '?', we can
> > just append another token/value pair `missing=yes`.
> 
> And we may want to avoid excessive bloat of the output that is not
> primarily meant for human consumption, in which case perhaps a
> common things like "missing" and "path" can be given a shorter
> token, perhaps like "m=yes" and "p=Makefile"?

I would prefer to keep longer paths here:

  - While the output typically shouldn't be seen by a human, the code
    handling it both on the printing and on the receiving side would.
    And there it helps the reader quite a bit to get additional context
    rather than cryptic single-letter abbreviations that one would have
    to always look up.

  - By keeping with long names we also avoid "letter squatting" when two
    attributes of an object start with the same letter.

  - I doubt that the couple of extra characters is really going to
    matter much given that most of the time will most likely be spent
    reading and decompressing the objects from disk anyway.

Patrick


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-11 23:57 ` Jeff King
@ 2025-03-12  7:42   ` Patrick Steinhardt
  2025-03-12 15:56     ` Junio C Hamano
  2025-03-12 22:09   ` Justin Tobler
  1 sibling, 1 reply; 60+ messages in thread
From: Patrick Steinhardt @ 2025-03-12  7:42 UTC (permalink / raw)
  To: Jeff King; +Cc: Justin Tobler, git, christian.couder

On Tue, Mar 11, 2025 at 07:57:20PM -0400, Jeff King wrote:
> On Mon, Mar 10, 2025 at 02:28:25PM -0500, Justin Tobler wrote:
> 
> > To make machine parsing easier, this series introduces a NUL-delimited
> > output mode for git-rev-list(1) via a `-z` option following a suggestion
> > from Junio in a previous thread[1]. In this mode, instead of LF, each
> > object is delimited with two NUL bytes and any object metadata is
> > separated with a single NUL byte. Examples:
> > 
> >         <oid> NUL NUL
> >         <oid> [NUL <path>] NUL NUL
> >         ?<oid> [NUL <token>=<value>]... NUL NUL
> > 
> > In this mode, path and value info are printed as-is without any special
> > encoding or truncation.
> 
> I think this is a good direction, but I have two compatibility
> questions:
> 
>   1. What should "git rev-list -z --stdin" do? In most other programs
>      with a "-z" option it affects both input and output. I don't
>      particularly care about this case myself, but it will be hard to
>      change later. So we probably want to decide now.

I would lean into the direction of making "-z" change the format both
for stdin and stdout. That's what we do in most cases, and in those
cases where we didn't we came to regret it (git-cat-file(1)).

Patrick


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/4] rev-list: support delimiting objects with NUL bytes
  2025-03-10 19:28 ` [PATCH 3/4] rev-list: support delimiting objects with NUL bytes Justin Tobler
  2025-03-10 20:59   ` Junio C Hamano
@ 2025-03-12  7:50   ` Patrick Steinhardt
  2025-03-12 21:41     ` Justin Tobler
  1 sibling, 1 reply; 60+ messages in thread
From: Patrick Steinhardt @ 2025-03-12  7:50 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, christian.couder

On Mon, Mar 10, 2025 at 02:28:28PM -0500, Justin Tobler wrote:
> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index 04d9c893b5..86b3ce5806 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -757,6 +778,14 @@ int cmd_rev_list(int argc,
>  		usage(rev_list_usage);
>  
>  	}
> +
> +	if (nul_delim) {
> +		if (revs.graph || revs.verbose_header || show_disk_usage ||
> +		    info.show_timestamp || info.header_prefix || bisect_list ||
> +		    use_bitmap_index || revs.edge_hint || arg_missing_action)
> +			die(_("-z option used with unsupported option"));
> +	}
> +

Not sure whether it's worth it, but do we maybe want to add a comment
here that mentions that this isn't an inherent limitation, but rather
that the initial implementation simply didn't implement compatibility
with these options? This would explicitly keep the door open for any
future improvements in this area.

Patrick


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-12  7:42   ` Patrick Steinhardt
@ 2025-03-12 15:56     ` Junio C Hamano
  2025-03-13  7:46       ` Patrick Steinhardt
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2025-03-12 15:56 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Jeff King, Justin Tobler, git, christian.couder

Patrick Steinhardt <ps@pks.im> writes:

> On Tue, Mar 11, 2025 at 07:57:20PM -0400, Jeff King wrote:
>> On Mon, Mar 10, 2025 at 02:28:25PM -0500, Justin Tobler wrote:
>> 
>> > To make machine parsing easier, this series introduces a NUL-delimited
>> > output mode for git-rev-list(1) via a `-z` option following a suggestion
>> > from Junio in a previous thread[1]. In this mode, instead of LF, each
>> > object is delimited with two NUL bytes and any object metadata is
>> > separated with a single NUL byte. Examples:
>> > 
>> >         <oid> NUL NUL
>> >         <oid> [NUL <path>] NUL NUL
>> >         ?<oid> [NUL <token>=<value>]... NUL NUL
>> > 
>> > In this mode, path and value info are printed as-is without any special
>> > encoding or truncation.
>> 
>> I think this is a good direction, but I have two compatibility
>> questions:
>> 
>>   1. What should "git rev-list -z --stdin" do? In most other programs
>>      with a "-z" option it affects both input and output. I don't
>>      particularly care about this case myself, but it will be hard to
>>      change later. So we probably want to decide now.
>
> I would lean into the direction of making "-z" change the format both
> for stdin and stdout. That's what we do in most cases, and in those
> cases where we didn't we came to regret it (git-cat-file(1)).

I've seen "-Z", in addition to "-z", used to differentiate between
input and output in some commands.  If we are not going to do that,
I agree that making "-z" to affect both input and output is less
surprising than having to remember which side is still text.



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/4] rev-list: refactor early option parsing
  2025-03-10 20:54   ` Junio C Hamano
@ 2025-03-12 21:39     ` Justin Tobler
  0 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-12 21:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ps, christian.couder

On 25/03/10 01:54PM, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
> 
> > @@ -639,19 +640,15 @@ int cmd_rev_list(int argc,
> >  		if (!strcmp(arg, "--exclude-promisor-objects")) {
> >  			fetch_if_missing = 0;
> >  			revs.exclude_promisor_objects = 1;
> > -			break;
> > -		}
> > -	}
> > -	for (i = 1; i < argc; i++) {
> > -		const char *arg = argv[i];
> > -		if (skip_prefix(arg, "--missing=", &arg)) {
> > -			if (revs.exclude_promisor_objects)
> > -				die(_("options '%s' and '%s' cannot be used together"), "--exclude-promisor-objects", "--missing");
> > -			if (parse_missing_action_value(arg))
> > -				break;
> > +		} else if (skip_prefix(arg, "--missing=", &arg)) {
> > +			parse_missing_action_value(arg);
> >  		}
> >  	}
> 
> There is a huge NEEDSWORK comment that essentially says that the
> above two loops that this patch combines into one is fundamentally
> broken.  I suspect that the remaining two patches in this series
> would punt and not improve them, but offhand I am not sure if this
> change is making it harder to fix them properly easier or harder.

My understanding here is that `setup_revisions()` does not provide a
mechanism to reject certain individual or combinations of parsed options
that do not make sense for a command. As you mentioned, this patch is
just punting on that issue, but I don't think it is really making the
problem any harder to be resolved in the future.

In this case, the `-z` option is parsed early to avoid is being handled
by the diff options parsing in `setup_revisions()`. From the perspective
of git-rev-list(1), I don't think there is any reason to be parsing diff
options at all. We could introduce an option to disable diff options
parsing in `setup_revisions()`, but now that we also want to modify
stdin argument parsing to be NUL-delimited, we would still have to parse
the -z option early in order to configure `setup_revisions()`
appropriately.

Another way we could potentially resolve these issues would be to
optionally separate the argument/option parsing logic out of
`setup_revisions()` and expose it directly. That could give the caller
more control over how options are handled before fully setting up the
`rev_info` structure. I suspect this would be non-trivial though, so I
think it would preferrable to keep the current approach for now as it
isn't making the problem harder to fix IMO.

-Justin


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/4] rev-list: support delimiting objects with NUL bytes
  2025-03-10 20:59   ` Junio C Hamano
@ 2025-03-12 21:39     ` Justin Tobler
  0 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-12 21:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ps, christian.couder

On 25/03/10 01:59PM, Junio C Hamano wrote:
> > +static int nul_delim;
> >  static int show_disk_usage;
> >  static off_t total_disk_usage;
> >  static int human_readable;
> >  
> > +static void print_object_term(int nul_delim)
> > +{
> > +	char line_sep = '\n';
> > +
> > +	if (nul_delim)
> > +		line_sep = '\0';
> > +
> > +	putchar(line_sep);
> > +	if (nul_delim)
> > +		putchar(line_sep);
> > +}
> 
> This looks, to put it mildly, strange.  The concept of the line
> delimiter byte (which can take any single byte) is wider than having
> a NUL as the line delimiter byte.  Why would we even want both?

This is a fair point.

There are scerarios where the printed object format varies more than
just the delimiter used. For example, in the normal output mode printed
object paths are truncated if they contain a newline. In the
NUL-delimited mode we want to print the complete path. Similarly,
missing object paths are c-quoted if they contain SP or LF characters.
These should also be printed as-is when in the NUL-delimited mode.

Due to branching behavior, I initially thought it would be easier to
follow if there was separate variable that signaled printing behavior,
but as you mentioned, we could also just use the global to hold the line
terminator being used and check if it is set to NUL when there is
conditional behavior.

> IOW, wouldn't it make more sense to have line_delim as the global
> (or per-invocation parameter to this function) and have
> print_object_term() just use it?  If you want to make it behave
> differently only when line-delimter is NUL (which I do not
> recommend), you can switch on the value of line_delimiter being NUL.
> So I do not see a merit in having two separate variables (except for
> confusing future readers).

In the next version, I'll use a global to hold the configured line
delimiter. I think it also makes sense to have a separate global
variable for the metadata delimiter since in normal mode, it is usually
a SP character. When the -z option is detected, we can set both of these
to a NUL byte.

-Justin


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/4] rev-list: support delimiting objects with NUL bytes
  2025-03-12  7:50   ` Patrick Steinhardt
@ 2025-03-12 21:41     ` Justin Tobler
  0 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-12 21:41 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, christian.couder

On 25/03/12 08:50AM, Patrick Steinhardt wrote:
> On Mon, Mar 10, 2025 at 02:28:28PM -0500, Justin Tobler wrote:
> > diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> > index 04d9c893b5..86b3ce5806 100644
> > --- a/builtin/rev-list.c
> > +++ b/builtin/rev-list.c
> > @@ -757,6 +778,14 @@ int cmd_rev_list(int argc,
> >  		usage(rev_list_usage);
> >  
> >  	}
> > +
> > +	if (nul_delim) {
> > +		if (revs.graph || revs.verbose_header || show_disk_usage ||
> > +		    info.show_timestamp || info.header_prefix || bisect_list ||
> > +		    use_bitmap_index || revs.edge_hint || arg_missing_action)
> > +			die(_("-z option used with unsupported option"));
> > +	}
> > +
> 
> Not sure whether it's worth it, but do we maybe want to add a comment
> here that mentions that this isn't an inherent limitation, but rather
> that the initial implementation simply didn't implement compatibility
> with these options? This would explicitly keep the door open for any
> future improvements in this area.

That's fair. I'll mention this is a comment that way we know support
NUL-delimited mode support can be added for some of the options in the
future.

Thanks
-Justin


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-12  7:37       ` Patrick Steinhardt
@ 2025-03-12 21:45         ` Justin Tobler
  0 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-12 21:45 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Junio C Hamano, git, christian.couder

On 25/03/12 08:37AM, Patrick Steinhardt wrote:
> On Tue, Mar 11, 2025 at 04:44:10PM -0700, Junio C Hamano wrote:
> > Justin Tobler <jltobler@gmail.com> writes:
> > 
> > >>  (4) The traditional "rev-list --missing" output loses the leading
> > >>      "?"; it is replaced by "missing" as the <attr> name, i.e. such
> > >>      a record may look like "<oid> NUL missing=yes NUL..." together
> > >>      with other "<token>=<value> NUL" pairs appended as needed at
> > >>      the end.
> > >
> > > I think this is good. Instead of prefixing missing OIDs with '?', we can
> > > just append another token/value pair `missing=yes`.
> > 
> > And we may want to avoid excessive bloat of the output that is not
> > primarily meant for human consumption, in which case perhaps a
> > common things like "missing" and "path" can be given a shorter
> > token, perhaps like "m=yes" and "p=Makefile"?
> 
> I would prefer to keep longer paths here:
> 
>   - While the output typically shouldn't be seen by a human, the code
>     handling it both on the printing and on the receiving side would.
>     And there it helps the reader quite a bit to get additional context
>     rather than cryptic single-letter abbreviations that one would have
>     to always look up.
> 
>   - By keeping with long names we also avoid "letter squatting" when two
>     attributes of an object start with the same letter.
> 
>   - I doubt that the couple of extra characters is really going to
>     matter much given that most of the time will most likely be spent
>     reading and decompressing the objects from disk anyway.

I also would prefer to keep the longer token names here. I don't think
the extra bytes should be much of a concern and it allows us to avoid
having to conditionally change the printed token name when in the
NUL-delimited mode for missing object metadata.

-Justin


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-11 23:57 ` Jeff King
  2025-03-12  7:42   ` Patrick Steinhardt
@ 2025-03-12 22:09   ` Justin Tobler
  2025-03-13  5:33     ` Jeff King
  1 sibling, 1 reply; 60+ messages in thread
From: Justin Tobler @ 2025-03-12 22:09 UTC (permalink / raw)
  To: Jeff King; +Cc: git, ps, christian.couder

On 25/03/11 07:57PM, Jeff King wrote:
> On Mon, Mar 10, 2025 at 02:28:25PM -0500, Justin Tobler wrote:
> 
> > To make machine parsing easier, this series introduces a NUL-delimited
> > output mode for git-rev-list(1) via a `-z` option following a suggestion
> > from Junio in a previous thread[1]. In this mode, instead of LF, each
> > object is delimited with two NUL bytes and any object metadata is
> > separated with a single NUL byte. Examples:
> > 
> >         <oid> NUL NUL
> >         <oid> [NUL <path>] NUL NUL
> >         ?<oid> [NUL <token>=<value>]... NUL NUL
> > 
> > In this mode, path and value info are printed as-is without any special
> > encoding or truncation.
> 
> I think this is a good direction, but I have two compatibility
> questions:
> 
>   1. What should "git rev-list -z --stdin" do? In most other programs
>      with a "-z" option it affects both input and output. I don't
>      particularly care about this case myself, but it will be hard to
>      change later. So we probably want to decide now.

As others suggested in this thread, in the next version I'll make
revision and pathspec parsing on stdin also be NUL-delimited when -z is
used.

>   2. I was a little surprised that rev-list already takes a "-z" option,
>      but it doesn't seem to do anything. I guess it's probably picked up
>      via diff_opt_parse(), though (I think) you can't actually trigger a
>      diff via rev-list itself. So even though this is a change in
>      behavior, probably nobody would have been using it until now?

This is correct. Technically git-rev-list(1) accepts all the common diff
options because it invokes `setup_revisions()`. From my understanding it
isn't possible to trigger diffs so I think parsing diff options is
unnecessary to begin with. As it also isn't a documented option, I
figured -z being an accepted option for git-rev-list(1) is largely an
unintended consequence of it using `setup_revisions()` and should be
fine to use.

>      If it is possible to see some effect from "-z" now (I didn't dig
>      very far), then it may be better to continue to let the diff
>      options parser handle it, and simply pick the result out of
>      revs.diffopt.line_termination. As your patch 3 is written, I think
>      the diff code probably doesn't see it anymore at all.

As currently implemented, the early parsing of -z doesn't effect the
diff options parsing in `setup_revisions()`. The early parsing doesn't
remove the option and thus it continues to be set in the diff options.

Furthermore, revision and pathspec argument parsing is all handled in
`setup_revisions()` so if we want to NUL-delimit arguments parsed on
stdin with -z, we would still need to parse the option early anyway. I
think it should be fine to leave the early -z option parsing as-is.

Thanks,
-Justin


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v2 0/6] rev-list: introduce NUL-delimited output mode
  2025-03-10 19:28 [PATCH 0/4] rev-list: introduce NUL-delimited output mode Justin Tobler
                   ` (6 preceding siblings ...)
  2025-03-11 23:57 ` Jeff King
@ 2025-03-13  0:17 ` Justin Tobler
  2025-03-13  0:17   ` [PATCH v2 1/6] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
                     ` (6 more replies)
  7 siblings, 7 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-13  0:17 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler

When walking objects, git-rev-list(1) prints each object entry on a
separate line in the form:

        <oid> LF

Some options, such as `--objects`, may print additional information
about the object on the same line:

        <oid> SP [<path>] LF

In this mode, if the object path contains a newline it is truncated at
the newline.

The `--boundary` option also modifies output by prefixing boundary
objects with `-`:

        -<oid> LF

When the `--missing={print,print-info}` option is provided, information
about any missing objects encountered during the object walk are also
printed in the form:

        ?<oid> [SP <token>=<value>]... LF

where values containing LF or SP are printed in a token specific fashion
so that the resulting encoded value does not contain either of these two
problematic bytes. For example, missing object paths are quoted in the C
style when they contain LF or SP.

To make machine parsing easier, this series introduces a NUL-delimited
output mode for git-rev-list(1) via a `-z` option. In this mode, the
output format for object records is unified such that each object and
its accompanying metadata is formatted without relying on object
metadata order. This format follows the existing `<token>=<value>` used
by the `--missing` option to represent object metadata in the form:

        <oid> NUL [<token>=<value> NUL]...

        # Examples
        <oid> LF                       -> <oid> NUL
        <oid> SP <path> LF             -> <oid> NUL path=<path> NUL
        -<oid> LF                      -> <oid> NUL boundary=yes NUL
        ?<oid> [SP <token>=<value>]... -> <oid> NUL missing=yes NUL [<token>=<value> NUL]...

Note that token value info is printed as-is without any special encoding
or truncation. Prefixes such as '-' and '?' are dropped in favor using a
token/value pair to signal the same information.

While in this mode, if the `--sdtin` option is used, revision and
pathspec arguments read from stdin are separated with a NUL byte instead
of being newline delimited.

For now this series only adds support for use with the `--objects`,
`--boundary` and `--missing` output options. Usage of `-z` with other
options is rejected, so it can potentially be added in the future.

This series is structured as follows:

        - Patches 1 and 2 do some minor preparatory refactors.

        - Patch 3 modifies stdin argument parsing handled by
          `setup_revisions()` to support NUL-delimited arguments.

        - Patch 4 adds the `-z` option to git-rev-list(1) to print
          objects in a NUL-delimited fashion. Arguments parsed on stdin
          while in the mode are also NUL-delimited.

        - Patch 5 teaches the `--boundary` option how to print info in a
          NUL-delimited fashino using the unified output format.

        - Patch 6 teaches the `--missing` option how to print info in a
          NUL-delimited fashion using the unified output format.

Changes since V1:

        - Use unified output format with `<token>=<value>` pairs for
          all object metadata.

        - Add support for the `--boundary` option in NUL-delimited mode.

        - Add support for NUL-delimited stdin argument parsing in
          NUL-delimited mode.

        - Instead of using two NUL bytes to delimit between object
          records, a single NUL byte is used. Now that object metadata
          is always in the form `<token>=<value>`, we know a new object
          record starts when there is an OID entry which will not
          contain '='.

Thanks for taking a look,
-Justin

Justin Tobler (6):
  rev-list: inline `show_object_with_name()` in `show_object()`
  rev-list: refactor early option parsing
  revision: support NUL-delimited --stdin mode
  rev-list: support delimiting objects with NUL bytes
  rev-list: support NUL-delimited --boundary option
  rev-list: support NUL-delimited --missing option

 Documentation/rev-list-options.adoc | 26 ++++++++
 builtin/rev-list.c                  | 92 ++++++++++++++++++++++-------
 revision.c                          | 27 ++++-----
 revision.h                          |  5 +-
 t/t6000-rev-list-misc.sh            | 51 ++++++++++++++++
 t/t6017-rev-list-stdin.sh           |  9 +++
 t/t6022-rev-list-missing.sh         | 31 ++++++++++
 7 files changed, 200 insertions(+), 41 deletions(-)

Range-diff against v1:
1:  d2eded3ac7 = 1:  d2eded3ac7 rev-list: inline `show_object_with_name()` in `show_object()`
2:  03cd08c859 = 2:  03cd08c859 rev-list: refactor early option parsing
-:  ---------- > 3:  803a49933a revision: support NUL-delimited --stdin mode
3:  41c5cb7737 ! 4:  d3b3c4ef89 rev-list: support delimiting objects with NUL bytes
    @@ Commit message
         newline are also truncated at the newline.
     
         Introduce the `-z` option for git-rev-list(1) which reformats the output
    -    to use NUL-delimiters between objects and associated info. Each object
    -    line uses two NUL bytes to indicate the end of an object entry and a
    -    single NUL byte to delimit between object information in the following
    -    form:
    +    to use NUL-delimiters between objects and associated info in the
    +    following form:
     
                 $ git rev-list -z --objects <rev>
    -            <oid> [NUL <path>] NUL NUL
    +            <oid> NUL [path=<path> NUL]
     
    -    For now, the `--objects` flag is the only option that can be used in
    -    combination with `-z`. In this mode, the object path is not truncated at
    -    newlines. In a subsequent commit, NUL-delimiter support for other
    -    options is added. Other options that do not make sense with be used in
    -    combination with `-z` are rejected.
    +    In this form, the start of each record is signaled by an OID entry that
    +    is all hexidecimal and does not contain any '='. Additional path info
    +    from `--objects` is appended to the record as a token/value pair
    +    `path=<path>` as-is without any truncation.
    +
    +    In this mode, revision and pathspec arguments provided on stdin with the
    +    `--stdin` option are also separated by a NUL byte instead of being
    +    newline delimited.
    +
    +    For now, the `--objects` and `--stdin` flag are the only options that
    +    can be used in combination with `-z`. In a subsequent commit,
    +    NUL-delimited support for other options is added. Other options that do
    +    not make sense with be used in combination with `-z` are rejected.
     
         Signed-off-by: Justin Tobler <jltobler@gmail.com>
     
    @@ Documentation/rev-list-options.adoc: ifdef::git-rev-list[]
      	`<header>` text will be printed with each progress update.
     +
     +-z::
    -+	Instead of being newline-delimited, each outputted object is delimited
    -+	with two NUL bytes in the following form:
    ++	Instead of being newline-delimited, each outputted object and its
    ++	accompanying metadata is delimited using NUL bytes in the following
    ++	form:
     ++
     +-----------------------------------------------------------------------
    -+<OID> NUL NUL
    ++<OID> NUL [<token>=<value> NUL]...
     +-----------------------------------------------------------------------
     ++
    -+When the `--objects` option is also present, available object name information
    -+is printed in the following form without any truncation for object names
    -+containing newline characters:
    ++Additional object metadata, such as object paths, is printed using the
    ++`<token>=<value>` form. Token values are printed as-is without any
    ++encoding/truncation. An OID entry never contains a '=' character and thus
    ++is used to signal the start of a new object record. Examples:
     ++
     +-----------------------------------------------------------------------
    -+<OID> [NUL <object-name>] NUL NUL
    ++<OID> NUL
    ++<OID> NUL path=<path> NUL
     +-----------------------------------------------------------------------
     ++
    -+This option is only compatible with `--objects`.
    ++This mode is only compatible with the `--objects` output option. Also, revision
    ++and pathspec argument parsing on stdin with the `--stdin` option is NUL byte
    ++delimited instead of using newlines while in this mode.
      endif::git-rev-list[]
      
      History Simplification
    @@ builtin/rev-list.c: static int arg_show_object_names = 1;
      
      #define DEFAULT_OIDSET_SIZE     (16*1024)
      
    -+static int nul_delim;
    ++static char line_term = '\n';
    ++static char info_term = ' ';
    ++
      static int show_disk_usage;
      static off_t total_disk_usage;
      static int human_readable;
    - 
    -+static void print_object_term(int nul_delim)
    -+{
    -+	char line_sep = '\n';
    -+
    -+	if (nul_delim)
    -+		line_sep = '\0';
    -+
    -+	putchar(line_sep);
    -+	if (nul_delim)
    -+		putchar(line_sep);
    -+}
    -+
    - static off_t get_object_disk_usage(struct object *obj)
    - {
    - 	off_t size;
     @@ builtin/rev-list.c: static void show_commit(struct commit *commit, void *data)
      	if (revs->commit_format == CMIT_FMT_ONELINE)
      		putchar(' ');
      	else if (revs->include_header)
     -		putchar('\n');
    -+		print_object_term(nul_delim);
    ++		putchar(line_term);
      
      	if (revs->verbose_header) {
      		struct strbuf buf = STRBUF_INIT;
    @@ builtin/rev-list.c: static void show_object(struct object *obj, const char *name
     -		putchar(' ');
     -		for (const char *p = name; *p && *p != '\n'; p++)
     -			putchar(*p);
    -+		if (nul_delim && *name) {
    -+			putchar('\0');
    -+			printf("%s", name);
    -+		} else if (!nul_delim) {
    -+			putchar(' ');
    ++		if (line_term) {
    ++			putchar(info_term);
     +			for (const char *p = name; *p && *p != '\n'; p++)
     +				putchar(*p);
    ++		} else if (*name) {
    ++			printf("%cpath=%s", info_term, name);
     +		}
      	}
      
     -	putchar('\n');
    -+	print_object_term(nul_delim);
    ++	putchar(line_term);
      }
      
      static void show_edge(struct commit *commit)
    @@ builtin/rev-list.c: int cmd_rev_list(int argc,
      		} else if (skip_prefix(arg, "--missing=", &arg)) {
      			parse_missing_action_value(arg);
     +		} else if (!strcmp(arg, "-z")) {
    -+			nul_delim = 1;
    ++			s_r_opt.nul_delim_stdin = 1;
    ++			line_term = '\0';
    ++			info_term = '\0';
      		}
      	}
      
    @@ builtin/rev-list.c: int cmd_rev_list(int argc,
      
      	}
     +
    -+	if (nul_delim) {
    ++	/*
    ++	 * Reject options currently incompatible with -z. For some options, this
    ++	 * is not an inherent limitation and support may be implemented in the
    ++	 * future.
    ++	 */
    ++	if (!line_term) {
     +		if (revs.graph || revs.verbose_header || show_disk_usage ||
     +		    info.show_timestamp || info.header_prefix || bisect_list ||
    -+		    use_bitmap_index || revs.edge_hint || arg_missing_action)
    ++		    use_bitmap_index || revs.edge_hint || revs.left_right ||
    ++		    revs.cherry_mark || arg_missing_action || revs.boundary)
     +			die(_("-z option used with unsupported option"));
     +	}
     +
    @@ t/t6000-rev-list-misc.sh: test_expect_success 'rev-list --unpacked' '
     +	oid1=$(git -C repo rev-parse HEAD) &&
     +	oid2=$(git -C repo rev-parse HEAD~) &&
     +
    -+	printf "%s\0\0%s\0\0" "$oid1" "$oid2" >expect &&
    ++	printf "%s\0%s\0" "$oid1" "$oid2" >expect &&
     +	git -C repo rev-list -z HEAD >actual &&
     +
     +	test_cmp expect actual
    @@ t/t6000-rev-list-misc.sh: test_expect_success 'rev-list --unpacked' '
     +	path1=1.t &&
     +	path2=2.t &&
     +
    -+	printf "%s\0%s\0\0%s\0%s\0\0" "$oid1" "$path1" "$oid2" "$path2" >expect &&
    ++	printf "%s\0path=%s\0%s\0path=%s\0" "$oid1" "$path1" "$oid2" "$path2" \
    ++		>expect &&
     +	git -C repo rev-list -z --objects HEAD:1.t HEAD:2.t >actual &&
     +
     +	test_cmp expect actual
     +'
     +
      test_done
    +
    + ## t/t6017-rev-list-stdin.sh ##
    +@@ t/t6017-rev-list-stdin.sh: test_expect_success '--not via stdin does not influence revisions from command l
    + 	test_cmp expect actual
    + '
    + 
    ++test_expect_success 'NUL-delimited stdin' '
    ++	printf "%s\0%s\0%s\0" "HEAD" "--" "file-1" > input &&
    ++
    ++	git rev-list -z --objects HEAD -- file-1 >expect &&
    ++	git rev-list -z --objects --stdin <input >actual &&
    ++
    ++	test_cmp expect actual
    ++'
    ++
    + test_done
-:  ---------- > 5:  5e4fc41976 rev-list: support NUL-delimited --boundary option
4:  007adbac25 ! 6:  7744966514 rev-list: support NUL-delimited --missing option
    @@ Commit message
         rev-list: support NUL-delimited --missing option
     
         The `--missing={print,print-info}` option for git-rev-list(1) prints
    -    missing objects found while performing the revision walk. Add support
    -    for printing missing objects in a NUL-delimited format when the `-z`
    -    option is enabled.
    +    missing objects found while performing the object walk in the form:
    +
    +            $ git rev-list --missing=print-info <rev>
    +            ?<oid> [SP <token>=<value>]... LF
    +
    +    Add support for printing missing objects in a NUL-delimited format when
    +    the `-z` option is enabled.
     
                 $ git rev-list -z --missing=print-info <rev>
    -            <oid> NUL NUL
    -            ?<oid> [NUL <token>=<value>]... NUL NUL
    +            <oid> NUL missing=yes NUL [<token>=<value> NUL]...
     
         In this mode, values containing special characters or spaces are printed
    -    as-is without being escaped or quoted.
    +    as-is without being escaped or quoted. Instead of prefixing the missing
    +    OID with '?', a separate `missing=yes` token/value pair is appended.
     
         Signed-off-by: Justin Tobler <jltobler@gmail.com>
     
      ## Documentation/rev-list-options.adoc ##
    -@@ Documentation/rev-list-options.adoc: containing newline characters:
    - <OID> [NUL <object-name>] NUL NUL
    +@@ Documentation/rev-list-options.adoc: ifdef::git-rev-list[]
    + <OID> NUL [<token>=<value> NUL]...
    + -----------------------------------------------------------------------
    + +
    +-Additional object metadata, such as object paths or boundary objects, is
    +-printed using the `<token>=<value>` form. Token values are printed as-is
    ++Additional object metadata, such as object paths or boundary/missing objects,
    ++is printed using the `<token>=<value>` form. Token values are printed as-is
    + without any encoding/truncation. An OID entry never contains a '=' character
    + and thus is used to signal the start of a new object record. Examples:
    + +
    +@@ Documentation/rev-list-options.adoc: and thus is used to signal the start of a new object record. Examples:
    + <OID> NUL
    + <OID> NUL path=<path> NUL
    + <OID> NUL boundary=yes NUL
    ++<OID> NUL missing=yes NUL [<token>=<value> NUL]...
      -----------------------------------------------------------------------
      +
    --This option is only compatible with `--objects`.
    -+When the `--missing` option is provided, missing objects are printed in the
    -+following form where value is printed as-is without any token specific
    -+encoding:
    -++
    -+-----------------------------------------------------------------------
    -+?<OID> [NUL <token>=<value>]... NUL NUL
    -+-----------------------------------------------------------------------
    -++
    -+This option is only compatible with `--objects` and `--missing`.
    +-This mode is only compatible with the `--objects` and `--boundary` output
    +-options. Also, revision and pathspec argument parsing on stdin with the
    +-`--stdin` option is NUL byte delimited instead of using newlines while in this
    +-mode.
    ++This mode is only compatible with the `--objects`, `--boundary`, and
    ++`--missing` output options. Also, revision and pathspec argument parsing on
    ++stdin with the `--stdin` option is NUL byte delimited instead of using newlines
    ++while in this mode.
      endif::git-rev-list[]
      
      History Simplification
     
      ## builtin/rev-list.c ##
     @@ builtin/rev-list.c: static void print_missing_object(struct missing_objects_map_entry *entry,
    - 				 int print_missing_info)
      {
      	struct strbuf sb = STRBUF_INIT;
    -+	char info_sep = ' ';
    + 
    ++	if (line_term)
    ++		putchar('?');
     +
    -+	if (nul_delim)
    -+		info_sep = '\0';
    ++	printf("%s", oid_to_hex(&entry->entry.oid));
    ++
    ++	if (!line_term)
    ++		printf("%cmissing=yes", info_term);
     +
    -+	printf("?%s", oid_to_hex(&entry->entry.oid));
    - 
      	if (!print_missing_info) {
     -		printf("?%s\n", oid_to_hex(&entry->entry.oid));
    -+		print_object_term(nul_delim);
    ++		putchar(line_term);
      		return;
      	}
      
    @@ builtin/rev-list.c: static void print_missing_object(struct missing_objects_map_
     -		strbuf_addstr(&sb, " path=");
     -		quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP);
     -		strbuf_addbuf(&sb, &path);
    -+		strbuf_addf(&sb, "%cpath=", info_sep);
    ++		strbuf_addf(&sb, "%cpath=", info_term);
     +
    -+		if (nul_delim) {
    -+			strbuf_addstr(&sb, entry->path);
    -+		} else {
    ++		if (line_term) {
     +			quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP);
     +			strbuf_addbuf(&sb, &path);
    ++		} else {
    ++			strbuf_addstr(&sb, entry->path);
     +		}
      
      		strbuf_release(&path);
      	}
      	if (entry->type)
     -		strbuf_addf(&sb, " type=%s", type_name(entry->type));
    -+		strbuf_addf(&sb, "%ctype=%s", info_sep, type_name(entry->type));
    ++		strbuf_addf(&sb, "%ctype=%s", info_term, type_name(entry->type));
     +
     +	fwrite(sb.buf, sizeof(char), sb.len, stdout);
    -+	print_object_term(nul_delim);
    ++	putchar(line_term);
      
     -	printf("?%s%s\n", oid_to_hex(&entry->entry.oid), sb.buf);
      	strbuf_release(&sb);
      }
      
     @@ builtin/rev-list.c: int cmd_rev_list(int argc,
    - 	if (nul_delim) {
      		if (revs.graph || revs.verbose_header || show_disk_usage ||
      		    info.show_timestamp || info.header_prefix || bisect_list ||
    --		    use_bitmap_index || revs.edge_hint || arg_missing_action)
    -+		    use_bitmap_index || revs.edge_hint)
    + 		    use_bitmap_index || revs.edge_hint || revs.left_right ||
    +-		    revs.cherry_mark || arg_missing_action)
    ++		    revs.cherry_mark)
      			die(_("-z option used with unsupported option"));
      	}
      
    @@ t/t6022-rev-list-missing.sh: do
     +
     +		git rev-list -z --objects --no-object-names \
     +			HEAD ^"$oid" >expect &&
    -+		printf "?%s\0path=%s\0type=%s\0\0" "$oid" "$path" "$type" >>expect &&
    ++		printf "%s\0missing=yes\0path=%s\0type=%s\0" "$oid" "$path" \
    ++			"$type" >>expect &&
     +
     +		mv "$obj_path" "$obj_path.hidden" &&
     +		git rev-list -z --objects --no-object-names \

base-commit: 87a0bdbf0f72b7561f3cd50636eee33dcb7dbcc3
-- 
2.49.0.rc2



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v2 1/6] rev-list: inline `show_object_with_name()` in `show_object()`
  2025-03-13  0:17 ` [PATCH v2 0/6] " Justin Tobler
@ 2025-03-13  0:17   ` Justin Tobler
  2025-03-13  0:17   ` [PATCH v2 2/6] rev-list: refactor early option parsing Justin Tobler
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-13  0:17 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler

The `show_object_with_name()` function only has a single call site.
Inline call to `show_object_with_name()` in `show_object()` so the
explicit function can be cleaned up and live closer to where it is used.
While at it, factor out the code that prints the OID and newline for
both objects with and without a name. In a subsequent commit,
`show_object()` is modified to support printing object information in a
NUL-delimited format.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/rev-list.c | 13 +++++++++----
 revision.c         |  8 --------
 revision.h         |  2 --
 3 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index bb26bee0d4..dcd079c16c 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -357,10 +357,15 @@ static void show_object(struct object *obj, const char *name, void *cb_data)
 		return;
 	}
 
-	if (arg_show_object_names)
-		show_object_with_name(stdout, obj, name);
-	else
-		printf("%s\n", oid_to_hex(&obj->oid));
+	printf("%s", oid_to_hex(&obj->oid));
+
+	if (arg_show_object_names) {
+		putchar(' ');
+		for (const char *p = name; *p && *p != '\n'; p++)
+			putchar(*p);
+	}
+
+	putchar('\n');
 }
 
 static void show_edge(struct commit *commit)
diff --git a/revision.c b/revision.c
index c4390f0938..0eaebe4478 100644
--- a/revision.c
+++ b/revision.c
@@ -59,14 +59,6 @@ implement_shared_commit_slab(revision_sources, char *);
 
 static inline int want_ancestry(const struct rev_info *revs);
 
-void show_object_with_name(FILE *out, struct object *obj, const char *name)
-{
-	fprintf(out, "%s ", oid_to_hex(&obj->oid));
-	for (const char *p = name; *p && *p != '\n'; p++)
-		fputc(*p, out);
-	fputc('\n', out);
-}
-
 static void mark_blob_uninteresting(struct blob *blob)
 {
 	if (!blob)
diff --git a/revision.h b/revision.h
index 71e984c452..21c6a69899 100644
--- a/revision.h
+++ b/revision.h
@@ -489,8 +489,6 @@ void mark_parents_uninteresting(struct rev_info *revs, struct commit *commit);
 void mark_tree_uninteresting(struct repository *r, struct tree *tree);
 void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *trees);
 
-void show_object_with_name(FILE *, struct object *, const char *);
-
 /**
  * Helpers to check if a reference should be excluded.
  */
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v2 2/6] rev-list: refactor early option parsing
  2025-03-13  0:17 ` [PATCH v2 0/6] " Justin Tobler
  2025-03-13  0:17   ` [PATCH v2 1/6] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
@ 2025-03-13  0:17   ` Justin Tobler
  2025-03-13  0:17   ` [PATCH v2 3/6] revision: support NUL-delimited --stdin mode Justin Tobler
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-13  0:17 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler

Before invoking `setup_revisions()`, the `--missing` and
`--exclude-promisor-objects` options are parsed early. In a subsequent
commit, another option is added that must be parsed early.

Refactor the code to parse both options in a single early pass.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/rev-list.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index dcd079c16c..04d9c893b5 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -16,6 +16,7 @@
 #include "object-file.h"
 #include "object-store-ll.h"
 #include "pack-bitmap.h"
+#include "parse-options.h"
 #include "log-tree.h"
 #include "graph.h"
 #include "bisect.h"
@@ -639,19 +640,15 @@ int cmd_rev_list(int argc,
 		if (!strcmp(arg, "--exclude-promisor-objects")) {
 			fetch_if_missing = 0;
 			revs.exclude_promisor_objects = 1;
-			break;
-		}
-	}
-	for (i = 1; i < argc; i++) {
-		const char *arg = argv[i];
-		if (skip_prefix(arg, "--missing=", &arg)) {
-			if (revs.exclude_promisor_objects)
-				die(_("options '%s' and '%s' cannot be used together"), "--exclude-promisor-objects", "--missing");
-			if (parse_missing_action_value(arg))
-				break;
+		} else if (skip_prefix(arg, "--missing=", &arg)) {
+			parse_missing_action_value(arg);
 		}
 	}
 
+	die_for_incompatible_opt2(revs.exclude_promisor_objects,
+				  "--exclude_promisor_objects",
+				  arg_missing_action, "--missing");
+
 	if (arg_missing_action)
 		revs.do_not_die_on_missing_objects = 1;
 
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v2 3/6] revision: support NUL-delimited --stdin mode
  2025-03-13  0:17 ` [PATCH v2 0/6] " Justin Tobler
  2025-03-13  0:17   ` [PATCH v2 1/6] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
  2025-03-13  0:17   ` [PATCH v2 2/6] rev-list: refactor early option parsing Justin Tobler
@ 2025-03-13  0:17   ` Justin Tobler
  2025-03-13  0:17   ` [PATCH v2 4/6] rev-list: support delimiting objects with NUL bytes Justin Tobler
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-13  0:17 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler

When `setup_revisions()` parses the `--stdin` option, revision and
pathspec arguments are read from stdin. Each line of input is handled as
a separate argument.

Introduce the `nul_delim_stdin` field to `setup_revision_opt` that, when
enabled, uses a NUL byte to delimit between stdin arguments instead of
newline.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 revision.c | 19 +++++++++++--------
 revision.h |  3 ++-
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/revision.c b/revision.c
index 0eaebe4478..5de6309830 100644
--- a/revision.c
+++ b/revision.c
@@ -2275,10 +2275,10 @@ int handle_revision_arg(const char *arg, struct rev_info *revs, int flags, unsig
 	return ret;
 }
 
-static void read_pathspec_from_stdin(struct strbuf *sb,
-				     struct strvec *prune)
+static void read_pathspec_from_stdin(struct strbuf *sb, struct strvec *prune,
+				     int line_term)
 {
-	while (strbuf_getline(sb, stdin) != EOF)
+	while (strbuf_getdelim_strip_crlf(sb, stdin, line_term) != EOF)
 		strvec_push(prune, sb->buf);
 }
 
@@ -2905,8 +2905,8 @@ static int handle_revision_pseudo_opt(struct rev_info *revs,
 	return 1;
 }
 
-static void read_revisions_from_stdin(struct rev_info *revs,
-				      struct strvec *prune)
+static void read_revisions_from_stdin(struct rev_info *revs, struct strvec *prune,
+				      int line_term)
 {
 	struct strbuf sb;
 	int seen_dashdash = 0;
@@ -2918,7 +2918,7 @@ static void read_revisions_from_stdin(struct rev_info *revs,
 	warn_on_object_refname_ambiguity = 0;
 
 	strbuf_init(&sb, 1000);
-	while (strbuf_getline(&sb, stdin) != EOF) {
+	while (strbuf_getdelim_strip_crlf(&sb, stdin, line_term) != EOF) {
 		if (!sb.len)
 			break;
 
@@ -2946,7 +2946,7 @@ static void read_revisions_from_stdin(struct rev_info *revs,
 			die("bad revision '%s'", sb.buf);
 	}
 	if (seen_dashdash)
-		read_pathspec_from_stdin(&sb, prune);
+		read_pathspec_from_stdin(&sb, prune, line_term);
 
 	strbuf_release(&sb);
 	warn_on_object_refname_ambiguity = save_warning;
@@ -3019,13 +3019,16 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 			}
 
 			if (!strcmp(arg, "--stdin")) {
+				int term = opt && opt->nul_delim_stdin ? '\0' : '\n';
+
 				if (revs->disable_stdin) {
 					argv[left++] = arg;
 					continue;
 				}
 				if (revs->read_from_stdin++)
 					die("--stdin given twice?");
-				read_revisions_from_stdin(revs, &prune_data);
+				read_revisions_from_stdin(revs, &prune_data,
+							  term);
 				continue;
 			}
 
diff --git a/revision.h b/revision.h
index 21c6a69899..0e680c3667 100644
--- a/revision.h
+++ b/revision.h
@@ -439,7 +439,8 @@ struct setup_revision_opt {
 	void (*tweak)(struct rev_info *);
 	unsigned int	assume_dashdash:1,
 			allow_exclude_promisor_objects:1,
-			free_removed_argv_elements:1;
+			free_removed_argv_elements:1,
+			nul_delim_stdin:1;
 	unsigned revarg_opt;
 };
 int setup_revisions(int argc, const char **argv, struct rev_info *revs,
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v2 4/6] rev-list: support delimiting objects with NUL bytes
  2025-03-13  0:17 ` [PATCH v2 0/6] " Justin Tobler
                     ` (2 preceding siblings ...)
  2025-03-13  0:17   ` [PATCH v2 3/6] revision: support NUL-delimited --stdin mode Justin Tobler
@ 2025-03-13  0:17   ` Justin Tobler
  2025-03-13 12:55     ` Patrick Steinhardt
  2025-03-13  0:17   ` [PATCH v2 5/6] rev-list: support NUL-delimited --boundary option Justin Tobler
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 60+ messages in thread
From: Justin Tobler @ 2025-03-13  0:17 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler

When walking objects, git-rev-list(1) prints each object entry on a
separate line. Some options, such as `--objects`, may print additional
information about tree and blob object on the same line in the form:

        $ git rev-list --objects <rev>
        <tree/blob oid> SP [<path>] LF

Note that in this form the SP is appended regardless of whether the tree
or blob object has path information available. Paths containing a
newline are also truncated at the newline.

Introduce the `-z` option for git-rev-list(1) which reformats the output
to use NUL-delimiters between objects and associated info in the
following form:

        $ git rev-list -z --objects <rev>
        <oid> NUL [path=<path> NUL]

In this form, the start of each record is signaled by an OID entry that
is all hexidecimal and does not contain any '='. Additional path info
from `--objects` is appended to the record as a token/value pair
`path=<path>` as-is without any truncation.

In this mode, revision and pathspec arguments provided on stdin with the
`--stdin` option are also separated by a NUL byte instead of being
newline delimited.

For now, the `--objects` and `--stdin` flag are the only options that
can be used in combination with `-z`. In a subsequent commit,
NUL-delimited support for other options is added. Other options that do
not make sense with be used in combination with `-z` are rejected.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/rev-list-options.adoc | 23 ++++++++++++++++++
 builtin/rev-list.c                  | 36 +++++++++++++++++++++++++----
 t/t6000-rev-list-misc.sh            | 35 ++++++++++++++++++++++++++++
 t/t6017-rev-list-stdin.sh           |  9 ++++++++
 4 files changed, 98 insertions(+), 5 deletions(-)

diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
index 785c0786e0..166d3cd19e 100644
--- a/Documentation/rev-list-options.adoc
+++ b/Documentation/rev-list-options.adoc
@@ -361,6 +361,29 @@ ifdef::git-rev-list[]
 --progress=<header>::
 	Show progress reports on stderr as objects are considered. The
 	`<header>` text will be printed with each progress update.
+
+-z::
+	Instead of being newline-delimited, each outputted object and its
+	accompanying metadata is delimited using NUL bytes in the following
+	form:
++
+-----------------------------------------------------------------------
+<OID> NUL [<token>=<value> NUL]...
+-----------------------------------------------------------------------
++
+Additional object metadata, such as object paths, is printed using the
+`<token>=<value>` form. Token values are printed as-is without any
+encoding/truncation. An OID entry never contains a '=' character and thus
+is used to signal the start of a new object record. Examples:
++
+-----------------------------------------------------------------------
+<OID> NUL
+<OID> NUL path=<path> NUL
+-----------------------------------------------------------------------
++
+This mode is only compatible with the `--objects` output option. Also, revision
+and pathspec argument parsing on stdin with the `--stdin` option is NUL byte
+delimited instead of using newlines while in this mode.
 endif::git-rev-list[]
 
 History Simplification
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 04d9c893b5..f048500679 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -65,6 +65,7 @@ static const char rev_list_usage[] =
 "    --abbrev-commit\n"
 "    --left-right\n"
 "    --count\n"
+"    -z\n"
 "  special purpose:\n"
 "    --bisect\n"
 "    --bisect-vars\n"
@@ -97,6 +98,9 @@ static int arg_show_object_names = 1;
 
 #define DEFAULT_OIDSET_SIZE     (16*1024)
 
+static char line_term = '\n';
+static char info_term = ' ';
+
 static int show_disk_usage;
 static off_t total_disk_usage;
 static int human_readable;
@@ -264,7 +268,7 @@ static void show_commit(struct commit *commit, void *data)
 	if (revs->commit_format == CMIT_FMT_ONELINE)
 		putchar(' ');
 	else if (revs->include_header)
-		putchar('\n');
+		putchar(line_term);
 
 	if (revs->verbose_header) {
 		struct strbuf buf = STRBUF_INIT;
@@ -361,12 +365,16 @@ static void show_object(struct object *obj, const char *name, void *cb_data)
 	printf("%s", oid_to_hex(&obj->oid));
 
 	if (arg_show_object_names) {
-		putchar(' ');
-		for (const char *p = name; *p && *p != '\n'; p++)
-			putchar(*p);
+		if (line_term) {
+			putchar(info_term);
+			for (const char *p = name; *p && *p != '\n'; p++)
+				putchar(*p);
+		} else if (*name) {
+			printf("%cpath=%s", info_term, name);
+		}
 	}
 
-	putchar('\n');
+	putchar(line_term);
 }
 
 static void show_edge(struct commit *commit)
@@ -642,6 +650,10 @@ int cmd_rev_list(int argc,
 			revs.exclude_promisor_objects = 1;
 		} else if (skip_prefix(arg, "--missing=", &arg)) {
 			parse_missing_action_value(arg);
+		} else if (!strcmp(arg, "-z")) {
+			s_r_opt.nul_delim_stdin = 1;
+			line_term = '\0';
+			info_term = '\0';
 		}
 	}
 
@@ -757,6 +769,20 @@ int cmd_rev_list(int argc,
 		usage(rev_list_usage);
 
 	}
+
+	/*
+	 * Reject options currently incompatible with -z. For some options, this
+	 * is not an inherent limitation and support may be implemented in the
+	 * future.
+	 */
+	if (!line_term) {
+		if (revs.graph || revs.verbose_header || show_disk_usage ||
+		    info.show_timestamp || info.header_prefix || bisect_list ||
+		    use_bitmap_index || revs.edge_hint || revs.left_right ||
+		    revs.cherry_mark || arg_missing_action || revs.boundary)
+			die(_("-z option used with unsupported option"));
+	}
+
 	if (revs.commit_format != CMIT_FMT_USERFORMAT)
 		revs.include_header = 1;
 	if (revs.commit_format != CMIT_FMT_UNSPECIFIED) {
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index 6289a2e8b0..dfbbc0aee6 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -182,4 +182,39 @@ test_expect_success 'rev-list --unpacked' '
 	test_cmp expect actual
 '
 
+test_expect_success 'rev-list -z' '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	test_commit -C repo 1 &&
+	test_commit -C repo 2 &&
+
+	oid1=$(git -C repo rev-parse HEAD) &&
+	oid2=$(git -C repo rev-parse HEAD~) &&
+
+	printf "%s\0%s\0" "$oid1" "$oid2" >expect &&
+	git -C repo rev-list -z HEAD >actual &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'rev-list -z --objects' '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	test_commit -C repo 1 &&
+	test_commit -C repo 2 &&
+
+	oid1=$(git -C repo rev-parse HEAD:1.t) &&
+	oid2=$(git -C repo rev-parse HEAD:2.t) &&
+	path1=1.t &&
+	path2=2.t &&
+
+	printf "%s\0path=%s\0%s\0path=%s\0" "$oid1" "$path1" "$oid2" "$path2" \
+		>expect &&
+	git -C repo rev-list -z --objects HEAD:1.t HEAD:2.t >actual &&
+
+	test_cmp expect actual
+'
+
 test_done
diff --git a/t/t6017-rev-list-stdin.sh b/t/t6017-rev-list-stdin.sh
index 4821b90e74..362a8b126a 100755
--- a/t/t6017-rev-list-stdin.sh
+++ b/t/t6017-rev-list-stdin.sh
@@ -148,4 +148,13 @@ test_expect_success '--not via stdin does not influence revisions from command l
 	test_cmp expect actual
 '
 
+test_expect_success 'NUL-delimited stdin' '
+	printf "%s\0%s\0%s\0" "HEAD" "--" "file-1" > input &&
+
+	git rev-list -z --objects HEAD -- file-1 >expect &&
+	git rev-list -z --objects --stdin <input >actual &&
+
+	test_cmp expect actual
+'
+
 test_done
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v2 5/6] rev-list: support NUL-delimited --boundary option
  2025-03-13  0:17 ` [PATCH v2 0/6] " Justin Tobler
                     ` (3 preceding siblings ...)
  2025-03-13  0:17   ` [PATCH v2 4/6] rev-list: support delimiting objects with NUL bytes Justin Tobler
@ 2025-03-13  0:17   ` Justin Tobler
  2025-03-13  0:17   ` [PATCH v2 6/6] rev-list: support NUL-delimited --missing option Justin Tobler
  2025-03-13 23:57   ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler
  6 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-13  0:17 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler

The `--boundary` option for git-rev-list(1) prints boundary objects
found while performing the object walk in the form:

        $ git rev-list --boundary <rev>
        -<oid> LF

Add support for printing boundary objects in a NUL-delimited format when
the `-z` option is enabled.

        $ git rev-list -z --boundary <rev>
        <oid> NUL boundary=yes NUL

In this mode, instead of prefixing the boundary OID with '-', a separate
`boundary=yes` token/value pair is appended.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/rev-list-options.adoc | 16 +++++++++-------
 builtin/rev-list.c                  |  9 +++++++--
 t/t6000-rev-list-misc.sh            | 16 ++++++++++++++++
 3 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
index 166d3cd19e..d400d76cf2 100644
--- a/Documentation/rev-list-options.adoc
+++ b/Documentation/rev-list-options.adoc
@@ -371,19 +371,21 @@ ifdef::git-rev-list[]
 <OID> NUL [<token>=<value> NUL]...
 -----------------------------------------------------------------------
 +
-Additional object metadata, such as object paths, is printed using the
-`<token>=<value>` form. Token values are printed as-is without any
-encoding/truncation. An OID entry never contains a '=' character and thus
-is used to signal the start of a new object record. Examples:
+Additional object metadata, such as object paths or boundary objects, is
+printed using the `<token>=<value>` form. Token values are printed as-is
+without any encoding/truncation. An OID entry never contains a '=' character
+and thus is used to signal the start of a new object record. Examples:
 +
 -----------------------------------------------------------------------
 <OID> NUL
 <OID> NUL path=<path> NUL
+<OID> NUL boundary=yes NUL
 -----------------------------------------------------------------------
 +
-This mode is only compatible with the `--objects` output option. Also, revision
-and pathspec argument parsing on stdin with the `--stdin` option is NUL byte
-delimited instead of using newlines while in this mode.
+This mode is only compatible with the `--objects` and `--boundary` output
+options. Also, revision and pathspec argument parsing on stdin with the
+`--stdin` option is NUL byte delimited instead of using newlines while in this
+mode.
 endif::git-rev-list[]
 
 History Simplification
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index f048500679..7c6d4b25b0 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -240,13 +240,18 @@ static void show_commit(struct commit *commit, void *data)
 		fputs(info->header_prefix, stdout);
 
 	if (revs->include_header) {
-		if (!revs->graph)
+		if (!revs->graph && line_term)
 			fputs(get_revision_mark(revs, commit), stdout);
 		if (revs->abbrev_commit && revs->abbrev)
 			fputs(repo_find_unique_abbrev(the_repository, &commit->object.oid, revs->abbrev),
 			      stdout);
 		else
 			fputs(oid_to_hex(&commit->object.oid), stdout);
+
+		if (!line_term) {
+			if (commit->object.flags & BOUNDARY)
+				printf("%cboundary=yes", info_term);
+		}
 	}
 	if (revs->print_parents) {
 		struct commit_list *parents = commit->parents;
@@ -779,7 +784,7 @@ int cmd_rev_list(int argc,
 		if (revs.graph || revs.verbose_header || show_disk_usage ||
 		    info.show_timestamp || info.header_prefix || bisect_list ||
 		    use_bitmap_index || revs.edge_hint || revs.left_right ||
-		    revs.cherry_mark || arg_missing_action || revs.boundary)
+		    revs.cherry_mark || arg_missing_action)
 			die(_("-z option used with unsupported option"));
 	}
 
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index dfbbc0aee6..349bf5ec3d 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -217,4 +217,20 @@ test_expect_success 'rev-list -z --objects' '
 	test_cmp expect actual
 '
 
+test_expect_success 'rev-list -z --boundary' '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	test_commit -C repo 1 &&
+	test_commit -C repo 2 &&
+
+	oid1=$(git -C repo rev-parse HEAD) &&
+	oid2=$(git -C repo rev-parse HEAD~) &&
+
+	printf "%s\0%s\0boundary=yes\0" "$oid1" "$oid2" >expect &&
+	git -C repo rev-list -z --boundary HEAD~.. >actual &&
+
+	test_cmp expect actual
+'
+
 test_done
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v2 6/6] rev-list: support NUL-delimited --missing option
  2025-03-13  0:17 ` [PATCH v2 0/6] " Justin Tobler
                     ` (4 preceding siblings ...)
  2025-03-13  0:17   ` [PATCH v2 5/6] rev-list: support NUL-delimited --boundary option Justin Tobler
@ 2025-03-13  0:17   ` Justin Tobler
  2025-03-13 12:55     ` Patrick Steinhardt
  2025-03-13 23:57   ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler
  6 siblings, 1 reply; 60+ messages in thread
From: Justin Tobler @ 2025-03-13  0:17 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler

The `--missing={print,print-info}` option for git-rev-list(1) prints
missing objects found while performing the object walk in the form:

        $ git rev-list --missing=print-info <rev>
        ?<oid> [SP <token>=<value>]... LF

Add support for printing missing objects in a NUL-delimited format when
the `-z` option is enabled.

        $ git rev-list -z --missing=print-info <rev>
        <oid> NUL missing=yes NUL [<token>=<value> NUL]...

In this mode, values containing special characters or spaces are printed
as-is without being escaped or quoted. Instead of prefixing the missing
OID with '?', a separate `missing=yes` token/value pair is appended.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/rev-list-options.adoc | 13 ++++++------
 builtin/rev-list.c                  | 29 ++++++++++++++++++++-------
 t/t6022-rev-list-missing.sh         | 31 +++++++++++++++++++++++++++++
 3 files changed, 60 insertions(+), 13 deletions(-)

diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
index d400d76cf2..145ded5c78 100644
--- a/Documentation/rev-list-options.adoc
+++ b/Documentation/rev-list-options.adoc
@@ -371,8 +371,8 @@ ifdef::git-rev-list[]
 <OID> NUL [<token>=<value> NUL]...
 -----------------------------------------------------------------------
 +
-Additional object metadata, such as object paths or boundary objects, is
-printed using the `<token>=<value>` form. Token values are printed as-is
+Additional object metadata, such as object paths or boundary/missing objects,
+is printed using the `<token>=<value>` form. Token values are printed as-is
 without any encoding/truncation. An OID entry never contains a '=' character
 and thus is used to signal the start of a new object record. Examples:
 +
@@ -380,12 +380,13 @@ and thus is used to signal the start of a new object record. Examples:
 <OID> NUL
 <OID> NUL path=<path> NUL
 <OID> NUL boundary=yes NUL
+<OID> NUL missing=yes NUL [<token>=<value> NUL]...
 -----------------------------------------------------------------------
 +
-This mode is only compatible with the `--objects` and `--boundary` output
-options. Also, revision and pathspec argument parsing on stdin with the
-`--stdin` option is NUL byte delimited instead of using newlines while in this
-mode.
+This mode is only compatible with the `--objects`, `--boundary`, and
+`--missing` output options. Also, revision and pathspec argument parsing on
+stdin with the `--stdin` option is NUL byte delimited instead of using newlines
+while in this mode.
 endif::git-rev-list[]
 
 History Simplification
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 7c6d4b25b0..d7b4dd48ff 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -136,24 +136,39 @@ static void print_missing_object(struct missing_objects_map_entry *entry,
 {
 	struct strbuf sb = STRBUF_INIT;
 
+	if (line_term)
+		putchar('?');
+
+	printf("%s", oid_to_hex(&entry->entry.oid));
+
+	if (!line_term)
+		printf("%cmissing=yes", info_term);
+
 	if (!print_missing_info) {
-		printf("?%s\n", oid_to_hex(&entry->entry.oid));
+		putchar(line_term);
 		return;
 	}
 
 	if (entry->path && *entry->path) {
 		struct strbuf path = STRBUF_INIT;
 
-		strbuf_addstr(&sb, " path=");
-		quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP);
-		strbuf_addbuf(&sb, &path);
+		strbuf_addf(&sb, "%cpath=", info_term);
+
+		if (line_term) {
+			quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP);
+			strbuf_addbuf(&sb, &path);
+		} else {
+			strbuf_addstr(&sb, entry->path);
+		}
 
 		strbuf_release(&path);
 	}
 	if (entry->type)
-		strbuf_addf(&sb, " type=%s", type_name(entry->type));
+		strbuf_addf(&sb, "%ctype=%s", info_term, type_name(entry->type));
+
+	fwrite(sb.buf, sizeof(char), sb.len, stdout);
+	putchar(line_term);
 
-	printf("?%s%s\n", oid_to_hex(&entry->entry.oid), sb.buf);
 	strbuf_release(&sb);
 }
 
@@ -784,7 +799,7 @@ int cmd_rev_list(int argc,
 		if (revs.graph || revs.verbose_header || show_disk_usage ||
 		    info.show_timestamp || info.header_prefix || bisect_list ||
 		    use_bitmap_index || revs.edge_hint || revs.left_right ||
-		    revs.cherry_mark || arg_missing_action)
+		    revs.cherry_mark)
 			die(_("-z option used with unsupported option"));
 	}
 
diff --git a/t/t6022-rev-list-missing.sh b/t/t6022-rev-list-missing.sh
index 3e2790d4c8..08e92dd002 100755
--- a/t/t6022-rev-list-missing.sh
+++ b/t/t6022-rev-list-missing.sh
@@ -198,4 +198,35 @@ do
 	'
 done
 
+test_expect_success "-z nul-delimited --missing" '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	(
+		cd repo &&
+		git commit --allow-empty -m first &&
+
+		path="foo bar" &&
+		echo foobar >"$path" &&
+		git add -A &&
+		git commit -m second &&
+
+		oid=$(git rev-parse "HEAD:$path") &&
+		type="$(git cat-file -t $oid)" &&
+
+		obj_path=".git/objects/$(test_oid_to_path $oid)" &&
+
+		git rev-list -z --objects --no-object-names \
+			HEAD ^"$oid" >expect &&
+		printf "%s\0missing=yes\0path=%s\0type=%s\0" "$oid" "$path" \
+			"$type" >>expect &&
+
+		mv "$obj_path" "$obj_path.hidden" &&
+		git rev-list -z --objects --no-object-names \
+			--missing=print-info HEAD >actual &&
+
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-12 22:09   ` Justin Tobler
@ 2025-03-13  5:33     ` Jeff King
  2025-03-13 16:41       ` Justin Tobler
  0 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2025-03-13  5:33 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, ps, christian.couder

On Wed, Mar 12, 2025 at 05:09:41PM -0500, Justin Tobler wrote:

> >      If it is possible to see some effect from "-z" now (I didn't dig
> >      very far), then it may be better to continue to let the diff
> >      options parser handle it, and simply pick the result out of
> >      revs.diffopt.line_termination. As your patch 3 is written, I think
> >      the diff code probably doesn't see it anymore at all.
> 
> As currently implemented, the early parsing of -z doesn't effect the
> diff options parsing in `setup_revisions()`. The early parsing doesn't
> remove the option and thus it continues to be set in the diff options.

Ah, OK. From the diff context I didn't realize it was not in the main
option parsing loop. That makes sense.

> Furthermore, revision and pathspec argument parsing is all handled in
> `setup_revisions()` so if we want to NUL-delimit arguments parsed on
> stdin with -z, we would still need to parse the option early anyway. I
> think it should be fine to leave the early -z option parsing as-is.

Makes sense. And I guess we might not want to have setup_revisions() do
that handling of "-z" for input, as that would make:

  git log --stdin --raw -z

behave differently (since it does not currently change stdin handling,
only the diff output). Though that does mean that these two commands
will behave differently:

  git log --stdin -z
  git rev-list --stdin -z

which seems...not great. My earlier suggestion to tie "-z" to stdin
handling was for consistency with other tools like grep. But if we
already have cases where "-z" is only for output, maybe it is better to
stay consistent with other parts of git. I.e., I was worried about us
painting ourselves into a corner with your patches, but we may have
already done so years ago. ;)

-Peff


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-12 15:56     ` Junio C Hamano
@ 2025-03-13  7:46       ` Patrick Steinhardt
  0 siblings, 0 replies; 60+ messages in thread
From: Patrick Steinhardt @ 2025-03-13  7:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Justin Tobler, git, christian.couder

On Wed, Mar 12, 2025 at 08:56:01AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > On Tue, Mar 11, 2025 at 07:57:20PM -0400, Jeff King wrote:
> >> On Mon, Mar 10, 2025 at 02:28:25PM -0500, Justin Tobler wrote:
> >> 
> >> > To make machine parsing easier, this series introduces a NUL-delimited
> >> > output mode for git-rev-list(1) via a `-z` option following a suggestion
> >> > from Junio in a previous thread[1]. In this mode, instead of LF, each
> >> > object is delimited with two NUL bytes and any object metadata is
> >> > separated with a single NUL byte. Examples:
> >> > 
> >> >         <oid> NUL NUL
> >> >         <oid> [NUL <path>] NUL NUL
> >> >         ?<oid> [NUL <token>=<value>]... NUL NUL
> >> > 
> >> > In this mode, path and value info are printed as-is without any special
> >> > encoding or truncation.
> >> 
> >> I think this is a good direction, but I have two compatibility
> >> questions:
> >> 
> >>   1. What should "git rev-list -z --stdin" do? In most other programs
> >>      with a "-z" option it affects both input and output. I don't
> >>      particularly care about this case myself, but it will be hard to
> >>      change later. So we probably want to decide now.
> >
> > I would lean into the direction of making "-z" change the format both
> > for stdin and stdout. That's what we do in most cases, and in those
> > cases where we didn't we came to regret it (git-cat-file(1)).
> 
> I've seen "-Z", in addition to "-z", used to differentiate between
> input and output in some commands.  If we are not going to do that,
> I agree that making "-z" to affect both input and output is less
> surprising than having to remember which side is still text.

Yeah, "-Z" is what I meant with git-cat-file(1) above. I just think it's
safer to always use NUL as delimiter in machine parsing -- and while one
often thinks initially that it may not be needed either in stdin or
stdout, my learning is that eventually you always find an edge case
where you do need NUL termination in both. So I'd rather want to push
application developers into the right direction and make "-z" adjust
both streams.

Patrick


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 4/6] rev-list: support delimiting objects with NUL bytes
  2025-03-13  0:17   ` [PATCH v2 4/6] rev-list: support delimiting objects with NUL bytes Justin Tobler
@ 2025-03-13 12:55     ` Patrick Steinhardt
  2025-03-13 14:44       ` Justin Tobler
  0 siblings, 1 reply; 60+ messages in thread
From: Patrick Steinhardt @ 2025-03-13 12:55 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, christian.couder, peff, ben.knoble

On Wed, Mar 12, 2025 at 07:17:04PM -0500, Justin Tobler wrote:
> diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
> index 785c0786e0..166d3cd19e 100644
> --- a/Documentation/rev-list-options.adoc
> +++ b/Documentation/rev-list-options.adoc
> @@ -361,6 +361,29 @@ ifdef::git-rev-list[]
>  --progress=<header>::
>  	Show progress reports on stderr as objects are considered. The
>  	`<header>` text will be printed with each progress update.
> +
> +-z::
> +	Instead of being newline-delimited, each outputted object and its
> +	accompanying metadata is delimited using NUL bytes in the following
> +	form:
> ++
> +-----------------------------------------------------------------------
> +<OID> NUL [<token>=<value> NUL]...
> +-----------------------------------------------------------------------
> ++
> +Additional object metadata, such as object paths, is printed using the
> +`<token>=<value>` form. Token values are printed as-is without any
> +encoding/truncation. An OID entry never contains a '=' character and thus
> +is used to signal the start of a new object record. Examples:
> ++
> +-----------------------------------------------------------------------
> +<OID> NUL
> +<OID> NUL path=<path> NUL
> +-----------------------------------------------------------------------
> ++
> +This mode is only compatible with the `--objects` output option. Also, revision
> +and pathspec argument parsing on stdin with the `--stdin` option is NUL byte
> +delimited instead of using newlines while in this mode.
>  endif::git-rev-list[]
>  
>  History Simplification

I feel like this last paragraph, where we talk about `--stdin` being
NUL-delimited, should already be mentioned in the first paragraph.

Patrick


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 6/6] rev-list: support NUL-delimited --missing option
  2025-03-13  0:17   ` [PATCH v2 6/6] rev-list: support NUL-delimited --missing option Justin Tobler
@ 2025-03-13 12:55     ` Patrick Steinhardt
  2025-03-13 14:51       ` Justin Tobler
  0 siblings, 1 reply; 60+ messages in thread
From: Patrick Steinhardt @ 2025-03-13 12:55 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, christian.couder, peff, ben.knoble

On Wed, Mar 12, 2025 at 07:17:06PM -0500, Justin Tobler wrote:
> diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
> index d400d76cf2..145ded5c78 100644
> --- a/Documentation/rev-list-options.adoc
> +++ b/Documentation/rev-list-options.adoc
> @@ -371,8 +371,8 @@ ifdef::git-rev-list[]
>  <OID> NUL [<token>=<value> NUL]...
>  -----------------------------------------------------------------------
>  +
> -Additional object metadata, such as object paths or boundary objects, is
> -printed using the `<token>=<value>` form. Token values are printed as-is
> +Additional object metadata, such as object paths or boundary/missing objects,
> +is printed using the `<token>=<value>` form. Token values are printed as-is
>  without any encoding/truncation. An OID entry never contains a '=' character
>  and thus is used to signal the start of a new object record. Examples:
>  +

Nit: I don't think we need to update this paragraph here as it is
written as a non-exhaustive list anyway.

> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index 7c6d4b25b0..d7b4dd48ff 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -136,24 +136,39 @@ static void print_missing_object(struct missing_objects_map_entry *entry,
>  {
>  	struct strbuf sb = STRBUF_INIT;
>  
> +	if (line_term)
> +		putchar('?');
> +
> +	printf("%s", oid_to_hex(&entry->entry.oid));
> +
> +	if (!line_term)
> +		printf("%cmissing=yes", info_term);
> +
>  	if (!print_missing_info) {
> -		printf("?%s\n", oid_to_hex(&entry->entry.oid));
> +		putchar(line_term);
>  		return;
>  	}
>  
>  	if (entry->path && *entry->path) {
>  		struct strbuf path = STRBUF_INIT;

Nit: the variable and its cleanup could be moved closer to where it's
used.

> -		strbuf_addstr(&sb, " path=");
> -		quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP);
> -		strbuf_addbuf(&sb, &path);
> +		strbuf_addf(&sb, "%cpath=", info_term);
> +
> +		if (line_term) {
> +			quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP);
> +			strbuf_addbuf(&sb, &path);
> +		} else {
> +			strbuf_addstr(&sb, entry->path);
> +		}
>  
>  		strbuf_release(&path);
>  	}
>  	if (entry->type)
> -		strbuf_addf(&sb, " type=%s", type_name(entry->type));
> +		strbuf_addf(&sb, "%ctype=%s", info_term, type_name(entry->type));
> +
> +	fwrite(sb.buf, sizeof(char), sb.len, stdout);
> +	putchar(line_term);
>  
> -	printf("?%s%s\n", oid_to_hex(&entry->entry.oid), sb.buf);
>  	strbuf_release(&sb);
>  }
>  

Patrick


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 4/6] rev-list: support delimiting objects with NUL bytes
  2025-03-13 12:55     ` Patrick Steinhardt
@ 2025-03-13 14:44       ` Justin Tobler
  0 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-13 14:44 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, christian.couder, peff, ben.knoble

On 25/03/13 01:55PM, Patrick Steinhardt wrote:
> On Wed, Mar 12, 2025 at 07:17:04PM -0500, Justin Tobler wrote:
> > diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
> > index 785c0786e0..166d3cd19e 100644
> > --- a/Documentation/rev-list-options.adoc
> > +++ b/Documentation/rev-list-options.adoc
> > @@ -361,6 +361,29 @@ ifdef::git-rev-list[]
> >  --progress=<header>::
> >  	Show progress reports on stderr as objects are considered. The
> >  	`<header>` text will be printed with each progress update.
> > +
> > +-z::
> > +	Instead of being newline-delimited, each outputted object and its
> > +	accompanying metadata is delimited using NUL bytes in the following
> > +	form:
> > ++
> > +-----------------------------------------------------------------------
> > +<OID> NUL [<token>=<value> NUL]...
> > +-----------------------------------------------------------------------
> > ++
> > +Additional object metadata, such as object paths, is printed using the
> > +`<token>=<value>` form. Token values are printed as-is without any
> > +encoding/truncation. An OID entry never contains a '=' character and thus
> > +is used to signal the start of a new object record. Examples:
> > ++
> > +-----------------------------------------------------------------------
> > +<OID> NUL
> > +<OID> NUL path=<path> NUL
> > +-----------------------------------------------------------------------
> > ++
> > +This mode is only compatible with the `--objects` output option. Also, revision
> > +and pathspec argument parsing on stdin with the `--stdin` option is NUL byte
> > +delimited instead of using newlines while in this mode.
> >  endif::git-rev-list[]
> >  
> >  History Simplification
> 
> I feel like this last paragraph, where we talk about `--stdin` being
> NUL-delimited, should already be mentioned in the first paragraph.

That's fair. I'll move the `--stdin` part to the beginning.

-Justin


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 6/6] rev-list: support NUL-delimited --missing option
  2025-03-13 12:55     ` Patrick Steinhardt
@ 2025-03-13 14:51       ` Justin Tobler
  0 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-13 14:51 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, christian.couder, peff, ben.knoble

On 25/03/13 01:55PM, Patrick Steinhardt wrote:
> On Wed, Mar 12, 2025 at 07:17:06PM -0500, Justin Tobler wrote:
> > diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
> > index d400d76cf2..145ded5c78 100644
> > --- a/Documentation/rev-list-options.adoc
> > +++ b/Documentation/rev-list-options.adoc
> > @@ -371,8 +371,8 @@ ifdef::git-rev-list[]
> >  <OID> NUL [<token>=<value> NUL]...
> >  -----------------------------------------------------------------------
> >  +
> > -Additional object metadata, such as object paths or boundary objects, is
> > -printed using the `<token>=<value>` form. Token values are printed as-is
> > +Additional object metadata, such as object paths or boundary/missing objects,
> > +is printed using the `<token>=<value>` form. Token values are printed as-is
> >  without any encoding/truncation. An OID entry never contains a '=' character
> >  and thus is used to signal the start of a new object record. Examples:
> >  +
> 
> Nit: I don't think we need to update this paragraph here as it is
> written as a non-exhaustive list anyway.

Ok, I'll omit this change and only keep the added example.

> > diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> > index 7c6d4b25b0..d7b4dd48ff 100644
> > --- a/builtin/rev-list.c
> > +++ b/builtin/rev-list.c
> > @@ -136,24 +136,39 @@ static void print_missing_object(struct missing_objects_map_entry *entry,
> >  {
> >  	struct strbuf sb = STRBUF_INIT;
> >  
> > +	if (line_term)
> > +		putchar('?');
> > +
> > +	printf("%s", oid_to_hex(&entry->entry.oid));
> > +
> > +	if (!line_term)
> > +		printf("%cmissing=yes", info_term);
> > +
> >  	if (!print_missing_info) {
> > -		printf("?%s\n", oid_to_hex(&entry->entry.oid));
> > +		putchar(line_term);
> >  		return;
> >  	}
> >  
> >  	if (entry->path && *entry->path) {
> >  		struct strbuf path = STRBUF_INIT;
> 
> Nit: the variable and its cleanup could be moved closer to where it's
> used.

Will do. Thanks

-Justin


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-13  5:33     ` Jeff King
@ 2025-03-13 16:41       ` Justin Tobler
  2025-03-14  2:49         ` Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Justin Tobler @ 2025-03-13 16:41 UTC (permalink / raw)
  To: Jeff King; +Cc: git, ps, christian.couder

On 25/03/13 01:33AM, Jeff King wrote:
> > Furthermore, revision and pathspec argument parsing is all handled in
> > `setup_revisions()` so if we want to NUL-delimit arguments parsed on
> > stdin with -z, we would still need to parse the option early anyway. I
> > think it should be fine to leave the early -z option parsing as-is.
> 
> Makes sense. And I guess we might not want to have setup_revisions() do
> that handling of "-z" for input, as that would make:
> 
>   git log --stdin --raw -z
> 
> behave differently (since it does not currently change stdin handling,
> only the diff output). 

Yes, we won't want to include this '-z' parsing directly in
`setup_revisions()` or else it would change the behavior of other
commands.

In version two of this series, NUL-delimited stdin handling for
`setup_revisions()` is triggered by setting a `nul_delim_stdin` field in
`setup_revision_opt`. This gives the `setup_revisions()` caller the
ability to control the parsing delimiter itself. 

Only in git-rev-list(1) does the stdin parsing behavior change if '-z'
is also present. The behavior stdin parsing for `git log -z --stdin`
remains unchanged.

> Though that does mean that these two commands
> will behave differently:
> 
>   git log --stdin -z
>   git rev-list --stdin -z
> 
> which seems...not great. My earlier suggestion to tie "-z" to stdin
> handling was for consistency with other tools like grep. But if we
> already have cases where "-z" is only for output, maybe it is better to
> stay consistent with other parts of git. I.e., I was worried about us
> painting ourselves into a corner with your patches, but we may have
> already done so years ago. ;)

I think to some extent Git is already inconsistent here. IMO it would be
preferable for both input and output to use NUL as the delimiter when
machine parsing in git-rev-list(1) as that is the behavior I would
personally expect. I also agree with Patrick's reasoning else where in
this thread[1].

I'm open to discuss further though :)

Thanks,
-Justin

[1]: <Z9KNQ8XliWrrYgAT@pks.im>


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode
  2025-03-13  0:17 ` [PATCH v2 0/6] " Justin Tobler
                     ` (5 preceding siblings ...)
  2025-03-13  0:17   ` [PATCH v2 6/6] rev-list: support NUL-delimited --missing option Justin Tobler
@ 2025-03-13 23:57   ` Justin Tobler
  2025-03-13 23:57     ` [PATCH v3 1/6] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
                       ` (6 more replies)
  6 siblings, 7 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-13 23:57 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler

When walking objects, git-rev-list(1) prints each object entry on a
separate line in the form:

        <oid> LF

Some options, such as `--objects`, may print additional information
about the object on the same line:

        <oid> SP [<path>] LF

In this mode, if the object path contains a newline it is truncated at
the newline.

The `--boundary` option also modifies output by prefixing boundary
objects with `-`:

        -<oid> LF

When the `--missing={print,print-info}` option is provided, information
about any missing objects encountered during the object walk are also
printed in the form:

        ?<oid> [SP <token>=<value>]... LF

where values containing LF or SP are printed in a token specific fashion
so that the resulting encoded value does not contain either of these two
problematic bytes. For example, missing object paths are quoted in the C
style when they contain LF or SP.

To make machine parsing easier, this series introduces a NUL-delimited
output mode for git-rev-list(1) via a `-z` option. In this mode, the
output format for object records is unified such that each object and
its accompanying metadata is formatted without relying on object
metadata order. This format follows the existing `<token>=<value>` used
by the `--missing` option to represent object metadata in the form:

        <oid> NUL [<token>=<value> NUL]...

        # Examples
        <oid> LF                       -> <oid> NUL
        <oid> SP <path> LF             -> <oid> NUL path=<path> NUL
        -<oid> LF                      -> <oid> NUL boundary=yes NUL
        ?<oid> [SP <token>=<value>]... -> <oid> NUL missing=yes NUL [<token>=<value> NUL]...

Note that token value info is printed as-is without any special encoding
or truncation. Prefixes such as '-' and '?' are dropped in favor using a
token/value pair to signal the same information.

While in this mode, if the `--sdtin` option is used, revision and
pathspec arguments read from stdin are separated with a NUL byte instead
of being newline delimited.

For now this series only adds support for use with the `--objects`,
`--boundary` and `--missing` output options. Usage of `-z` with other
options is rejected, so it can potentially be added in the future.

This series is structured as follows:

        - Patches 1 and 2 do some minor preparatory refactors.

        - Patch 3 modifies stdin argument parsing handled by
          `setup_revisions()` to support NUL-delimited arguments.

        - Patch 4 adds the `-z` option to git-rev-list(1) to print
          objects in a NUL-delimited fashion. Arguments parsed on stdin
          while in the mode are also NUL-delimited.

        - Patch 5 teaches the `--boundary` option how to print info in a
          NUL-delimited fashino using the unified output format.

        - Patch 6 teaches the `--missing` option how to print info in a
          NUL-delimited fashion using the unified output format.

Changes since V2:

        - In patch 4, the documentation for the -z option now points out
          the `--stdin` behavior change earlier.

        - Minor code style and documentation changes in patch 6.

Changes since V1:

        - Use unified output format with `<token>=<value>` pairs for
          all object metadata.

        - Add support for the `--boundary` option in NUL-delimited mode.

        - Add support for NUL-delimited stdin argument parsing in
          NUL-delimited mode.

        - Instead of using two NUL bytes to delimit between object
          records, a single NUL byte is used. Now that object metadata
          is always in the form `<token>=<value>`, we know a new object
          record starts when there is an OID entry which will not
          contain '='.

Thanks for taking a look,
-Justin

Justin Tobler (6):
  rev-list: inline `show_object_with_name()` in `show_object()`
  rev-list: refactor early option parsing
  revision: support NUL-delimited --stdin mode
  rev-list: support delimiting objects with NUL bytes
  rev-list: support NUL-delimited --boundary option
  rev-list: support NUL-delimited --missing option

 Documentation/rev-list-options.adoc | 26 ++++++++
 builtin/rev-list.c                  | 94 +++++++++++++++++++++--------
 revision.c                          | 27 ++++-----
 revision.h                          |  5 +-
 t/t6000-rev-list-misc.sh            | 51 ++++++++++++++++
 t/t6017-rev-list-stdin.sh           |  9 +++
 t/t6022-rev-list-missing.sh         | 31 ++++++++++
 7 files changed, 200 insertions(+), 43 deletions(-)

Range-diff against v2:
1:  d2eded3ac7 = 1:  d2eded3ac7 rev-list: inline `show_object_with_name()` in `show_object()`
2:  03cd08c859 = 2:  03cd08c859 rev-list: refactor early option parsing
3:  803a49933a = 3:  803a49933a revision: support NUL-delimited --stdin mode
4:  d3b3c4ef89 ! 4:  8eb7669089 rev-list: support delimiting objects with NUL bytes
    @@ Documentation/rev-list-options.adoc: ifdef::git-rev-list[]
     +
     +-z::
     +	Instead of being newline-delimited, each outputted object and its
    -+	accompanying metadata is delimited using NUL bytes in the following
    -+	form:
    ++	accompanying metadata is delimited using NUL bytes. In this mode, when
    ++	the `--stdin` option is provided, revision and pathspec arguments on
    ++	stdin are also delimited using a NUL byte. Output is printed in the
    ++	following form:
     ++
     +-----------------------------------------------------------------------
     +<OID> NUL [<token>=<value> NUL]...
    @@ Documentation/rev-list-options.adoc: ifdef::git-rev-list[]
     +<OID> NUL path=<path> NUL
     +-----------------------------------------------------------------------
     ++
    -+This mode is only compatible with the `--objects` output option. Also, revision
    -+and pathspec argument parsing on stdin with the `--stdin` option is NUL byte
    -+delimited instead of using newlines while in this mode.
    ++This mode is only compatible with the `--objects` output option.
      endif::git-rev-list[]
      
      History Simplification
5:  5e4fc41976 ! 5:  591a2c7dac rev-list: support NUL-delimited --boundary option
    @@ Documentation/rev-list-options.adoc: ifdef::git-rev-list[]
     +<OID> NUL boundary=yes NUL
      -----------------------------------------------------------------------
      +
    --This mode is only compatible with the `--objects` output option. Also, revision
    --and pathspec argument parsing on stdin with the `--stdin` option is NUL byte
    --delimited instead of using newlines while in this mode.
    +-This mode is only compatible with the `--objects` output option.
     +This mode is only compatible with the `--objects` and `--boundary` output
    -+options. Also, revision and pathspec argument parsing on stdin with the
    -+`--stdin` option is NUL byte delimited instead of using newlines while in this
    -+mode.
    ++options.
      endif::git-rev-list[]
      
      History Simplification
6:  7744966514 ! 6:  669b3b5d9f rev-list: support NUL-delimited --missing option
    @@ Commit message
         Signed-off-by: Justin Tobler <jltobler@gmail.com>
     
      ## Documentation/rev-list-options.adoc ##
    -@@ Documentation/rev-list-options.adoc: ifdef::git-rev-list[]
    - <OID> NUL [<token>=<value> NUL]...
    - -----------------------------------------------------------------------
    - +
    --Additional object metadata, such as object paths or boundary objects, is
    --printed using the `<token>=<value>` form. Token values are printed as-is
    -+Additional object metadata, such as object paths or boundary/missing objects,
    -+is printed using the `<token>=<value>` form. Token values are printed as-is
    - without any encoding/truncation. An OID entry never contains a '=' character
    - and thus is used to signal the start of a new object record. Examples:
    - +
     @@ Documentation/rev-list-options.adoc: and thus is used to signal the start of a new object record. Examples:
      <OID> NUL
      <OID> NUL path=<path> NUL
    @@ Documentation/rev-list-options.adoc: and thus is used to signal the start of a n
      -----------------------------------------------------------------------
      +
     -This mode is only compatible with the `--objects` and `--boundary` output
    --options. Also, revision and pathspec argument parsing on stdin with the
    --`--stdin` option is NUL byte delimited instead of using newlines while in this
    --mode.
    +-options.
     +This mode is only compatible with the `--objects`, `--boundary`, and
    -+`--missing` output options. Also, revision and pathspec argument parsing on
    -+stdin with the `--stdin` option is NUL byte delimited instead of using newlines
    -+while in this mode.
    ++`--missing` output options.
      endif::git-rev-list[]
      
      History Simplification
    @@ builtin/rev-list.c: static void print_missing_object(struct missing_objects_map_
      	struct strbuf sb = STRBUF_INIT;
      
     +	if (line_term)
    -+		putchar('?');
    -+
    -+	printf("%s", oid_to_hex(&entry->entry.oid));
    -+
    -+	if (!line_term)
    -+		printf("%cmissing=yes", info_term);
    ++		printf("?%s", oid_to_hex(&entry->entry.oid));
    ++	else
    ++		printf("%s%cmissing=yes", oid_to_hex(&entry->entry.oid),
    ++		       info_term);
     +
      	if (!print_missing_info) {
     -		printf("?%s\n", oid_to_hex(&entry->entry.oid));
    @@ builtin/rev-list.c: static void print_missing_object(struct missing_objects_map_
      	}
      
      	if (entry->path && *entry->path) {
    - 		struct strbuf path = STRBUF_INIT;
    +-		struct strbuf path = STRBUF_INIT;
    ++		strbuf_addf(&sb, "%cpath=", info_term);
    ++
    ++		if (line_term) {
    ++			struct strbuf path = STRBUF_INIT;
      
     -		strbuf_addstr(&sb, " path=");
     -		quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP);
     -		strbuf_addbuf(&sb, &path);
    -+		strbuf_addf(&sb, "%cpath=", info_term);
    -+
    -+		if (line_term) {
     +			quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP);
     +			strbuf_addbuf(&sb, &path);
    + 
    +-		strbuf_release(&path);
    ++			strbuf_release(&path);
     +		} else {
     +			strbuf_addstr(&sb, entry->path);
     +		}
    - 
    - 		strbuf_release(&path);
      	}
      	if (entry->type)
     -		strbuf_addf(&sb, " type=%s", type_name(entry->type));

base-commit: 87a0bdbf0f72b7561f3cd50636eee33dcb7dbcc3
-- 
2.49.0.rc2



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v3 1/6] rev-list: inline `show_object_with_name()` in `show_object()`
  2025-03-13 23:57   ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler
@ 2025-03-13 23:57     ` Justin Tobler
  2025-03-13 23:57     ` [PATCH v3 2/6] rev-list: refactor early option parsing Justin Tobler
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-13 23:57 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler

The `show_object_with_name()` function only has a single call site.
Inline call to `show_object_with_name()` in `show_object()` so the
explicit function can be cleaned up and live closer to where it is used.
While at it, factor out the code that prints the OID and newline for
both objects with and without a name. In a subsequent commit,
`show_object()` is modified to support printing object information in a
NUL-delimited format.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/rev-list.c | 13 +++++++++----
 revision.c         |  8 --------
 revision.h         |  2 --
 3 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index bb26bee0d4..dcd079c16c 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -357,10 +357,15 @@ static void show_object(struct object *obj, const char *name, void *cb_data)
 		return;
 	}
 
-	if (arg_show_object_names)
-		show_object_with_name(stdout, obj, name);
-	else
-		printf("%s\n", oid_to_hex(&obj->oid));
+	printf("%s", oid_to_hex(&obj->oid));
+
+	if (arg_show_object_names) {
+		putchar(' ');
+		for (const char *p = name; *p && *p != '\n'; p++)
+			putchar(*p);
+	}
+
+	putchar('\n');
 }
 
 static void show_edge(struct commit *commit)
diff --git a/revision.c b/revision.c
index c4390f0938..0eaebe4478 100644
--- a/revision.c
+++ b/revision.c
@@ -59,14 +59,6 @@ implement_shared_commit_slab(revision_sources, char *);
 
 static inline int want_ancestry(const struct rev_info *revs);
 
-void show_object_with_name(FILE *out, struct object *obj, const char *name)
-{
-	fprintf(out, "%s ", oid_to_hex(&obj->oid));
-	for (const char *p = name; *p && *p != '\n'; p++)
-		fputc(*p, out);
-	fputc('\n', out);
-}
-
 static void mark_blob_uninteresting(struct blob *blob)
 {
 	if (!blob)
diff --git a/revision.h b/revision.h
index 71e984c452..21c6a69899 100644
--- a/revision.h
+++ b/revision.h
@@ -489,8 +489,6 @@ void mark_parents_uninteresting(struct rev_info *revs, struct commit *commit);
 void mark_tree_uninteresting(struct repository *r, struct tree *tree);
 void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *trees);
 
-void show_object_with_name(FILE *, struct object *, const char *);
-
 /**
  * Helpers to check if a reference should be excluded.
  */
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 2/6] rev-list: refactor early option parsing
  2025-03-13 23:57   ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler
  2025-03-13 23:57     ` [PATCH v3 1/6] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
@ 2025-03-13 23:57     ` Justin Tobler
  2025-03-13 23:57     ` [PATCH v3 3/6] revision: support NUL-delimited --stdin mode Justin Tobler
                       ` (4 subsequent siblings)
  6 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-13 23:57 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler

Before invoking `setup_revisions()`, the `--missing` and
`--exclude-promisor-objects` options are parsed early. In a subsequent
commit, another option is added that must be parsed early.

Refactor the code to parse both options in a single early pass.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/rev-list.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index dcd079c16c..04d9c893b5 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -16,6 +16,7 @@
 #include "object-file.h"
 #include "object-store-ll.h"
 #include "pack-bitmap.h"
+#include "parse-options.h"
 #include "log-tree.h"
 #include "graph.h"
 #include "bisect.h"
@@ -639,19 +640,15 @@ int cmd_rev_list(int argc,
 		if (!strcmp(arg, "--exclude-promisor-objects")) {
 			fetch_if_missing = 0;
 			revs.exclude_promisor_objects = 1;
-			break;
-		}
-	}
-	for (i = 1; i < argc; i++) {
-		const char *arg = argv[i];
-		if (skip_prefix(arg, "--missing=", &arg)) {
-			if (revs.exclude_promisor_objects)
-				die(_("options '%s' and '%s' cannot be used together"), "--exclude-promisor-objects", "--missing");
-			if (parse_missing_action_value(arg))
-				break;
+		} else if (skip_prefix(arg, "--missing=", &arg)) {
+			parse_missing_action_value(arg);
 		}
 	}
 
+	die_for_incompatible_opt2(revs.exclude_promisor_objects,
+				  "--exclude_promisor_objects",
+				  arg_missing_action, "--missing");
+
 	if (arg_missing_action)
 		revs.do_not_die_on_missing_objects = 1;
 
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 3/6] revision: support NUL-delimited --stdin mode
  2025-03-13 23:57   ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler
  2025-03-13 23:57     ` [PATCH v3 1/6] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
  2025-03-13 23:57     ` [PATCH v3 2/6] rev-list: refactor early option parsing Justin Tobler
@ 2025-03-13 23:57     ` Justin Tobler
  2025-03-13 23:57     ` [PATCH v3 4/6] rev-list: support delimiting objects with NUL bytes Justin Tobler
                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-13 23:57 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler

When `setup_revisions()` parses the `--stdin` option, revision and
pathspec arguments are read from stdin. Each line of input is handled as
a separate argument.

Introduce the `nul_delim_stdin` field to `setup_revision_opt` that, when
enabled, uses a NUL byte to delimit between stdin arguments instead of
newline.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 revision.c | 19 +++++++++++--------
 revision.h |  3 ++-
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/revision.c b/revision.c
index 0eaebe4478..5de6309830 100644
--- a/revision.c
+++ b/revision.c
@@ -2275,10 +2275,10 @@ int handle_revision_arg(const char *arg, struct rev_info *revs, int flags, unsig
 	return ret;
 }
 
-static void read_pathspec_from_stdin(struct strbuf *sb,
-				     struct strvec *prune)
+static void read_pathspec_from_stdin(struct strbuf *sb, struct strvec *prune,
+				     int line_term)
 {
-	while (strbuf_getline(sb, stdin) != EOF)
+	while (strbuf_getdelim_strip_crlf(sb, stdin, line_term) != EOF)
 		strvec_push(prune, sb->buf);
 }
 
@@ -2905,8 +2905,8 @@ static int handle_revision_pseudo_opt(struct rev_info *revs,
 	return 1;
 }
 
-static void read_revisions_from_stdin(struct rev_info *revs,
-				      struct strvec *prune)
+static void read_revisions_from_stdin(struct rev_info *revs, struct strvec *prune,
+				      int line_term)
 {
 	struct strbuf sb;
 	int seen_dashdash = 0;
@@ -2918,7 +2918,7 @@ static void read_revisions_from_stdin(struct rev_info *revs,
 	warn_on_object_refname_ambiguity = 0;
 
 	strbuf_init(&sb, 1000);
-	while (strbuf_getline(&sb, stdin) != EOF) {
+	while (strbuf_getdelim_strip_crlf(&sb, stdin, line_term) != EOF) {
 		if (!sb.len)
 			break;
 
@@ -2946,7 +2946,7 @@ static void read_revisions_from_stdin(struct rev_info *revs,
 			die("bad revision '%s'", sb.buf);
 	}
 	if (seen_dashdash)
-		read_pathspec_from_stdin(&sb, prune);
+		read_pathspec_from_stdin(&sb, prune, line_term);
 
 	strbuf_release(&sb);
 	warn_on_object_refname_ambiguity = save_warning;
@@ -3019,13 +3019,16 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
 			}
 
 			if (!strcmp(arg, "--stdin")) {
+				int term = opt && opt->nul_delim_stdin ? '\0' : '\n';
+
 				if (revs->disable_stdin) {
 					argv[left++] = arg;
 					continue;
 				}
 				if (revs->read_from_stdin++)
 					die("--stdin given twice?");
-				read_revisions_from_stdin(revs, &prune_data);
+				read_revisions_from_stdin(revs, &prune_data,
+							  term);
 				continue;
 			}
 
diff --git a/revision.h b/revision.h
index 21c6a69899..0e680c3667 100644
--- a/revision.h
+++ b/revision.h
@@ -439,7 +439,8 @@ struct setup_revision_opt {
 	void (*tweak)(struct rev_info *);
 	unsigned int	assume_dashdash:1,
 			allow_exclude_promisor_objects:1,
-			free_removed_argv_elements:1;
+			free_removed_argv_elements:1,
+			nul_delim_stdin:1;
 	unsigned revarg_opt;
 };
 int setup_revisions(int argc, const char **argv, struct rev_info *revs,
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 4/6] rev-list: support delimiting objects with NUL bytes
  2025-03-13 23:57   ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler
                       ` (2 preceding siblings ...)
  2025-03-13 23:57     ` [PATCH v3 3/6] revision: support NUL-delimited --stdin mode Justin Tobler
@ 2025-03-13 23:57     ` Justin Tobler
  2025-03-19 12:35       ` Christian Couder
  2025-03-13 23:57     ` [PATCH v3 5/6] rev-list: support NUL-delimited --boundary option Justin Tobler
                       ` (2 subsequent siblings)
  6 siblings, 1 reply; 60+ messages in thread
From: Justin Tobler @ 2025-03-13 23:57 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler

When walking objects, git-rev-list(1) prints each object entry on a
separate line. Some options, such as `--objects`, may print additional
information about tree and blob object on the same line in the form:

        $ git rev-list --objects <rev>
        <tree/blob oid> SP [<path>] LF

Note that in this form the SP is appended regardless of whether the tree
or blob object has path information available. Paths containing a
newline are also truncated at the newline.

Introduce the `-z` option for git-rev-list(1) which reformats the output
to use NUL-delimiters between objects and associated info in the
following form:

        $ git rev-list -z --objects <rev>
        <oid> NUL [path=<path> NUL]

In this form, the start of each record is signaled by an OID entry that
is all hexidecimal and does not contain any '='. Additional path info
from `--objects` is appended to the record as a token/value pair
`path=<path>` as-is without any truncation.

In this mode, revision and pathspec arguments provided on stdin with the
`--stdin` option are also separated by a NUL byte instead of being
newline delimited.

For now, the `--objects` and `--stdin` flag are the only options that
can be used in combination with `-z`. In a subsequent commit,
NUL-delimited support for other options is added. Other options that do
not make sense with be used in combination with `-z` are rejected.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/rev-list-options.adoc | 23 ++++++++++++++++++
 builtin/rev-list.c                  | 36 +++++++++++++++++++++++++----
 t/t6000-rev-list-misc.sh            | 35 ++++++++++++++++++++++++++++
 t/t6017-rev-list-stdin.sh           |  9 ++++++++
 4 files changed, 98 insertions(+), 5 deletions(-)

diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
index 785c0786e0..14d82fdfbf 100644
--- a/Documentation/rev-list-options.adoc
+++ b/Documentation/rev-list-options.adoc
@@ -361,6 +361,29 @@ ifdef::git-rev-list[]
 --progress=<header>::
 	Show progress reports on stderr as objects are considered. The
 	`<header>` text will be printed with each progress update.
+
+-z::
+	Instead of being newline-delimited, each outputted object and its
+	accompanying metadata is delimited using NUL bytes. In this mode, when
+	the `--stdin` option is provided, revision and pathspec arguments on
+	stdin are also delimited using a NUL byte. Output is printed in the
+	following form:
++
+-----------------------------------------------------------------------
+<OID> NUL [<token>=<value> NUL]...
+-----------------------------------------------------------------------
++
+Additional object metadata, such as object paths, is printed using the
+`<token>=<value>` form. Token values are printed as-is without any
+encoding/truncation. An OID entry never contains a '=' character and thus
+is used to signal the start of a new object record. Examples:
++
+-----------------------------------------------------------------------
+<OID> NUL
+<OID> NUL path=<path> NUL
+-----------------------------------------------------------------------
++
+This mode is only compatible with the `--objects` output option.
 endif::git-rev-list[]
 
 History Simplification
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 04d9c893b5..f048500679 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -65,6 +65,7 @@ static const char rev_list_usage[] =
 "    --abbrev-commit\n"
 "    --left-right\n"
 "    --count\n"
+"    -z\n"
 "  special purpose:\n"
 "    --bisect\n"
 "    --bisect-vars\n"
@@ -97,6 +98,9 @@ static int arg_show_object_names = 1;
 
 #define DEFAULT_OIDSET_SIZE     (16*1024)
 
+static char line_term = '\n';
+static char info_term = ' ';
+
 static int show_disk_usage;
 static off_t total_disk_usage;
 static int human_readable;
@@ -264,7 +268,7 @@ static void show_commit(struct commit *commit, void *data)
 	if (revs->commit_format == CMIT_FMT_ONELINE)
 		putchar(' ');
 	else if (revs->include_header)
-		putchar('\n');
+		putchar(line_term);
 
 	if (revs->verbose_header) {
 		struct strbuf buf = STRBUF_INIT;
@@ -361,12 +365,16 @@ static void show_object(struct object *obj, const char *name, void *cb_data)
 	printf("%s", oid_to_hex(&obj->oid));
 
 	if (arg_show_object_names) {
-		putchar(' ');
-		for (const char *p = name; *p && *p != '\n'; p++)
-			putchar(*p);
+		if (line_term) {
+			putchar(info_term);
+			for (const char *p = name; *p && *p != '\n'; p++)
+				putchar(*p);
+		} else if (*name) {
+			printf("%cpath=%s", info_term, name);
+		}
 	}
 
-	putchar('\n');
+	putchar(line_term);
 }
 
 static void show_edge(struct commit *commit)
@@ -642,6 +650,10 @@ int cmd_rev_list(int argc,
 			revs.exclude_promisor_objects = 1;
 		} else if (skip_prefix(arg, "--missing=", &arg)) {
 			parse_missing_action_value(arg);
+		} else if (!strcmp(arg, "-z")) {
+			s_r_opt.nul_delim_stdin = 1;
+			line_term = '\0';
+			info_term = '\0';
 		}
 	}
 
@@ -757,6 +769,20 @@ int cmd_rev_list(int argc,
 		usage(rev_list_usage);
 
 	}
+
+	/*
+	 * Reject options currently incompatible with -z. For some options, this
+	 * is not an inherent limitation and support may be implemented in the
+	 * future.
+	 */
+	if (!line_term) {
+		if (revs.graph || revs.verbose_header || show_disk_usage ||
+		    info.show_timestamp || info.header_prefix || bisect_list ||
+		    use_bitmap_index || revs.edge_hint || revs.left_right ||
+		    revs.cherry_mark || arg_missing_action || revs.boundary)
+			die(_("-z option used with unsupported option"));
+	}
+
 	if (revs.commit_format != CMIT_FMT_USERFORMAT)
 		revs.include_header = 1;
 	if (revs.commit_format != CMIT_FMT_UNSPECIFIED) {
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index 6289a2e8b0..dfbbc0aee6 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -182,4 +182,39 @@ test_expect_success 'rev-list --unpacked' '
 	test_cmp expect actual
 '
 
+test_expect_success 'rev-list -z' '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	test_commit -C repo 1 &&
+	test_commit -C repo 2 &&
+
+	oid1=$(git -C repo rev-parse HEAD) &&
+	oid2=$(git -C repo rev-parse HEAD~) &&
+
+	printf "%s\0%s\0" "$oid1" "$oid2" >expect &&
+	git -C repo rev-list -z HEAD >actual &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'rev-list -z --objects' '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	test_commit -C repo 1 &&
+	test_commit -C repo 2 &&
+
+	oid1=$(git -C repo rev-parse HEAD:1.t) &&
+	oid2=$(git -C repo rev-parse HEAD:2.t) &&
+	path1=1.t &&
+	path2=2.t &&
+
+	printf "%s\0path=%s\0%s\0path=%s\0" "$oid1" "$path1" "$oid2" "$path2" \
+		>expect &&
+	git -C repo rev-list -z --objects HEAD:1.t HEAD:2.t >actual &&
+
+	test_cmp expect actual
+'
+
 test_done
diff --git a/t/t6017-rev-list-stdin.sh b/t/t6017-rev-list-stdin.sh
index 4821b90e74..362a8b126a 100755
--- a/t/t6017-rev-list-stdin.sh
+++ b/t/t6017-rev-list-stdin.sh
@@ -148,4 +148,13 @@ test_expect_success '--not via stdin does not influence revisions from command l
 	test_cmp expect actual
 '
 
+test_expect_success 'NUL-delimited stdin' '
+	printf "%s\0%s\0%s\0" "HEAD" "--" "file-1" > input &&
+
+	git rev-list -z --objects HEAD -- file-1 >expect &&
+	git rev-list -z --objects --stdin <input >actual &&
+
+	test_cmp expect actual
+'
+
 test_done
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 5/6] rev-list: support NUL-delimited --boundary option
  2025-03-13 23:57   ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler
                       ` (3 preceding siblings ...)
  2025-03-13 23:57     ` [PATCH v3 4/6] rev-list: support delimiting objects with NUL bytes Justin Tobler
@ 2025-03-13 23:57     ` Justin Tobler
  2025-03-13 23:57     ` [PATCH v3 6/6] rev-list: support NUL-delimited --missing option Justin Tobler
  2025-03-19 18:34     ` [PATCH v4 0/5] rev-list: introduce NUL-delimited output mode Justin Tobler
  6 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-13 23:57 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler

The `--boundary` option for git-rev-list(1) prints boundary objects
found while performing the object walk in the form:

        $ git rev-list --boundary <rev>
        -<oid> LF

Add support for printing boundary objects in a NUL-delimited format when
the `-z` option is enabled.

        $ git rev-list -z --boundary <rev>
        <oid> NUL boundary=yes NUL

In this mode, instead of prefixing the boundary OID with '-', a separate
`boundary=yes` token/value pair is appended.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/rev-list-options.adoc | 12 +++++++-----
 builtin/rev-list.c                  |  9 +++++++--
 t/t6000-rev-list-misc.sh            | 16 ++++++++++++++++
 3 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
index 14d82fdfbf..92ac31a8e8 100644
--- a/Documentation/rev-list-options.adoc
+++ b/Documentation/rev-list-options.adoc
@@ -373,17 +373,19 @@ ifdef::git-rev-list[]
 <OID> NUL [<token>=<value> NUL]...
 -----------------------------------------------------------------------
 +
-Additional object metadata, such as object paths, is printed using the
-`<token>=<value>` form. Token values are printed as-is without any
-encoding/truncation. An OID entry never contains a '=' character and thus
-is used to signal the start of a new object record. Examples:
+Additional object metadata, such as object paths or boundary objects, is
+printed using the `<token>=<value>` form. Token values are printed as-is
+without any encoding/truncation. An OID entry never contains a '=' character
+and thus is used to signal the start of a new object record. Examples:
 +
 -----------------------------------------------------------------------
 <OID> NUL
 <OID> NUL path=<path> NUL
+<OID> NUL boundary=yes NUL
 -----------------------------------------------------------------------
 +
-This mode is only compatible with the `--objects` output option.
+This mode is only compatible with the `--objects` and `--boundary` output
+options.
 endif::git-rev-list[]
 
 History Simplification
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index f048500679..7c6d4b25b0 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -240,13 +240,18 @@ static void show_commit(struct commit *commit, void *data)
 		fputs(info->header_prefix, stdout);
 
 	if (revs->include_header) {
-		if (!revs->graph)
+		if (!revs->graph && line_term)
 			fputs(get_revision_mark(revs, commit), stdout);
 		if (revs->abbrev_commit && revs->abbrev)
 			fputs(repo_find_unique_abbrev(the_repository, &commit->object.oid, revs->abbrev),
 			      stdout);
 		else
 			fputs(oid_to_hex(&commit->object.oid), stdout);
+
+		if (!line_term) {
+			if (commit->object.flags & BOUNDARY)
+				printf("%cboundary=yes", info_term);
+		}
 	}
 	if (revs->print_parents) {
 		struct commit_list *parents = commit->parents;
@@ -779,7 +784,7 @@ int cmd_rev_list(int argc,
 		if (revs.graph || revs.verbose_header || show_disk_usage ||
 		    info.show_timestamp || info.header_prefix || bisect_list ||
 		    use_bitmap_index || revs.edge_hint || revs.left_right ||
-		    revs.cherry_mark || arg_missing_action || revs.boundary)
+		    revs.cherry_mark || arg_missing_action)
 			die(_("-z option used with unsupported option"));
 	}
 
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index dfbbc0aee6..349bf5ec3d 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -217,4 +217,20 @@ test_expect_success 'rev-list -z --objects' '
 	test_cmp expect actual
 '
 
+test_expect_success 'rev-list -z --boundary' '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	test_commit -C repo 1 &&
+	test_commit -C repo 2 &&
+
+	oid1=$(git -C repo rev-parse HEAD) &&
+	oid2=$(git -C repo rev-parse HEAD~) &&
+
+	printf "%s\0%s\0boundary=yes\0" "$oid1" "$oid2" >expect &&
+	git -C repo rev-list -z --boundary HEAD~.. >actual &&
+
+	test_cmp expect actual
+'
+
 test_done
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 6/6] rev-list: support NUL-delimited --missing option
  2025-03-13 23:57   ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler
                       ` (4 preceding siblings ...)
  2025-03-13 23:57     ` [PATCH v3 5/6] rev-list: support NUL-delimited --boundary option Justin Tobler
@ 2025-03-13 23:57     ` Justin Tobler
  2025-03-19 18:34     ` [PATCH v4 0/5] rev-list: introduce NUL-delimited output mode Justin Tobler
  6 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-13 23:57 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, peff, ben.knoble, Justin Tobler

The `--missing={print,print-info}` option for git-rev-list(1) prints
missing objects found while performing the object walk in the form:

        $ git rev-list --missing=print-info <rev>
        ?<oid> [SP <token>=<value>]... LF

Add support for printing missing objects in a NUL-delimited format when
the `-z` option is enabled.

        $ git rev-list -z --missing=print-info <rev>
        <oid> NUL missing=yes NUL [<token>=<value> NUL]...

In this mode, values containing special characters or spaces are printed
as-is without being escaped or quoted. Instead of prefixing the missing
OID with '?', a separate `missing=yes` token/value pair is appended.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/rev-list-options.adoc |  5 +++--
 builtin/rev-list.c                  | 31 ++++++++++++++++++++---------
 t/t6022-rev-list-missing.sh         | 31 +++++++++++++++++++++++++++++
 3 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
index 92ac31a8e8..f4764b72f5 100644
--- a/Documentation/rev-list-options.adoc
+++ b/Documentation/rev-list-options.adoc
@@ -382,10 +382,11 @@ and thus is used to signal the start of a new object record. Examples:
 <OID> NUL
 <OID> NUL path=<path> NUL
 <OID> NUL boundary=yes NUL
+<OID> NUL missing=yes NUL [<token>=<value> NUL]...
 -----------------------------------------------------------------------
 +
-This mode is only compatible with the `--objects` and `--boundary` output
-options.
+This mode is only compatible with the `--objects`, `--boundary`, and
+`--missing` output options.
 endif::git-rev-list[]
 
 History Simplification
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 7c6d4b25b0..036fcc26d5 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -136,24 +136,37 @@ static void print_missing_object(struct missing_objects_map_entry *entry,
 {
 	struct strbuf sb = STRBUF_INIT;
 
+	if (line_term)
+		printf("?%s", oid_to_hex(&entry->entry.oid));
+	else
+		printf("%s%cmissing=yes", oid_to_hex(&entry->entry.oid),
+		       info_term);
+
 	if (!print_missing_info) {
-		printf("?%s\n", oid_to_hex(&entry->entry.oid));
+		putchar(line_term);
 		return;
 	}
 
 	if (entry->path && *entry->path) {
-		struct strbuf path = STRBUF_INIT;
+		strbuf_addf(&sb, "%cpath=", info_term);
+
+		if (line_term) {
+			struct strbuf path = STRBUF_INIT;
 
-		strbuf_addstr(&sb, " path=");
-		quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP);
-		strbuf_addbuf(&sb, &path);
+			quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP);
+			strbuf_addbuf(&sb, &path);
 
-		strbuf_release(&path);
+			strbuf_release(&path);
+		} else {
+			strbuf_addstr(&sb, entry->path);
+		}
 	}
 	if (entry->type)
-		strbuf_addf(&sb, " type=%s", type_name(entry->type));
+		strbuf_addf(&sb, "%ctype=%s", info_term, type_name(entry->type));
+
+	fwrite(sb.buf, sizeof(char), sb.len, stdout);
+	putchar(line_term);
 
-	printf("?%s%s\n", oid_to_hex(&entry->entry.oid), sb.buf);
 	strbuf_release(&sb);
 }
 
@@ -784,7 +797,7 @@ int cmd_rev_list(int argc,
 		if (revs.graph || revs.verbose_header || show_disk_usage ||
 		    info.show_timestamp || info.header_prefix || bisect_list ||
 		    use_bitmap_index || revs.edge_hint || revs.left_right ||
-		    revs.cherry_mark || arg_missing_action)
+		    revs.cherry_mark)
 			die(_("-z option used with unsupported option"));
 	}
 
diff --git a/t/t6022-rev-list-missing.sh b/t/t6022-rev-list-missing.sh
index 3e2790d4c8..08e92dd002 100755
--- a/t/t6022-rev-list-missing.sh
+++ b/t/t6022-rev-list-missing.sh
@@ -198,4 +198,35 @@ do
 	'
 done
 
+test_expect_success "-z nul-delimited --missing" '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	(
+		cd repo &&
+		git commit --allow-empty -m first &&
+
+		path="foo bar" &&
+		echo foobar >"$path" &&
+		git add -A &&
+		git commit -m second &&
+
+		oid=$(git rev-parse "HEAD:$path") &&
+		type="$(git cat-file -t $oid)" &&
+
+		obj_path=".git/objects/$(test_oid_to_path $oid)" &&
+
+		git rev-list -z --objects --no-object-names \
+			HEAD ^"$oid" >expect &&
+		printf "%s\0missing=yes\0path=%s\0type=%s\0" "$oid" "$path" \
+			"$type" >>expect &&
+
+		mv "$obj_path" "$obj_path.hidden" &&
+		git rev-list -z --objects --no-object-names \
+			--missing=print-info HEAD >actual &&
+
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-13 16:41       ` Justin Tobler
@ 2025-03-14  2:49         ` Jeff King
  2025-03-14 17:02           ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2025-03-14  2:49 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, ps, christian.couder

On Thu, Mar 13, 2025 at 11:41:56AM -0500, Justin Tobler wrote:

> > Though that does mean that these two commands
> > will behave differently:
> > 
> >   git log --stdin -z
> >   git rev-list --stdin -z
> > 
> > which seems...not great. My earlier suggestion to tie "-z" to stdin
> > handling was for consistency with other tools like grep. But if we
> > already have cases where "-z" is only for output, maybe it is better to
> > stay consistent with other parts of git. I.e., I was worried about us
> > painting ourselves into a corner with your patches, but we may have
> > already done so years ago. ;)
> 
> I think to some extent Git is already inconsistent here. IMO it would be
> preferable for both input and output to use NUL as the delimiter when
> machine parsing in git-rev-list(1) as that is the behavior I would
> personally expect. I also agree with Patrick's reasoning else where in
> this thread[1].
> 
> I'm open to discuss further though :)

Nope, I don't have anything further to add. I just wanted to make sure
my suggestion in the earlier part of the thread was not leading us
accidentally down a bad path. ;) If the inconsistencies have been
considered and we're OK with them, that sounds good to me (and the
option handling in your v2 does seem to faithfully implement that).

Thanks for all your responses.

-Peff


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-14  2:49         ` Jeff King
@ 2025-03-14 17:02           ` Junio C Hamano
  2025-03-14 18:59             ` Jeff King
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2025-03-14 17:02 UTC (permalink / raw)
  To: Jeff King; +Cc: Justin Tobler, git, ps, christian.couder

Jeff King <peff@peff.net> writes:

> On Thu, Mar 13, 2025 at 11:41:56AM -0500, Justin Tobler wrote:
>
>> > Though that does mean that these two commands
>> > will behave differently:
>> > 
>> >   git log --stdin -z
>> >   git rev-list --stdin -z
>> > 
>> > which seems...not great. My earlier suggestion to tie "-z" to stdin
>> > handling was for consistency with other tools like grep. But if we
>> > already have cases where "-z" is only for output, maybe it is better to
>> > stay consistent with other parts of git. I.e., I was worried about us
>> > painting ourselves into a corner with your patches, but we may have
>> > already done so years ago. ;)
>> 
>> I think to some extent Git is already inconsistent here. IMO it would be
>> preferable for both input and output to use NUL as the delimiter when
>> machine parsing in git-rev-list(1) as that is the behavior I would
>> personally expect. I also agree with Patrick's reasoning else where in
>> this thread[1].
>> 
>> I'm open to discuss further though :)
>
> Nope, I don't have anything further to add. I just wanted to make sure
> my suggestion in the earlier part of the thread was not leading us
> accidentally down a bad path. ;) If the inconsistencies have been
> considered and we're OK with them, that sounds good to me (and the
> option handling in your v2 does seem to faithfully implement that).
>
> Thanks for all your responses.

The developers who know that the revision walk infrastructure is
shared between rev-list and log may wish that "-z" to mean the same
thing for these two commands, but the end-users do not have to, no?

After all, "git log" accepts "-z" but "git rev-list" does not in the
current system and nobody complained about the discrepancy so far.

Having said that, at the plumbing level, my preference is to have
two independent options "--nul-delimited-{output,input}".  It does
not prevent us from starting with a single "-z" that works as a
short-hand that flips both on (and is inconsistent with "git log" at
the Porcelain level), but we can make "-z" only for output for
consistency.  As long as we agree on the design to allow us to
control both sides independently, starting with "-z" that is only
for output may be the best way forward.







^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-14 17:02           ` Junio C Hamano
@ 2025-03-14 18:59             ` Jeff King
  2025-03-14 19:53               ` Justin Tobler
  0 siblings, 1 reply; 60+ messages in thread
From: Jeff King @ 2025-03-14 18:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Justin Tobler, git, ps, christian.couder

On Fri, Mar 14, 2025 at 10:02:11AM -0700, Junio C Hamano wrote:

> The developers who know that the revision walk infrastructure is
> shared between rev-list and log may wish that "-z" to mean the same
> thing for these two commands, but the end-users do not have to, no?

I think of them as related in a user-facing way, but it is likely that
my mind is poisoned by knowing too much of Git's internals. :)

> After all, "git log" accepts "-z" but "git rev-list" does not in the
> current system and nobody complained about the discrepancy so far.

Well, rev-list _does_ take "-z", but it just happens that it cannot ever
do anything because you cannot convince it to produce a diff. But even
knowing that is true is probably again a sign of my poisoned mind.

> Having said that, at the plumbing level, my preference is to have
> two independent options "--nul-delimited-{output,input}".  It does
> not prevent us from starting with a single "-z" that works as a
> short-hand that flips both on (and is inconsistent with "git log" at
> the Porcelain level), but we can make "-z" only for output for
> consistency.  As long as we agree on the design to allow us to
> control both sides independently, starting with "-z" that is only
> for output may be the best way forward.

Yeah, I almost suggested earlier having longer, unambiguous names. And
then that punts the issue from "which functionality should be available"
to "which functionality should be mapped to short-and-sweet -z".

I do think it's still worth considering what "-z" should do _now_,
though, because it will be painful/impossible to switch its behavior
later. And people seemed to like the "both input and output" direction.
That would leave the longer names as escape hatches. I.e., I'd expect:

  git rev-list -z --no-nul-delimited-input --stdin

to use newlines for the input and NULs for the output.

-Peff


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-14 18:59             ` Jeff King
@ 2025-03-14 19:53               ` Justin Tobler
  2025-03-14 21:16                 ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Justin Tobler @ 2025-03-14 19:53 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, ps, christian.couder

On 25/03/14 02:59PM, Jeff King wrote:
> On Fri, Mar 14, 2025 at 10:02:11AM -0700, Junio C Hamano wrote:
> > Having said that, at the plumbing level, my preference is to have
> > two independent options "--nul-delimited-{output,input}".  It does
> > not prevent us from starting with a single "-z" that works as a
> > short-hand that flips both on (and is inconsistent with "git log" at
> > the Porcelain level), but we can make "-z" only for output for
> > consistency.  As long as we agree on the design to allow us to
> > control both sides independently, starting with "-z" that is only
> > for output may be the best way forward.
> 
> Yeah, I almost suggested earlier having longer, unambiguous names. And
> then that punts the issue from "which functionality should be available"
> to "which functionality should be mapped to short-and-sweet -z".
> 
> I do think it's still worth considering what "-z" should do _now_,
> though, because it will be painful/impossible to switch its behavior
> later. And people seemed to like the "both input and output" direction.
> That would leave the longer names as escape hatches. I.e., I'd expect:
> 
>   git rev-list -z --no-nul-delimited-input --stdin
> 
> to use newlines for the input and NULs for the output.

If we want to adopt less ambiguous long options names for NUL-delimited
input/output options as an alternative to "-z", maybe we could do
something like:

    $ git rev-list --nul-delimited={all,input,output}

where the default for the `--nul-delimited` could be both input/output. 

If we want to do something along these lines, I can send another verison
of this series where we drop "-z" in favor of using this option.

-Justin


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-14 19:53               ` Justin Tobler
@ 2025-03-14 21:16                 ` Junio C Hamano
  2025-03-19 15:58                   ` Justin Tobler
  0 siblings, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2025-03-14 21:16 UTC (permalink / raw)
  To: Justin Tobler; +Cc: Jeff King, git, ps, christian.couder

Justin Tobler <jltobler@gmail.com> writes:

> If we want to adopt less ambiguous long options names for NUL-delimited
> input/output options as an alternative to "-z", maybe we could do
> something like:
>
>     $ git rev-list --nul-delimited={all,input,output}
>
> where the default for the `--nul-delimited` could be both input/output. 

I'd prefer not to see that route taken, as it does not look any
"less ambiguous" at least to me.  Making individual selections are
almost the same in either syntax, and the only difference is that
--nul-delimited-input --nul-delimited-output can be independently
chosen and given and happen to end up selecting both.

But with --nul-delimited=<value>, you have to plan ahead and choose
"all".  When your script first wants NUL delimited I/O on the output
side, you'd write "output".  When later you want to allow it to
optionally take NUL delimited I/O on the input side, you have to
notice that you have "output" there already and replace it with "all".
If the initial version did not have NUL-delimited output, your change
to add support for NUL-delimited input would be different.

And you also have to remember that it has to be spelled "all" and
not "both" when you replace existing "output".

In other words, I'd prefer to leave independent/orthogonal things as
such, even if such a general design principle may make the result a
bit more verbose, at the plumbing level.

Thanks.




^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 4/6] rev-list: support delimiting objects with NUL bytes
  2025-03-13 23:57     ` [PATCH v3 4/6] rev-list: support delimiting objects with NUL bytes Justin Tobler
@ 2025-03-19 12:35       ` Christian Couder
  2025-03-19 16:02         ` Justin Tobler
  0 siblings, 1 reply; 60+ messages in thread
From: Christian Couder @ 2025-03-19 12:35 UTC (permalink / raw)
  To: Justin Tobler; +Cc: git, ps, peff, ben.knoble

On Fri, Mar 14, 2025 at 1:01 AM Justin Tobler <jltobler@gmail.com> wrote:

> For now, the `--objects` and `--stdin` flag are the only options that
> can be used in combination with `-z`. In a subsequent commit,
> NUL-delimited support for other options is added. Other options that do
> not make sense with be used in combination with `-z` are rejected.

s/with be used/when used/

[...]

> +test_expect_success 'rev-list -z' '
> +       test_when_finished rm -rf repo &&
> +
> +       git init repo &&
> +       test_commit -C repo 1 &&
> +       test_commit -C repo 2 &&
> +
> +       oid1=$(git -C repo rev-parse HEAD) &&
> +       oid2=$(git -C repo rev-parse HEAD~) &&

It seems to me that HEAD is at commit 2 and HEAD~ at commit 1 instead
of the other way around.

It looks like there is the same issue in the test added in the next
patch ("[PATCH v3 5/6] rev-list: support NUL-delimited --boundary
option")

> +       printf "%s\0%s\0" "$oid1" "$oid2" >expect &&
> +       git -C repo rev-list -z HEAD >actual &&
> +
> +       test_cmp expect actual
> +'

Otherwise the whole patch series looks good to me.

Thanks.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] rev-list: introduce NUL-delimited output mode
  2025-03-14 21:16                 ` Junio C Hamano
@ 2025-03-19 15:58                   ` Justin Tobler
  0 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-19 15:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, ps, christian.couder

On 25/03/14 02:16PM, Junio C Hamano wrote:
> Justin Tobler <jltobler@gmail.com> writes:
> 
> > If we want to adopt less ambiguous long options names for NUL-delimited
> > input/output options as an alternative to "-z", maybe we could do
> > something like:
> >
> >     $ git rev-list --nul-delimited={all,input,output}
> >
> > where the default for the `--nul-delimited` could be both input/output. 
> 
> I'd prefer not to see that route taken, as it does not look any
> "less ambiguous" at least to me.  Making individual selections are
> almost the same in either syntax, and the only difference is that
> --nul-delimited-input --nul-delimited-output can be independently
> chosen and given and happen to end up selecting both.
> 
> But with --nul-delimited=<value>, you have to plan ahead and choose
> "all".  When your script first wants NUL delimited I/O on the output
> side, you'd write "output".  When later you want to allow it to
> optionally take NUL delimited I/O on the input side, you have to
> notice that you have "output" there already and replace it with "all".
> If the initial version did not have NUL-delimited output, your change
> to add support for NUL-delimited input would be different.
> 
> And you also have to remember that it has to be spelled "all" and
> not "both" when you replace existing "output".
> 
> In other words, I'd prefer to leave independent/orthogonal things as
> such, even if such a general design principle may make the result a
> bit more verbose, at the plumbing level.

That's fair. I agree the explicitness of having two separate options is
nice and, while more verbose, that is probably not a big deal at the
plumbing level.

In my next version I'll return the "-z" option to only setting the
output to be NUL-delimited which would better match the log family of
commands. If we also want to support NUL-delimited stdin parsing, I can
submit a followup series which teaches git-rev-list(1) the
"--nul-delimited-{input,output}" options.

On a side note, I also noticed that git-commit(1) uses the "--null"
option as an alias to "-z". I think it would be nice if going forward
there was greater consistency around the options used to control
input/output delimiters. Maybe "--nul-delimited-{input,output}" could be
that in the future.

Thanks,
-Justin


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 4/6] rev-list: support delimiting objects with NUL bytes
  2025-03-19 12:35       ` Christian Couder
@ 2025-03-19 16:02         ` Justin Tobler
  0 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-19 16:02 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, ps, peff, ben.knoble

On 25/03/19 01:35PM, Christian Couder wrote:
> On Fri, Mar 14, 2025 at 1:01 AM Justin Tobler <jltobler@gmail.com> wrote:
> 
> > +test_expect_success 'rev-list -z' '
> > +       test_when_finished rm -rf repo &&
> > +
> > +       git init repo &&
> > +       test_commit -C repo 1 &&
> > +       test_commit -C repo 2 &&
> > +
> > +       oid1=$(git -C repo rev-parse HEAD) &&
> > +       oid2=$(git -C repo rev-parse HEAD~) &&
> 
> It seems to me that HEAD is at commit 2 and HEAD~ at commit 1 instead
> of the other way around.

In this case, oid1 and oid2 were ordered based on how they would show up
in ouput, but this is somewhat confusing because its not the order they
were committed in.

I'll change it to be in commit order instead.

Thanks,
-Justin


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v4 0/5] rev-list: introduce NUL-delimited output mode
  2025-03-13 23:57   ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler
                       ` (5 preceding siblings ...)
  2025-03-13 23:57     ` [PATCH v3 6/6] rev-list: support NUL-delimited --missing option Justin Tobler
@ 2025-03-19 18:34     ` Justin Tobler
  2025-03-19 18:34       ` [PATCH v4 1/5] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
                         ` (4 more replies)
  6 siblings, 5 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-19 18:34 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, Justin Tobler

When walking objects, git-rev-list(1) prints each object entry on a
separate line in the form:

        <oid> LF

Some options, such as `--objects`, may print additional information
about the object on the same line:

        <oid> SP [<path>] LF

In this mode, if the object path contains a newline it is truncated at
the newline.

The `--boundary` option also modifies output by prefixing boundary
objects with `-`:

        -<oid> LF

When the `--missing={print,print-info}` option is provided, information
about any missing objects encountered during the object walk are also
printed in the form:

        ?<oid> [SP <token>=<value>]... LF

where values containing LF or SP are printed in a token specific fashion
so that the resulting encoded value does not contain either of these two
problematic bytes. For example, missing object paths are quoted in the C
style when they contain LF or SP.

To make machine parsing easier, this series introduces a NUL-delimited
output mode for git-rev-list(1) via a `-z` option. In this mode, the
output format for object records is unified such that each object and
its accompanying metadata is formatted without relying on object
metadata order. This format follows the existing `<token>=<value>` used
by the `--missing` option to represent object metadata in the form:

        <oid> NUL [<token>=<value> NUL]...

        # Examples
        <oid> LF                       -> <oid> NUL
        <oid> SP <path> LF             -> <oid> NUL path=<path> NUL
        -<oid> LF                      -> <oid> NUL boundary=yes NUL
        ?<oid> [SP <token>=<value>]... -> <oid> NUL missing=yes NUL [<token>=<value> NUL]...

Note that token value info is printed as-is without any special encoding
or truncation. Prefixes such as '-' and '?' are dropped in favor using a
token/value pair to signal the same information.

For now this series only adds support for use with the `--objects`,
`--boundary` and `--missing` output options. Usage of `-z` with other
options is rejected, so it can potentially be added in the future.

This series is structured as follows:

        - Patches 1 and 2 do some minor preparatory refactors.

        - Patch 3 adds the `-z` option to git-rev-list(1) to print
          objects in a NUL-delimited fashion.

        - Patch 4 teaches the `--boundary` option how to print info in a
          NUL-delimited fashino using the unified output format.

        - Patch 5 teaches the `--missing` option how to print info in a
          NUL-delimited fashion using the unified output format.

Changes since V3:

        - The -z option now only makes output NUL-delimited. Input
          parsed on stdin via the `--stdin` option remains unchanged.
          This is done to remain more consistent with other log family
          commands. Support for more explicit options to control
          NUL-delimited input/ouput behavior may be added in a future
          series via `--NUL-delimited-{input,output}` options.

        - Changed some variable names in tests that were a little
          confusing.

Changes since V2:

        - In patch 4, the documentation for the -z option now points out
          the `--stdin` behavior change earlier.

        - Minor code style and documentation changes in patch 6.

Changes since V1:

        - Use unified output format with `<token>=<value>` pairs for
          all object metadata.

        - Add support for the `--boundary` option in NUL-delimited mode.

        - Add support for NUL-delimited stdin argument parsing in
          NUL-delimited mode.

        - Instead of using two NUL bytes to delimit between object
          records, a single NUL byte is used. Now that object metadata
          is always in the form `<token>=<value>`, we know a new object
          record starts when there is an OID entry which will not
          contain '='.

Thanks for taking a look,
-Justin

Justin Tobler (5):
  rev-list: inline `show_object_with_name()` in `show_object()`
  rev-list: refactor early option parsing
  rev-list: support delimiting objects with NUL bytes
  rev-list: support NUL-delimited --boundary option
  rev-list: support NUL-delimited --missing option

 Documentation/rev-list-options.adoc | 24 ++++++++
 builtin/rev-list.c                  | 93 +++++++++++++++++++++--------
 revision.c                          |  8 ---
 revision.h                          |  2 -
 t/t6000-rev-list-misc.sh            | 51 ++++++++++++++++
 t/t6022-rev-list-missing.sh         | 31 ++++++++++
 6 files changed, 175 insertions(+), 34 deletions(-)

Range-diff against v3:
1:  d2eded3ac7 = 1:  d2eded3ac7 rev-list: inline `show_object_with_name()` in `show_object()`
2:  03cd08c859 = 2:  03cd08c859 rev-list: refactor early option parsing
3:  803a49933a < -:  ---------- revision: support NUL-delimited --stdin mode
4:  8eb7669089 ! 3:  f6ee01571d rev-list: support delimiting objects with NUL bytes
    @@ Commit message
         from `--objects` is appended to the record as a token/value pair
         `path=<path>` as-is without any truncation.
     
    -    In this mode, revision and pathspec arguments provided on stdin with the
    -    `--stdin` option are also separated by a NUL byte instead of being
    -    newline delimited.
    -
    -    For now, the `--objects` and `--stdin` flag are the only options that
    -    can be used in combination with `-z`. In a subsequent commit,
    -    NUL-delimited support for other options is added. Other options that do
    -    not make sense with be used in combination with `-z` are rejected.
    +    For now, the `--objects` flag is the only options that can be used in
    +    combination with `-z`. In a subsequent commit, NUL-delimited support for
    +    other options is added. Other options that do not make sense when used
    +    in combination with `-z` are rejected.
     
         Signed-off-by: Justin Tobler <jltobler@gmail.com>
     
    @@ Documentation/rev-list-options.adoc: ifdef::git-rev-list[]
     +
     +-z::
     +	Instead of being newline-delimited, each outputted object and its
    -+	accompanying metadata is delimited using NUL bytes. In this mode, when
    -+	the `--stdin` option is provided, revision and pathspec arguments on
    -+	stdin are also delimited using a NUL byte. Output is printed in the
    -+	following form:
    ++	accompanying metadata is delimited using NUL bytes. Output is printed
    ++	in the following form:
     ++
     +-----------------------------------------------------------------------
     +<OID> NUL [<token>=<value> NUL]...
    @@ builtin/rev-list.c: int cmd_rev_list(int argc,
      		} else if (skip_prefix(arg, "--missing=", &arg)) {
      			parse_missing_action_value(arg);
     +		} else if (!strcmp(arg, "-z")) {
    -+			s_r_opt.nul_delim_stdin = 1;
     +			line_term = '\0';
     +			info_term = '\0';
      		}
    @@ t/t6000-rev-list-misc.sh: test_expect_success 'rev-list --unpacked' '
     +	test_commit -C repo 1 &&
     +	test_commit -C repo 2 &&
     +
    -+	oid1=$(git -C repo rev-parse HEAD) &&
    -+	oid2=$(git -C repo rev-parse HEAD~) &&
    ++	oid1=$(git -C repo rev-parse HEAD~) &&
    ++	oid2=$(git -C repo rev-parse HEAD) &&
     +
    -+	printf "%s\0%s\0" "$oid1" "$oid2" >expect &&
    ++	printf "%s\0%s\0" "$oid2" "$oid1" >expect &&
     +	git -C repo rev-list -z HEAD >actual &&
     +
     +	test_cmp expect actual
    @@ t/t6000-rev-list-misc.sh: test_expect_success 'rev-list --unpacked' '
     +'
     +
      test_done
    -
    - ## t/t6017-rev-list-stdin.sh ##
    -@@ t/t6017-rev-list-stdin.sh: test_expect_success '--not via stdin does not influence revisions from command l
    - 	test_cmp expect actual
    - '
    - 
    -+test_expect_success 'NUL-delimited stdin' '
    -+	printf "%s\0%s\0%s\0" "HEAD" "--" "file-1" > input &&
    -+
    -+	git rev-list -z --objects HEAD -- file-1 >expect &&
    -+	git rev-list -z --objects --stdin <input >actual &&
    -+
    -+	test_cmp expect actual
    -+'
    -+
    - test_done
5:  591a2c7dac ! 4:  ccf6bd8d35 rev-list: support NUL-delimited --boundary option
    @@ t/t6000-rev-list-misc.sh: test_expect_success 'rev-list -z --objects' '
     +	test_commit -C repo 1 &&
     +	test_commit -C repo 2 &&
     +
    -+	oid1=$(git -C repo rev-parse HEAD) &&
    -+	oid2=$(git -C repo rev-parse HEAD~) &&
    ++	oid1=$(git -C repo rev-parse HEAD~) &&
    ++	oid2=$(git -C repo rev-parse HEAD) &&
     +
    -+	printf "%s\0%s\0boundary=yes\0" "$oid1" "$oid2" >expect &&
    ++	printf "%s\0%s\0boundary=yes\0" "$oid2" "$oid1" >expect &&
     +	git -C repo rev-list -z --boundary HEAD~.. >actual &&
     +
     +	test_cmp expect actual
6:  669b3b5d9f = 5:  b1bd245155 rev-list: support NUL-delimited --missing option

base-commit: 87a0bdbf0f72b7561f3cd50636eee33dcb7dbcc3
-- 
2.49.0.rc2



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v4 1/5] rev-list: inline `show_object_with_name()` in `show_object()`
  2025-03-19 18:34     ` [PATCH v4 0/5] rev-list: introduce NUL-delimited output mode Justin Tobler
@ 2025-03-19 18:34       ` Justin Tobler
  2025-03-19 18:34       ` [PATCH v4 2/5] rev-list: refactor early option parsing Justin Tobler
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-19 18:34 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, Justin Tobler

The `show_object_with_name()` function only has a single call site.
Inline call to `show_object_with_name()` in `show_object()` so the
explicit function can be cleaned up and live closer to where it is used.
While at it, factor out the code that prints the OID and newline for
both objects with and without a name. In a subsequent commit,
`show_object()` is modified to support printing object information in a
NUL-delimited format.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/rev-list.c | 13 +++++++++----
 revision.c         |  8 --------
 revision.h         |  2 --
 3 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index bb26bee0d4..dcd079c16c 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -357,10 +357,15 @@ static void show_object(struct object *obj, const char *name, void *cb_data)
 		return;
 	}
 
-	if (arg_show_object_names)
-		show_object_with_name(stdout, obj, name);
-	else
-		printf("%s\n", oid_to_hex(&obj->oid));
+	printf("%s", oid_to_hex(&obj->oid));
+
+	if (arg_show_object_names) {
+		putchar(' ');
+		for (const char *p = name; *p && *p != '\n'; p++)
+			putchar(*p);
+	}
+
+	putchar('\n');
 }
 
 static void show_edge(struct commit *commit)
diff --git a/revision.c b/revision.c
index c4390f0938..0eaebe4478 100644
--- a/revision.c
+++ b/revision.c
@@ -59,14 +59,6 @@ implement_shared_commit_slab(revision_sources, char *);
 
 static inline int want_ancestry(const struct rev_info *revs);
 
-void show_object_with_name(FILE *out, struct object *obj, const char *name)
-{
-	fprintf(out, "%s ", oid_to_hex(&obj->oid));
-	for (const char *p = name; *p && *p != '\n'; p++)
-		fputc(*p, out);
-	fputc('\n', out);
-}
-
 static void mark_blob_uninteresting(struct blob *blob)
 {
 	if (!blob)
diff --git a/revision.h b/revision.h
index 71e984c452..21c6a69899 100644
--- a/revision.h
+++ b/revision.h
@@ -489,8 +489,6 @@ void mark_parents_uninteresting(struct rev_info *revs, struct commit *commit);
 void mark_tree_uninteresting(struct repository *r, struct tree *tree);
 void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *trees);
 
-void show_object_with_name(FILE *, struct object *, const char *);
-
 /**
  * Helpers to check if a reference should be excluded.
  */
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 2/5] rev-list: refactor early option parsing
  2025-03-19 18:34     ` [PATCH v4 0/5] rev-list: introduce NUL-delimited output mode Justin Tobler
  2025-03-19 18:34       ` [PATCH v4 1/5] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
@ 2025-03-19 18:34       ` Justin Tobler
  2025-03-19 18:34       ` [PATCH v4 3/5] rev-list: support delimiting objects with NUL bytes Justin Tobler
                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-19 18:34 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, Justin Tobler

Before invoking `setup_revisions()`, the `--missing` and
`--exclude-promisor-objects` options are parsed early. In a subsequent
commit, another option is added that must be parsed early.

Refactor the code to parse both options in a single early pass.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 builtin/rev-list.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index dcd079c16c..04d9c893b5 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -16,6 +16,7 @@
 #include "object-file.h"
 #include "object-store-ll.h"
 #include "pack-bitmap.h"
+#include "parse-options.h"
 #include "log-tree.h"
 #include "graph.h"
 #include "bisect.h"
@@ -639,19 +640,15 @@ int cmd_rev_list(int argc,
 		if (!strcmp(arg, "--exclude-promisor-objects")) {
 			fetch_if_missing = 0;
 			revs.exclude_promisor_objects = 1;
-			break;
-		}
-	}
-	for (i = 1; i < argc; i++) {
-		const char *arg = argv[i];
-		if (skip_prefix(arg, "--missing=", &arg)) {
-			if (revs.exclude_promisor_objects)
-				die(_("options '%s' and '%s' cannot be used together"), "--exclude-promisor-objects", "--missing");
-			if (parse_missing_action_value(arg))
-				break;
+		} else if (skip_prefix(arg, "--missing=", &arg)) {
+			parse_missing_action_value(arg);
 		}
 	}
 
+	die_for_incompatible_opt2(revs.exclude_promisor_objects,
+				  "--exclude_promisor_objects",
+				  arg_missing_action, "--missing");
+
 	if (arg_missing_action)
 		revs.do_not_die_on_missing_objects = 1;
 
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 3/5] rev-list: support delimiting objects with NUL bytes
  2025-03-19 18:34     ` [PATCH v4 0/5] rev-list: introduce NUL-delimited output mode Justin Tobler
  2025-03-19 18:34       ` [PATCH v4 1/5] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
  2025-03-19 18:34       ` [PATCH v4 2/5] rev-list: refactor early option parsing Justin Tobler
@ 2025-03-19 18:34       ` Justin Tobler
  2025-03-19 18:34       ` [PATCH v4 4/5] rev-list: support NUL-delimited --boundary option Justin Tobler
  2025-03-19 18:34       ` [PATCH v4 5/5] rev-list: support NUL-delimited --missing option Justin Tobler
  4 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-19 18:34 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, Justin Tobler

When walking objects, git-rev-list(1) prints each object entry on a
separate line. Some options, such as `--objects`, may print additional
information about tree and blob object on the same line in the form:

        $ git rev-list --objects <rev>
        <tree/blob oid> SP [<path>] LF

Note that in this form the SP is appended regardless of whether the tree
or blob object has path information available. Paths containing a
newline are also truncated at the newline.

Introduce the `-z` option for git-rev-list(1) which reformats the output
to use NUL-delimiters between objects and associated info in the
following form:

        $ git rev-list -z --objects <rev>
        <oid> NUL [path=<path> NUL]

In this form, the start of each record is signaled by an OID entry that
is all hexidecimal and does not contain any '='. Additional path info
from `--objects` is appended to the record as a token/value pair
`path=<path>` as-is without any truncation.

For now, the `--objects` flag is the only options that can be used in
combination with `-z`. In a subsequent commit, NUL-delimited support for
other options is added. Other options that do not make sense when used
in combination with `-z` are rejected.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/rev-list-options.adoc | 21 +++++++++++++++++
 builtin/rev-list.c                  | 35 ++++++++++++++++++++++++-----
 t/t6000-rev-list-misc.sh            | 35 +++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+), 5 deletions(-)

diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
index 785c0786e0..aef83813b8 100644
--- a/Documentation/rev-list-options.adoc
+++ b/Documentation/rev-list-options.adoc
@@ -361,6 +361,27 @@ ifdef::git-rev-list[]
 --progress=<header>::
 	Show progress reports on stderr as objects are considered. The
 	`<header>` text will be printed with each progress update.
+
+-z::
+	Instead of being newline-delimited, each outputted object and its
+	accompanying metadata is delimited using NUL bytes. Output is printed
+	in the following form:
++
+-----------------------------------------------------------------------
+<OID> NUL [<token>=<value> NUL]...
+-----------------------------------------------------------------------
++
+Additional object metadata, such as object paths, is printed using the
+`<token>=<value>` form. Token values are printed as-is without any
+encoding/truncation. An OID entry never contains a '=' character and thus
+is used to signal the start of a new object record. Examples:
++
+-----------------------------------------------------------------------
+<OID> NUL
+<OID> NUL path=<path> NUL
+-----------------------------------------------------------------------
++
+This mode is only compatible with the `--objects` output option.
 endif::git-rev-list[]
 
 History Simplification
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 04d9c893b5..17de99d9ca 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -65,6 +65,7 @@ static const char rev_list_usage[] =
 "    --abbrev-commit\n"
 "    --left-right\n"
 "    --count\n"
+"    -z\n"
 "  special purpose:\n"
 "    --bisect\n"
 "    --bisect-vars\n"
@@ -97,6 +98,9 @@ static int arg_show_object_names = 1;
 
 #define DEFAULT_OIDSET_SIZE     (16*1024)
 
+static char line_term = '\n';
+static char info_term = ' ';
+
 static int show_disk_usage;
 static off_t total_disk_usage;
 static int human_readable;
@@ -264,7 +268,7 @@ static void show_commit(struct commit *commit, void *data)
 	if (revs->commit_format == CMIT_FMT_ONELINE)
 		putchar(' ');
 	else if (revs->include_header)
-		putchar('\n');
+		putchar(line_term);
 
 	if (revs->verbose_header) {
 		struct strbuf buf = STRBUF_INIT;
@@ -361,12 +365,16 @@ static void show_object(struct object *obj, const char *name, void *cb_data)
 	printf("%s", oid_to_hex(&obj->oid));
 
 	if (arg_show_object_names) {
-		putchar(' ');
-		for (const char *p = name; *p && *p != '\n'; p++)
-			putchar(*p);
+		if (line_term) {
+			putchar(info_term);
+			for (const char *p = name; *p && *p != '\n'; p++)
+				putchar(*p);
+		} else if (*name) {
+			printf("%cpath=%s", info_term, name);
+		}
 	}
 
-	putchar('\n');
+	putchar(line_term);
 }
 
 static void show_edge(struct commit *commit)
@@ -642,6 +650,9 @@ int cmd_rev_list(int argc,
 			revs.exclude_promisor_objects = 1;
 		} else if (skip_prefix(arg, "--missing=", &arg)) {
 			parse_missing_action_value(arg);
+		} else if (!strcmp(arg, "-z")) {
+			line_term = '\0';
+			info_term = '\0';
 		}
 	}
 
@@ -757,6 +768,20 @@ int cmd_rev_list(int argc,
 		usage(rev_list_usage);
 
 	}
+
+	/*
+	 * Reject options currently incompatible with -z. For some options, this
+	 * is not an inherent limitation and support may be implemented in the
+	 * future.
+	 */
+	if (!line_term) {
+		if (revs.graph || revs.verbose_header || show_disk_usage ||
+		    info.show_timestamp || info.header_prefix || bisect_list ||
+		    use_bitmap_index || revs.edge_hint || revs.left_right ||
+		    revs.cherry_mark || arg_missing_action || revs.boundary)
+			die(_("-z option used with unsupported option"));
+	}
+
 	if (revs.commit_format != CMIT_FMT_USERFORMAT)
 		revs.include_header = 1;
 	if (revs.commit_format != CMIT_FMT_UNSPECIFIED) {
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index 6289a2e8b0..886e2fc710 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -182,4 +182,39 @@ test_expect_success 'rev-list --unpacked' '
 	test_cmp expect actual
 '
 
+test_expect_success 'rev-list -z' '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	test_commit -C repo 1 &&
+	test_commit -C repo 2 &&
+
+	oid1=$(git -C repo rev-parse HEAD~) &&
+	oid2=$(git -C repo rev-parse HEAD) &&
+
+	printf "%s\0%s\0" "$oid2" "$oid1" >expect &&
+	git -C repo rev-list -z HEAD >actual &&
+
+	test_cmp expect actual
+'
+
+test_expect_success 'rev-list -z --objects' '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	test_commit -C repo 1 &&
+	test_commit -C repo 2 &&
+
+	oid1=$(git -C repo rev-parse HEAD:1.t) &&
+	oid2=$(git -C repo rev-parse HEAD:2.t) &&
+	path1=1.t &&
+	path2=2.t &&
+
+	printf "%s\0path=%s\0%s\0path=%s\0" "$oid1" "$path1" "$oid2" "$path2" \
+		>expect &&
+	git -C repo rev-list -z --objects HEAD:1.t HEAD:2.t >actual &&
+
+	test_cmp expect actual
+'
+
 test_done
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 4/5] rev-list: support NUL-delimited --boundary option
  2025-03-19 18:34     ` [PATCH v4 0/5] rev-list: introduce NUL-delimited output mode Justin Tobler
                         ` (2 preceding siblings ...)
  2025-03-19 18:34       ` [PATCH v4 3/5] rev-list: support delimiting objects with NUL bytes Justin Tobler
@ 2025-03-19 18:34       ` Justin Tobler
  2025-03-19 18:34       ` [PATCH v4 5/5] rev-list: support NUL-delimited --missing option Justin Tobler
  4 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-19 18:34 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, Justin Tobler

The `--boundary` option for git-rev-list(1) prints boundary objects
found while performing the object walk in the form:

        $ git rev-list --boundary <rev>
        -<oid> LF

Add support for printing boundary objects in a NUL-delimited format when
the `-z` option is enabled.

        $ git rev-list -z --boundary <rev>
        <oid> NUL boundary=yes NUL

In this mode, instead of prefixing the boundary OID with '-', a separate
`boundary=yes` token/value pair is appended.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/rev-list-options.adoc | 12 +++++++-----
 builtin/rev-list.c                  |  9 +++++++--
 t/t6000-rev-list-misc.sh            | 16 ++++++++++++++++
 3 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
index aef83813b8..3fc9902d6b 100644
--- a/Documentation/rev-list-options.adoc
+++ b/Documentation/rev-list-options.adoc
@@ -371,17 +371,19 @@ ifdef::git-rev-list[]
 <OID> NUL [<token>=<value> NUL]...
 -----------------------------------------------------------------------
 +
-Additional object metadata, such as object paths, is printed using the
-`<token>=<value>` form. Token values are printed as-is without any
-encoding/truncation. An OID entry never contains a '=' character and thus
-is used to signal the start of a new object record. Examples:
+Additional object metadata, such as object paths or boundary objects, is
+printed using the `<token>=<value>` form. Token values are printed as-is
+without any encoding/truncation. An OID entry never contains a '=' character
+and thus is used to signal the start of a new object record. Examples:
 +
 -----------------------------------------------------------------------
 <OID> NUL
 <OID> NUL path=<path> NUL
+<OID> NUL boundary=yes NUL
 -----------------------------------------------------------------------
 +
-This mode is only compatible with the `--objects` output option.
+This mode is only compatible with the `--objects` and `--boundary` output
+options.
 endif::git-rev-list[]
 
 History Simplification
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 17de99d9ca..bcb880f109 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -240,13 +240,18 @@ static void show_commit(struct commit *commit, void *data)
 		fputs(info->header_prefix, stdout);
 
 	if (revs->include_header) {
-		if (!revs->graph)
+		if (!revs->graph && line_term)
 			fputs(get_revision_mark(revs, commit), stdout);
 		if (revs->abbrev_commit && revs->abbrev)
 			fputs(repo_find_unique_abbrev(the_repository, &commit->object.oid, revs->abbrev),
 			      stdout);
 		else
 			fputs(oid_to_hex(&commit->object.oid), stdout);
+
+		if (!line_term) {
+			if (commit->object.flags & BOUNDARY)
+				printf("%cboundary=yes", info_term);
+		}
 	}
 	if (revs->print_parents) {
 		struct commit_list *parents = commit->parents;
@@ -778,7 +783,7 @@ int cmd_rev_list(int argc,
 		if (revs.graph || revs.verbose_header || show_disk_usage ||
 		    info.show_timestamp || info.header_prefix || bisect_list ||
 		    use_bitmap_index || revs.edge_hint || revs.left_right ||
-		    revs.cherry_mark || arg_missing_action || revs.boundary)
+		    revs.cherry_mark || arg_missing_action)
 			die(_("-z option used with unsupported option"));
 	}
 
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index 886e2fc710..33881274a4 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -217,4 +217,20 @@ test_expect_success 'rev-list -z --objects' '
 	test_cmp expect actual
 '
 
+test_expect_success 'rev-list -z --boundary' '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	test_commit -C repo 1 &&
+	test_commit -C repo 2 &&
+
+	oid1=$(git -C repo rev-parse HEAD~) &&
+	oid2=$(git -C repo rev-parse HEAD) &&
+
+	printf "%s\0%s\0boundary=yes\0" "$oid2" "$oid1" >expect &&
+	git -C repo rev-list -z --boundary HEAD~.. >actual &&
+
+	test_cmp expect actual
+'
+
 test_done
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 5/5] rev-list: support NUL-delimited --missing option
  2025-03-19 18:34     ` [PATCH v4 0/5] rev-list: introduce NUL-delimited output mode Justin Tobler
                         ` (3 preceding siblings ...)
  2025-03-19 18:34       ` [PATCH v4 4/5] rev-list: support NUL-delimited --boundary option Justin Tobler
@ 2025-03-19 18:34       ` Justin Tobler
  4 siblings, 0 replies; 60+ messages in thread
From: Justin Tobler @ 2025-03-19 18:34 UTC (permalink / raw)
  To: git; +Cc: ps, christian.couder, Justin Tobler

The `--missing={print,print-info}` option for git-rev-list(1) prints
missing objects found while performing the object walk in the form:

        $ git rev-list --missing=print-info <rev>
        ?<oid> [SP <token>=<value>]... LF

Add support for printing missing objects in a NUL-delimited format when
the `-z` option is enabled.

        $ git rev-list -z --missing=print-info <rev>
        <oid> NUL missing=yes NUL [<token>=<value> NUL]...

In this mode, values containing special characters or spaces are printed
as-is without being escaped or quoted. Instead of prefixing the missing
OID with '?', a separate `missing=yes` token/value pair is appended.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---
 Documentation/rev-list-options.adoc |  5 +++--
 builtin/rev-list.c                  | 31 ++++++++++++++++++++---------
 t/t6022-rev-list-missing.sh         | 31 +++++++++++++++++++++++++++++
 3 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
index 3fc9902d6b..0e5605a85e 100644
--- a/Documentation/rev-list-options.adoc
+++ b/Documentation/rev-list-options.adoc
@@ -380,10 +380,11 @@ and thus is used to signal the start of a new object record. Examples:
 <OID> NUL
 <OID> NUL path=<path> NUL
 <OID> NUL boundary=yes NUL
+<OID> NUL missing=yes NUL [<token>=<value> NUL]...
 -----------------------------------------------------------------------
 +
-This mode is only compatible with the `--objects` and `--boundary` output
-options.
+This mode is only compatible with the `--objects`, `--boundary`, and
+`--missing` output options.
 endif::git-rev-list[]
 
 History Simplification
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index bcb880f109..e6ee3f82ee 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -136,24 +136,37 @@ static void print_missing_object(struct missing_objects_map_entry *entry,
 {
 	struct strbuf sb = STRBUF_INIT;
 
+	if (line_term)
+		printf("?%s", oid_to_hex(&entry->entry.oid));
+	else
+		printf("%s%cmissing=yes", oid_to_hex(&entry->entry.oid),
+		       info_term);
+
 	if (!print_missing_info) {
-		printf("?%s\n", oid_to_hex(&entry->entry.oid));
+		putchar(line_term);
 		return;
 	}
 
 	if (entry->path && *entry->path) {
-		struct strbuf path = STRBUF_INIT;
+		strbuf_addf(&sb, "%cpath=", info_term);
+
+		if (line_term) {
+			struct strbuf path = STRBUF_INIT;
 
-		strbuf_addstr(&sb, " path=");
-		quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP);
-		strbuf_addbuf(&sb, &path);
+			quote_path(entry->path, NULL, &path, QUOTE_PATH_QUOTE_SP);
+			strbuf_addbuf(&sb, &path);
 
-		strbuf_release(&path);
+			strbuf_release(&path);
+		} else {
+			strbuf_addstr(&sb, entry->path);
+		}
 	}
 	if (entry->type)
-		strbuf_addf(&sb, " type=%s", type_name(entry->type));
+		strbuf_addf(&sb, "%ctype=%s", info_term, type_name(entry->type));
+
+	fwrite(sb.buf, sizeof(char), sb.len, stdout);
+	putchar(line_term);
 
-	printf("?%s%s\n", oid_to_hex(&entry->entry.oid), sb.buf);
 	strbuf_release(&sb);
 }
 
@@ -783,7 +796,7 @@ int cmd_rev_list(int argc,
 		if (revs.graph || revs.verbose_header || show_disk_usage ||
 		    info.show_timestamp || info.header_prefix || bisect_list ||
 		    use_bitmap_index || revs.edge_hint || revs.left_right ||
-		    revs.cherry_mark || arg_missing_action)
+		    revs.cherry_mark)
 			die(_("-z option used with unsupported option"));
 	}
 
diff --git a/t/t6022-rev-list-missing.sh b/t/t6022-rev-list-missing.sh
index 3e2790d4c8..08e92dd002 100755
--- a/t/t6022-rev-list-missing.sh
+++ b/t/t6022-rev-list-missing.sh
@@ -198,4 +198,35 @@ do
 	'
 done
 
+test_expect_success "-z nul-delimited --missing" '
+	test_when_finished rm -rf repo &&
+
+	git init repo &&
+	(
+		cd repo &&
+		git commit --allow-empty -m first &&
+
+		path="foo bar" &&
+		echo foobar >"$path" &&
+		git add -A &&
+		git commit -m second &&
+
+		oid=$(git rev-parse "HEAD:$path") &&
+		type="$(git cat-file -t $oid)" &&
+
+		obj_path=".git/objects/$(test_oid_to_path $oid)" &&
+
+		git rev-list -z --objects --no-object-names \
+			HEAD ^"$oid" >expect &&
+		printf "%s\0missing=yes\0path=%s\0type=%s\0" "$oid" "$path" \
+			"$type" >>expect &&
+
+		mv "$obj_path" "$obj_path.hidden" &&
+		git rev-list -z --objects --no-object-names \
+			--missing=print-info HEAD >actual &&
+
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.49.0.rc2



^ permalink raw reply related	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2025-03-19 18:38 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-10 19:28 [PATCH 0/4] rev-list: introduce NUL-delimited output mode Justin Tobler
2025-03-10 19:28 ` [PATCH 1/4] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
2025-03-10 20:51   ` Junio C Hamano
2025-03-10 19:28 ` [PATCH 2/4] rev-list: refactor early option parsing Justin Tobler
2025-03-10 20:54   ` Junio C Hamano
2025-03-12 21:39     ` Justin Tobler
2025-03-10 19:28 ` [PATCH 3/4] rev-list: support delimiting objects with NUL bytes Justin Tobler
2025-03-10 20:59   ` Junio C Hamano
2025-03-12 21:39     ` Justin Tobler
2025-03-12  7:50   ` Patrick Steinhardt
2025-03-12 21:41     ` Justin Tobler
2025-03-10 19:28 ` [PATCH 4/4] rev-list: support NUL-delimited --missing option Justin Tobler
2025-03-10 20:37 ` [PATCH 0/4] rev-list: introduce NUL-delimited output mode Junio C Hamano
2025-03-10 21:08   ` Junio C Hamano
2025-03-11 23:24     ` Justin Tobler
2025-03-11 23:19   ` Justin Tobler
2025-03-11 23:44     ` Junio C Hamano
2025-03-12  7:37       ` Patrick Steinhardt
2025-03-12 21:45         ` Justin Tobler
2025-03-10 22:38 ` D. Ben Knoble
2025-03-11 22:59   ` Justin Tobler
2025-03-11 23:57 ` Jeff King
2025-03-12  7:42   ` Patrick Steinhardt
2025-03-12 15:56     ` Junio C Hamano
2025-03-13  7:46       ` Patrick Steinhardt
2025-03-12 22:09   ` Justin Tobler
2025-03-13  5:33     ` Jeff King
2025-03-13 16:41       ` Justin Tobler
2025-03-14  2:49         ` Jeff King
2025-03-14 17:02           ` Junio C Hamano
2025-03-14 18:59             ` Jeff King
2025-03-14 19:53               ` Justin Tobler
2025-03-14 21:16                 ` Junio C Hamano
2025-03-19 15:58                   ` Justin Tobler
2025-03-13  0:17 ` [PATCH v2 0/6] " Justin Tobler
2025-03-13  0:17   ` [PATCH v2 1/6] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
2025-03-13  0:17   ` [PATCH v2 2/6] rev-list: refactor early option parsing Justin Tobler
2025-03-13  0:17   ` [PATCH v2 3/6] revision: support NUL-delimited --stdin mode Justin Tobler
2025-03-13  0:17   ` [PATCH v2 4/6] rev-list: support delimiting objects with NUL bytes Justin Tobler
2025-03-13 12:55     ` Patrick Steinhardt
2025-03-13 14:44       ` Justin Tobler
2025-03-13  0:17   ` [PATCH v2 5/6] rev-list: support NUL-delimited --boundary option Justin Tobler
2025-03-13  0:17   ` [PATCH v2 6/6] rev-list: support NUL-delimited --missing option Justin Tobler
2025-03-13 12:55     ` Patrick Steinhardt
2025-03-13 14:51       ` Justin Tobler
2025-03-13 23:57   ` [PATCH v3 0/6] rev-list: introduce NUL-delimited output mode Justin Tobler
2025-03-13 23:57     ` [PATCH v3 1/6] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
2025-03-13 23:57     ` [PATCH v3 2/6] rev-list: refactor early option parsing Justin Tobler
2025-03-13 23:57     ` [PATCH v3 3/6] revision: support NUL-delimited --stdin mode Justin Tobler
2025-03-13 23:57     ` [PATCH v3 4/6] rev-list: support delimiting objects with NUL bytes Justin Tobler
2025-03-19 12:35       ` Christian Couder
2025-03-19 16:02         ` Justin Tobler
2025-03-13 23:57     ` [PATCH v3 5/6] rev-list: support NUL-delimited --boundary option Justin Tobler
2025-03-13 23:57     ` [PATCH v3 6/6] rev-list: support NUL-delimited --missing option Justin Tobler
2025-03-19 18:34     ` [PATCH v4 0/5] rev-list: introduce NUL-delimited output mode Justin Tobler
2025-03-19 18:34       ` [PATCH v4 1/5] rev-list: inline `show_object_with_name()` in `show_object()` Justin Tobler
2025-03-19 18:34       ` [PATCH v4 2/5] rev-list: refactor early option parsing Justin Tobler
2025-03-19 18:34       ` [PATCH v4 3/5] rev-list: support delimiting objects with NUL bytes Justin Tobler
2025-03-19 18:34       ` [PATCH v4 4/5] rev-list: support NUL-delimited --boundary option Justin Tobler
2025-03-19 18:34       ` [PATCH v4 5/5] rev-list: support NUL-delimited --missing option Justin Tobler

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).