git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/7] commit-graph: split strategies, '--[no-]check-oids'
@ 2020-04-14  4:03 Taylor Blau
  2020-04-14  4:04 ` [PATCH 1/7] t/helper/test-read-graph.c: support commit-graph chains Taylor Blau
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Taylor Blau @ 2020-04-14  4:03 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, martin.agren, peff, szeder.dev

Hi,

This is a reworked version of a couple of series [1, 2] that add various
options to the 'git commit-graph write' builtin. I had developed [1, 2]
while deploying incremental commit-graphs at GitHub, and so there was a
lot in flux about what options we needed and so on.

Now that the dust has settled, I cleaned up all of the patches generated
during that time into a grab-bag series that reflects what we ended up
sticking with.

Namely, I got rid of the '--input=<source>' options ('--input=none' was
confusing at best, and we ended up not needing it), as well as
'--split=merge-all'. We used to use this to roll up all of the
incremental graphs generated during pushes into one big length-1 chain
every N pushes. But, we ended up abandoning this, too, in favor of
'--split=replace', which is described in the fourth patch. I'm not sure
if anyone will be sad to see '--split=merge-all' go, so if you are,
please let me know and I'll drag that patch back in.

Thanks for all of the review that I've received so far, and thanks in
advance for your review on these patches.

[1]: https://lore.kernel.org/git/cover.1580430057.git.me@ttaylorr.com/
[2]: https://lore.kernel.org/git/cover.1584762087.git.me@ttaylorr.com/

Taylor Blau (7):
  t/helper/test-read-graph.c: support commit-graph chains
  builtin/commit-graph.c: support for '--split[=<strategy>]'
  builtin/commit-graph.c: introduce split strategy 'no-merge'
  builtin/commit-graph.c: introduce split strategy 'replace'
  oidset: introduce 'oidset_size'
  commit-graph.h: replace 'commit_hex' with 'commits'
  commit-graph.c: introduce '--[no-]check-oids'

 Documentation/git-commit-graph.txt |  22 +++--
 builtin/commit-graph.c             |  59 +++++++++++--
 commit-graph.c                     | 129 +++++++++++++++++++----------
 commit-graph.h                     |  10 ++-
 oidset.c                           |   5 ++
 oidset.h                           |   5 ++
 t/helper/test-read-graph.c         |  13 +--
 t/t5318-commit-graph.sh            |  30 ++++++-
 t/t5324-split-commit-graph.sh      |  30 +++++++
 9 files changed, 230 insertions(+), 73 deletions(-)

--
2.26.0.106.g9fadedd637

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/7] t/helper/test-read-graph.c: support commit-graph chains
  2020-04-14  4:03 [PATCH 0/7] commit-graph: split strategies, '--[no-]check-oids' Taylor Blau
@ 2020-04-14  4:04 ` Taylor Blau
  2020-04-14  4:04 ` [PATCH 2/7] builtin/commit-graph.c: support for '--split[=<strategy>]' Taylor Blau
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Taylor Blau @ 2020-04-14  4:04 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, martin.agren, peff, szeder.dev

In 61df89c8e5 (commit-graph: don't early exit(1) on e.g. "git status",
2019-03-25), the former 'load_commit_graph_one' was refactored into
'open_commit_graph' and 'load_commit_graph_one_fd_st' as a means of
avoiding an early-exit from non-library code.

However, 'load_commit_graph_one' does not support commit-graph chains,
and hence the 'read-graph' test tool does not work with them.

Replace 'load_commit_graph_one' with 'read_commit_graph_one' in order to
support commit-graph chains. In the spirit of 61df89c8e5,
'read_commit_graph_one' does not ever 'die()', making it a suitable
replacement here.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/helper/test-read-graph.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c
index f8a461767c..4846040363 100644
--- a/t/helper/test-read-graph.c
+++ b/t/helper/test-read-graph.c
@@ -7,26 +7,15 @@
 int cmd__read_graph(int argc, const char **argv)
 {
 	struct commit_graph *graph = NULL;
-	char *graph_name;
-	int open_ok;
-	int fd;
-	struct stat st;
 	struct object_directory *odb;
 
 	setup_git_directory();
 	odb = the_repository->objects->odb;
 
-	graph_name = get_commit_graph_filename(odb);
-
-	open_ok = open_commit_graph(graph_name, &fd, &st);
-	if (!open_ok)
-		die_errno(_("Could not open commit-graph '%s'"), graph_name);
-
-	graph = load_commit_graph_one_fd_st(fd, &st, odb);
+	graph = read_commit_graph_one(the_repository, odb);
 	if (!graph)
 		return 1;
 
-	FREE_AND_NULL(graph_name);
 
 	printf("header: %08x %d %d %d %d\n",
 		ntohl(*(uint32_t*)graph->data),
-- 
2.26.0.106.g9fadedd637


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/7] builtin/commit-graph.c: support for '--split[=<strategy>]'
  2020-04-14  4:03 [PATCH 0/7] commit-graph: split strategies, '--[no-]check-oids' Taylor Blau
  2020-04-14  4:04 ` [PATCH 1/7] t/helper/test-read-graph.c: support commit-graph chains Taylor Blau
@ 2020-04-14  4:04 ` Taylor Blau
  2020-04-14  4:04 ` [PATCH 3/7] builtin/commit-graph.c: introduce split strategy 'no-merge' Taylor Blau
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Taylor Blau @ 2020-04-14  4:04 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, martin.agren, peff, szeder.dev

With '--split', the commit-graph machinery writes new commits in another
incremental commit-graph which is part of the existing chain, and
optionally decides to condense the chain into a single commit-graph.
This is done to ensure that the asymptotic behavior of looking up a
commit in an incremental chain is not dominated by the number of
incrementals in that chain. It can be controlled by the '--max-commits'
and '--size-multiple' options.

In the next two commits, we will introduce additional splitting
strategies that can exert additional control over:

  - when a split commit-graph is and isn't written, and

  - when the existing commit-graph chain is discarded completely and
    replaced with another graph

To prepare for this, make '--split' take an optional strategy (as in
'--split[=<strategy>]'), and add a new enum to describe which strategy
is being used. For now, no strategies are given, and the only enumerated
value is 'COMMIT_GRAPH_SPLIT_UNSPECIFIED', indicating the absence of a
strategy.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-commit-graph.txt | 11 ++++++-----
 builtin/commit-graph.c             | 26 ++++++++++++++++++++++----
 commit-graph.h                     |  5 +++++
 3 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 28d1fee505..10d757c5cc 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -57,11 +57,12 @@ or `--stdin-packs`.)
 With the `--append` option, include all commits that are present in the
 existing commit-graph file.
 +
-With the `--split` option, write the commit-graph as a chain of multiple
-commit-graph files stored in `<dir>/info/commit-graphs`. The new commits
-not already in the commit-graph are added in a new "tip" file. This file
-is merged with the existing file if the following merge conditions are
-met:
+With the `--split[=<strategy>]` option, write the commit-graph as a
+chain of multiple commit-graph files stored in
+`<dir>/info/commit-graphs`. Commit-graph layers are merged based on the
+strategy and other splitting options. The new commits not already in the
+commit-graph are added in a new "tip" file. This file is merged with the
+existing file if the following merge conditions are met:
 +
 * If `--size-multiple=<X>` is not specified, let `X` equal 2. If the new
 tip file would have `N` commits and the previous tip has `M` commits and
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index d1ab6625f6..342094fc77 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -9,7 +9,9 @@
 
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"),
-	N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--[no-]progress] <split options>"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] "
+	   "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] "
+	   "[--[no-]progress] <split options>"),
 	NULL
 };
 
@@ -19,7 +21,9 @@ static const char * const builtin_commit_graph_verify_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--[no-]progress] <split options>"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] "
+	   "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] "
+	   "[--[no-]progress] <split options>"),
 	NULL
 };
 
@@ -114,6 +118,18 @@ static int graph_verify(int argc, const char **argv)
 extern int read_replace_refs;
 static struct split_commit_graph_opts split_opts;
 
+static int write_option_parse_split(const struct option *opt, const char *arg,
+				    int unset)
+{
+	opts.split = 1;
+	if (!arg)
+		return 0;
+
+	die(_("unrecognized --split argument, %s"), arg);
+
+	return 0;
+}
+
 static int graph_write(int argc, const char **argv)
 {
 	struct string_list *pack_indexes = NULL;
@@ -136,8 +152,10 @@ static int graph_write(int argc, const char **argv)
 		OPT_BOOL(0, "append", &opts.append,
 			N_("include all commits already in the commit-graph file")),
 		OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
-		OPT_BOOL(0, "split", &opts.split,
-			N_("allow writing an incremental commit-graph file")),
+		OPT_CALLBACK_F(0, "split", &split_opts.flags, NULL,
+			N_("allow writing an incremental commit-graph file"),
+			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
+			write_option_parse_split),
 		OPT_INTEGER(0, "max-commits", &split_opts.max_commits,
 			N_("maximum number of commits in a non-base split commit-graph")),
 		OPT_INTEGER(0, "size-multiple", &split_opts.size_multiple,
diff --git a/commit-graph.h b/commit-graph.h
index e87a6f6360..e799008ff4 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -82,10 +82,15 @@ enum commit_graph_write_flags {
 	COMMIT_GRAPH_WRITE_CHECK_OIDS = (1 << 3)
 };
 
+enum commit_graph_split_flags {
+	COMMIT_GRAPH_SPLIT_UNSPECIFIED = 0
+};
+
 struct split_commit_graph_opts {
 	int size_multiple;
 	int max_commits;
 	timestamp_t expire_time;
+	enum commit_graph_split_flags flags;
 };
 
 /*
-- 
2.26.0.106.g9fadedd637


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/7] builtin/commit-graph.c: introduce split strategy 'no-merge'
  2020-04-14  4:03 [PATCH 0/7] commit-graph: split strategies, '--[no-]check-oids' Taylor Blau
  2020-04-14  4:04 ` [PATCH 1/7] t/helper/test-read-graph.c: support commit-graph chains Taylor Blau
  2020-04-14  4:04 ` [PATCH 2/7] builtin/commit-graph.c: support for '--split[=<strategy>]' Taylor Blau
@ 2020-04-14  4:04 ` Taylor Blau
  2020-04-14  4:04 ` [PATCH 4/7] builtin/commit-graph.c: introduce split strategy 'replace' Taylor Blau
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Taylor Blau @ 2020-04-14  4:04 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, martin.agren, peff, szeder.dev

In the previous commit, we laid the groundwork for supporting different
splitting strategies. In this commit, we introduce the first splitting
strategy: 'no-merge'.

Passing '--split=no-merge' is useful for callers which wish to write a
new incremental commit-graph, but do not want to spend effort condensing
the incremental chain [1]. Previously, this was possible by passing
'--size-multiple=0', but this no longer the case following 63020f175f
(commit-graph: prefer default size_mult when given zero, 2020-01-02).

When '--split=no-merge' is given, the commit-graph machinery will never
condense an existing chain, and it will always write a new incremental.

[1]: This might occur when, for example, a server administrator running
some program after each push may want to ensure that each job runs
proportional in time to the size of the push, and does not "jump" when
the commit-graph machinery decides to trigger a merge.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-commit-graph.txt |  5 +++++
 builtin/commit-graph.c             |  7 ++++++-
 commit-graph.c                     | 19 ++++++++++++-------
 commit-graph.h                     |  3 ++-
 t/t5324-split-commit-graph.sh      | 11 +++++++++++
 5 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 10d757c5cc..a4c4a641e5 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -63,6 +63,11 @@ chain of multiple commit-graph files stored in
 strategy and other splitting options. The new commits not already in the
 commit-graph are added in a new "tip" file. This file is merged with the
 existing file if the following merge conditions are met:
+* If `--split=no-merge` is specified, a merge is never performed, and
+the remaining options are ignored. A bare `--split` defers to the
+remaining options. (Note that merging a chain of commit graphs replaces
+the existing chain with a length-1 chain where the first and only
+incremental holds the entire graph).
 +
 * If `--size-multiple=<X>` is not specified, let `X` equal 2. If the new
 tip file would have `N` commits and the previous tip has `M` commits and
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 342094fc77..42182ed71c 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -121,11 +121,16 @@ static struct split_commit_graph_opts split_opts;
 static int write_option_parse_split(const struct option *opt, const char *arg,
 				    int unset)
 {
+	enum commit_graph_split_flags *flags = opt->value;
+
 	opts.split = 1;
 	if (!arg)
 		return 0;
 
-	die(_("unrecognized --split argument, %s"), arg);
+	if (!strcmp(arg, "no-merge"))
+		*flags = COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED;
+	else
+		die(_("unrecognized --split argument, %s"), arg);
 
 	return 0;
 }
diff --git a/commit-graph.c b/commit-graph.c
index f013a84e29..af3fe20bb5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1529,6 +1529,7 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx)
 {
 	struct commit_graph *g;
 	uint32_t num_commits;
+	enum commit_graph_split_flags flags = COMMIT_GRAPH_SPLIT_UNSPECIFIED;
 	uint32_t i;
 
 	int max_commits = 0;
@@ -1539,21 +1540,25 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx)
 
 		if (ctx->split_opts->size_multiple)
 			size_mult = ctx->split_opts->size_multiple;
+
+		flags = ctx->split_opts->flags;
 	}
 
 	g = ctx->r->objects->commit_graph;
 	num_commits = ctx->commits.nr;
 	ctx->num_commit_graphs_after = ctx->num_commit_graphs_before + 1;
 
-	while (g && (g->num_commits <= size_mult * num_commits ||
-		    (max_commits && num_commits > max_commits))) {
-		if (g->odb != ctx->odb)
-			break;
+	if (flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) {
+		while (g && (g->num_commits <= size_mult * num_commits ||
+			    (max_commits && num_commits > max_commits))) {
+			if (g->odb != ctx->odb)
+				break;
 
-		num_commits += g->num_commits;
-		g = g->base_graph;
+			num_commits += g->num_commits;
+			g = g->base_graph;
 
-		ctx->num_commit_graphs_after--;
+			ctx->num_commit_graphs_after--;
+		}
 	}
 
 	ctx->new_base_graph = g;
diff --git a/commit-graph.h b/commit-graph.h
index e799008ff4..8752afb88d 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -83,7 +83,8 @@ enum commit_graph_write_flags {
 };
 
 enum commit_graph_split_flags {
-	COMMIT_GRAPH_SPLIT_UNSPECIFIED = 0
+	COMMIT_GRAPH_SPLIT_UNSPECIFIED      = 0,
+	COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED = 1
 };
 
 struct split_commit_graph_opts {
diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh
index 53b2e6b455..1438d63f24 100755
--- a/t/t5324-split-commit-graph.sh
+++ b/t/t5324-split-commit-graph.sh
@@ -351,4 +351,15 @@ test_expect_success 'split across alternate where alternate is not split' '
 	test_cmp commit-graph .git/objects/info/commit-graph
 '
 
+test_expect_success '--split=no-merge always writes an incremental' '
+	test_when_finished rm -rf a b &&
+	rm -rf $graphdir $infodir/commit-graph &&
+	git reset --hard commits/2 &&
+	git rev-list HEAD~1 >a &&
+	git rev-list HEAD >b &&
+	git commit-graph write --split --stdin-commits <a &&
+	git commit-graph write --split=no-merge --stdin-commits <b &&
+	test_line_count = 2 $graphdir/commit-graph-chain
+'
+
 test_done
-- 
2.26.0.106.g9fadedd637


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 4/7] builtin/commit-graph.c: introduce split strategy 'replace'
  2020-04-14  4:03 [PATCH 0/7] commit-graph: split strategies, '--[no-]check-oids' Taylor Blau
                   ` (2 preceding siblings ...)
  2020-04-14  4:04 ` [PATCH 3/7] builtin/commit-graph.c: introduce split strategy 'no-merge' Taylor Blau
@ 2020-04-14  4:04 ` Taylor Blau
  2020-04-14  4:04 ` [PATCH 5/7] oidset: introduce 'oidset_size' Taylor Blau
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Taylor Blau @ 2020-04-14  4:04 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, martin.agren, peff, szeder.dev

When using split commit-graphs, it is sometimes useful to completely
replace the commit-graph chain with a new base.

For example, consider a scenario in which a repository builds a new
commit-graph incremental for each push. Occasionally (say, after some
fixed number of pushes), they may wish to rebuild the commit-graph chain
with all reachable commits.

They can do so with

  $ git commit-graph write --reachable

but this removes the chain entirely and replaces it with a single
commit-graph in 'objects/info/commit-graph'. Unfortunately, this means
that the next push will have to move this commit-graph into the first
layer of a new chain, and then write its new commits on top.

Avoid such copying entirely by allowing the caller to specify that they
wish to replace the entirety of their commit-graph chain, while also
specifying that the new commit-graph should become the basis of a fresh,
length-one chain.

This addresses the above situation by making it possible for the caller
to instead write:

  $ git commit-graph write --reachable --split=replace

which writes a new length-one chain to 'objects/info/commit-graphs',
making the commit-graph incremental generated by the subsequent push
relatively cheap by avoiding the aforementioned copy.

In order to do this, remove an assumption in 'write_commit_graph_file'
that chains are always at least two incrementals long.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-commit-graph.txt |  7 ++--
 builtin/commit-graph.c             |  2 ++
 commit-graph.c                     | 53 ++++++++++++++++++++++--------
 commit-graph.h                     |  3 +-
 t/t5324-split-commit-graph.sh      | 19 +++++++++++
 5 files changed, 66 insertions(+), 18 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index a4c4a641e5..46f7f7c573 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -64,9 +64,10 @@ strategy and other splitting options. The new commits not already in the
 commit-graph are added in a new "tip" file. This file is merged with the
 existing file if the following merge conditions are met:
 * If `--split=no-merge` is specified, a merge is never performed, and
-the remaining options are ignored. A bare `--split` defers to the
-remaining options. (Note that merging a chain of commit graphs replaces
-the existing chain with a length-1 chain where the first and only
+the remaining options are ignored. `--split=replace` overwrites the
+existing chain with a new one. A bare `--split` defers to the remaining
+options. (Note that merging a chain of commit graphs replaces the
+existing chain with a length-1 chain where the first and only
 incremental holds the entire graph).
 +
 * If `--size-multiple=<X>` is not specified, let `X` equal 2. If the new
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 42182ed71c..922f876bfa 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -129,6 +129,8 @@ static int write_option_parse_split(const struct option *opt, const char *arg,
 
 	if (!strcmp(arg, "no-merge"))
 		*flags = COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED;
+	else if (!strcmp(arg, "replace"))
+		*flags = COMMIT_GRAPH_SPLIT_REPLACE;
 	else
 		die(_("unrecognized --split argument, %s"), arg);
 
diff --git a/commit-graph.c b/commit-graph.c
index af3fe20bb5..c598508d7f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -866,7 +866,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 
 			if (edge_value >= 0)
 				edge_value += ctx->new_num_commits_in_base;
-			else {
+			else if (ctx->new_base_graph) {
 				uint32_t pos;
 				if (find_commit_in_graph(parent->item,
 							 ctx->new_base_graph,
@@ -897,7 +897,7 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 
 			if (edge_value >= 0)
 				edge_value += ctx->new_num_commits_in_base;
-			else {
+			else if (ctx->new_base_graph) {
 				uint32_t pos;
 				if (find_commit_in_graph(parent->item,
 							 ctx->new_base_graph,
@@ -964,7 +964,7 @@ static void write_graph_chunk_extra_edges(struct hashfile *f,
 
 			if (edge_value >= 0)
 				edge_value += ctx->new_num_commits_in_base;
-			else {
+			else if (ctx->new_base_graph) {
 				uint32_t pos;
 				if (find_commit_in_graph(parent->item,
 							 ctx->new_base_graph,
@@ -1037,6 +1037,8 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 {
 	int i;
 	struct commit *commit;
+	enum commit_graph_split_flags flags = ctx->split_opts ?
+		ctx->split_opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED;
 
 	if (ctx->report_progress)
 		ctx->progress = start_delayed_progress(
@@ -1066,8 +1068,9 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 		if (!commit)
 			continue;
 		if (ctx->split) {
-			if (!parse_commit(commit) &&
-			    commit->graph_pos == COMMIT_NOT_FROM_GRAPH)
+			if ((!parse_commit(commit) &&
+			     commit->graph_pos == COMMIT_NOT_FROM_GRAPH) ||
+			    flags == COMMIT_GRAPH_SPLIT_REPLACE)
 				add_missing_parents(ctx, commit);
 		} else if (!parse_commit_no_graph(commit))
 			add_missing_parents(ctx, commit);
@@ -1287,6 +1290,8 @@ static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx)
 static void copy_oids_to_commits(struct write_commit_graph_context *ctx)
 {
 	uint32_t i;
+	enum commit_graph_split_flags flags = ctx->split_opts ?
+		ctx->split_opts->flags : COMMIT_GRAPH_SPLIT_UNSPECIFIED;
 
 	ctx->num_extra_edges = 0;
 	if (ctx->report_progress)
@@ -1303,11 +1308,14 @@ static void copy_oids_to_commits(struct write_commit_graph_context *ctx)
 		ALLOC_GROW(ctx->commits.list, ctx->commits.nr + 1, ctx->commits.alloc);
 		ctx->commits.list[ctx->commits.nr] = lookup_commit(ctx->r, &ctx->oids.list[i]);
 
-		if (ctx->split &&
+		if (ctx->split && flags != COMMIT_GRAPH_SPLIT_REPLACE &&
 		    ctx->commits.list[ctx->commits.nr]->graph_pos != COMMIT_NOT_FROM_GRAPH)
 			continue;
 
-		parse_commit_no_graph(ctx->commits.list[ctx->commits.nr]);
+		if (ctx->split && flags == COMMIT_GRAPH_SPLIT_REPLACE)
+			parse_commit(ctx->commits.list[ctx->commits.nr]);
+		else
+			parse_commit_no_graph(ctx->commits.list[ctx->commits.nr]);
 
 		num_parents = commit_list_count(ctx->commits.list[ctx->commits.nr]->parents);
 		if (num_parents > 2)
@@ -1488,8 +1496,12 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 		}
 
 		if (ctx->base_graph_name) {
-			const char *dest = ctx->commit_graph_filenames_after[
-						ctx->num_commit_graphs_after - 2];
+			const char *dest;
+			int idx = ctx->num_commit_graphs_after - 1;
+			if (ctx->num_commit_graphs_after > 1)
+				idx--;
+
+			dest = ctx->commit_graph_filenames_after[idx];
 
 			if (strcmp(ctx->base_graph_name, dest)) {
 				result = rename(ctx->base_graph_name, dest);
@@ -1546,9 +1558,13 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx)
 
 	g = ctx->r->objects->commit_graph;
 	num_commits = ctx->commits.nr;
-	ctx->num_commit_graphs_after = ctx->num_commit_graphs_before + 1;
+	if (flags == COMMIT_GRAPH_SPLIT_REPLACE)
+		ctx->num_commit_graphs_after = 1;
+	else
+		ctx->num_commit_graphs_after = ctx->num_commit_graphs_before + 1;
 
-	if (flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED) {
+	if (flags != COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED &&
+	    flags != COMMIT_GRAPH_SPLIT_REPLACE) {
 		while (g && (g->num_commits <= size_mult * num_commits ||
 			    (max_commits && num_commits > max_commits))) {
 			if (g->odb != ctx->odb)
@@ -1561,7 +1577,11 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx)
 		}
 	}
 
-	ctx->new_base_graph = g;
+	if (flags != COMMIT_GRAPH_SPLIT_REPLACE)
+		ctx->new_base_graph = g;
+	else if (ctx->num_commit_graphs_after != 1)
+		BUG("split_graph_merge_strategy: num_commit_graphs_after "
+		    "should be 1 with --split=replace");
 
 	if (ctx->num_commit_graphs_after == 2) {
 		char *old_graph_name = get_commit_graph_filename(g->odb);
@@ -1768,6 +1788,7 @@ int write_commit_graph(struct object_directory *odb,
 	struct write_commit_graph_context *ctx;
 	uint32_t i, count_distinct = 0;
 	int res = 0;
+	int replace = 0;
 
 	if (!commit_graph_compatible(the_repository))
 		return 0;
@@ -1802,6 +1823,9 @@ int write_commit_graph(struct object_directory *odb,
 				g = g->base_graph;
 			}
 		}
+
+		if (ctx->split_opts)
+			replace = ctx->split_opts->flags & COMMIT_GRAPH_SPLIT_REPLACE;
 	}
 
 	ctx->approx_nr_objects = approximate_object_count();
@@ -1862,13 +1886,14 @@ int write_commit_graph(struct object_directory *odb,
 		goto cleanup;
 	}
 
-	if (!ctx->commits.nr)
+	if (!ctx->commits.nr && !replace)
 		goto cleanup;
 
 	if (ctx->split) {
 		split_graph_merge_strategy(ctx);
 
-		merge_commit_graphs(ctx);
+		if (!replace)
+			merge_commit_graphs(ctx);
 	} else
 		ctx->num_commit_graphs_after = 1;
 
diff --git a/commit-graph.h b/commit-graph.h
index 8752afb88d..718433d79b 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -84,7 +84,8 @@ enum commit_graph_write_flags {
 
 enum commit_graph_split_flags {
 	COMMIT_GRAPH_SPLIT_UNSPECIFIED      = 0,
-	COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED = 1
+	COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED = 1,
+	COMMIT_GRAPH_SPLIT_REPLACE          = 2
 };
 
 struct split_commit_graph_opts {
diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh
index 1438d63f24..e5d8d64170 100755
--- a/t/t5324-split-commit-graph.sh
+++ b/t/t5324-split-commit-graph.sh
@@ -362,4 +362,23 @@ test_expect_success '--split=no-merge always writes an incremental' '
 	test_line_count = 2 $graphdir/commit-graph-chain
 '
 
+test_expect_success '--split=replace replaces the chain' '
+	rm -rf $graphdir $infodir/commit-graph &&
+	git reset --hard commits/3 &&
+	git rev-list -1 HEAD~2 >a &&
+	git rev-list -1 HEAD~1 >b &&
+	git rev-list -1 HEAD >c &&
+	git commit-graph write --split=no-merge --stdin-commits <a &&
+	git commit-graph write --split=no-merge --stdin-commits <b &&
+	git commit-graph write --split=no-merge --stdin-commits <c &&
+	test_line_count = 3 $graphdir/commit-graph-chain &&
+	git commit-graph write --stdin-commits --split=replace <b &&
+	test_path_is_missing $infodir/commit-graph &&
+	test_path_is_file $graphdir/commit-graph-chain &&
+	ls $graphdir/graph-*.graph >graph-files &&
+	test_line_count = 1 graph-files &&
+	verify_chain_files_exist $graphdir &&
+	graph_read_expect 2
+'
+
 test_done
-- 
2.26.0.106.g9fadedd637


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 5/7] oidset: introduce 'oidset_size'
  2020-04-14  4:03 [PATCH 0/7] commit-graph: split strategies, '--[no-]check-oids' Taylor Blau
                   ` (3 preceding siblings ...)
  2020-04-14  4:04 ` [PATCH 4/7] builtin/commit-graph.c: introduce split strategy 'replace' Taylor Blau
@ 2020-04-14  4:04 ` Taylor Blau
  2020-04-14  4:04 ` [PATCH 6/7] commit-graph.h: replace 'commit_hex' with 'commits' Taylor Blau
  2020-04-14  4:04 ` [PATCH 7/7] commit-graph.c: introduce '--[no-]check-oids' Taylor Blau
  6 siblings, 0 replies; 19+ messages in thread
From: Taylor Blau @ 2020-04-14  4:04 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, martin.agren, peff, szeder.dev

Occasionally, it may be useful for callers to know the number of object
IDs in an oidset. Right now, the only way to compute this is to call
'kh_size' on the internal 'kh_set_oid_t'.

Similar to how we wrap other 'kh_*' functions over the 'oidset' type,
let's allow callers to compute this value by introducing 'oidset_size'.

We will add its first caller in the subsequent commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 oidset.c | 5 +++++
 oidset.h | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/oidset.c b/oidset.c
index f63ce818f6..15d4e18c37 100644
--- a/oidset.c
+++ b/oidset.c
@@ -36,6 +36,11 @@ void oidset_clear(struct oidset *set)
 	oidset_init(set, 0);
 }
 
+int oidset_size(struct oidset *set)
+{
+	return kh_size(&set->set);
+}
+
 void oidset_parse_file(struct oidset *set, const char *path)
 {
 	FILE *fp;
diff --git a/oidset.h b/oidset.h
index d8a106b127..8261c36b91 100644
--- a/oidset.h
+++ b/oidset.h
@@ -54,6 +54,11 @@ int oidset_insert(struct oidset *set, const struct object_id *oid);
  */
 int oidset_remove(struct oidset *set, const struct object_id *oid);
 
+/**
+ * Returns the number of oids in the set.
+ */
+int oidset_size(struct oidset *set);
+
 /**
  * Remove all entries from the oidset, freeing any resources associated with
  * it.
-- 
2.26.0.106.g9fadedd637


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 6/7] commit-graph.h: replace 'commit_hex' with 'commits'
  2020-04-14  4:03 [PATCH 0/7] commit-graph: split strategies, '--[no-]check-oids' Taylor Blau
                   ` (4 preceding siblings ...)
  2020-04-14  4:04 ` [PATCH 5/7] oidset: introduce 'oidset_size' Taylor Blau
@ 2020-04-14  4:04 ` Taylor Blau
  2020-04-14  4:04 ` [PATCH 7/7] commit-graph.c: introduce '--[no-]check-oids' Taylor Blau
  6 siblings, 0 replies; 19+ messages in thread
From: Taylor Blau @ 2020-04-14  4:04 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, martin.agren, peff, szeder.dev

The 'write_commit_graph()' function takes in either a string list of
pack indices, or a string list of hexadecimal commit OIDs. These
correspond to the '--stdin-packs' and '--stdin-commits' mode(s) from
'git commit-graph write'.

Using a string_list of hexadecimal commit IDs is not the most efficient
use of memory, since we can instead use the 'struct oidset', which is
more well-suited for this case.

This has another benefit which will become apparent in the following
commit. This is that we are about to disambiguate the kinds of errors we
produce with '--stdin-commits' into "non-hex input" and "hex-input, but
referring to a non-commit object". By having 'write_commit_graph' take
in a 'struct oidset *' of commits, we place the burden on the caller (in
this case, the builtin) to handle the first case, and the commit-graph
machinery can handle the second case.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/commit-graph.c  | 19 ++++++++++---
 commit-graph.c          | 59 +++++++++++++++++++++++------------------
 commit-graph.h          |  3 ++-
 t/t5318-commit-graph.sh |  2 +-
 4 files changed, 52 insertions(+), 31 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 922f876bfa..c69716aa7e 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -140,7 +140,7 @@ static int write_option_parse_split(const struct option *opt, const char *arg,
 static int graph_write(int argc, const char **argv)
 {
 	struct string_list *pack_indexes = NULL;
-	struct string_list *commit_hex = NULL;
+	struct oidset commits = OIDSET_INIT;
 	struct object_directory *odb = NULL;
 	struct string_list lines;
 	int result = 0;
@@ -213,7 +213,20 @@ static int graph_write(int argc, const char **argv)
 		if (opts.stdin_packs)
 			pack_indexes = &lines;
 		if (opts.stdin_commits) {
-			commit_hex = &lines;
+			struct string_list_item *item;
+			oidset_init(&commits, lines.nr);
+			for_each_string_list_item(item, &lines) {
+				struct object_id oid;
+				const char *end;
+
+				if (parse_oid_hex(item->string, &oid, &end)) {
+					error(_("unexpected non-hex object ID: "
+						"%s"), item->string);
+					return 1;
+				}
+
+				oidset_insert(&commits, &oid);
+			}
 			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
 		}
 
@@ -222,7 +235,7 @@ static int graph_write(int argc, const char **argv)
 
 	if (write_commit_graph(odb,
 			       pack_indexes,
-			       commit_hex,
+			       opts.stdin_commits ? &commits : NULL,
 			       flags,
 			       &split_opts))
 		result = 1;
diff --git a/commit-graph.c b/commit-graph.c
index c598508d7f..f60346baee 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1136,13 +1136,13 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 	stop_progress(&ctx->progress);
 }
 
-static int add_ref_to_list(const char *refname,
-			   const struct object_id *oid,
-			   int flags, void *cb_data)
+static int add_ref_to_set(const char *refname,
+			  const struct object_id *oid,
+			  int flags, void *cb_data)
 {
-	struct string_list *list = (struct string_list *)cb_data;
+	struct oidset *commits = (struct oidset *)cb_data;
 
-	string_list_append(list, oid_to_hex(oid));
+	oidset_insert(commits, oid);
 	return 0;
 }
 
@@ -1150,14 +1150,14 @@ int write_commit_graph_reachable(struct object_directory *odb,
 				 enum commit_graph_write_flags flags,
 				 const struct split_commit_graph_opts *split_opts)
 {
-	struct string_list list = STRING_LIST_INIT_DUP;
+	struct oidset commits = OIDSET_INIT;
 	int result;
 
-	for_each_ref(add_ref_to_list, &list);
-	result = write_commit_graph(odb, NULL, &list,
+	for_each_ref(add_ref_to_set, &commits);
+	result = write_commit_graph(odb, NULL, &commits,
 				    flags, split_opts);
 
-	string_list_clear(&list, 0);
+	oidset_clear(&commits);
 	return result;
 }
 
@@ -1206,39 +1206,46 @@ static int fill_oids_from_packs(struct write_commit_graph_context *ctx,
 	return 0;
 }
 
-static int fill_oids_from_commit_hex(struct write_commit_graph_context *ctx,
-				     struct string_list *commit_hex)
+static int fill_oids_from_commits(struct write_commit_graph_context *ctx,
+				  struct oidset *commits)
 {
-	uint32_t i;
+	uint32_t i = 0;
 	struct strbuf progress_title = STRBUF_INIT;
+	struct oidset_iter iter;
+	struct object_id *oid;
+
+	if (!oidset_size(commits))
+		return 0;
 
 	if (ctx->report_progress) {
 		strbuf_addf(&progress_title,
 			    Q_("Finding commits for commit graph from %d ref",
 			       "Finding commits for commit graph from %d refs",
-			       commit_hex->nr),
-			    commit_hex->nr);
+			       oidset_size(commits)),
+			    oidset_size(commits));
 		ctx->progress = start_delayed_progress(
 					progress_title.buf,
-					commit_hex->nr);
+					oidset_size(commits));
 	}
-	for (i = 0; i < commit_hex->nr; i++) {
-		const char *end;
-		struct object_id oid;
+
+	oidset_iter_init(commits, &iter);
+	while ((oid = oidset_iter_next(&iter))) {
 		struct commit *result;
 
-		display_progress(ctx->progress, i + 1);
-		if (!parse_oid_hex(commit_hex->items[i].string, &oid, &end) &&
-		    (result = lookup_commit_reference_gently(ctx->r, &oid, 1))) {
+		display_progress(ctx->progress, ++i);
+
+		result = lookup_commit_reference_gently(ctx->r, oid, 1);
+		if (result) {
 			ALLOC_GROW(ctx->oids.list, ctx->oids.nr + 1, ctx->oids.alloc);
 			oidcpy(&ctx->oids.list[ctx->oids.nr], &(result->object.oid));
 			ctx->oids.nr++;
 		} else if (ctx->check_oids) {
 			error(_("invalid commit object id: %s"),
-			    commit_hex->items[i].string);
+			      oid_to_hex(oid));
 			return -1;
 		}
 	}
+
 	stop_progress(&ctx->progress);
 	strbuf_release(&progress_title);
 
@@ -1781,7 +1788,7 @@ static void expire_commit_graphs(struct write_commit_graph_context *ctx)
 
 int write_commit_graph(struct object_directory *odb,
 		       struct string_list *pack_indexes,
-		       struct string_list *commit_hex,
+		       struct oidset *commits,
 		       enum commit_graph_write_flags flags,
 		       const struct split_commit_graph_opts *split_opts)
 {
@@ -1857,12 +1864,12 @@ int write_commit_graph(struct object_directory *odb,
 			goto cleanup;
 	}
 
-	if (commit_hex) {
-		if ((res = fill_oids_from_commit_hex(ctx, commit_hex)))
+	if (commits) {
+		if ((res = fill_oids_from_commits(ctx, commits)))
 			goto cleanup;
 	}
 
-	if (!pack_indexes && !commit_hex)
+	if (!pack_indexes && !commits)
 		fill_oids_from_all_packs(ctx);
 
 	close_reachable(ctx);
diff --git a/commit-graph.h b/commit-graph.h
index 718433d79b..98ef121924 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -6,6 +6,7 @@
 #include "string-list.h"
 #include "cache.h"
 #include "object-store.h"
+#include "oidset.h"
 
 #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH"
 #define GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD"
@@ -106,7 +107,7 @@ int write_commit_graph_reachable(struct object_directory *odb,
 				 const struct split_commit_graph_opts *split_opts);
 int write_commit_graph(struct object_directory *odb,
 		       struct string_list *pack_indexes,
-		       struct string_list *commit_hex,
+		       struct oidset *commits,
 		       enum commit_graph_write_flags flags,
 		       const struct split_commit_graph_opts *split_opts);
 
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 9bf920ae17..e874a12696 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -43,7 +43,7 @@ test_expect_success 'create commits and repack' '
 test_expect_success 'exit with correct error on bad input to --stdin-commits' '
 	cd "$TRASH_DIRECTORY/full" &&
 	echo HEAD | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
-	test_i18ngrep "invalid commit object id" stderr &&
+	test_i18ngrep "unexpected non-hex object ID: HEAD" stderr &&
 	# valid tree OID, but not a commit OID
 	git rev-parse HEAD^{tree} | test_expect_code 1 git commit-graph write --stdin-commits 2>stderr &&
 	test_i18ngrep "invalid commit object id" stderr
-- 
2.26.0.106.g9fadedd637


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 7/7] commit-graph.c: introduce '--[no-]check-oids'
  2020-04-14  4:03 [PATCH 0/7] commit-graph: split strategies, '--[no-]check-oids' Taylor Blau
                   ` (5 preceding siblings ...)
  2020-04-14  4:04 ` [PATCH 6/7] commit-graph.h: replace 'commit_hex' with 'commits' Taylor Blau
@ 2020-04-14  4:04 ` Taylor Blau
  2020-04-15  4:29   ` Taylor Blau
  6 siblings, 1 reply; 19+ messages in thread
From: Taylor Blau @ 2020-04-14  4:04 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, martin.agren, peff, szeder.dev

When operating on a stream of commit OIDs on stdin, 'git commit-graph
write' checks that each OID refers to an object that is indeed a commit.
This is convenient to make sure that the given input is well-formed, but
can sometimes be undesirable.

For example, server operators may wish to feed the refnames that were
updated during a push to 'git commit-graph write --input=stdin-commits',
and silently discard refs that don't point at commits. This can be done
by combing the output of 'git for-each-ref' with '--format
%(*objecttype)', but this requires opening up a potentially large number
of objects.  Instead, it is more convenient to feed the updated refs to
the commit-graph machinery, and let it throw out refs that don't point
to commits.

Introduce '--[no-]check-oids' to make such a behavior possible. With
'--check-oids' (the default behavior to retain backwards compatibility),
'git commit-graph write' will barf on a non-commit line in its input.
With 'no-check-oids', such lines will be silently ignored, making the
above possible by specifying this option.

No matter which is supplied, 'git commit-graph write' retains the
behavior from the previous commit of rejecting non-OID inputs like
"HEAD" and "refs/heads/foo" as before.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-commit-graph.txt |  5 +++++
 builtin/commit-graph.c             | 11 ++++++++---
 t/t5318-commit-graph.sh            | 28 ++++++++++++++++++++++++++++
 3 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 46f7f7c573..91e8027b86 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -82,6 +82,11 @@ tip with the previous tip.
 Finally, if `--expire-time=<datetime>` is not specified, let `datetime`
 be the current time. After writing the split commit-graph, delete all
 unused commit-graph whose modified times are older than `datetime`.
++
+The `--[no-]check-oids` option decides whether or not OIDs are required
+to be commits. By default, `--check-oids` is implied, generating an
+error on non-commit objects. If `--no-check-oids` is given, non-commits
+are silently discarded.
 
 'verify'::
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index c69716aa7e..2d0a8e822a 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -11,7 +11,7 @@ static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"),
 	N_("git commit-graph write [--object-dir <objdir>] [--append] "
 	   "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] "
-	   "[--[no-]progress] <split options>"),
+	   "[--[no-]progress] [--[no-]check-oids] <split options>"),
 	NULL
 };
 
@@ -23,7 +23,7 @@ static const char * const builtin_commit_graph_verify_usage[] = {
 static const char * const builtin_commit_graph_write_usage[] = {
 	N_("git commit-graph write [--object-dir <objdir>] [--append] "
 	   "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] "
-	   "[--[no-]progress] <split options>"),
+	   "[--[no-]progress] [--[no-]check-oids] <split options>"),
 	NULL
 };
 
@@ -36,6 +36,7 @@ static struct opts_commit_graph {
 	int split;
 	int shallow;
 	int progress;
+	int check_oids;
 } opts;
 
 static struct object_directory *find_odb(struct repository *r,
@@ -163,6 +164,8 @@ static int graph_write(int argc, const char **argv)
 			N_("allow writing an incremental commit-graph file"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
 			write_option_parse_split),
+		OPT_BOOL(0, "check-oids", &opts.check_oids,
+			N_("require OIDs to be commits")),
 		OPT_INTEGER(0, "max-commits", &split_opts.max_commits,
 			N_("maximum number of commits in a non-base split commit-graph")),
 		OPT_INTEGER(0, "size-multiple", &split_opts.size_multiple,
@@ -173,6 +176,7 @@ static int graph_write(int argc, const char **argv)
 	};
 
 	opts.progress = isatty(2);
+	opts.check_oids = 1;
 	split_opts.size_multiple = 2;
 	split_opts.max_commits = 0;
 	split_opts.expire_time = 0;
@@ -227,7 +231,8 @@ static int graph_write(int argc, const char **argv)
 
 				oidset_insert(&commits, &oid);
 			}
-			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
+			if (opts.check_oids)
+				flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
 		}
 
 		UNLEAK(buf);
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index e874a12696..2cbd301abe 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -49,6 +49,34 @@ test_expect_success 'exit with correct error on bad input to --stdin-commits' '
 	test_i18ngrep "invalid commit object id" stderr
 '
 
+graph_expect_commits() {
+	test-tool read-graph >got
+	if ! grep "num_commits: $1" got
+	then
+		echo "graph_expect_commits: expected $1 commit(s), got:"
+		cat got
+		false
+	fi
+}
+
+test_expect_success 'ignores non-commit OIDs to --input=stdin-commits with --no-check-oids' '
+	test_when_finished rm -rf "$objdir/info/commit-graph" &&
+	cd "$TRASH_DIRECTORY/full" &&
+	# write a graph to ensure layers are/are not added appropriately
+	git rev-parse HEAD~1 >base &&
+	git commit-graph write --stdin-commits <base &&
+	graph_expect_commits 2 &&
+	# bad input is rejected
+	echo HEAD >bad &&
+	test_expect_code 1 git commit-graph write --stdin-commits <bad 2>err &&
+	test_i18ngrep "invalid commit object id" err &&
+	graph_expect_commits 2 &&
+	# update with valid commit OID, ignore tree OID
+	git rev-parse HEAD HEAD^{tree} >in &&
+	git commit-graph write --stdin-commits --no-check-oids <in &&
+	graph_expect_commits 3
+'
+
 graph_git_two_modes() {
 	git -c core.commitGraph=true $1 >output
 	git -c core.commitGraph=false $1 >expect
-- 
2.26.0.106.g9fadedd637

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/7] commit-graph.c: introduce '--[no-]check-oids'
  2020-04-14  4:04 ` [PATCH 7/7] commit-graph.c: introduce '--[no-]check-oids' Taylor Blau
@ 2020-04-15  4:29   ` Taylor Blau
  2020-04-15  4:31     ` Taylor Blau
  0 siblings, 1 reply; 19+ messages in thread
From: Taylor Blau @ 2020-04-15  4:29 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, martin.agren, peff, szeder.dev

Whoops. I sent the wrong version of this patch. It should be the below:

--- >8 ---

Subject: [PATCH] shallow.c: use 'reset_repository_shallow' when appropriate

In bd0b42aed3 (fetch-pack: do not take shallow lock unnecessarily,
2019-01-10), the author noted that 'is_repository_shallow' produces
visible side-effect(s) by setting 'is_shallow' and 'shallow_stat'.

This is a problem for e.g., fetching with '--update-shallow' in a
shallow repsoitory with 'fetch.writeCommitGraph' enabled, since the
update to '.git/shallow' will cause Git to think that the repository
*isn't* shallow when it is, thereby circumventing the commit-graph
compatability check.

This causes problems in shallow repositories with at least shallow refs
that have at least one ancestor (since the client won't have those
object(s), and therefore can't take the reachability closure over
commits to be written to the commit-graph).

Address this by introducing 'reset_repository_shallow()', and calling it
when the shallow file is updated, forcing 'is_repository_shallow' to
re-evaluate whether the repository is still shallow after fetching in
the above scenario.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit.h     |  1 +
 fetch-pack.c |  1 +
 shallow.c    | 15 ++++++++-------
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/commit.h b/commit.h
index 008a0fa4a0..ee1ba139d4 100644
--- a/commit.h
+++ b/commit.h
@@ -251,6 +251,7 @@ int register_shallow(struct repository *r, const struct object_id *oid);
 int unregister_shallow(const struct object_id *oid);
 int for_each_commit_graft(each_commit_graft_fn, void *);
 int is_repository_shallow(struct repository *r);
+void reset_repository_shallow(struct repository *r);
 struct commit_list *get_shallow_commits(struct object_array *heads,
 					int depth, int shallow_flag, int not_shallow_flag);
 struct commit_list *get_shallow_commits_by_rev_list(
diff --git a/fetch-pack.c b/fetch-pack.c
index 1734a573b0..051902ef6d 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1630,6 +1630,7 @@ static void update_shallow(struct fetch_pack_args *args,
 		if (*alternate_shallow_file == '\0') { /* --unshallow */
 			unlink_or_warn(git_path_shallow(the_repository));
 			rollback_lock_file(&shallow_lock);
+			reset_repository_shallow(the_repository);
 		} else
 			commit_lock_file(&shallow_lock);
 		alternate_shallow_file = NULL;
diff --git a/shallow.c b/shallow.c
index 7fd04afed1..fac383dec9 100644
--- a/shallow.c
+++ b/shallow.c
@@ -40,13 +40,6 @@ int register_shallow(struct repository *r, const struct object_id *oid)

 int is_repository_shallow(struct repository *r)
 {
-	/*
-	 * NEEDSWORK: This function updates
-	 * r->parsed_objects->{is_shallow,shallow_stat} as a side effect but
-	 * there is no corresponding function to clear them when the shallow
-	 * file is updated.
-	 */
-
 	FILE *fp;
 	char buf[1024];
 	const char *path = r->parsed_objects->alternate_shallow_file;
@@ -79,6 +72,12 @@ int is_repository_shallow(struct repository *r)
 	return r->parsed_objects->is_shallow;
 }

+void reset_repository_shallow(struct repository *r)
+{
+	r->parsed_objects->is_shallow = -1;
+	stat_validity_clear(r->parsed_objects->shallow_stat);
+}
+
 /*
  * TODO: use "int" elemtype instead of "int *" when/if commit-slab
  * supports a "valid" flag.
@@ -362,6 +361,7 @@ void setup_alternate_shallow(struct lock_file *shallow_lock,
 		 * shallow file".
 		 */
 		*alternate_shallow_file = "";
+	reset_repository_shallow(the_repository);
 	strbuf_release(&sb);
 }

@@ -411,6 +411,7 @@ void prune_shallow(unsigned options)
 			die_errno("failed to write to %s",
 				  get_lock_file_path(&shallow_lock));
 		commit_lock_file(&shallow_lock);
+		reset_repository_shallow(the_repository);
 	} else {
 		unlink(git_path_shallow(the_repository));
 		rollback_lock_file(&shallow_lock);
--
2.26.0.106.g9fadedd637

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/7] commit-graph.c: introduce '--[no-]check-oids'
  2020-04-15  4:29   ` Taylor Blau
@ 2020-04-15  4:31     ` Taylor Blau
  2020-04-22 10:55       ` SZEDER Gábor
  0 siblings, 1 reply; 19+ messages in thread
From: Taylor Blau @ 2020-04-15  4:31 UTC (permalink / raw)
  To: git; +Cc: dstolee, gitster, martin.agren, peff, szeder.dev

On Tue, Apr 14, 2020 at 10:29:30PM -0600, Taylor Blau wrote:
> Whoops. I sent the wrong version of this patch. It should be the below:

Double whoops. I was on the wrong branch, and hit send too early. *This*
is the version of the patch that I meant to send ;).

--- >8 ---

Subject: [PATCH] commit-graph.c: introduce '--[no-]check-oids'

When operating on a stream of commit OIDs on stdin, 'git commit-graph
write' checks that each OID refers to an object that is indeed a commit.
This is convenient to make sure that the given input is well-formed, but
can sometimes be undesirable.

For example, server operators may wish to feed the refnames that were
updated during a push to 'git commit-graph write --input=stdin-commits',
and silently discard refs that don't point at commits. This can be done
by combing the output of 'git for-each-ref' with '--format
%(*objecttype)', but this requires opening up a potentially large number
of objects.  Instead, it is more convenient to feed the updated refs to
the commit-graph machinery, and let it throw out refs that don't point
to commits.

Introduce '--[no-]check-oids' to make such a behavior possible. With
'--check-oids' (the default behavior to retain backwards compatibility),
'git commit-graph write' will barf on a non-commit line in its input.
With 'no-check-oids', such lines will be silently ignored, making the
above possible by specifying this option.

No matter which is supplied, 'git commit-graph write' retains the
behavior from the previous commit of rejecting non-OID inputs like
"HEAD" and "refs/heads/foo" as before.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-commit-graph.txt |  5 +++++
 builtin/commit-graph.c             | 11 ++++++++---
 commit-graph.c                     |  2 +-
 t/t5318-commit-graph.sh            | 28 ++++++++++++++++++++++++++++
 4 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 46f7f7c573..91e8027b86 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -82,6 +82,11 @@ tip with the previous tip.
 Finally, if `--expire-time=<datetime>` is not specified, let `datetime`
 be the current time. After writing the split commit-graph, delete all
 unused commit-graph whose modified times are older than `datetime`.
++
+The `--[no-]check-oids` option decides whether or not OIDs are required
+to be commits. By default, `--check-oids` is implied, generating an
+error on non-commit objects. If `--no-check-oids` is given, non-commits
+are silently discarded.

 'verify'::

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index c69716aa7e..2d0a8e822a 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -11,7 +11,7 @@ static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"),
 	N_("git commit-graph write [--object-dir <objdir>] [--append] "
 	   "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] "
-	   "[--[no-]progress] <split options>"),
+	   "[--[no-]progress] [--[no-]check-oids] <split options>"),
 	NULL
 };

@@ -23,7 +23,7 @@ static const char * const builtin_commit_graph_verify_usage[] = {
 static const char * const builtin_commit_graph_write_usage[] = {
 	N_("git commit-graph write [--object-dir <objdir>] [--append] "
 	   "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] "
-	   "[--[no-]progress] <split options>"),
+	   "[--[no-]progress] [--[no-]check-oids] <split options>"),
 	NULL
 };

@@ -36,6 +36,7 @@ static struct opts_commit_graph {
 	int split;
 	int shallow;
 	int progress;
+	int check_oids;
 } opts;

 static struct object_directory *find_odb(struct repository *r,
@@ -163,6 +164,8 @@ static int graph_write(int argc, const char **argv)
 			N_("allow writing an incremental commit-graph file"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
 			write_option_parse_split),
+		OPT_BOOL(0, "check-oids", &opts.check_oids,
+			N_("require OIDs to be commits")),
 		OPT_INTEGER(0, "max-commits", &split_opts.max_commits,
 			N_("maximum number of commits in a non-base split commit-graph")),
 		OPT_INTEGER(0, "size-multiple", &split_opts.size_multiple,
@@ -173,6 +176,7 @@ static int graph_write(int argc, const char **argv)
 	};

 	opts.progress = isatty(2);
+	opts.check_oids = 1;
 	split_opts.size_multiple = 2;
 	split_opts.max_commits = 0;
 	split_opts.expire_time = 0;
@@ -227,7 +231,8 @@ static int graph_write(int argc, const char **argv)

 				oidset_insert(&commits, &oid);
 			}
-			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
+			if (opts.check_oids)
+				flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
 		}

 		UNLEAK(buf);
diff --git a/commit-graph.c b/commit-graph.c
index f60346baee..b8737f0ce9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -145,7 +145,7 @@ static int verify_commit_graph_lite(struct commit_graph *g)
 	 *
 	 * There should only be very basic checks here to ensure that
 	 * we don't e.g. segfault in fill_commit_in_graph(), but
-	 * because this is a very hot codepath nothing that e.g. loops
+	 e because this is a very hot codepath nothing that e.g. loops
 	 * over g->num_commits, or runs a checksum on the commit-graph
 	 * itself.
 	 */
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index e874a12696..7960cefa1b 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -49,6 +49,34 @@ test_expect_success 'exit with correct error on bad input to --stdin-commits' '
 	test_i18ngrep "invalid commit object id" stderr
 '

+graph_expect_commits() {
+	test-tool read-graph >got
+	if ! grep "num_commits: $1" got
+	then
+		echo "graph_expect_commits: expected $1 commit(s), got:"
+		cat got
+		false
+	fi
+}
+
+test_expect_success 'ignores non-commit OIDs to --input=stdin-commits with --no-check-oids' '
+	test_when_finished rm -rf "$objdir/info/commit-graph" &&
+	cd "$TRASH_DIRECTORY/full" &&
+	# write a graph to ensure layers are/are not added appropriately
+	git rev-parse HEAD~1 >base &&
+	git commit-graph write --stdin-commits <base &&
+	graph_expect_commits 2 &&
+	# bad input is rejected
+	echo HEAD >bad &&
+	test_expect_code 1 git commit-graph write --stdin-commits <bad 2>err &&
+	test_i18ngrep "unexpected non-hex object ID: HEAD" err &&
+	graph_expect_commits 2 &&
+	# update with valid commit OID, ignore tree OID
+	git rev-parse HEAD HEAD^{tree} >in &&
+	git commit-graph write --stdin-commits --no-check-oids <in &&
+	graph_expect_commits 3
+'
+
 graph_git_two_modes() {
 	git -c core.commitGraph=true $1 >output
 	git -c core.commitGraph=false $1 >expect
--
2.26.0.106.g9fadedd637


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/7] commit-graph.c: introduce '--[no-]check-oids'
  2020-04-15  4:31     ` Taylor Blau
@ 2020-04-22 10:55       ` SZEDER Gábor
  2020-04-22 23:39         ` Taylor Blau
  0 siblings, 1 reply; 19+ messages in thread
From: SZEDER Gábor @ 2020-04-22 10:55 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, martin.agren, peff

On Tue, Apr 14, 2020 at 10:31:37PM -0600, Taylor Blau wrote:
> On Tue, Apr 14, 2020 at 10:29:30PM -0600, Taylor Blau wrote:
> > Whoops. I sent the wrong version of this patch. It should be the below:
> 
> Double whoops. I was on the wrong branch, and hit send too early. *This*
> is the version of the patch that I meant to send ;).
> 
> --- >8 ---
> 
> Subject: [PATCH] commit-graph.c: introduce '--[no-]check-oids'
> 
> When operating on a stream of commit OIDs on stdin, 'git commit-graph
> write' checks that each OID refers to an object that is indeed a commit.
> This is convenient to make sure that the given input is well-formed, but
> can sometimes be undesirable.
> 
> For example, server operators may wish to feed the refnames that were

s/the refnames/full commit object IDs pointed to by refs/

or something similar.

> updated during a push to 'git commit-graph write --input=stdin-commits',
> and silently discard refs that don't point at commits.

s/refs/<something along the lines of the above>/

> This can be done
> by combing the output of 'git for-each-ref' with '--format
> %(*objecttype)', but this requires opening up a potentially large number
> of objects.  Instead, it is more convenient to feed the updated refs to

s/refs/.../

> the commit-graph machinery, and let it throw out refs that don't point

s/refs/.../

> to commits.
> 
> Introduce '--[no-]check-oids' to make such a behavior possible. With
> '--check-oids' (the default behavior to retain backwards compatibility),
> 'git commit-graph write' will barf on a non-commit line in its input.
> With 'no-check-oids', such lines will be silently ignored, making the

s/no-check-oids/--no-check-oids/

> above possible by specifying this option.
> 
> No matter which is supplied, 'git commit-graph write' retains the
> behavior from the previous commit of rejecting non-OID inputs like
> "HEAD" and "refs/heads/foo" as before.

See? :)  This is why all those s/// are necessary.

> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  Documentation/git-commit-graph.txt |  5 +++++
>  builtin/commit-graph.c             | 11 ++++++++---
>  commit-graph.c                     |  2 +-
>  t/t5318-commit-graph.sh            | 28 ++++++++++++++++++++++++++++
>  4 files changed, 42 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
> index 46f7f7c573..91e8027b86 100644
> --- a/Documentation/git-commit-graph.txt
> +++ b/Documentation/git-commit-graph.txt
> @@ -82,6 +82,11 @@ tip with the previous tip.
>  Finally, if `--expire-time=<datetime>` is not specified, let `datetime`
>  be the current time. After writing the split commit-graph, delete all
>  unused commit-graph whose modified times are older than `datetime`.
> ++
> +The `--[no-]check-oids` option decides whether or not OIDs are required
> +to be commits. By default, `--check-oids` is implied, generating an
> +error on non-commit objects. If `--no-check-oids` is given, non-commits
> +are silently discarded.

What happens with OIDs of tags, in particular with OIDs of tags that
can be peeled down to commit objects?  According to (my (too
pedantic?) interpretation of) the above description they will trigger
an error with '--check-oids' or will be ignored with
'--no-check-oids'.  The implementation, however, accepts those oids
and peels them down to commit objects; I think this is the right
behaviour.

What happens with OIDs that name non-existing objects?

>  'verify'::
> 
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index c69716aa7e..2d0a8e822a 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -11,7 +11,7 @@ static char const * const builtin_commit_graph_usage[] = {
>  	N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"),
>  	N_("git commit-graph write [--object-dir <objdir>] [--append] "
>  	   "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] "
> -	   "[--[no-]progress] <split options>"),
> +	   "[--[no-]progress] [--[no-]check-oids] <split options>"),
>  	NULL
>  };
> 
> @@ -23,7 +23,7 @@ static const char * const builtin_commit_graph_verify_usage[] = {
>  static const char * const builtin_commit_graph_write_usage[] = {
>  	N_("git commit-graph write [--object-dir <objdir>] [--append] "
>  	   "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] "
> -	   "[--[no-]progress] <split options>"),
> +	   "[--[no-]progress] [--[no-]check-oids] <split options>"),
>  	NULL
>  };
> 
> @@ -36,6 +36,7 @@ static struct opts_commit_graph {
>  	int split;
>  	int shallow;
>  	int progress;
> +	int check_oids;
>  } opts;
> 
>  static struct object_directory *find_odb(struct repository *r,
> @@ -163,6 +164,8 @@ static int graph_write(int argc, const char **argv)
>  			N_("allow writing an incremental commit-graph file"),
>  			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
>  			write_option_parse_split),
> +		OPT_BOOL(0, "check-oids", &opts.check_oids,
> +			N_("require OIDs to be commits")),
>  		OPT_INTEGER(0, "max-commits", &split_opts.max_commits,
>  			N_("maximum number of commits in a non-base split commit-graph")),
>  		OPT_INTEGER(0, "size-multiple", &split_opts.size_multiple,
> @@ -173,6 +176,7 @@ static int graph_write(int argc, const char **argv)
>  	};
> 
>  	opts.progress = isatty(2);
> +	opts.check_oids = 1;
>  	split_opts.size_multiple = 2;
>  	split_opts.max_commits = 0;
>  	split_opts.expire_time = 0;
> @@ -227,7 +231,8 @@ static int graph_write(int argc, const char **argv)
> 
>  				oidset_insert(&commits, &oid);
>  			}
> -			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
> +			if (opts.check_oids)
> +				flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
>  		}
> 
>  		UNLEAK(buf);
> diff --git a/commit-graph.c b/commit-graph.c
> index f60346baee..b8737f0ce9 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -145,7 +145,7 @@ static int verify_commit_graph_lite(struct commit_graph *g)
>  	 *
>  	 * There should only be very basic checks here to ensure that
>  	 * we don't e.g. segfault in fill_commit_in_graph(), but
> -	 * because this is a very hot codepath nothing that e.g. loops
> +	 e because this is a very hot codepath nothing that e.g. loops

Bogus hunk, perhaps?

>  	 * over g->num_commits, or runs a checksum on the commit-graph
>  	 * itself.
>  	 */
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index e874a12696..7960cefa1b 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -49,6 +49,34 @@ test_expect_success 'exit with correct error on bad input to --stdin-commits' '
>  	test_i18ngrep "invalid commit object id" stderr
>  '
> 
> +graph_expect_commits() {
> +	test-tool read-graph >got
> +	if ! grep "num_commits: $1" got
> +	then
> +		echo "graph_expect_commits: expected $1 commit(s), got:"
> +		cat got
> +		false
> +	fi
> +}
> +
> +test_expect_success 'ignores non-commit OIDs to --input=stdin-commits with --no-check-oids' '
> +	test_when_finished rm -rf "$objdir/info/commit-graph" &&
> +	cd "$TRASH_DIRECTORY/full" &&
> +	# write a graph to ensure layers are/are not added appropriately
> +	git rev-parse HEAD~1 >base &&
> +	git commit-graph write --stdin-commits <base &&
> +	graph_expect_commits 2 &&
> +	# bad input is rejected
> +	echo HEAD >bad &&
> +	test_expect_code 1 git commit-graph write --stdin-commits <bad 2>err &&
> +	test_i18ngrep "unexpected non-hex object ID: HEAD" err &&
> +	graph_expect_commits 2 &&
> +	# update with valid commit OID, ignore tree OID
> +	git rev-parse HEAD HEAD^{tree} >in &&
> +	git commit-graph write --stdin-commits --no-check-oids <in &&
> +	graph_expect_commits 3
> +'
> +
>  graph_git_two_modes() {
>  	git -c core.commitGraph=true $1 >output
>  	git -c core.commitGraph=false $1 >expect
> --
> 2.26.0.106.g9fadedd637
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/7] commit-graph.c: introduce '--[no-]check-oids'
  2020-04-22 10:55       ` SZEDER Gábor
@ 2020-04-22 23:39         ` Taylor Blau
  2020-04-24 10:59           ` SZEDER Gábor
  0 siblings, 1 reply; 19+ messages in thread
From: Taylor Blau @ 2020-04-22 23:39 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Taylor Blau, git, dstolee, gitster, martin.agren, peff

On Wed, Apr 22, 2020 at 12:55:36PM +0200, SZEDER Gábor wrote:
> On Tue, Apr 14, 2020 at 10:31:37PM -0600, Taylor Blau wrote:
> > On Tue, Apr 14, 2020 at 10:29:30PM -0600, Taylor Blau wrote:
> > > Whoops. I sent the wrong version of this patch. It should be the below:
> >
> > Double whoops. I was on the wrong branch, and hit send too early. *This*
> > is the version of the patch that I meant to send ;).
> >
> > --- >8 ---
> >
> > Subject: [PATCH] commit-graph.c: introduce '--[no-]check-oids'
> >
> > When operating on a stream of commit OIDs on stdin, 'git commit-graph
> > write' checks that each OID refers to an object that is indeed a commit.
> > This is convenient to make sure that the given input is well-formed, but
> > can sometimes be undesirable.
> >
> > For example, server operators may wish to feed the refnames that were
>
> s/the refnames/full commit object IDs pointed to by refs/
>
> or something similar.
>
> > updated during a push to 'git commit-graph write --input=stdin-commits',
> > and silently discard refs that don't point at commits.
>
> s/refs/<something along the lines of the above>/
>
> > This can be done
> > by combing the output of 'git for-each-ref' with '--format
> > %(*objecttype)', but this requires opening up a potentially large number
> > of objects.  Instead, it is more convenient to feed the updated refs to
>
> s/refs/.../
>
> > the commit-graph machinery, and let it throw out refs that don't point
>
> s/refs/.../
>
> > to commits.
> >
> > Introduce '--[no-]check-oids' to make such a behavior possible. With
> > '--check-oids' (the default behavior to retain backwards compatibility),
> > 'git commit-graph write' will barf on a non-commit line in its input.
> > With 'no-check-oids', such lines will be silently ignored, making the
>
> s/no-check-oids/--no-check-oids/
>
> > above possible by specifying this option.
> >
> > No matter which is supplied, 'git commit-graph write' retains the
> > behavior from the previous commit of rejecting non-OID inputs like
> > "HEAD" and "refs/heads/foo" as before.
>
> See? :)  This is why all those s/// are necessary.

Ah :). All good points; thanks for your suggestions.

> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> >  Documentation/git-commit-graph.txt |  5 +++++
> >  builtin/commit-graph.c             | 11 ++++++++---
> >  commit-graph.c                     |  2 +-
> >  t/t5318-commit-graph.sh            | 28 ++++++++++++++++++++++++++++
> >  4 files changed, 42 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
> > index 46f7f7c573..91e8027b86 100644
> > --- a/Documentation/git-commit-graph.txt
> > +++ b/Documentation/git-commit-graph.txt
> > @@ -82,6 +82,11 @@ tip with the previous tip.
> >  Finally, if `--expire-time=<datetime>` is not specified, let `datetime`
> >  be the current time. After writing the split commit-graph, delete all
> >  unused commit-graph whose modified times are older than `datetime`.
> > ++
> > +The `--[no-]check-oids` option decides whether or not OIDs are required
> > +to be commits. By default, `--check-oids` is implied, generating an
> > +error on non-commit objects. If `--no-check-oids` is given, non-commits
> > +are silently discarded.
>
> What happens with OIDs of tags, in particular with OIDs of tags that
> can be peeled down to commit objects?  According to (my (too
> pedantic?) interpretation of) the above description they will trigger
> an error with '--check-oids' or will be ignored with
> '--no-check-oids'.  The implementation, however, accepts those oids
> and peels them down to commit objects; I think this is the right
> behaviour.

That's right, and certainly merits a mention in the documentation. I've
added that...

> What happens with OIDs that name non-existing objects?

...these are silently discarded. I think that you could make a
compelling argument in either direction on this one, but I'm slightly
swayed towards "discard these, too", since '--no-check-oids' is
literally saying "don't check these".

I guess that pushes us into the territory of whether or not "check" is
the right verb. "verify"? "scrutinize" :)? Do you have any thoughts
here?

If you're otherwise satisfied with this series, here's the updated
patch.

-- >8 --

Subject: [PATCH] commit-graph.c: introduce '--[no-]check-oids'

When operating on a stream of commit OIDs on stdin, 'git commit-graph
write' checks that each OID refers to an object that is indeed a commit.
This is convenient to make sure that the given input is well-formed, but
can sometimes be undesirable.

For example, server operators may wish to feed the full commit object
IDs pointed to by refs that were updated during a push to 'git
commit-graph write --input=stdin-commits', and silently discard any
input that doesn't point at a commit. This can be done by combing the
output of 'git for-each-ref' with '--format %(*objecttype)', but this
requires opening up a potentially large number of objects.  Instead, it
is more convenient to feed the updated object IDs to the commit-graph
machinery, and let it throw out whatever remains.  to commits.

Introduce '--[no-]check-oids' to make such a behavior possible. With
'--check-oids' (the default behavior to retain backwards compatibility),
'git commit-graph write' will barf on a non-commit line in its input.
With '--no-check-oids', such lines will be silently ignored, making the
above possible by specifying this option.

No matter which is supplied, 'git commit-graph write' retains the
behavior from the previous commit of rejecting non-OID inputs like
"HEAD" and "refs/heads/foo" as before.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-commit-graph.txt |  6 ++++++
 builtin/commit-graph.c             | 11 ++++++++---
 t/t5318-commit-graph.sh            | 28 ++++++++++++++++++++++++++++
 3 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 46f7f7c573..6bdbe42766 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -82,6 +82,12 @@ tip with the previous tip.
 Finally, if `--expire-time=<datetime>` is not specified, let `datetime`
 be the current time. After writing the split commit-graph, delete all
 unused commit-graph whose modified times are older than `datetime`.
++
+The `--[no-]check-oids` option decides whether or not OIDs are required
+to be commits. By default, `--check-oids` is implied, generating an
+error on non-commit objects. If `--no-check-oids` is given, non-commits
+and non-existent objects are silently discarded. In either case, tags
+are peeled down to the object they reference.

 'verify'::

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index c69716aa7e..2d0a8e822a 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -11,7 +11,7 @@ static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"),
 	N_("git commit-graph write [--object-dir <objdir>] [--append] "
 	   "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] "
-	   "[--[no-]progress] <split options>"),
+	   "[--[no-]progress] [--[no-]check-oids] <split options>"),
 	NULL
 };

@@ -23,7 +23,7 @@ static const char * const builtin_commit_graph_verify_usage[] = {
 static const char * const builtin_commit_graph_write_usage[] = {
 	N_("git commit-graph write [--object-dir <objdir>] [--append] "
 	   "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] "
-	   "[--[no-]progress] <split options>"),
+	   "[--[no-]progress] [--[no-]check-oids] <split options>"),
 	NULL
 };

@@ -36,6 +36,7 @@ static struct opts_commit_graph {
 	int split;
 	int shallow;
 	int progress;
+	int check_oids;
 } opts;

 static struct object_directory *find_odb(struct repository *r,
@@ -163,6 +164,8 @@ static int graph_write(int argc, const char **argv)
 			N_("allow writing an incremental commit-graph file"),
 			PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
 			write_option_parse_split),
+		OPT_BOOL(0, "check-oids", &opts.check_oids,
+			N_("require OIDs to be commits")),
 		OPT_INTEGER(0, "max-commits", &split_opts.max_commits,
 			N_("maximum number of commits in a non-base split commit-graph")),
 		OPT_INTEGER(0, "size-multiple", &split_opts.size_multiple,
@@ -173,6 +176,7 @@ static int graph_write(int argc, const char **argv)
 	};

 	opts.progress = isatty(2);
+	opts.check_oids = 1;
 	split_opts.size_multiple = 2;
 	split_opts.max_commits = 0;
 	split_opts.expire_time = 0;
@@ -227,7 +231,8 @@ static int graph_write(int argc, const char **argv)

 				oidset_insert(&commits, &oid);
 			}
-			flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
+			if (opts.check_oids)
+				flags |= COMMIT_GRAPH_WRITE_CHECK_OIDS;
 		}

 		UNLEAK(buf);
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index e874a12696..7960cefa1b 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -49,6 +49,34 @@ test_expect_success 'exit with correct error on bad input to --stdin-commits' '
 	test_i18ngrep "invalid commit object id" stderr
 '

+graph_expect_commits() {
+	test-tool read-graph >got
+	if ! grep "num_commits: $1" got
+	then
+		echo "graph_expect_commits: expected $1 commit(s), got:"
+		cat got
+		false
+	fi
+}
+
+test_expect_success 'ignores non-commit OIDs to --input=stdin-commits with --no-check-oids' '
+	test_when_finished rm -rf "$objdir/info/commit-graph" &&
+	cd "$TRASH_DIRECTORY/full" &&
+	# write a graph to ensure layers are/are not added appropriately
+	git rev-parse HEAD~1 >base &&
+	git commit-graph write --stdin-commits <base &&
+	graph_expect_commits 2 &&
+	# bad input is rejected
+	echo HEAD >bad &&
+	test_expect_code 1 git commit-graph write --stdin-commits <bad 2>err &&
+	test_i18ngrep "unexpected non-hex object ID: HEAD" err &&
+	graph_expect_commits 2 &&
+	# update with valid commit OID, ignore tree OID
+	git rev-parse HEAD HEAD^{tree} >in &&
+	git commit-graph write --stdin-commits --no-check-oids <in &&
+	graph_expect_commits 3
+'
+
 graph_git_two_modes() {
 	git -c core.commitGraph=true $1 >output
 	git -c core.commitGraph=false $1 >expect
--
2.26.0.113.ge9739cdccc


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/7] commit-graph.c: introduce '--[no-]check-oids'
  2020-04-22 23:39         ` Taylor Blau
@ 2020-04-24 10:59           ` SZEDER Gábor
  2020-05-01 22:38             ` Taylor Blau
  0 siblings, 1 reply; 19+ messages in thread
From: SZEDER Gábor @ 2020-04-24 10:59 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, martin.agren, peff

On Wed, Apr 22, 2020 at 05:39:30PM -0600, Taylor Blau wrote:
> > > diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
> > > index 46f7f7c573..91e8027b86 100644
> > > --- a/Documentation/git-commit-graph.txt
> > > +++ b/Documentation/git-commit-graph.txt
> > > @@ -82,6 +82,11 @@ tip with the previous tip.
> > >  Finally, if `--expire-time=<datetime>` is not specified, let `datetime`
> > >  be the current time. After writing the split commit-graph, delete all
> > >  unused commit-graph whose modified times are older than `datetime`.
> > > ++
> > > +The `--[no-]check-oids` option decides whether or not OIDs are required
> > > +to be commits. By default, `--check-oids` is implied, generating an
> > > +error on non-commit objects. If `--no-check-oids` is given, non-commits
> > > +are silently discarded.
> >
> > What happens with OIDs of tags, in particular with OIDs of tags that
> > can be peeled down to commit objects?  According to (my (too
> > pedantic?) interpretation of) the above description they will trigger
> > an error with '--check-oids' or will be ignored with
> > '--no-check-oids'.  The implementation, however, accepts those oids
> > and peels them down to commit objects; I think this is the right
> > behaviour.
> 
> That's right, and certainly merits a mention in the documentation. I've
> added that...
> 
> > What happens with OIDs that name non-existing objects?
> 
> ...these are silently discarded. I think that you could make a
> compelling argument in either direction on this one, but I'm slightly
> swayed towards "discard these, too", since '--no-check-oids' is
> literally saying "don't check these".

I don't want to argue either way, but I'd argue for making a conscious
decision that is justified in the commit message and documented in the
docs.

So, the option is '--stdin-commits' or '--input=stdin-commits', but
it's not only about commits.  Now, allowing OIDs of tags pointing to
commits and peeling them makes obvious sense, because we want commits
reachable from those tags included in the commit-graph file.  Allowing
OIDs of tags pointing to non-commits and silently ignoring them makes
sense (to better support your 'git f-e-r ... | git c-g write ...' use
case), though it's not that obvious (after all I managed to overlook
it back then, that's why we are now here discussing these
'--check-oids' patches).

But I'm not sure about silently ignoring OIDs pointing to non-existent
objects, because those might be a sign of some issues in whatever is
generating the list of objects to be fed to 'git c-g write'.  E.g. there
could be a race between 'git for-each-ref' listing OIDs and some other
processes pruning them.  Does this worth worrying about?  Dunno...
but at least let's think it through, and record in the commit message
why we made that decision, whatever that decision might be.

> I guess that pushes us into the territory of whether or not "check" is
> the right verb. "verify"? 

Oh, I didn't think about this, but now that you mention it we have
'--verify' in 'git rev-parse', 'git tag' and elsewhere, and we have
'verify-commit', 'verify-path' and 'verify-tag' commands.  So
'--verify-oids' might be more consistent.  I kind of like the 'oids'
suffix in the option name, though I don't know what else we might want
to verify in this command in the future...

> "scrutinize" :)?

Huhh, erm, no ;)

> If you're otherwise satisfied with this series, here's the updated
> patch.

I haven't yet looked closely at the rest of the series...  The
documentation update in the updated patch below looks good to me,
thanks.

> -- >8 --
> 
> Subject: [PATCH] commit-graph.c: introduce '--[no-]check-oids'
> 
> When operating on a stream of commit OIDs on stdin, 'git commit-graph
> write' checks that each OID refers to an object that is indeed a commit.
> This is convenient to make sure that the given input is well-formed, but
> can sometimes be undesirable.
> 
> For example, server operators may wish to feed the full commit object
> IDs pointed to by refs that were updated during a push to 'git
> commit-graph write --input=stdin-commits', and silently discard any
> input that doesn't point at a commit. This can be done by combing the
> output of 'git for-each-ref' with '--format %(*objecttype)', but this
> requires opening up a potentially large number of objects.  Instead, it
> is more convenient to feed the updated object IDs to the commit-graph
> machinery, and let it throw out whatever remains.  to commits.

Either the bulk of a sentence is missing, or there is a stray(?) "to
commits." at the end of this line.

> Introduce '--[no-]check-oids' to make such a behavior possible. With
> '--check-oids' (the default behavior to retain backwards compatibility),
> 'git commit-graph write' will barf on a non-commit line in its input.
> With '--no-check-oids', such lines will be silently ignored, making the
> above possible by specifying this option.
> 
> No matter which is supplied, 'git commit-graph write' retains the
> behavior from the previous commit of rejecting non-OID inputs like
> "HEAD" and "refs/heads/foo" as before.
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  Documentation/git-commit-graph.txt |  6 ++++++
>  builtin/commit-graph.c             | 11 ++++++++---
>  t/t5318-commit-graph.sh            | 28 ++++++++++++++++++++++++++++
>  3 files changed, 42 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
> index 46f7f7c573..6bdbe42766 100644
> --- a/Documentation/git-commit-graph.txt
> +++ b/Documentation/git-commit-graph.txt
> @@ -82,6 +82,12 @@ tip with the previous tip.
>  Finally, if `--expire-time=<datetime>` is not specified, let `datetime`
>  be the current time. After writing the split commit-graph, delete all
>  unused commit-graph whose modified times are older than `datetime`.
> ++
> +The `--[no-]check-oids` option decides whether or not OIDs are required
> +to be commits. By default, `--check-oids` is implied, generating an
> +error on non-commit objects. If `--no-check-oids` is given, non-commits
> +and non-existent objects are silently discarded. In either case, tags
> +are peeled down to the object they reference.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/7] commit-graph.c: introduce '--[no-]check-oids'
  2020-04-24 10:59           ` SZEDER Gábor
@ 2020-05-01 22:38             ` Taylor Blau
  2020-05-03  9:40               ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Taylor Blau @ 2020-05-01 22:38 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Taylor Blau, git, dstolee, gitster, martin.agren, peff

On Fri, Apr 24, 2020 at 12:59:57PM +0200, SZEDER Gábor wrote:
> On Wed, Apr 22, 2020 at 05:39:30PM -0600, Taylor Blau wrote:
> > > > diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
> > > > index 46f7f7c573..91e8027b86 100644
> > > > --- a/Documentation/git-commit-graph.txt
> > > > +++ b/Documentation/git-commit-graph.txt
> > > > @@ -82,6 +82,11 @@ tip with the previous tip.
> > > >  Finally, if `--expire-time=<datetime>` is not specified, let `datetime`
> > > >  be the current time. After writing the split commit-graph, delete all
> > > >  unused commit-graph whose modified times are older than `datetime`.
> > > > ++
> > > > +The `--[no-]check-oids` option decides whether or not OIDs are required
> > > > +to be commits. By default, `--check-oids` is implied, generating an
> > > > +error on non-commit objects. If `--no-check-oids` is given, non-commits
> > > > +are silently discarded.
> > >
> > > What happens with OIDs of tags, in particular with OIDs of tags that
> > > can be peeled down to commit objects?  According to (my (too
> > > pedantic?) interpretation of) the above description they will trigger
> > > an error with '--check-oids' or will be ignored with
> > > '--no-check-oids'.  The implementation, however, accepts those oids
> > > and peels them down to commit objects; I think this is the right
> > > behaviour.
> >
> > That's right, and certainly merits a mention in the documentation. I've
> > added that...
> >
> > > What happens with OIDs that name non-existing objects?
> >
> > ...these are silently discarded. I think that you could make a
> > compelling argument in either direction on this one, but I'm slightly
> > swayed towards "discard these, too", since '--no-check-oids' is
> > literally saying "don't check these".
>
> I don't want to argue either way, but I'd argue for making a conscious
> decision that is justified in the commit message and documented in the
> docs.

Me either, I very much welcome your consistently thoughtful replies
(even if my extreme delay in responding to this one would suggest
otherwise... ;)).

> So, the option is '--stdin-commits' or '--input=stdin-commits', but
> it's not only about commits.  Now, allowing OIDs of tags pointing to
> commits and peeling them makes obvious sense, because we want commits
> reachable from those tags included in the commit-graph file.  Allowing
> OIDs of tags pointing to non-commits and silently ignoring them makes
> sense (to better support your 'git f-e-r ... | git c-g write ...' use
> case), though it's not that obvious (after all I managed to overlook
> it back then, that's why we are now here discussing these
> '--check-oids' patches).
>
> But I'm not sure about silently ignoring OIDs pointing to non-existent
> objects, because those might be a sign of some issues in whatever is
> generating the list of objects to be fed to 'git c-g write'.  E.g. there
> could be a race between 'git for-each-ref' listing OIDs and some other
> processes pruning them.  Does this worth worrying about?  Dunno...
> but at least let's think it through, and record in the commit message
> why we made that decision, whatever that decision might be.

Yeah, I think that the most reasonable behavior is definitely that we
should complain about non-existent objects over 'git commit-graph write
--stdin-commits' no matter if '--[no-]check-oids' is given or not.

But, let's step back for a minute. What are we actually hoping to
accomplish with '--check-oids'? I wrote this patch because I wanted a
way to support 'git for-each-ref' piping into 'git commit-graph write'
without having to juggle which tags peel down to commits and which
don't.

Now, I figured that it would be unreasonable to change the default
behavior of 'git commit-graph write --stdin-commits' (which is to
complain and error out in this circumstance), so I added
'--no-check-oids' as a way to avoid that behavior for callers that want
that.

But, are there ever any callers that *wouldn't* want this behavior? As
far as I can tell, probably not. We're only going to be permitting
*more* inputs to 'git commit-graph write', and I seriously doubt that
anybody is depending on the above behavior. (Of course, if that's not
the case, I'd love for somebody to speak up here and we can continue
the course on this patch).

So, I propose the following:

  * We drop the idea of '--[no-]{check,verify}-oids', and always
    silently ignore non-commit inputs, retaining the existing behavior
    of always complaining about things that aren't valid hex OIDs, such
    as "HEAD".

  * We always error out on missing or corrupt commit OIDs, including
    valid OIDs that don't resolve to any object, or resolve to a tag
    that can't be fully peeled.

Does that seem reasonable?

> > I guess that pushes us into the territory of whether or not "check" is
> > the right verb. "verify"?
>
> Oh, I didn't think about this, but now that you mention it we have
> '--verify' in 'git rev-parse', 'git tag' and elsewhere, and we have
> 'verify-commit', 'verify-path' and 'verify-tag' commands.  So
> '--verify-oids' might be more consistent.  I kind of like the 'oids'
> suffix in the option name, though I don't know what else we might want
> to verify in this command in the future...
>
> > "scrutinize" :)?
>
> Huhh, erm, no ;)
>
> > If you're otherwise satisfied with this series, here's the updated
> > patch.
>
> I haven't yet looked closely at the rest of the series...  The
> documentation update in the updated patch below looks good to me,
> thanks.
>
> > -- >8 --
> >
> > Subject: [PATCH] commit-graph.c: introduce '--[no-]check-oids'
> >
> > When operating on a stream of commit OIDs on stdin, 'git commit-graph
> > write' checks that each OID refers to an object that is indeed a commit.
> > This is convenient to make sure that the given input is well-formed, but
> > can sometimes be undesirable.
> >
> > For example, server operators may wish to feed the full commit object
> > IDs pointed to by refs that were updated during a push to 'git
> > commit-graph write --input=stdin-commits', and silently discard any
> > input that doesn't point at a commit. This can be done by combing the
> > output of 'git for-each-ref' with '--format %(*objecttype)', but this
> > requires opening up a potentially large number of objects.  Instead, it
> > is more convenient to feed the updated object IDs to the commit-graph
> > machinery, and let it throw out whatever remains.  to commits.
>
> Either the bulk of a sentence is missing, or there is a stray(?) "to
> commits." at the end of this line.
>
> > Introduce '--[no-]check-oids' to make such a behavior possible. With
> > '--check-oids' (the default behavior to retain backwards compatibility),
> > 'git commit-graph write' will barf on a non-commit line in its input.
> > With '--no-check-oids', such lines will be silently ignored, making the
> > above possible by specifying this option.
> >
> > No matter which is supplied, 'git commit-graph write' retains the
> > behavior from the previous commit of rejecting non-OID inputs like
> > "HEAD" and "refs/heads/foo" as before.
> >
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> >  Documentation/git-commit-graph.txt |  6 ++++++
> >  builtin/commit-graph.c             | 11 ++++++++---
> >  t/t5318-commit-graph.sh            | 28 ++++++++++++++++++++++++++++
> >  3 files changed, 42 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
> > index 46f7f7c573..6bdbe42766 100644
> > --- a/Documentation/git-commit-graph.txt
> > +++ b/Documentation/git-commit-graph.txt
> > @@ -82,6 +82,12 @@ tip with the previous tip.
> >  Finally, if `--expire-time=<datetime>` is not specified, let `datetime`
> >  be the current time. After writing the split commit-graph, delete all
> >  unused commit-graph whose modified times are older than `datetime`.
> > ++
> > +The `--[no-]check-oids` option decides whether or not OIDs are required
> > +to be commits. By default, `--check-oids` is implied, generating an
> > +error on non-commit objects. If `--no-check-oids` is given, non-commits
> > +and non-existent objects are silently discarded. In either case, tags
> > +are peeled down to the object they reference.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/7] commit-graph.c: introduce '--[no-]check-oids'
  2020-05-01 22:38             ` Taylor Blau
@ 2020-05-03  9:40               ` Jeff King
  2020-05-03 16:55                 ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2020-05-03  9:40 UTC (permalink / raw)
  To: Taylor Blau; +Cc: SZEDER Gábor, git, dstolee, gitster, martin.agren

On Fri, May 01, 2020 at 04:38:48PM -0600, Taylor Blau wrote:

> But, are there ever any callers that *wouldn't* want this behavior? As
> far as I can tell, probably not. We're only going to be permitting
> *more* inputs to 'git commit-graph write', and I seriously doubt that
> anybody is depending on the above behavior. (Of course, if that's not
> the case, I'd love for somebody to speak up here and we can continue
> the course on this patch).
> 
> So, I propose the following:
> 
>   * We drop the idea of '--[no-]{check,verify}-oids', and always
>     silently ignore non-commit inputs, retaining the existing behavior
>     of always complaining about things that aren't valid hex OIDs, such
>     as "HEAD".
> 
>   * We always error out on missing or corrupt commit OIDs, including
>     valid OIDs that don't resolve to any object, or resolve to a tag
>     that can't be fully peeled.
> 
> Does that seem reasonable?

FWIW, I think that is the best direction. If anybody is depending on the
"commit-graph write will complain about non-commits" behavior, they
could only be doing so for a few versions; prior to v2.24.0 we did not.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/7] commit-graph.c: introduce '--[no-]check-oids'
  2020-05-03  9:40               ` Jeff King
@ 2020-05-03 16:55                 ` Junio C Hamano
  2020-05-04 14:59                   ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2020-05-03 16:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, SZEDER Gábor, git, dstolee, martin.agren

Jeff King <peff@peff.net> writes:

> On Fri, May 01, 2020 at 04:38:48PM -0600, Taylor Blau wrote:
> ...
>> So, I propose the following:
>> 
>>   * We drop the idea of '--[no-]{check,verify}-oids', and always
>>     silently ignore non-commit inputs, retaining the existing behavior
>>     of always complaining about things that aren't valid hex OIDs, such
>>     as "HEAD".
>> 
>>   * We always error out on missing or corrupt commit OIDs, including
>>     valid OIDs that don't resolve to any object, or resolve to a tag
>>     that can't be fully peeled.
>> 
>> Does that seem reasonable?
>
> FWIW, I think that is the best direction. If anybody is depending on the
> "commit-graph write will complain about non-commits" behavior, they
> could only be doing so for a few versions; prior to v2.24.0 we did not.

If we had it for the past 180 days or so, that's not like " people
have seen it for only a brief time", but working it around shouldn't
be too difficult---they need to validate the input they feed to the
command themselves (or do they need to do more?).

Thanks.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/7] commit-graph.c: introduce '--[no-]check-oids'
  2020-05-03 16:55                 ` Junio C Hamano
@ 2020-05-04 14:59                   ` Jeff King
  2020-05-04 16:29                     ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2020-05-04 14:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, SZEDER Gábor, git, dstolee, martin.agren

On Sun, May 03, 2020 at 09:55:39AM -0700, Junio C Hamano wrote:

> >> Does that seem reasonable?
> >
> > FWIW, I think that is the best direction. If anybody is depending on the
> > "commit-graph write will complain about non-commits" behavior, they
> > could only be doing so for a few versions; prior to v2.24.0 we did not.
> 
> If we had it for the past 180 days or so, that's not like " people
> have seen it for only a brief time", but working it around shouldn't
> be too difficult---they need to validate the input they feed to the
> command themselves (or do they need to do more?).

Yeah, my point wasn't so much that it was brief as that we've had it
both ways, and nobody was complaining about it before v2.24.0 (the
type-restriction change came as a side effect of another tightening).

But yeah, if somebody really wants that validation, they can do it
themselves with "cat-file --batch-check". Or even for-each-ref directly:

  git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
  awk '/commit/ { print $1 }' |
  git commit-graph write --stdin-commits

If you're using --stdin-commits, you're presumably processing the input
anyway (since otherwise you'd just be using --reachable).

I suppose you could argue the other way, too (that the user could be
filtering out non-commits). But so far we have one data point in either
direction, and it wants the more forgiving behavior. :)

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/7] commit-graph.c: introduce '--[no-]check-oids'
  2020-05-04 14:59                   ` Jeff King
@ 2020-05-04 16:29                     ` Junio C Hamano
  2020-05-04 22:16                       ` Taylor Blau
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2020-05-04 16:29 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, SZEDER Gábor, git, dstolee, martin.agren

Jeff King <peff@peff.net> writes:

> On Sun, May 03, 2020 at 09:55:39AM -0700, Junio C Hamano wrote:
>
>> >> Does that seem reasonable?
>> >
>> > FWIW, I think that is the best direction. If anybody is depending on the
>> > "commit-graph write will complain about non-commits" behavior, they
>> > could only be doing so for a few versions; prior to v2.24.0 we did not.
>> 
>> If we had it for the past 180 days or so, that's not like " people
>> have seen it for only a brief time", but working it around shouldn't
>> be too difficult---they need to validate the input they feed to the
>> command themselves (or do they need to do more?).
>
> Yeah, my point wasn't so much that it was brief as that we've had it
> both ways, and nobody was complaining about it before v2.24.0 (the
> type-restriction change came as a side effect of another tightening).
>
> But yeah, if somebody really wants that validation, they can do it
> themselves with "cat-file --batch-check". Or even for-each-ref directly:
>
>   git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
>   awk '/commit/ { print $1 }' |
>   git commit-graph write --stdin-commits
>
> If you're using --stdin-commits, you're presumably processing the input
> anyway (since otherwise you'd just be using --reachable).
>
> I suppose you could argue the other way, too (that the user could be
> filtering out non-commits). But so far we have one data point in either
> direction, and it wants the more forgiving behavior. :)

Yup.  I agree that Taylor outlined the best direction going forward.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/7] commit-graph.c: introduce '--[no-]check-oids'
  2020-05-04 16:29                     ` Junio C Hamano
@ 2020-05-04 22:16                       ` Taylor Blau
  0 siblings, 0 replies; 19+ messages in thread
From: Taylor Blau @ 2020-05-04 22:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Taylor Blau, SZEDER Gábor, git, dstolee, martin.agren

On Mon, May 04, 2020 at 09:29:29AM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > On Sun, May 03, 2020 at 09:55:39AM -0700, Junio C Hamano wrote:
> >
> >> >> Does that seem reasonable?
> >> >
> >> > FWIW, I think that is the best direction. If anybody is depending on the
> >> > "commit-graph write will complain about non-commits" behavior, they
> >> > could only be doing so for a few versions; prior to v2.24.0 we did not.
> >>
> >> If we had it for the past 180 days or so, that's not like " people
> >> have seen it for only a brief time", but working it around shouldn't
> >> be too difficult---they need to validate the input they feed to the
> >> command themselves (or do they need to do more?).
> >
> > Yeah, my point wasn't so much that it was brief as that we've had it
> > both ways, and nobody was complaining about it before v2.24.0 (the
> > type-restriction change came as a side effect of another tightening).
> >
> > But yeah, if somebody really wants that validation, they can do it
> > themselves with "cat-file --batch-check". Or even for-each-ref directly:
> >
> >   git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
> >   awk '/commit/ { print $1 }' |
> >   git commit-graph write --stdin-commits
> >
> > If you're using --stdin-commits, you're presumably processing the input
> > anyway (since otherwise you'd just be using --reachable).
> >
> > I suppose you could argue the other way, too (that the user could be
> > filtering out non-commits). But so far we have one data point in either
> > direction, and it wants the more forgiving behavior. :)
>
> Yup.  I agree that Taylor outlined the best direction going forward.

Thanks, both. I'll send a series to address this shortly. (I've been
sitting on it in ttaylorr/git for a couple of days over the weekend,
but a few other things today have gotten in the way of me sending it to
the list.)

> Thanks.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-05-04 22:16 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-14  4:03 [PATCH 0/7] commit-graph: split strategies, '--[no-]check-oids' Taylor Blau
2020-04-14  4:04 ` [PATCH 1/7] t/helper/test-read-graph.c: support commit-graph chains Taylor Blau
2020-04-14  4:04 ` [PATCH 2/7] builtin/commit-graph.c: support for '--split[=<strategy>]' Taylor Blau
2020-04-14  4:04 ` [PATCH 3/7] builtin/commit-graph.c: introduce split strategy 'no-merge' Taylor Blau
2020-04-14  4:04 ` [PATCH 4/7] builtin/commit-graph.c: introduce split strategy 'replace' Taylor Blau
2020-04-14  4:04 ` [PATCH 5/7] oidset: introduce 'oidset_size' Taylor Blau
2020-04-14  4:04 ` [PATCH 6/7] commit-graph.h: replace 'commit_hex' with 'commits' Taylor Blau
2020-04-14  4:04 ` [PATCH 7/7] commit-graph.c: introduce '--[no-]check-oids' Taylor Blau
2020-04-15  4:29   ` Taylor Blau
2020-04-15  4:31     ` Taylor Blau
2020-04-22 10:55       ` SZEDER Gábor
2020-04-22 23:39         ` Taylor Blau
2020-04-24 10:59           ` SZEDER Gábor
2020-05-01 22:38             ` Taylor Blau
2020-05-03  9:40               ` Jeff King
2020-05-03 16:55                 ` Junio C Hamano
2020-05-04 14:59                   ` Jeff King
2020-05-04 16:29                     ` Junio C Hamano
2020-05-04 22:16                       ` Taylor Blau

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).