git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges
@ 2023-03-06 14:06 Derrick Stolee via GitGitGadget
  2023-03-06 14:06 ` [PATCH 1/8] ahead-behind: create empty builtin Derrick Stolee via GitGitGadget
                   ` (10 more replies)
  0 siblings, 11 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-06 14:06 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, Derrick Stolee

This series introduces the 'git ahead-behind' builtin, which has been used
at $DAYJOB for many years, but took many forms before landing on the current
version.

The main goal of the builtin is to compare multiple references against a
common base reference. The comparison is number of commits that are in each
side of the symmtric difference of their reachable sets. A commit C is
"ahead" of a commit B by the number of commits in B..C (reachable from C but
not reachable from B). Similarly, the commit C is "behind" the commit B by
the number of commits in C..B (reachable from B but not reachable from C).

These numbers can be computed by 'git rev-list --count B..C' and 'git
rev-list --count C..B', but there are common needs that benefit from having
the checks being done in the same process:

 1. Our "branches" page lists ahead/behind counts for each listed branch as
    compared to the repo's default branch. This can be done with a single
    'git ahead-behind' process.
 2. When a branch is updated, a background job checks if any pull requests
    that target that branch should be closed because their branches were
    merged implicitly by that update. These queries can e batched into 'git
    ahead-behind' calls.

In that second example, we don't need the full ahead/behind counts (although
it is sufficient to look for branches that are "zero commits ahead", meaning
they are reachable from the base), so this builtin has an extra '--contains'
mode that only checks reachability from the base to each of the tips. 'git
ahead-behind --contains' is sort of the reverse of 'git branch --contains'.

The series starts with some basic boilerplate and argument parsing, along
with error conditions for missing objects. To avoid TOCTOU races, an
'--ignore-missing' option allows being flexible when a tip reference does
not exist. This is all covered in patches 1-3.

Patches 4-6 introduce a new method: ensure_generations_valid(). Patch 4 does
some refactoring of the existing generation number computations to make it
more generic, and patch 5 updates the definition of
commit_graph_generation() slightly, making way for patch 6 to implement the
method. With an existing commit-graph file, the commits that are not present
in the file are considered as having generation number "infinity". This is
useful for most of our reachability queries to this point, since those
commits are "above" the ones tracked by the commit-graph. When these commits
are low in number, then there is very little performance cost and zero
correctness cost.

However, we will see that the ahead/behind computation requires accurate
generation numbers to avoid overcounting. Thus, ensure_generations_valid()
is a way to specify a list of commits that need generation numbers computed
before continuing. It's a no-op if all of those commits are in the
commit-graph file. It's expensive if the commit-graph doesn't exist.
However, 'git ahead-behind' computations are likely to be slow no matter
what without a commit-graph, so assuming an existing commit-graph file is
reasonable. If we find sufficient desire to have an implementation that does
not have this requirement, we could create a second implementation and
toggle to it when generation_numbers_enabled() returns false.

Patch 7 implements the ahead-behind algorithm, as well as integrating the
builtin with that implementation. It's a long commit message, so hopefully
it explains the algorithm sufficiently.

Patch 8 implements the --contains option, which is another algorithm, but
more similar to other depth-first searches that already exist in
commit-reach.c.

Thanks, -Stolee

Derrick Stolee (7):
  ahead-behind: create empty builtin
  ahead-behind: parse tip references
  ahead-behind: implement --ignore-missing option
  commit-graph: combine generation computations
  commit-graph: return generation from memory
  ahead-behind: implement ahead_behind() logic
  ahead-behind: add --contains mode

Taylor Blau (1):
  commit-graph: introduce `ensure_generations_valid()`

 .gitignore                         |   1 +
 Documentation/git-ahead-behind.txt |  78 +++++++++++
 Makefile                           |   1 +
 builtin.h                          |   1 +
 builtin/ahead-behind.c             | 121 +++++++++++++++++
 commit-graph.c                     | 209 +++++++++++++++++++----------
 commit-graph.h                     |   7 +
 commit-reach.c                     | 205 ++++++++++++++++++++++++++++
 commit-reach.h                     |  37 +++++
 git.c                              |   1 +
 t/perf/p1500-graph-walks.sh        |  29 ++++
 t/t4218-ahead-behind.sh            | 162 ++++++++++++++++++++++
 t/t5318-commit-graph.sh            |   2 +-
 t/t6600-test-reach.sh              | 120 +++++++++++++++++
 14 files changed, 904 insertions(+), 70 deletions(-)
 create mode 100644 Documentation/git-ahead-behind.txt
 create mode 100644 builtin/ahead-behind.c
 create mode 100755 t/perf/p1500-graph-walks.sh
 create mode 100755 t/t4218-ahead-behind.sh


base-commit: d15644fe0226af7ffc874572d968598564a230dd
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1489%2Fderrickstolee%2Fstolee%2Fupstream-ahead-behind-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1489/derrickstolee/stolee/upstream-ahead-behind-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1489
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH 1/8] ahead-behind: create empty builtin
  2023-03-06 14:06 [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Derrick Stolee via GitGitGadget
@ 2023-03-06 14:06 ` Derrick Stolee via GitGitGadget
  2023-03-06 18:48   ` Junio C Hamano
  2023-03-06 14:06 ` [PATCH 2/8] ahead-behind: parse tip references Derrick Stolee via GitGitGadget
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-06 14:06 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The 'git ahead-behind' builtin _will_ allow users to specify multiple
tip revisions relative to a common base and _will_ report the number of
commits on each side of the symmetric difference between each tip and
the base. However, that algorithm is not implemented yet and instead
this change introduces the builtin and the basic boilerplate for a new
builtin.

This builtin could be replaced with multiple invocations of 'git
rev-list --count <base>..<tip>' (for ahead values) and 'git rev-list
--count <tip>..<base>' (for behind values). However, it is important to
be able to batch these calls into a single process.

For example, we will be able to track all local branches relative to an
upstream branch using an invocation such as

  git for-each-ref --format=%(refname) refs/heads/* |
    git ahead-behind --base=origin/main --stdin

This would report each local branch and how far ahead or behind it is
relative to the remote branch 'origin/main'. This could be used to
signal some branches are very old and need to be updated via 'git
rebase' or deleted. We will see in future changes how such commit
counting can be done efficiently within a single process (and a single
commit walk) instead of multiple processes.

For now, only 'git ahead-behind -h' works, and the builtin reports
failure and shows the usage if the '--base' option is skipped. The
documentation is light. These will be updated in the coming changes.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 .gitignore                         |  1 +
 Documentation/git-ahead-behind.txt | 62 ++++++++++++++++++++++++++++++
 Makefile                           |  1 +
 builtin.h                          |  1 +
 builtin/ahead-behind.c             | 30 +++++++++++++++
 git.c                              |  1 +
 t/t4218-ahead-behind.sh            | 17 ++++++++
 7 files changed, 113 insertions(+)
 create mode 100644 Documentation/git-ahead-behind.txt
 create mode 100644 builtin/ahead-behind.c
 create mode 100755 t/t4218-ahead-behind.sh

diff --git a/.gitignore b/.gitignore
index e875c590545..cc064a4817a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -14,6 +14,7 @@
 /bin-wrappers/
 /git
 /git-add
+/git-ahead-behind
 /git-am
 /git-annotate
 /git-apply
diff --git a/Documentation/git-ahead-behind.txt b/Documentation/git-ahead-behind.txt
new file mode 100644
index 00000000000..0e2f989a1a0
--- /dev/null
+++ b/Documentation/git-ahead-behind.txt
@@ -0,0 +1,62 @@
+git-ahead-behind(1)
+===================
+
+NAME
+----
+git-ahead-behind - Count the commits on each side of a revision range
+
+SYNOPSIS
+--------
+[verse]
+'git ahead-behind' --base=<ref> [ --stdin | <revs> ]
+
+DESCRIPTION
+-----------
+
+Given a list of commit ranges, report the number of commits reachable from
+each of the sides of the range, but not the other. Consider a commit range
+specified as `<base>...<tip>`, allowing for the following definitions:
+
+* The `<tip>` is *ahead* of `<base>` by the number of commits reachable
+  from `<tip>` but not reachable from `<base>`. This is the same as the
+  number of the commits in the range `<base>..<tip>`.
+
+* The `<tip>` is *behind* `<base>` by the number of commits reachable from
+  `<base>` but not reachble from `<tip>`. This is the same as the number
+  of commits in the range `<tip>..<base>`.
+
+The sum of the ahead and behind counts equals the number of commits in the
+symmetric difference, the range `<base>...<tip>`.
+
+Multiple revisions may be specified, and they are all compared against a
+common base revision, as specified by the `--base` option. The values are
+reported to stdout one line at a time as follows:
+
+---
+  <rev> <ahead> <behind>
+---
+
+There will be exactly one line per input revision, but the lines may be
+in an arbitrary order.
+
+
+OPTIONS
+-------
+--base=<ref>::
+	Specify that `<ref>` should be used as a common base for all
+	provided revisions that are not specified in the form of a range.
+
+--stdin::
+	Read revision tips and ranges from stdin instead of from the
+	command-line.
+
+
+SEE ALSO
+--------
+linkgit:git-branch[1]
+linkgit:git-rev-list[1]
+linkgit:git-tag[1]
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 50ee51fde32..691f84e8d4e 100644
--- a/Makefile
+++ b/Makefile
@@ -1199,6 +1199,7 @@ LIB_OBJS += xdiff-interface.o
 LIB_OBJS += zlib.o
 
 BUILTIN_OBJS += builtin/add.o
+BUILTIN_OBJS += builtin/ahead-behind.o
 BUILTIN_OBJS += builtin/am.o
 BUILTIN_OBJS += builtin/annotate.o
 BUILTIN_OBJS += builtin/apply.o
diff --git a/builtin.h b/builtin.h
index 46cc7897898..1ae168fa3e3 100644
--- a/builtin.h
+++ b/builtin.h
@@ -108,6 +108,7 @@ void setup_auto_pager(const char *cmd, int def);
 int is_builtin(const char *s);
 
 int cmd_add(int argc, const char **argv, const char *prefix);
+int cmd_ahead_behind(int argc, const char **argv, const char *prefix);
 int cmd_am(int argc, const char **argv, const char *prefix);
 int cmd_annotate(int argc, const char **argv, const char *prefix);
 int cmd_apply(int argc, const char **argv, const char *prefix);
diff --git a/builtin/ahead-behind.c b/builtin/ahead-behind.c
new file mode 100644
index 00000000000..a56cc565def
--- /dev/null
+++ b/builtin/ahead-behind.c
@@ -0,0 +1,30 @@
+#include "builtin.h"
+#include "parse-options.h"
+#include "config.h"
+
+static const char * const ahead_behind_usage[] = {
+	N_("git ahead-behind --base=<ref> [ --stdin | <revs> ]"),
+	NULL
+};
+
+int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
+{
+	const char *base_ref = NULL;
+	int from_stdin = 0;
+
+	struct option ahead_behind_opts[] = {
+		OPT_STRING('b', "base", &base_ref, N_("base"), N_("base reference to process")),
+		OPT_BOOL(0 , "stdin", &from_stdin, N_("read rev names from stdin")),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, NULL, ahead_behind_opts,
+			     ahead_behind_usage, PARSE_OPT_KEEP_UNKNOWN_OPT);
+
+	if (!base_ref)
+		usage_with_options(ahead_behind_usage, ahead_behind_opts);
+
+	git_config(git_default_config, NULL);
+
+	return 0;
+}
diff --git a/git.c b/git.c
index 6171fd6769d..64e3d493561 100644
--- a/git.c
+++ b/git.c
@@ -467,6 +467,7 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
 
 static struct cmd_struct commands[] = {
 	{ "add", cmd_add, RUN_SETUP | NEED_WORK_TREE },
+	{ "ahead-behind", cmd_ahead_behind, RUN_SETUP },
 	{ "am", cmd_am, RUN_SETUP | NEED_WORK_TREE },
 	{ "annotate", cmd_annotate, RUN_SETUP },
 	{ "apply", cmd_apply, RUN_SETUP_GENTLY },
diff --git a/t/t4218-ahead-behind.sh b/t/t4218-ahead-behind.sh
new file mode 100755
index 00000000000..bc08f1207a0
--- /dev/null
+++ b/t/t4218-ahead-behind.sh
@@ -0,0 +1,17 @@
+#!/bin/sh
+
+test_description='git ahead-behind command-line options'
+
+. ./test-lib.sh
+
+test_expect_success 'git ahead-behind -h' '
+	test_must_fail git ahead-behind -h >out &&
+	grep "usage:" out
+'
+
+test_expect_success 'git ahead-behind without --base' '
+	test_must_fail git ahead-behind HEAD 2>err &&
+	grep "usage:" err
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 2/8] ahead-behind: parse tip references
  2023-03-06 14:06 [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Derrick Stolee via GitGitGadget
  2023-03-06 14:06 ` [PATCH 1/8] ahead-behind: create empty builtin Derrick Stolee via GitGitGadget
@ 2023-03-06 14:06 ` Derrick Stolee via GitGitGadget
  2023-03-07  0:43   ` Taylor Blau
  2023-03-06 14:06 ` [PATCH 3/8] ahead-behind: implement --ignore-missing option Derrick Stolee via GitGitGadget
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-06 14:06 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Before implementing the logic to compute the ahead/behind counts, parse
the unknown options as commits and place them in a string_list.

Be sure to error out when the reference is not found.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/ahead-behind.c  | 39 +++++++++++++++++++++++++++++++++++++++
 t/t4218-ahead-behind.sh | 10 ++++++++++
 2 files changed, 49 insertions(+)

diff --git a/builtin/ahead-behind.c b/builtin/ahead-behind.c
index a56cc565def..c1212cc8d46 100644
--- a/builtin/ahead-behind.c
+++ b/builtin/ahead-behind.c
@@ -1,16 +1,31 @@
 #include "builtin.h"
 #include "parse-options.h"
 #include "config.h"
+#include "commit.h"
 
 static const char * const ahead_behind_usage[] = {
 	N_("git ahead-behind --base=<ref> [ --stdin | <revs> ]"),
 	NULL
 };
 
+static int handle_arg(struct string_list *tips, const char *arg)
+{
+	struct string_list_item *item;
+	struct commit *c = lookup_commit_reference_by_name(arg);
+
+	if (!c)
+		return error(_("could not resolve '%s'"), arg);
+
+	item = string_list_append(tips, arg);
+	item->util = c;
+	return 0;
+}
+
 int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
 {
 	const char *base_ref = NULL;
 	int from_stdin = 0;
+	struct string_list tips = STRING_LIST_INIT_DUP;
 
 	struct option ahead_behind_opts[] = {
 		OPT_STRING('b', "base", &base_ref, N_("base"), N_("base reference to process")),
@@ -26,5 +41,29 @@ int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
 
 	git_config(git_default_config, NULL);
 
+	if (from_stdin) {
+		struct strbuf line = STRBUF_INIT;
+
+		while (strbuf_getline(&line, stdin) != EOF) {
+			if (!line.len)
+				break;
+
+			if (handle_arg(&tips, line.buf))
+				return 1;
+		}
+
+		strbuf_release(&line);
+	} else {
+		int i;
+		for (i = 0; i < argc; ++i) {
+			if (handle_arg(&tips, argv[i]))
+				return 1;
+		}
+	}
+
+	/* Early return for no tips. */
+	if (!tips.nr)
+		return 0;
+
 	return 0;
 }
diff --git a/t/t4218-ahead-behind.sh b/t/t4218-ahead-behind.sh
index bc08f1207a0..3b8b9dc9887 100755
--- a/t/t4218-ahead-behind.sh
+++ b/t/t4218-ahead-behind.sh
@@ -14,4 +14,14 @@ test_expect_success 'git ahead-behind without --base' '
 	grep "usage:" err
 '
 
+test_expect_success 'git ahead-behind with broken tip' '
+	test_must_fail git ahead-behind --base=HEAD bogus 2>err &&
+	grep "could not resolve '\''bogus'\''" err
+'
+
+test_expect_success 'git ahead-behind without tips' '
+	git ahead-behind --base=HEAD 2>err &&
+	test_must_be_empty err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 3/8] ahead-behind: implement --ignore-missing option
  2023-03-06 14:06 [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Derrick Stolee via GitGitGadget
  2023-03-06 14:06 ` [PATCH 1/8] ahead-behind: create empty builtin Derrick Stolee via GitGitGadget
  2023-03-06 14:06 ` [PATCH 2/8] ahead-behind: parse tip references Derrick Stolee via GitGitGadget
@ 2023-03-06 14:06 ` Derrick Stolee via GitGitGadget
  2023-03-07  0:46   ` Taylor Blau
  2023-03-06 14:06 ` [PATCH 4/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-06 14:06 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When parsing the tip revisions from the ahead-behind inputs, it is
important to check that those tips exist before adding them to the list
for computation. The previous change caused the builtin to return with
errors if the revisions could not be resolved.

However, when running 'git ahead-behind' in an environment with
concurrent edits, such as a Git server, then the references could be
deleted from underneath the caller between reading the reference list
and starting the 'git ahead-behind' process. Avoid this race by allowing
the caller to specify '--ignore-missing' and continue using the
information that is still available.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-ahead-behind.txt | 6 ++++++
 builtin/ahead-behind.c             | 8 +++++++-
 t/t4218-ahead-behind.sh            | 6 ++++++
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-ahead-behind.txt b/Documentation/git-ahead-behind.txt
index 0e2f989a1a0..2dd5147f6b2 100644
--- a/Documentation/git-ahead-behind.txt
+++ b/Documentation/git-ahead-behind.txt
@@ -50,6 +50,12 @@ OPTIONS
 	Read revision tips and ranges from stdin instead of from the
 	command-line.
 
+--ignore-missing::
+	When parsing tip references, ignore any references that are not
+	found. This is useful when operating in an environment where a
+	reference could be deleted between reading the reference and
+	starting the `git ahead-behind` process.
+
 
 SEE ALSO
 --------
diff --git a/builtin/ahead-behind.c b/builtin/ahead-behind.c
index c1212cc8d46..e4f65fc0548 100644
--- a/builtin/ahead-behind.c
+++ b/builtin/ahead-behind.c
@@ -8,13 +8,18 @@ static const char * const ahead_behind_usage[] = {
 	NULL
 };
 
+static int ignore_missing;
+
 static int handle_arg(struct string_list *tips, const char *arg)
 {
 	struct string_list_item *item;
 	struct commit *c = lookup_commit_reference_by_name(arg);
 
-	if (!c)
+	if (!c) {
+		if (ignore_missing)
+			return 0;
 		return error(_("could not resolve '%s'"), arg);
+	}
 
 	item = string_list_append(tips, arg);
 	item->util = c;
@@ -30,6 +35,7 @@ int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
 	struct option ahead_behind_opts[] = {
 		OPT_STRING('b', "base", &base_ref, N_("base"), N_("base reference to process")),
 		OPT_BOOL(0 , "stdin", &from_stdin, N_("read rev names from stdin")),
+		OPT_BOOL(0 , "ignore-missing", &ignore_missing, N_("ignore missing tip references")),
 		OPT_END()
 	};
 
diff --git a/t/t4218-ahead-behind.sh b/t/t4218-ahead-behind.sh
index 3b8b9dc9887..56f16515896 100755
--- a/t/t4218-ahead-behind.sh
+++ b/t/t4218-ahead-behind.sh
@@ -19,6 +19,12 @@ test_expect_success 'git ahead-behind with broken tip' '
 	grep "could not resolve '\''bogus'\''" err
 '
 
+test_expect_success 'git ahead-behind with broken tip and --ignore-missing' '
+	git ahead-behind --base=HEAD --ignore-missing bogus 2>err >out &&
+	test_must_be_empty err &&
+	test_must_be_empty out
+'
+
 test_expect_success 'git ahead-behind without tips' '
 	git ahead-behind --base=HEAD 2>err &&
 	test_must_be_empty err
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 4/8] commit-graph: combine generation computations
  2023-03-06 14:06 [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2023-03-06 14:06 ` [PATCH 3/8] ahead-behind: implement --ignore-missing option Derrick Stolee via GitGitGadget
@ 2023-03-06 14:06 ` Derrick Stolee via GitGitGadget
  2023-03-06 14:06 ` [PATCH 5/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-06 14:06 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

This patch extracts the common code used to compute topological levels
and corrected committer dates into a common routine,
compute_reachable_generation_numbers_1().

This new routine dispatches to call the necessary functions to get and
set the generation number for a given commit through a vtable (the
compute_generation_info struct).

Computing the generation number itself is done in
compute_generation_from_max(), which dispatches its implementation based
on the generation version requested, or issuing a BUG() for unrecognized
generation versions.

This patch cleans up the two places that currently compute topological
levels and corrected commit dates by reducing the amount of duplicated
code. It also makes it possible to introduce a function which
dynamically computes those values for commits that aren't stored in a
commit-graph, which will be required for the forthcoming ahead-behind
rewrite.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-graph.c | 171 +++++++++++++++++++++++++++++++------------------
 1 file changed, 107 insertions(+), 64 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index c11b59f28b3..deccf984a0d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1446,24 +1446,53 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 	stop_progress(&ctx->progress);
 }
 
-static void compute_topological_levels(struct write_commit_graph_context *ctx)
+struct compute_generation_info {
+	struct repository *r;
+	struct packed_commit_list *commits;
+	struct progress *progress;
+	int progress_cnt;
+
+	timestamp_t (*get_generation)(struct commit *c, void *data);
+	void (*set_generation)(struct commit *c, timestamp_t gen, void *data);
+	void *data;
+};
+
+static timestamp_t compute_generation_from_max(struct commit *c,
+					       timestamp_t max_gen,
+					       int generation_version)
+{
+	switch (generation_version) {
+	case 1: /* topological levels */
+		if (max_gen > GENERATION_NUMBER_V1_MAX - 1)
+			max_gen = GENERATION_NUMBER_V1_MAX - 1;
+		return max_gen + 1;
+
+	case 2: /* corrected commit date */
+		if (c->date && c->date > max_gen)
+			max_gen = c->date - 1;
+		return max_gen + 1;
+
+	default:
+		BUG("attempting unimplemented version");
+	}
+}
+
+static void compute_reachable_generation_numbers_1(
+			struct compute_generation_info *info,
+			int generation_version)
 {
 	int i;
 	struct commit_list *list = NULL;
 
-	if (ctx->report_progress)
-		ctx->progress = start_delayed_progress(
-					_("Computing commit graph topological levels"),
-					ctx->commits.nr);
-	for (i = 0; i < ctx->commits.nr; i++) {
-		struct commit *c = ctx->commits.list[i];
-		uint32_t level;
+	for (i = 0; i < info->commits->nr; i++) {
+		struct commit *c = info->commits->list[i];
+		timestamp_t gen;
+		repo_parse_commit(info->r, c);
+		gen = info->get_generation(c, info->data);
 
-		repo_parse_commit(ctx->r, c);
-		level = *topo_level_slab_at(ctx->topo_levels, c);
+		display_progress(info->progress, info->progress_cnt + 1);
 
-		display_progress(ctx->progress, i + 1);
-		if (level != GENERATION_NUMBER_ZERO)
+		if (gen != GENERATION_NUMBER_ZERO && gen != GENERATION_NUMBER_INFINITY)
 			continue;
 
 		commit_list_insert(c, &list);
@@ -1471,38 +1500,91 @@ static void compute_topological_levels(struct write_commit_graph_context *ctx)
 			struct commit *current = list->item;
 			struct commit_list *parent;
 			int all_parents_computed = 1;
-			uint32_t max_level = 0;
+			uint32_t max_gen = 0;
 
 			for (parent = current->parents; parent; parent = parent->next) {
-				repo_parse_commit(ctx->r, parent->item);
-				level = *topo_level_slab_at(ctx->topo_levels, parent->item);
+				repo_parse_commit(info->r, parent->item);
+				gen = info->get_generation(parent->item, info->data);
 
-				if (level == GENERATION_NUMBER_ZERO) {
+				if (gen == GENERATION_NUMBER_ZERO) {
 					all_parents_computed = 0;
 					commit_list_insert(parent->item, &list);
 					break;
 				}
 
-				if (level > max_level)
-					max_level = level;
+				if (gen > max_gen)
+					max_gen = gen;
 			}
 
 			if (all_parents_computed) {
 				pop_commit(&list);
-
-				if (max_level > GENERATION_NUMBER_V1_MAX - 1)
-					max_level = GENERATION_NUMBER_V1_MAX - 1;
-				*topo_level_slab_at(ctx->topo_levels, current) = max_level + 1;
+				gen = compute_generation_from_max(
+						current, max_gen,
+						generation_version);
+				info->set_generation(current, gen, info->data);
 			}
 		}
 	}
+}
+
+static timestamp_t get_topo_level(struct commit *c, void *data)
+{
+	struct write_commit_graph_context *ctx = data;
+	return *topo_level_slab_at(ctx->topo_levels, c);
+}
+
+static void set_topo_level(struct commit *c, timestamp_t t, void *data)
+{
+	struct write_commit_graph_context *ctx = data;
+	*topo_level_slab_at(ctx->topo_levels, c) = (uint32_t)t;
+	display_progress(ctx->progress, ctx->progress_cnt + 1);
+}
+
+static void compute_topological_levels(struct write_commit_graph_context *ctx)
+{
+	struct compute_generation_info info = {
+		.r = ctx->r,
+		.progress = ctx->progress,
+		.commits = &ctx->commits,
+		.get_generation = get_topo_level,
+		.set_generation = set_topo_level,
+		.data = ctx,
+	};
+
+	if (ctx->report_progress)
+		ctx->progress = start_delayed_progress(
+					_("Computing commit graph topological levels"),
+					ctx->commits.nr);
+
+	compute_reachable_generation_numbers_1(&info, 1);
+
 	stop_progress(&ctx->progress);
 }
 
+static timestamp_t get_generation_from_graph_data(struct commit *c, void *data)
+{
+	return commit_graph_data_at(c)->generation;
+}
+
+static void set_generation_v2(struct commit *c, timestamp_t t, void *data)
+{
+	struct write_commit_graph_context *ctx = data;
+	struct commit_graph_data *g = commit_graph_data_at(c);
+	g->generation = (uint32_t)t;
+	display_progress(ctx->progress, ctx->progress_cnt + 1);
+}
+
 static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 {
 	int i;
-	struct commit_list *list = NULL;
+	struct compute_generation_info info = {
+		.r = ctx->r,
+		.progress = ctx->progress,
+		.commits = &ctx->commits,
+		.get_generation = get_generation_from_graph_data,
+		.set_generation = set_generation_v2,
+		.data = ctx,
+	};
 
 	if (ctx->report_progress)
 		ctx->progress = start_delayed_progress(
@@ -1517,47 +1599,7 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 		}
 	}
 
-	for (i = 0; i < ctx->commits.nr; i++) {
-		struct commit *c = ctx->commits.list[i];
-		timestamp_t corrected_commit_date;
-
-		repo_parse_commit(ctx->r, c);
-		corrected_commit_date = commit_graph_data_at(c)->generation;
-
-		display_progress(ctx->progress, i + 1);
-		if (corrected_commit_date != GENERATION_NUMBER_ZERO)
-			continue;
-
-		commit_list_insert(c, &list);
-		while (list) {
-			struct commit *current = list->item;
-			struct commit_list *parent;
-			int all_parents_computed = 1;
-			timestamp_t max_corrected_commit_date = 0;
-
-			for (parent = current->parents; parent; parent = parent->next) {
-				repo_parse_commit(ctx->r, parent->item);
-				corrected_commit_date = commit_graph_data_at(parent->item)->generation;
-
-				if (corrected_commit_date == GENERATION_NUMBER_ZERO) {
-					all_parents_computed = 0;
-					commit_list_insert(parent->item, &list);
-					break;
-				}
-
-				if (corrected_commit_date > max_corrected_commit_date)
-					max_corrected_commit_date = corrected_commit_date;
-			}
-
-			if (all_parents_computed) {
-				pop_commit(&list);
-
-				if (current->date && current->date > max_corrected_commit_date)
-					max_corrected_commit_date = current->date - 1;
-				commit_graph_data_at(current)->generation = max_corrected_commit_date + 1;
-			}
-		}
-	}
+	compute_reachable_generation_numbers_1(&info, 2);
 
 	for (i = 0; i < ctx->commits.nr; i++) {
 		struct commit *c = ctx->commits.list[i];
@@ -1565,6 +1607,7 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 		if (offset > GENERATION_NUMBER_V2_OFFSET_MAX)
 			ctx->num_generation_data_overflows++;
 	}
+
 	stop_progress(&ctx->progress);
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 5/8] commit-graph: return generation from memory
  2023-03-06 14:06 [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2023-03-06 14:06 ` [PATCH 4/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
@ 2023-03-06 14:06 ` Derrick Stolee via GitGitGadget
  2023-03-06 14:06 ` [PATCH 6/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-06 14:06 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The commit_graph_generation() method used to report a value of
GENERATION_NUMBER_INFINITY if the commit_graph_data_slab had an instance
for the given commit but the graph_pos indicated the commit was not in
the commit-graph file.

Instead, trust the 'generation' member if the commit has a value in the
slab _and_ the 'generation' member is non-zero. Otherwise, treat it as
GENERATION_NUMBER_INFINITY.

This only makes a difference for a very old case for the commit-graph:
the very first Git release to write commit-graph files wrote zeroes in
the topological level positions. If we are parsing a commit-graph with
all zeroes, those commits will now appear to have
GENERATION_NUMBER_INFINITY (as if they were not parsed from the
commit-graph).

I attempted several variations to work around the need for providing an
uninitialized 'generation' member, but this was the best one I found. It
does require a change to a verification test in t5318 because it reports
a different error than the one about non-zero generation numbers.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-graph.c          | 9 ++++-----
 t/t5318-commit-graph.sh | 2 +-
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index deccf984a0d..f04b02be1bb 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -111,17 +111,16 @@ uint32_t commit_graph_position(const struct commit *c)
 	return data ? data->graph_pos : COMMIT_NOT_FROM_GRAPH;
 }
 
+
 timestamp_t commit_graph_generation(const struct commit *c)
 {
 	struct commit_graph_data *data =
 		commit_graph_data_slab_peek(&commit_graph_data_slab, c);
 
-	if (!data)
-		return GENERATION_NUMBER_INFINITY;
-	else if (data->graph_pos == COMMIT_NOT_FROM_GRAPH)
-		return GENERATION_NUMBER_INFINITY;
+	if (data && data->generation)
+		return data->generation;
 
-	return data->generation;
+	return GENERATION_NUMBER_INFINITY;
 }
 
 static struct commit_graph_data *commit_graph_data_at(const struct commit *c)
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 049c5fc8ead..b6e12115786 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -630,7 +630,7 @@ test_expect_success 'detect incorrect generation number' '
 
 test_expect_success 'detect incorrect generation number' '
 	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\01" \
-		"non-zero generation number"
+		"commit-graph generation for commit"
 '
 
 test_expect_success 'detect incorrect commit date' '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 6/8] commit-graph: introduce `ensure_generations_valid()`
  2023-03-06 14:06 [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2023-03-06 14:06 ` [PATCH 5/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
@ 2023-03-06 14:06 ` Taylor Blau via GitGitGadget
  2023-03-06 18:52   ` Junio C Hamano
  2023-03-06 14:06 ` [PATCH 7/8] ahead-behind: implement ahead_behind() logic Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 90+ messages in thread
From: Taylor Blau via GitGitGadget @ 2023-03-06 14:06 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, Derrick Stolee, Taylor Blau

From: Taylor Blau <me@ttaylorr.com>

Use the just-introduced compute_reachable_generation_numbers_1() to
implement a function which dynamically computes topological levels (or
corrected commit dates) for out-of-graph commits.

This will be useful for the ahead-behind algorithm we are about to
introduce, which needs accurate topological levels on _all_ commits
reachable from the tips in order to avoid over-counting.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-graph.c | 29 +++++++++++++++++++++++++++++
 commit-graph.h |  7 +++++++
 2 files changed, 36 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index f04b02be1bb..a573d1b89ff 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1610,6 +1610,35 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 	stop_progress(&ctx->progress);
 }
 
+static void set_generation_in_graph_data(struct commit *c, timestamp_t t,
+					 void *data)
+{
+	commit_graph_data_at(c)->generation = t;
+}
+
+/*
+ * After this method, all commits reachable from those in the given
+ * list will have non-zero, non-infinite generation numbers.
+ */
+void ensure_generations_valid(struct commit **commits, size_t nr)
+{
+	struct repository *r = the_repository;
+	int generation_version = get_configured_generation_version(r);
+	struct packed_commit_list list = {
+		.list = commits,
+		.alloc = nr,
+		.nr = nr,
+	};
+	struct compute_generation_info info = {
+		.r = r,
+		.commits = &list,
+		.get_generation = get_generation_from_graph_data,
+		.set_generation = set_generation_in_graph_data,
+	};
+
+	compute_reachable_generation_numbers_1(&info, generation_version);
+}
+
 static void trace2_bloom_filter_write_statistics(struct write_commit_graph_context *ctx)
 {
 	trace2_data_intmax("commit-graph", ctx->r, "filter-computed",
diff --git a/commit-graph.h b/commit-graph.h
index 37faee6b66d..a529c62b518 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -190,4 +190,11 @@ struct commit_graph_data {
  */
 timestamp_t commit_graph_generation(const struct commit *);
 uint32_t commit_graph_position(const struct commit *);
+
+/*
+ * After this method, all commits reachable from those in the given
+ * list will have non-zero, non-infinite generation numbers.
+ */
+void ensure_generations_valid(struct commit **commits, size_t nr);
+
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 7/8] ahead-behind: implement ahead_behind() logic
  2023-03-06 14:06 [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2023-03-06 14:06 ` [PATCH 6/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
@ 2023-03-06 14:06 ` Derrick Stolee via GitGitGadget
  2023-03-07  1:05   ` Taylor Blau
  2023-03-06 14:06 ` [PATCH 8/8] ahead-behind: add --contains mode Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-06 14:06 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Fully implement the commit-counting logic behind the ahead-behind
builtin as a new ahead_behind() method in commit-reach.h. Add tests
for the functionality in both t4218-ahead-behind.sh and
t6600-test-reach.sh. The tests in t4218 are rather simple, but cover a
simple diamond commit history completely while the tests in t6600 make
use of the more complicated commit history and the test setup to check
three repository states: no commit-graph, a complete commit-graph, and a
half-filled commit-graph. These extra states are particularly helpful to
check due to the implementation of ahead_behind() and how it relies upon
ensure_generations_valid().

The interface for ahead_behind() uses two arrays. The first array of
commits contains the list of all starting points for the walk. This
includes all tip commits _and_ base commits. The second array, using the
new ahead_behind_count struct, indicates which commits from that initial
array form the base/tip pair for the ahead/behind count it will store.

While the ahead-behind builtin currently only supports one base, this
implementation of ahead_behind() allows multiple bases, if desired. Even
with multiple bases, there is only one commit walk used for counting the
ahead/behind values, saving time when the base/tip ranges overlap
significantly.

This interface for ahead_behind() also makes it very easy to call
ensure_generations_valid() on the entire array of bases and tips. This
call is necessary because it is critical that the walk that counts
ahead/behind values never walks a commit more than once. Without
generation numbers on every commit, there is a possibility that a
commit date skew could cause the walk to revisit a commit and then
double-count it. For this reason, it is strongly recommended that 'git
ahead-behind' is only run in a repository with a commit-graph file that
covers most of the reachable commits, storing precomputed generation
numbers. If no commit-graph exists, this walk will be much slower as it
must walk all reachable commits in ensure_generations_valid() before
performing the counting logic.

It is possible to detect if generation numbers are available at run time
and redirect the implementation to another algorithm that does not
require this property. However, that implementation requires a commit
walk per base/tip pair _and_ can be slower due to the commit date
heuristics required. Such an implementation could be considered in the
future if there is a reason to include it, but most Git hosts should
already be generating a commit-graph file as part of repository
maintenance. Most Git clients should also be generating commit-graph
files as part of background maintenance or automatic GCs.

Now, let's discuss the ahead/behind counting algorithm.

Each commit in the input commit list is associated with a bit position
indicating "the ith commit can reach this commit". Each of these commits
is associated with a bitmap with its position flipped on and then
placed in a queue for walking commit history. We walk commits by popping
the commit with maximum generation number out of the queue, guaranteeing
that we will never walk a child of that commit in any future steps.

As we walk, we load the bitmap for the current commit and perform two
main steps. The _second_ step examines each parent of the current commit
and adds the current commit's bitmap bits to each parent's bitmap. (We
create a new bitmap for the parent if this is our first time seeing that
parent.) After adding the bits to the parent's bitmap, the parent is
added to the walk queue. Due to this passing of bits to parents, the
current commit has a guarantee that the ith bit is enabled on its bitmap
if and only if the ith commit can reach the current commit.

The first step of the walk is to examine the bitmask on the current
commit and decide which ranges the commit is in or not. Due to the "bit
pushing" in the second step, we have a guarantee that the ith bit of the
current commit's bitmap is on if and only if the ith starting commit can
reach it. For each ahead_behind_count struct, check the base_index and
tip_index to see if those bits are enabled on the current bitmap. If
exactly one bit is enabled, then increment the corresponding 'ahead' or
'behind' count.  This increment is the reason we _absolutely need_ to
walk commits at most once.

The only subtle thing to do with this walk is to check to see if a
parent has all bits on in its bitmap, in which case it becomes "stale"
and is marked with the STALE bit. This allows queue_has_nonstale() to be
the terminating condition of the walk, which greatly reduces the number
of commits walked if all of the commits are nearby in history. It avoids
walking a large number of common commits when there is a deep history.
We also use the helper method insert_no_dup() to add commits to the
priority queue without adding them multiple times. This uses the PARENT2
flag. Thus, we must clear both the STALE and PARENT2 bits of all
commits, in case ahead_behind() is called multiple times in the same
process.

There is no previous implementation of ahead-behind to compare against.
A previous implementation in another fork of Git used a single process
to essentially do the same walk as 'git rev-list --count <base>..<tip>'
for every base/tip pair given as input. The single-walk implementation
in this change was a significant improvement over that implementation.
Another version from that fork used reachability bitmaps for the
comparison, but that implementation was slower than the current commit
walk implementation in almost all cases.

To best present _some_ amount of evidence for this performance gain,
create a new performance test, p1500-graph-walks.sh. This script could
be used for other walks than just ahead-behind in the future, but let's
limit to ahead-behind now.

To gain some amount of a baseline, create one test that checks 'git
ahead-behind' against up to 50 tips and another that uses 'git rev-list
--count' in a loop. Be sure to write a commit-graph before running the
performance tests.

Using the Git source code as the repository, we see a pronounced
improvement:

Test                                            this tree
---------------------------------------------------------------
1500.2: ahead-behind counts: git ahead-behind   0.08(0.07+0.01)
1500.3: ahead-behind counts: git rev-list       1.11(0.92+0.18)

But the de-facto performance benchmark is the Linux kernel repository,
which presents these values for my copy:

Test                                            this tree
---------------------------------------------------------------
1500.2: ahead-behind counts: git ahead-behind   0.27(0.25+0.02)
1500.3: ahead-behind counts: git rev-list       4.53(3.92+0.60)

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/ahead-behind.c      | 23 +++++++++
 commit-reach.c              | 95 +++++++++++++++++++++++++++++++++++++
 commit-reach.h              | 30 ++++++++++++
 t/perf/p1500-graph-walks.sh | 25 ++++++++++
 t/t4218-ahead-behind.sh     | 67 ++++++++++++++++++++++++++
 t/t6600-test-reach.sh       | 62 ++++++++++++++++++++++++
 6 files changed, 302 insertions(+)
 create mode 100755 t/perf/p1500-graph-walks.sh

diff --git a/builtin/ahead-behind.c b/builtin/ahead-behind.c
index e4f65fc0548..c06b95b5f37 100644
--- a/builtin/ahead-behind.c
+++ b/builtin/ahead-behind.c
@@ -2,6 +2,7 @@
 #include "parse-options.h"
 #include "config.h"
 #include "commit.h"
+#include "commit-reach.h"
 
 static const char * const ahead_behind_usage[] = {
 	N_("git ahead-behind --base=<ref> [ --stdin | <revs> ]"),
@@ -29,8 +30,12 @@ static int handle_arg(struct string_list *tips, const char *arg)
 int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
 {
 	const char *base_ref = NULL;
+	struct commit *base;
 	int from_stdin = 0;
 	struct string_list tips = STRING_LIST_INIT_DUP;
+	struct commit **commits;
+	struct ahead_behind_count *counts;
+	size_t i;
 
 	struct option ahead_behind_opts[] = {
 		OPT_STRING('b', "base", &base_ref, N_("base"), N_("base reference to process")),
@@ -71,5 +76,23 @@ int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
 	if (!tips.nr)
 		return 0;
 
+	ALLOC_ARRAY(commits, tips.nr + 1);
+	ALLOC_ARRAY(counts, tips.nr);
+
+	for (i = 0; i < tips.nr; i++) {
+		commits[i] = tips.items[i].util;
+		counts[i].tip_index = i;
+		counts[i].base_index = tips.nr;
+	}
+	commits[tips.nr] = base;
+
+	ahead_behind(commits, tips.nr + 1, counts, tips.nr);
+
+	for (i = 0; i < tips.nr; i++)
+		printf("%s %d %d\n", tips.items[i].string,
+		       counts[i].ahead, counts[i].behind);
+
+	free(counts);
+	free(commits);
 	return 0;
 }
diff --git a/commit-reach.c b/commit-reach.c
index 2e33c599a82..87ccc2cd4f5 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -8,6 +8,7 @@
 #include "revision.h"
 #include "tag.h"
 #include "commit-reach.h"
+#include "ewah/ewok.h"
 
 /* Remember to update object flag allocation in object.h */
 #define PARENT1		(1u<<16)
@@ -941,3 +942,97 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 
 	return found_commits;
 }
+
+define_commit_slab(bit_arrays, struct bitmap *);
+static struct bit_arrays bit_arrays;
+
+static void insert_no_dup(struct prio_queue *queue, struct commit *c)
+{
+	if (c->object.flags & PARENT2)
+		return;
+	prio_queue_put(queue, c);
+	c->object.flags |= PARENT2;
+}
+
+static struct bitmap *init_bit_array(struct commit *c, int width)
+{
+	struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c);
+	if (!*bitmap)
+		*bitmap = bitmap_word_alloc(width);
+	return *bitmap;
+}
+
+static void free_bit_array(struct commit *c)
+{
+	struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c);
+	if (!*bitmap)
+		return;
+	bitmap_free(*bitmap);
+	*bitmap = NULL;
+}
+
+void ahead_behind(struct commit **commits, size_t commits_nr,
+		  struct ahead_behind_count *counts, size_t counts_nr)
+{
+	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+	size_t width = (commits_nr + BITS_IN_EWORD - 1) / BITS_IN_EWORD;
+	size_t i;
+
+	if (!commits_nr || !counts_nr)
+		return;
+
+	for (i = 0; i < counts_nr; i++) {
+		counts[i].ahead = 0;
+		counts[i].behind = 0;
+	}
+
+	ensure_generations_valid(commits, commits_nr);
+
+	init_bit_arrays(&bit_arrays);
+
+	for (i = 0; i < commits_nr; i++) {
+		struct commit *c = commits[i];
+		struct bitmap *bitmap = init_bit_array(c, width);
+
+		bitmap_set(bitmap, i);
+		insert_no_dup(&queue, c);
+	}
+
+	while (queue_has_nonstale(&queue)) {
+		struct commit *c = prio_queue_get(&queue);
+		struct commit_list *p;
+		struct bitmap *bitmap_c = init_bit_array(c, width);
+
+		for (i = 0; i < counts_nr; i++) {
+			int reach_from_tip = bitmap_get(bitmap_c, counts[i].tip_index);
+			int reach_from_base = bitmap_get(bitmap_c, counts[i].base_index);
+
+			if (reach_from_tip ^ reach_from_base) {
+				if (reach_from_base)
+					counts[i].behind++;
+				else
+					counts[i].ahead++;
+			}
+		}
+
+		for (p = c->parents; p; p = p->next) {
+			struct bitmap *bitmap_p;
+
+			parse_commit(p->item);
+
+			bitmap_p = init_bit_array(p->item, width);
+			bitmap_or(bitmap_p, bitmap_c);
+
+			if (bitmap_popcount(bitmap_p) == commits_nr)
+				p->item->object.flags |= STALE;
+
+			insert_no_dup(&queue, p->item);
+		}
+
+		free_bit_array(c);
+	}
+
+	repo_clear_commit_marks(the_repository, PARENT2 | STALE);
+	clear_bit_arrays(&bit_arrays);
+	clear_prio_queue(&queue);
+}
diff --git a/commit-reach.h b/commit-reach.h
index 148b56fea50..1780f9317bf 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -104,4 +104,34 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 					 struct commit **to, int nr_to,
 					 unsigned int reachable_flag);
 
+struct ahead_behind_count {
+	/**
+	 * As input, the *_index members indicate which positions in
+	 * the 'tips' array correspond to the tip and base of this
+	 * comparison.
+	 */
+	size_t tip_index;
+	size_t base_index;
+
+	/**
+	 * These values store the computed counts for each side of the
+	 * symmetric difference:
+	 *
+	 * 'ahead' stores the number of commits reachable from the tip
+	 * and not reachable from the base.
+	 *
+	 * 'behind' stores the number of commits reachable from the base
+	 * and not reachable from the tip.
+	 */
+	int ahead;
+	int behind;
+};
+
+/**
+ * Given an array of commits and an array of ahead_behind_count pairs,
+ * compute the ahead/behind counts for each pair.
+ */
+void ahead_behind(struct commit **commits, size_t commits_nr,
+		  struct ahead_behind_count *counts, size_t counts_nr);
+
 #endif
diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
new file mode 100755
index 00000000000..c9ac4b7e6e2
--- /dev/null
+++ b/t/perf/p1500-graph-walks.sh
@@ -0,0 +1,25 @@
+#!/bin/sh
+
+test_description='Commit walk performance tests'
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_expect_success 'setup' '
+	git for-each-ref --format="%(refname)" "refs/heads/*" "refs/tags/*" >allrefs &&
+	sort -r allrefs | head -n 50 >refs &&
+	git commit-graph write --reachable
+'
+
+test_perf 'ahead-behind counts: git ahead-behind' '
+	git ahead-behind --base=HEAD --stdin <refs
+'
+
+test_perf 'ahead-behind counts: git rev-list' '
+	for r in $(cat refs)
+	do
+		git rev-list --count "HEAD..$r" || return 1
+	done
+'
+
+test_done
diff --git a/t/t4218-ahead-behind.sh b/t/t4218-ahead-behind.sh
index 56f16515896..6658c919fdf 100755
--- a/t/t4218-ahead-behind.sh
+++ b/t/t4218-ahead-behind.sh
@@ -4,6 +4,16 @@ test_description='git ahead-behind command-line options'
 
 . ./test-lib.sh
 
+test_expect_success 'setup simple history' '
+	test_commit base &&
+	git checkout -b right &&
+	test_commit right &&
+	git checkout -b left base &&
+	test_commit left &&
+	git checkout -b merge &&
+	git merge right -m "merge"
+'
+
 test_expect_success 'git ahead-behind -h' '
 	test_must_fail git ahead-behind -h >out &&
 	grep "usage:" out
@@ -14,6 +24,11 @@ test_expect_success 'git ahead-behind without --base' '
 	grep "usage:" err
 '
 
+test_expect_success 'git ahead-behind with broken --base' '
+	test_must_fail git ahead-behind --base=bogus HEAD 2>err &&
+	grep "could not resolve '\''bogus'\''" err
+'
+
 test_expect_success 'git ahead-behind with broken tip' '
 	test_must_fail git ahead-behind --base=HEAD bogus 2>err &&
 	grep "could not resolve '\''bogus'\''" err
@@ -30,4 +45,56 @@ test_expect_success 'git ahead-behind without tips' '
 	test_must_be_empty err
 '
 
+test_expect_success 'git ahead-behind --base=base' '
+	git ahead-behind --base=base base left right merge >actual &&
+
+	cat >expect <<-EOF &&
+	base 0 0
+	left 1 0
+	right 1 0
+	merge 3 0
+	EOF
+
+	test_cmp expect actual
+'
+
+test_expect_success 'git ahead-behind --base=left' '
+	git ahead-behind --base=left base left right merge >actual &&
+
+	cat >expect <<-EOF &&
+	base 0 1
+	left 0 0
+	right 1 1
+	merge 2 0
+	EOF
+
+	test_cmp expect actual
+'
+
+test_expect_success 'git ahead-behind --base=right' '
+	git ahead-behind --base=right base left right merge >actual &&
+
+	cat >expect <<-EOF &&
+	base 0 1
+	left 1 1
+	right 0 0
+	merge 2 0
+	EOF
+
+	test_cmp expect actual
+'
+
+test_expect_success 'git ahead-behind --base=merge' '
+	git ahead-behind --base=merge base left right merge >actual &&
+
+	cat >expect <<-EOF &&
+	base 0 3
+	left 0 2
+	right 0 2
+	merge 0 0
+	EOF
+
+	test_cmp expect actual
+'
+
 test_done
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 338a9c46a24..951e07100f6 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -443,4 +443,66 @@ test_expect_success 'get_reachable_subset:none' '
 	test_all_modes get_reachable_subset
 '
 
+test_expect_success 'ahead-behind:linear' '
+	cat >input <<-\EOF &&
+	commit-1-1
+	commit-1-3
+	commit-1-5
+	commit-1-8
+	EOF
+	cat >expect <<-\EOF &&
+	commit-1-1 0 8
+	commit-1-3 0 6
+	commit-1-5 0 4
+	commit-1-8 0 1
+	EOF
+	run_all_modes git ahead-behind --base=commit-1-9 --stdin
+'
+
+test_expect_success 'ahead-behind:all' '
+	cat >input <<-\EOF &&
+	commit-1-1
+	commit-2-4
+	commit-4-2
+	commit-4-4
+	EOF
+	cat >expect <<-\EOF &&
+	commit-1-1 0 24
+	commit-2-4 0 17
+	commit-4-2 0 17
+	commit-4-4 0 9
+	EOF
+	run_all_modes git ahead-behind --base=commit-5-5 --stdin
+'
+
+test_expect_success 'ahead-behind:some' '
+	cat >input <<-\EOF &&
+	commit-1-1
+	commit-5-3
+	commit-4-8
+	commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	commit-1-1 0 53
+	commit-5-3 0 39
+	commit-4-8 8 30
+	commit-9-9 27 0
+	EOF
+	run_all_modes git ahead-behind --base=commit-9-6 --stdin
+'
+
+test_expect_success 'ahead-behind:none' '
+	cat >input <<-\EOF &&
+	commit-7-5
+	commit-4-8
+	commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	commit-7-5 7 4
+	commit-4-8 16 16
+	commit-9-9 49 0
+	EOF
+	run_all_modes git ahead-behind --base=commit-8-4 --stdin
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH 8/8] ahead-behind: add --contains mode
  2023-03-06 14:06 [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Derrick Stolee via GitGitGadget
                   ` (6 preceding siblings ...)
  2023-03-06 14:06 ` [PATCH 7/8] ahead-behind: implement ahead_behind() logic Derrick Stolee via GitGitGadget
@ 2023-03-06 14:06 ` Derrick Stolee via GitGitGadget
  2023-03-06 18:26 ` [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Junio C Hamano
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-06 14:06 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The 'git ahead-behind' builtin can answer a list of questions that do
not require the full ahead/behind counts. For example, instead of using
'git branch --contains=<ref>' to get the full list of branches that
contain the commit at <ref>, a list of tips could be passed to 'git
ahead-behind --base=<ref>' and the rows that report a "behind" value of
zero show which tips can reach <ref>.

By contract, the rows that report an "ahead" value of zero show which
tips are reachable from the given base. This type of query does not have
an existing equivalent for batching this request. While extracting the
information from 'git ahead-behind' is not terribly difficult, it does
more work than required to answer this query: it _counts_.

Add a new '--contains' mode to 'git ahead-behind' that removes the
counting behavior and focuses instead on the reachability concern. The
output of the builtin changes in this mode: instead of reporting "<tip>
<ahead> <behind>" for every input tip, it will instead report "<tip>"
for the input tips that are reachable from the specified base.

The algorithm is implemented in commit-reach.c in the method
tips_reachable_from_base(). This method takes a string_list of tips and
assigns the 'util' for each item with the value 1 if the base commit can
reach those tips.

Like other reachability queries in commit-reach.c, the fastest way to
search for "can A reach B?" is to do a depth-first search up to the
generation number of B, preferring to explore first parents before later
parents. While we must walk all reachable commits up to that generation
number when the answer is "no", the depth-first search can answer "yes"
much faster than other approaches in most cases.

This search becomes trickier when there are multiple targets for the
depth-first search. The commits with lower generation number are more
likely to be within the history of the start commit, but we don't want
to waste time searching commits of low generation number if the commit
target with lowest generation number has already been found.

The trick here is to take the input commits and sort them by generation
number in ascending order. Track the index within this order as
min_generation_index. When we find a commit, if its index in the list is
equal to min_generation_index, then we can increase the generation
number boundary of our search to the next-lowest value in the list.

With this mechanism, the number of commits to search is minimized with
respect to the depth-first search heuristic. We will walk all commits up
to the minimum generation number of a commit that is _not_ reachable
from the start, but we will walk only the necessary portion of the
depth-first search for the reachable commits of lower generation.

We can test this completely in the simple repo example in t4218 and more
substantially in the larger repository example in t6600. We can also add
a performance test to demonstrate the speedup relative to the 'git
ahead-behind' builtin without the '--contains' option.

For the Git source code repository, I was able to measure a speedup,
even though both are quite fast.

Test                                                       this tree
--------------------------------------------------------------------------
1500.2: ahead-behind counts: git ahead-behind              0.06(0.06+0.00)
1500.3: ahead-behind counts: git rev-list                  1.08(0.90+0.18)
1500.4: ahead-behind counts: git ahead-behind --contains   0.02(0.02+0.00)

In the Linux kernel repository, the impact is more pronounced:

Test                                                       this tree
--------------------------------------------------------------------------
1500.2: ahead-behind counts: git ahead-behind              0.26(0.25+0.01)
1500.3: ahead-behind counts: git rev-list                  4.58(3.92+0.66)
1500.4: ahead-behind counts: git ahead-behind --contains   0.02(0.00+0.02)

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-ahead-behind.txt |  12 +++-
 builtin/ahead-behind.c             |  27 ++++++-
 commit-reach.c                     | 110 +++++++++++++++++++++++++++++
 commit-reach.h                     |   7 ++
 t/perf/p1500-graph-walks.sh        |   4 ++
 t/t4218-ahead-behind.sh            |  62 ++++++++++++++++
 t/t6600-test-reach.sh              |  58 +++++++++++++++
 7 files changed, 277 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-ahead-behind.txt b/Documentation/git-ahead-behind.txt
index 2dd5147f6b2..1652a51d719 100644
--- a/Documentation/git-ahead-behind.txt
+++ b/Documentation/git-ahead-behind.txt
@@ -8,7 +8,7 @@ git-ahead-behind - Count the commits on each side of a revision range
 SYNOPSIS
 --------
 [verse]
-'git ahead-behind' --base=<ref> [ --stdin | <revs> ]
+'git ahead-behind' --base=<ref> [ --contains ] [ --stdin | <revs> ]
 
 DESCRIPTION
 -----------
@@ -39,6 +39,9 @@ reported to stdout one line at a time as follows:
 There will be exactly one line per input revision, but the lines may be
 in an arbitrary order.
 
+If the `--contains` option is provided, then the output will list the
+`<tip>` refs are reachable from the provided `<base>`, one per line.
+
 
 OPTIONS
 -------
@@ -50,6 +53,13 @@ OPTIONS
 	Read revision tips and ranges from stdin instead of from the
 	command-line.
 
+--contains::
+	Specify that instead of counting the ahead/behind values, only
+	indicate whether each tip reference is reachable from the base. In
+	this mode, the output format changes to include only the name of
+	each tip by name, one per line, and only the tips reachable from
+	the base are included in the output.
+
 --ignore-missing::
 	When parsing tip references, ignore any references that are not
 	found. This is useful when operating in an environment where a
diff --git a/builtin/ahead-behind.c b/builtin/ahead-behind.c
index c06b95b5f37..4efd324d5d9 100644
--- a/builtin/ahead-behind.c
+++ b/builtin/ahead-behind.c
@@ -5,7 +5,7 @@
 #include "commit-reach.h"
 
 static const char * const ahead_behind_usage[] = {
-	N_("git ahead-behind --base=<ref> [ --stdin | <revs> ]"),
+	N_("git ahead-behind --base=<ref> [ --contains ] [ --stdin | <revs> ]"),
 	NULL
 };
 
@@ -31,7 +31,7 @@ int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
 {
 	const char *base_ref = NULL;
 	struct commit *base;
-	int from_stdin = 0;
+	int from_stdin = 0, contains = 0;
 	struct string_list tips = STRING_LIST_INIT_DUP;
 	struct commit **commits;
 	struct ahead_behind_count *counts;
@@ -41,6 +41,7 @@ int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
 		OPT_STRING('b', "base", &base_ref, N_("base"), N_("base reference to process")),
 		OPT_BOOL(0 , "stdin", &from_stdin, N_("read rev names from stdin")),
 		OPT_BOOL(0 , "ignore-missing", &ignore_missing, N_("ignore missing tip references")),
+		OPT_BOOL(0 , "contains", &contains, N_("only check that tips are reachable from the base")),
 		OPT_END()
 	};
 
@@ -52,6 +53,10 @@ int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
 
 	git_config(git_default_config, NULL);
 
+	base = lookup_commit_reference_by_name(base_ref);
+	if (!base)
+		die(_("could not resolve '%s'"), base_ref);
+
 	if (from_stdin) {
 		struct strbuf line = STRBUF_INIT;
 
@@ -76,6 +81,24 @@ int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
 	if (!tips.nr)
 		return 0;
 
+	if (contains) {
+		struct string_list_item *item;
+
+		/* clear out */
+		for_each_string_list_item(item, &tips)
+			item->util = NULL;
+
+		tips_reachable_from_base(base, &tips);
+
+		for_each_string_list_item(item, &tips) {
+			if (item->util)
+				printf("%s\n", item->string);
+		}
+
+		return 0;
+	}
+	/* else: not --contains, but normal ahead-behind counting. */
+
 	ALLOC_ARRAY(commits, tips.nr + 1);
 	ALLOC_ARRAY(counts, tips.nr);
 
diff --git a/commit-reach.c b/commit-reach.c
index 87ccc2cd4f5..a7a2c045551 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1036,3 +1036,113 @@ void ahead_behind(struct commit **commits, size_t commits_nr,
 	clear_bit_arrays(&bit_arrays);
 	clear_prio_queue(&queue);
 }
+
+struct commit_and_index {
+	struct commit *commit;
+	unsigned int index;
+	timestamp_t generation;
+};
+
+static int compare_commit_and_index_by_generation(const void *va, const void *vb)
+{
+	const struct commit_and_index *a = (const struct commit_and_index *)va;
+	const struct commit_and_index *b = (const struct commit_and_index *)vb;
+
+	if (a->generation > b->generation)
+		return 1;
+	if (a->generation < b->generation)
+		return -1;
+	return 0;
+}
+
+void tips_reachable_from_base(struct commit *base,
+			      struct string_list *tips)
+{
+	unsigned int i;
+	struct commit_and_index *commits;
+	unsigned int min_generation_index = 0;
+	timestamp_t min_generation;
+	struct commit_list *stack = NULL;
+
+	if (!base || !tips || !tips->nr)
+		return;
+
+	/*
+	 * Do a depth-first search starting at 'base' to search for the
+	 * tips. Stop at the lowest (un-found) generation number. When
+	 * finding the lowest commit, increase the minimum generation
+	 * number to the next lowest (un-found) generation number.
+	 */
+
+	CALLOC_ARRAY(commits, tips->nr);
+
+	for (i = 0; i < tips->nr; i++) {
+		commits[i].commit = lookup_commit_reference_by_name(tips->items[i].string);
+		commits[i].index = i;
+		commits[i].generation = commit_graph_generation(commits[i].commit);
+	}
+
+	/* Sort with generation number ascending. */
+	QSORT(commits, tips->nr, compare_commit_and_index_by_generation);
+	min_generation = commits[0].generation;
+
+	parse_commit(base);
+	commit_list_insert(base, &stack);
+
+	while (stack) {
+		unsigned int j;
+		int explored_all_parents = 1;
+		struct commit_list *p;
+		struct commit *c = stack->item;
+		timestamp_t c_gen = commit_graph_generation(c);
+
+		/* Does it match any of our bases? */
+		for (j = min_generation_index; j < tips->nr; j++) {
+			if (c_gen < commits[j].generation)
+				break;
+
+			if (commits[j].commit == c) {
+				tips->items[commits[j].index].util = (void *)(uintptr_t)1;
+
+				if (j == min_generation_index) {
+					unsigned int k = j + 1;
+					while (k < tips->nr &&
+					       tips->items[commits[k].index].util)
+						k++;
+
+					/* Terminate early if all found. */
+					if (k >= tips->nr)
+						goto done;
+
+					min_generation_index = k;
+					min_generation = commits[k].generation;
+				}
+			}
+		}
+
+		for (p = c->parents; p; p = p->next) {
+			parse_commit(p->item);
+
+			/* Have we already explored this parent? */
+			if (p->item->object.flags & SEEN)
+				continue;
+
+			/* Is it below the current minimum generation? */
+			if (commit_graph_generation(p->item) < min_generation)
+				continue;
+
+			/* Ok, we will explore from here on. */
+			p->item->object.flags |= SEEN;
+			explored_all_parents = 0;
+			commit_list_insert(p->item, &stack);
+			break;
+		}
+
+		if (explored_all_parents)
+			pop_commit(&stack);
+	}
+
+done:
+	free(commits);
+	repo_clear_commit_marks(the_repository, SEEN);
+}
diff --git a/commit-reach.h b/commit-reach.h
index 1780f9317bf..fa8994f5696 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -134,4 +134,11 @@ struct ahead_behind_count {
 void ahead_behind(struct commit **commits, size_t commits_nr,
 		  struct ahead_behind_count *counts, size_t counts_nr);
 
+/*
+ * Populate the "util" of each string_list item with the boolean value
+ * corresponding to "can 'base' reach this tip?"
+ */
+void tips_reachable_from_base(struct commit *base,
+			      struct string_list *tips);
+
 #endif
diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
index c9ac4b7e6e2..56de6c5b13d 100755
--- a/t/perf/p1500-graph-walks.sh
+++ b/t/perf/p1500-graph-walks.sh
@@ -22,4 +22,8 @@ test_perf 'ahead-behind counts: git rev-list' '
 	done
 '
 
+test_perf 'ahead-behind counts: git ahead-behind --contains' '
+	git ahead-behind --contains --base=HEAD --stdin <refs
+'
+
 test_done
diff --git a/t/t4218-ahead-behind.sh b/t/t4218-ahead-behind.sh
index 6658c919fdf..c333cb623f8 100755
--- a/t/t4218-ahead-behind.sh
+++ b/t/t4218-ahead-behind.sh
@@ -40,6 +40,19 @@ test_expect_success 'git ahead-behind with broken tip and --ignore-missing' '
 	test_must_be_empty out
 '
 
+test_expect_success 'git ahead-behind --contains with broken tip' '
+	test_must_fail git ahead-behind --contains \
+		--base=HEAD bogus 2>err &&
+	grep "could not resolve '\''bogus'\''" err
+'
+
+test_expect_success 'git ahead-behind --contains with broken tip and --ignore-missing' '
+	git ahead-behind --base=HEAD --contains \
+		--ignore-missing bogus 2>err >out &&
+	test_must_be_empty err &&
+	test_must_be_empty out
+'
+
 test_expect_success 'git ahead-behind without tips' '
 	git ahead-behind --base=HEAD 2>err &&
 	test_must_be_empty err
@@ -97,4 +110,53 @@ test_expect_success 'git ahead-behind --base=merge' '
 	test_cmp expect actual
 '
 
+test_expect_success 'git ahead-behind --contains --base=base' '
+	git ahead-behind --contains --base=base \
+		base left right merge >actual &&
+
+	cat >expect <<-EOF &&
+	base
+	EOF
+
+	test_cmp expect actual
+'
+
+test_expect_success 'git ahead-behind --contains --base=left' '
+	git ahead-behind --contains --base=left \
+		base left right merge >actual &&
+
+	cat >expect <<-EOF &&
+	base
+	left
+	EOF
+
+	test_cmp expect actual
+'
+
+test_expect_success 'git ahead-behind --contains --base=right' '
+	git ahead-behind --contains --base=right \
+		base left right merge >actual &&
+
+	cat >expect <<-EOF &&
+	base
+	right
+	EOF
+
+	test_cmp expect actual
+'
+
+test_expect_success 'git ahead-behind --contains --base=merge' '
+	git ahead-behind --contains --base=merge \
+		base left right merge >actual &&
+
+	cat >expect <<-EOF &&
+	base
+	left
+	right
+	merge
+	EOF
+
+	test_cmp expect actual
+'
+
 test_done
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 951e07100f6..2fdad7b3619 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -505,4 +505,62 @@ test_expect_success 'ahead-behind:none' '
 	run_all_modes git ahead-behind --base=commit-8-4 --stdin
 '
 
+test_expect_success 'ahead-behind--contains:all' '
+	cat >input <<-\EOF &&
+	commit-1-1
+	commit-2-4
+	commit-4-2
+	commit-4-4
+	EOF
+	cat >expect <<-\EOF &&
+	commit-1-1
+	commit-2-4
+	commit-4-2
+	commit-4-4
+	EOF
+	run_all_modes git ahead-behind --contains --base=commit-5-5 \
+		--stdin --use-bitmap-index
+'
+
+test_expect_success 'ahead-behind--contains:some' '
+	cat >input <<-\EOF &&
+	commit-1-1
+	commit-5-3
+	commit-4-8
+	commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	commit-1-1
+	commit-5-3
+	EOF
+	run_all_modes git ahead-behind --contains --base=commit-9-6 \
+		--stdin --use-bitmap-index
+'
+
+test_expect_success 'ahead-behind--contains:some, reordered' '
+	cat >input <<-\EOF &&
+	commit-4-8
+	commit-5-3
+	commit-9-9
+	commit-1-1
+	EOF
+	cat >expect <<-\EOF &&
+	commit-5-3
+	commit-1-1
+	EOF
+	run_all_modes git ahead-behind --contains --base=commit-9-6 \
+		--stdin --use-bitmap-index
+'
+
+test_expect_success 'ahead-behind--contains:none' '
+	cat >input <<-\EOF &&
+	commit-7-5
+	commit-4-8
+	commit-9-9
+	EOF
+	>expect &&
+	run_all_modes git ahead-behind --contains --base=commit-8-4 \
+		--stdin --use-bitmap-index
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges
  2023-03-06 14:06 [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Derrick Stolee via GitGitGadget
                   ` (7 preceding siblings ...)
  2023-03-06 14:06 ` [PATCH 8/8] ahead-behind: add --contains mode Derrick Stolee via GitGitGadget
@ 2023-03-06 18:26 ` Junio C Hamano
  2023-03-06 20:18   ` Derrick Stolee
  2023-03-07  0:36   ` Taylor Blau
  2023-03-07  0:33 ` Taylor Blau
  2023-03-10 17:20 ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
  10 siblings, 2 replies; 90+ messages in thread
From: Junio C Hamano @ 2023-03-06 18:26 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, me, vdye, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> These numbers can be computed by 'git rev-list --count B..C' and 'git
> rev-list --count C..B', but there are common needs that benefit from having
> the checks being done in the same process:

This makes readers wonder if "git rev-list --count B...C" should be
the end-user facing UI for this new feature, perhaps?

Of course if you are checking how C0, C1, C2,... relate to a single
B, the existing rev-list syntax would not work, and makes a totally
new subcommand a possibilty.

>  2. When a branch is updated, a background job checks if any pull requests
>     that target that branch should be closed because their branches were
>     merged implicitly by that update. These queries can e batched into 'git
>     ahead-behind' calls.
>
> In that second example, we don't need the full ahead/behind counts (although
> it is sufficient to look for branches that are "zero commits ahead", meaning
> they are reachable from the base), so this builtin has an extra '--contains'
> mode that only checks reachability from the base to each of the tips. 'git
> ahead-behind --contains' is sort of the reverse of 'git branch --contains'.

I thought that the reverse of "git branch --contains" was "git
branch --merged".  "git branch --merged maint ??/\*" is how I cull
topic branches that have already served their purpose.  

Isn't closing pull requests because they have been already merged
the same idea?  "git for-each-ref --merged main refs/pull/\*" or
something, perhaps?

All of the above are from only reading the cover letter.  I'm sure
I'll have more thoughts or even change my mind after reading the
patches.

Thanks.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 1/8] ahead-behind: create empty builtin
  2023-03-06 14:06 ` [PATCH 1/8] ahead-behind: create empty builtin Derrick Stolee via GitGitGadget
@ 2023-03-06 18:48   ` Junio C Hamano
  2023-03-07  0:40     ` Taylor Blau
  0 siblings, 1 reply; 90+ messages in thread
From: Junio C Hamano @ 2023-03-06 18:48 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, me, vdye, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> For example, we will be able to track all local branches relative to an
> upstream branch using an invocation such as
>
>   git for-each-ref --format=%(refname) refs/heads/* |
>     git ahead-behind --base=origin/main --stdin

Stepping back a bit, this motivating example makes me wonder if

 $ git for-each-ref --format='%(refname) %(aheadbehind)' refs/heads/\*

that computes the ahead-behind number for each ref (that matches the
pattern) based on their own "upstream" (presumably each branch is
configured to track the same, or different, upstreams), or
overrriding @{upstream}, a specified base, i.e.

 $ git for-each-ref --format='%(refname) %(aheadbehind:origin/main)' refs/heads/\*

would be a more intuitive interface to the end-users.

It would probably work well in conjunction with

    git for-each-ref --format='%(refname)' --merged origin/main refs/heads/\*

which is a way to list local branches that are already merged into
the upstream, to have the feature appear in the same command,
perhaps?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 6/8] commit-graph: introduce `ensure_generations_valid()`
  2023-03-06 14:06 ` [PATCH 6/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
@ 2023-03-06 18:52   ` Junio C Hamano
  2023-03-07  0:50     ` Taylor Blau
  0 siblings, 1 reply; 90+ messages in thread
From: Junio C Hamano @ 2023-03-06 18:52 UTC (permalink / raw)
  To: Taylor Blau via GitGitGadget; +Cc: git, me, vdye, Derrick Stolee

"Taylor Blau via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Taylor Blau <me@ttaylorr.com>
>
> Use the just-introduced compute_reachable_generation_numbers_1() to
> implement a function which dynamically computes topological levels (or
> corrected commit dates) for out-of-graph commits.
>
> This will be useful for the ahead-behind algorithm we are about to
> introduce, which needs accurate topological levels on _all_ commits
> reachable from the tips in order to avoid over-counting.

Interesting and nice to see it done with so small a change thanks to
the previous refactoring.

>
> Co-authored-by: Derrick Stolee <derrickstolee@github.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  commit-graph.c | 29 +++++++++++++++++++++++++++++
>  commit-graph.h |  7 +++++++
>  2 files changed, 36 insertions(+)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index f04b02be1bb..a573d1b89ff 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -1610,6 +1610,35 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
>  	stop_progress(&ctx->progress);
>  }
>  
> +static void set_generation_in_graph_data(struct commit *c, timestamp_t t,
> +					 void *data)
> +{
> +	commit_graph_data_at(c)->generation = t;
> +}
> +
> +/*
> + * After this method, all commits reachable from those in the given
> + * list will have non-zero, non-infinite generation numbers.
> + */
> +void ensure_generations_valid(struct commit **commits, size_t nr)
> +{
> +	struct repository *r = the_repository;
> +	int generation_version = get_configured_generation_version(r);
> +	struct packed_commit_list list = {
> +		.list = commits,
> +		.alloc = nr,
> +		.nr = nr,
> +	};
> +	struct compute_generation_info info = {
> +		.r = r,
> +		.commits = &list,
> +		.get_generation = get_generation_from_graph_data,
> +		.set_generation = set_generation_in_graph_data,
> +	};
> +
> +	compute_reachable_generation_numbers_1(&info, generation_version);
> +}
> +
>  static void trace2_bloom_filter_write_statistics(struct write_commit_graph_context *ctx)
>  {
>  	trace2_data_intmax("commit-graph", ctx->r, "filter-computed",
> diff --git a/commit-graph.h b/commit-graph.h
> index 37faee6b66d..a529c62b518 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -190,4 +190,11 @@ struct commit_graph_data {
>   */
>  timestamp_t commit_graph_generation(const struct commit *);
>  uint32_t commit_graph_position(const struct commit *);
> +
> +/*
> + * After this method, all commits reachable from those in the given
> + * list will have non-zero, non-infinite generation numbers.
> + */
> +void ensure_generations_valid(struct commit **commits, size_t nr);
> +
>  #endif

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges
  2023-03-06 18:26 ` [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Junio C Hamano
@ 2023-03-06 20:18   ` Derrick Stolee
  2023-03-06 22:24     ` Junio C Hamano
  2023-03-07  0:36   ` Taylor Blau
  1 sibling, 1 reply; 90+ messages in thread
From: Derrick Stolee @ 2023-03-06 20:18 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget; +Cc: git, me, vdye

On 3/6/2023 1:26 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> These numbers can be computed by 'git rev-list --count B..C' and 'git
>> rev-list --count C..B', but there are common needs that benefit from having
>> the checks being done in the same process:
> 
> This makes readers wonder if "git rev-list --count B...C" should be
> the end-user facing UI for this new feature, perhaps?
> 
> Of course if you are checking how C0, C1, C2,... relate to a single
> B, the existing rev-list syntax would not work, and makes a totally
> new subcommand a possibilty.
> 
>>  2. When a branch is updated, a background job checks if any pull requests
>>     that target that branch should be closed because their branches were
>>     merged implicitly by that update. These queries can e batched into 'git
>>     ahead-behind' calls.
>>
>> In that second example, we don't need the full ahead/behind counts (although
>> it is sufficient to look for branches that are "zero commits ahead", meaning
>> they are reachable from the base), so this builtin has an extra '--contains'
>> mode that only checks reachability from the base to each of the tips. 'git
>> ahead-behind --contains' is sort of the reverse of 'git branch --contains'.
> 
> I thought that the reverse of "git branch --contains" was "git
> branch --merged".  "git branch --merged maint ??/\*" is how I cull
> topic branches that have already served their purpose.  
> 
> Isn't closing pull requests because they have been already merged
> the same idea?  "git for-each-ref --merged main refs/pull/\*" or
> something, perhaps?

You are definitely on to something, and I was not aware of --merged as
an option to either of these.

'git branch --merged' has some limitations that tags cannot be used.

'git for-each-ref --merged' is probably sufficient. The only difference
being that it would be nice to specify the matching refs over stdin
with --stdin to avoid long argument lists.

With this in mind, I can update the performance test to look like this
(after updating the setup step to add branches for each line in 'refs')


test_perf 'batch reachability: git ahead-behind --contains' '
	git ahead-behind --contains --base=HEAD --stdin <refs
'

test_perf 'batch reachability: git branch --merged' '
	xargs git branch --merged=HEAD <branches
'

test_perf 'batch reachability: git for-each-ref --merged' '
	xargs git for-each-ref --merged=HEAD <refs
'

And get decent results on all cases with the Linux kernel repository:

Test                                                      this tree      
-------------------------------------------------------------------------
1500.2: ahead-behind counts: git ahead-behind             0.26(0.24+0.01)
1500.3: ahead-behind counts: git rev-list                 4.46(3.91+0.54)
1500.4: batch reachability: git ahead-behind --contains   0.02(0.01+0.01)
1500.5: batch reachability: git branch --merged           0.14(0.13+0.00)
1500.6: batch reachability: git for-each-ref --merged     0.14(0.13+0.00)

So, there is benefit in using this tips_reachable_from_base() method in
the two existing 'git (branch|for-each-ref) --merged' computations. The
API boundary selected in this series might not be the most appropriate
for those builtins, so let's kick out patch 8 from this series for now
and I'll revisit it separately.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges
  2023-03-06 20:18   ` Derrick Stolee
@ 2023-03-06 22:24     ` Junio C Hamano
  0 siblings, 0 replies; 90+ messages in thread
From: Junio C Hamano @ 2023-03-06 22:24 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Derrick Stolee via GitGitGadget, git, me, vdye

Derrick Stolee <derrickstolee@github.com> writes:

> 'git for-each-ref --merged' is probably sufficient. The only difference
> being that it would be nice to specify the matching refs over stdin
> with --stdin to avoid long argument lists.

Yeah, if you have a list of concrete refs maintained externally
(as opposed to the example I responded to, where you generate the
refs by telling for-each-ref what pattern they should match), having
--stdin would be a good thing.

> So, there is benefit in using this tips_reachable_from_base() method in
> the two existing 'git (branch|for-each-ref) --merged' computations. The
> API boundary selected in this series might not be the most appropriate
> for those builtins, so let's kick out patch 8 from this series for now
> and I'll revisit it separately.

Yup, if the reachability API refactoring in these patches can also
help the "for-each-ref" listing (which "git branch" and "git tag"
bases their listing behaviour), it would be very good.

Thanks.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges
  2023-03-06 14:06 [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Derrick Stolee via GitGitGadget
                   ` (8 preceding siblings ...)
  2023-03-06 18:26 ` [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Junio C Hamano
@ 2023-03-07  0:33 ` Taylor Blau
  2023-03-10 17:20 ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
  10 siblings, 0 replies; 90+ messages in thread
From: Taylor Blau @ 2023-03-07  0:33 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, vdye, Derrick Stolee

On Mon, Mar 06, 2023 at 02:06:30PM +0000, Derrick Stolee via GitGitGadget wrote:
> This series introduces the 'git ahead-behind' builtin, which has been used
> at $DAYJOB for many years, but took many forms before landing on the current
> version.

Thanks for a helpful summary of all of the details here. I am of course
familiar with your use of this builtin at our common $DAYJOB, but it was
nice to see a from-scratch explanation of what you're trying to do here.

Now let's take a look at how the patches came together... ;-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges
  2023-03-06 18:26 ` [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Junio C Hamano
  2023-03-06 20:18   ` Derrick Stolee
@ 2023-03-07  0:36   ` Taylor Blau
  2023-03-09  9:20     ` Jeff King
  1 sibling, 1 reply; 90+ messages in thread
From: Taylor Blau @ 2023-03-07  0:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee via GitGitGadget, git, vdye, Derrick Stolee

On Mon, Mar 06, 2023 at 10:26:26AM -0800, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > These numbers can be computed by 'git rev-list --count B..C' and 'git
> > rev-list --count C..B', but there are common needs that benefit from having
> > the checks being done in the same process:
>
> This makes readers wonder if "git rev-list --count B...C" should be
> the end-user facing UI for this new feature, perhaps?
>
> Of course if you are checking how C0, C1, C2,... relate to a single
> B, the existing rev-list syntax would not work, and makes a totally
> new subcommand a possibilty.

Yeah. You could imagine that `rev-list --count` might do something
fancy like coalescing

    git rev-list --count B...C1 B...C2 B...C3

into a single walk. But I am not sure that just because `rev-list
--count` provides similar functionality that we should fold in the
proposed `ahead-behind` interface into that flag.

On the other hand, I could see a compelling argument for a slightly
different syntax (maybe `--count-ahead-behind` or
`--count=ahead-behind`) that would fold this functionality into
`rev-list`.

And that is the sort of thing that we would want to settle on sooner
rather than later, since it's fairly baked in once we decide one way or
another and then merge this up.

My personal feeling is that we ought to avoid (further) overloading
`rev-list` absent of a compelling reason to do so. But I am definitely
open to other thoughts here.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 1/8] ahead-behind: create empty builtin
  2023-03-06 18:48   ` Junio C Hamano
@ 2023-03-07  0:40     ` Taylor Blau
  2023-03-08 22:14       ` Derrick Stolee
  0 siblings, 1 reply; 90+ messages in thread
From: Taylor Blau @ 2023-03-07  0:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee via GitGitGadget, git, vdye, Derrick Stolee

On Mon, Mar 06, 2023 at 10:48:45AM -0800, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > For example, we will be able to track all local branches relative to an
> > upstream branch using an invocation such as
> >
> >   git for-each-ref --format=%(refname) refs/heads/* |
> >     git ahead-behind --base=origin/main --stdin
>
> Stepping back a bit, this motivating example makes me wonder if
>
>  $ git for-each-ref --format='%(refname) %(aheadbehind)' refs/heads/\*

One disadvantage to using for-each-ref here is that we are bound to use
all of the ref-sorting code, so callers can't see intermediate results
until the entire walk is complete.

I can't remember enough of the details about the custom traversal we use
here to know if that would even matter or not (i.e., do we need to
traverse through the whole set of objects entirely before outputting a
single result anyway?). But something to think about nonetheless.

At the very least, it is quite a cute idea (especially something like
'%(aheadbehind:origin/main)') ;-).

> that computes the ahead-behind number for each ref (that matches the
> pattern) based on their own "upstream" (presumably each branch is
> configured to track the same, or different, upstreams), or
> overrriding @{upstream}, a specified base, i.e.
>
>  $ git for-each-ref --format='%(refname) %(aheadbehind:origin/main)' refs/heads/\*
>
> would be a more intuitive interface to the end-users.
>
> It would probably work well in conjunction with
>
>     git for-each-ref --format='%(refname)' --merged origin/main refs/heads/\*
>
> which is a way to list local branches that are already merged into
> the upstream, to have the feature appear in the same command,
> perhaps?

One thing that we had talked about internally[^1] was the idea of
specifying multiple bases. IOW, having some way to invoke the
ahead-behind builtin that gives some set of tips with a common base B1,
and another set of tips (which could--but doesn't have to--intersect
with the first) and a common base to compare *them* to, say, B2.

There are some technical reasons that we might want to consider such a
thing at least motivated by GitHub's proposed future use of it. But they
are kind of technical and not that interesting to this discussion, so I
wouldn't be sad if we didn't have a way to specify multiple bases.

OTOH, it would be nice to avoid painting ourselves into a corner from a
UI-perspective if we can avoid it.

Thanks,
Taylor

[^1]: ...and couldn't decide if it was going to be a nice future
addition or simply another case of YAGNI ;-).

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 2/8] ahead-behind: parse tip references
  2023-03-06 14:06 ` [PATCH 2/8] ahead-behind: parse tip references Derrick Stolee via GitGitGadget
@ 2023-03-07  0:43   ` Taylor Blau
  0 siblings, 0 replies; 90+ messages in thread
From: Taylor Blau @ 2023-03-07  0:43 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, vdye, Derrick Stolee

On Mon, Mar 06, 2023 at 02:06:32PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
>
> Before implementing the logic to compute the ahead/behind counts, parse
> the unknown options as commits and place them in a string_list.
>
> Be sure to error out when the reference is not found.
>
> Co-authored-by: Taylor Blau <me@ttaylorr.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>

This all looks reasonable (and forging my S-o-b here and elsewhere
throughout this series is all fine). I have seen most of this code
before at least in its final state, but the intermediate bits are new to
me.

And they all look fine and familiar, except...

> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  builtin/ahead-behind.c  | 39 +++++++++++++++++++++++++++++++++++++++
>  t/t4218-ahead-behind.sh | 10 ++++++++++
>  2 files changed, 49 insertions(+)
>
> diff --git a/builtin/ahead-behind.c b/builtin/ahead-behind.c
> index a56cc565def..c1212cc8d46 100644
> --- a/builtin/ahead-behind.c
> +++ b/builtin/ahead-behind.c
> @@ -1,16 +1,31 @@
>  #include "builtin.h"
>  #include "parse-options.h"
>  #include "config.h"
> +#include "commit.h"
>
>  static const char * const ahead_behind_usage[] = {
>  	N_("git ahead-behind --base=<ref> [ --stdin | <revs> ]"),
>  	NULL
>  };
>
> +static int handle_arg(struct string_list *tips, const char *arg)
> +{
> +	struct string_list_item *item;
> +	struct commit *c = lookup_commit_reference_by_name(arg);
> +
> +	if (!c)
> +		return error(_("could not resolve '%s'"), arg);
> +
> +	item = string_list_append(tips, arg);
> +	item->util = c;
> +	return 0;
> +}
> +
>  int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
>  {
>  	const char *base_ref = NULL;
>  	int from_stdin = 0;
> +	struct string_list tips = STRING_LIST_INIT_DUP;
>
>  	struct option ahead_behind_opts[] = {
>  		OPT_STRING('b', "base", &base_ref, N_("base"), N_("base reference to process")),
> @@ -26,5 +41,29 @@ int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
>
>  	git_config(git_default_config, NULL);
>
> +	if (from_stdin) {
> +		struct strbuf line = STRBUF_INIT;
> +
> +		while (strbuf_getline(&line, stdin) != EOF) {
> +			if (!line.len)
> +				break;
> +
> +			if (handle_arg(&tips, line.buf))
> +				return 1;
> +		}
> +
> +		strbuf_release(&line);
> +	} else {
> +		int i;
> +		for (i = 0; i < argc; ++i) {
> +			if (handle_arg(&tips, argv[i]))
> +				return 1;
> +		}
> +	}
> +
> +	/* Early return for no tips. */
> +	if (!tips.nr)
> +		return 0;
> +

...are we missing a call to `string_list_clear()` here on `&tips`?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 3/8] ahead-behind: implement --ignore-missing option
  2023-03-06 14:06 ` [PATCH 3/8] ahead-behind: implement --ignore-missing option Derrick Stolee via GitGitGadget
@ 2023-03-07  0:46   ` Taylor Blau
  0 siblings, 0 replies; 90+ messages in thread
From: Taylor Blau @ 2023-03-07  0:46 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, vdye, Derrick Stolee

On Mon, Mar 06, 2023 at 02:06:33PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
>
> When parsing the tip revisions from the ahead-behind inputs, it is
> important to check that those tips exist before adding them to the list
> for computation. The previous change caused the builtin to return with
> errors if the revisions could not be resolved.
>
> However, when running 'git ahead-behind' in an environment with
> concurrent edits, such as a Git server, then the references could be
> deleted from underneath the caller between reading the reference list
> and starting the 'git ahead-behind' process. Avoid this race by allowing
> the caller to specify '--ignore-missing' and continue using the
> information that is still available.

Well explained, thanks for writing this all out :-).

> diff --git a/builtin/ahead-behind.c b/builtin/ahead-behind.c
> index c1212cc8d46..e4f65fc0548 100644
> --- a/builtin/ahead-behind.c
> +++ b/builtin/ahead-behind.c
> @@ -8,13 +8,18 @@ static const char * const ahead_behind_usage[] = {
>  	NULL
>  };
>
> +static int ignore_missing;
> +
>  static int handle_arg(struct string_list *tips, const char *arg)
>  {
>  	struct string_list_item *item;
>  	struct commit *c = lookup_commit_reference_by_name(arg);
>
> -	if (!c)
> +	if (!c) {
> +		if (ignore_missing)
> +			return 0;
>  		return error(_("could not resolve '%s'"), arg);
> +	}

And the diff makes complete sense here, too.

>  	item = string_list_append(tips, arg);
>  	item->util = c;
> @@ -30,6 +35,7 @@ int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
>  	struct option ahead_behind_opts[] = {
>  		OPT_STRING('b', "base", &base_ref, N_("base"), N_("base reference to process")),
>  		OPT_BOOL(0 , "stdin", &from_stdin, N_("read rev names from stdin")),
> +		OPT_BOOL(0 , "ignore-missing", &ignore_missing, N_("ignore missing tip references")),

The spacing between "0" and the comma and "ignore-missing" (as well with
"stdin" above, though I didn't notice it when reading the previous
patch) is a little funky.

I suspect that it is carried over from some historical typo from many
years ago, but probably worth fixing while we're thinking about it.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 6/8] commit-graph: introduce `ensure_generations_valid()`
  2023-03-06 18:52   ` Junio C Hamano
@ 2023-03-07  0:50     ` Taylor Blau
  0 siblings, 0 replies; 90+ messages in thread
From: Taylor Blau @ 2023-03-07  0:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau via GitGitGadget, git, vdye, Derrick Stolee

On Mon, Mar 06, 2023 at 10:52:20AM -0800, Junio C Hamano wrote:
> "Taylor Blau via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Taylor Blau <me@ttaylorr.com>
> >
> > Use the just-introduced compute_reachable_generation_numbers_1() to
> > implement a function which dynamically computes topological levels (or
> > corrected commit dates) for out-of-graph commits.
> >
> > This will be useful for the ahead-behind algorithm we are about to
> > introduce, which needs accurate topological levels on _all_ commits
> > reachable from the tips in order to avoid over-counting.
>
> Interesting and nice to see it done with so small a change thanks to
> the previous refactoring.

Stolee did all of the hard work ;-). But chiming in to say that I
remember all of these changes well and they look faithful to my original
read of them (as well as what I wrote in this patch).

So, LGTM so far.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 7/8] ahead-behind: implement ahead_behind() logic
  2023-03-06 14:06 ` [PATCH 7/8] ahead-behind: implement ahead_behind() logic Derrick Stolee via GitGitGadget
@ 2023-03-07  1:05   ` Taylor Blau
  2023-03-09 17:32     ` Derrick Stolee
  0 siblings, 1 reply; 90+ messages in thread
From: Taylor Blau @ 2023-03-07  1:05 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, vdye, Derrick Stolee

On Mon, Mar 06, 2023 at 02:06:37PM +0000, Derrick Stolee via GitGitGadget wrote:
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>

Having read and worked with this code before, I don't have a ton of
substance to add here. But it was interesting to reread, and I left a
few sprinklings here and there of some thing that we may want to
consider for v2.

Before that, though, IIRC we wrote most of this together, so I would be
happy to have my:

    Co-authored-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Taylor Blau <me@ttaylorr.com>

above your S-o-b here. But you've done so much work since we originally
wrote this together that I don't mind being dropped here. Up to you :-).

> @@ -71,5 +76,23 @@ int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
>  	if (!tips.nr)
>  		return 0;
>
> +	ALLOC_ARRAY(commits, tips.nr + 1);
> +	ALLOC_ARRAY(counts, tips.nr);
> +
> +	for (i = 0; i < tips.nr; i++) {
> +		commits[i] = tips.items[i].util;
> +		counts[i].tip_index = i;
> +		counts[i].base_index = tips.nr;
> +	}
> +	commits[tips.nr] = base;
> +
> +	ahead_behind(commits, tips.nr + 1, counts, tips.nr);
> +
> +	for (i = 0; i < tips.nr; i++)
> +		printf("%s %d %d\n", tips.items[i].string,
> +		       counts[i].ahead, counts[i].behind);
> +
> +	free(counts);
> +	free(commits);
>  	return 0;
>  }

I have to say, the interface looks particularly well designed when you
see the patches come together in this fashion. The builtin is doing
basically no work except collating the user's input, passing it off to
ahead_behind(), and then spitting out the results.

Very nice ;-).

> diff --git a/commit-reach.c b/commit-reach.c
> index 2e33c599a82..87ccc2cd4f5 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -8,6 +8,7 @@
>  #include "revision.h"
>  #include "tag.h"
>  #include "commit-reach.h"
> +#include "ewah/ewok.h"
>
>  /* Remember to update object flag allocation in object.h */

There is a new use of PARENT2 (which we hardcode here as bit 17) below,
but it is already covered as part of the object flag allocation table in
object.h. So this comment has done its job over the years ;-).

>  #define PARENT1		(1u<<16)
> @@ -941,3 +942,97 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
>
>  	return found_commits;
>  }
> +
> +define_commit_slab(bit_arrays, struct bitmap *);
> +static struct bit_arrays bit_arrays;
> +
> +static void insert_no_dup(struct prio_queue *queue, struct commit *c)
> +{
> +	if (c->object.flags & PARENT2)
> +		return;
> +	prio_queue_put(queue, c);
> +	c->object.flags |= PARENT2;
> +}

You mentioned this in the patch message, but:

It may be worth noting here (or in the call to repo_clear_commit_marks()
below) that the PARENT2 flag is used to detect and avoid duplicates in
this list.

> +static struct bitmap *init_bit_array(struct commit *c, int width)
> +{
> +	struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c);
> +	if (!*bitmap)
> +		*bitmap = bitmap_word_alloc(width);
> +	return *bitmap;
> +}
> +
> +static void free_bit_array(struct commit *c)
> +{
> +	struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c);
> +	if (!*bitmap)
> +		return;
> +	bitmap_free(*bitmap);
> +	*bitmap = NULL;
> +}
> +
> +void ahead_behind(struct commit **commits, size_t commits_nr,
> +		  struct ahead_behind_count *counts, size_t counts_nr)
> +{
> +	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
> +	size_t width = (commits_nr + BITS_IN_EWORD - 1) / BITS_IN_EWORD;
> +	size_t i;
> +
> +	if (!commits_nr || !counts_nr)
> +		return;
> +
> +	for (i = 0; i < counts_nr; i++) {
> +		counts[i].ahead = 0;
> +		counts[i].behind = 0;
> +	}
> +
> +	ensure_generations_valid(commits, commits_nr);
> +
> +	init_bit_arrays(&bit_arrays);
> +
> +	for (i = 0; i < commits_nr; i++) {
> +		struct commit *c = commits[i];
> +		struct bitmap *bitmap = init_bit_array(c, width);
> +
> +		bitmap_set(bitmap, i);
> +		insert_no_dup(&queue, c);
> +	}
> +
> +	while (queue_has_nonstale(&queue)) {
> +		struct commit *c = prio_queue_get(&queue);
> +		struct commit_list *p;
> +		struct bitmap *bitmap_c = init_bit_array(c, width);
> +
> +		for (i = 0; i < counts_nr; i++) {
> +			int reach_from_tip = bitmap_get(bitmap_c, counts[i].tip_index);
> +			int reach_from_base = bitmap_get(bitmap_c, counts[i].base_index);

Since we're XORing these, I'd hate to get bit by bitmap_get returning
something other than 0 or 1. It doesn't, since the return value (for any
"pos" for which it holds that `EWAH_BLOCKI(pos) < self->word_alloc`) is:

    (self->words[EWAH_BLOCK(pos)] & EWAH_MASK(pos)) != 0

so we'll always be guaranteed to zero or one. But if we retuned instead:

    self->words[EWAH_BLOCK(pos)] & EWAH_MASK(pos)

...this code would break in a very annoying and hard-to-debug way ;-).

I wonder if we might do a little of belt-and-suspenders here by calling
these like:

    int reach_from_tip  = !!(bitmap_get(bitmap_c, counts[i].tip_index));
    int reach_from_base = !!(bitmap_get(bitmap_c, counts[i].base_index));

where the "!!(...)" is new.

> +			if (reach_from_tip ^ reach_from_base) {
> +				if (reach_from_base)
> +					counts[i].behind++;
> +				else
> +					counts[i].ahead++;
> +			}
> +		}

I have gone back and forth so many times on this code :-). I think the
XORs are fine, though.

> +		for (p = c->parents; p; p = p->next) {
> +			struct bitmap *bitmap_p;
> +
> +			parse_commit(p->item);
> +
> +			bitmap_p = init_bit_array(p->item, width);
> +			bitmap_or(bitmap_p, bitmap_c);
> +
> +			if (bitmap_popcount(bitmap_p) == commits_nr)
> +				p->item->object.flags |= STALE;
> +
> +			insert_no_dup(&queue, p->item);

Do we care about inserting p->item when the above condition is met? IOW,
would it be OK to instead write:

    if (bitmap_popcount(bitmap_p) == commits_nr)
      p->item->object.flags |= STALE;
    else
      insert_no_dup(&queue, p->item);

> diff --git a/commit-reach.h b/commit-reach.h
> index 148b56fea50..1780f9317bf 100644
> --- a/commit-reach.h
> +++ b/commit-reach.h
> @@ -104,4 +104,34 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
>  					 struct commit **to, int nr_to,
>  					 unsigned int reachable_flag);
>
> +struct ahead_behind_count {
> +	/**
> +	 * As input, the *_index members indicate which positions in
> +	 * the 'tips' array correspond to the tip and base of this
> +	 * comparison.
> +	 */
> +	size_t tip_index;
> +	size_t base_index;
> +
> +	/**
> +	 * These values store the computed counts for each side of the
> +	 * symmetric difference:
> +	 *
> +	 * 'ahead' stores the number of commits reachable from the tip
> +	 * and not reachable from the base.
> +	 *
> +	 * 'behind' stores the number of commits reachable from the base
> +	 * and not reachable from the tip.
> +	 */
> +	int ahead;
> +	int behind;
> +};

Should these be unsigned values? I don't think we have a sensible
interpretation for what a negative "ahead" or "behind" could would mean.
I guess behind "behind" by "N" means you're "ahead" by "-N", but I don't
think it's practical ;-).

> +
> +/**

Here and elsewhere, these kind of doc-comments are a little
non-standard, and IIRC the opening should instead be "/*" (with one
asterisk instead of two).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 1/8] ahead-behind: create empty builtin
  2023-03-07  0:40     ` Taylor Blau
@ 2023-03-08 22:14       ` Derrick Stolee
  2023-03-08 22:56         ` Junio C Hamano
  0 siblings, 1 reply; 90+ messages in thread
From: Derrick Stolee @ 2023-03-08 22:14 UTC (permalink / raw)
  To: Taylor Blau, Junio C Hamano; +Cc: Derrick Stolee via GitGitGadget, git, vdye

On 3/6/2023 7:40 PM, Taylor Blau wrote:
> On Mon, Mar 06, 2023 at 10:48:45AM -0800, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> For example, we will be able to track all local branches relative to an
>>> upstream branch using an invocation such as
>>>
>>>   git for-each-ref --format=%(refname) refs/heads/* |
>>>     git ahead-behind --base=origin/main --stdin
>>
>> Stepping back a bit, this motivating example makes me wonder if
>>
>>  $ git for-each-ref --format='%(refname) %(aheadbehind)' refs/heads/\*
> 
> One disadvantage to using for-each-ref here is that we are bound to use
> all of the ref-sorting code, so callers can't see intermediate results
> until the entire walk is complete.
> 
> I can't remember enough of the details about the custom traversal we use
> here to know if that would even matter or not (i.e., do we need to
> traverse through the whole set of objects entirely before outputting a
> single result anyway?). But something to think about nonetheless.
> 
> At the very least, it is quite a cute idea (especially something like
> '%(aheadbehind:origin/main)') ;-).
> 
>> that computes the ahead-behind number for each ref (that matches the
>> pattern) based on their own "upstream" (presumably each branch is
>> configured to track the same, or different, upstreams), or
>> overrriding @{upstream}, a specified base, i.e.
>>
>>  $ git for-each-ref --format='%(refname) %(aheadbehind:origin/main)' refs/heads/\*
>>
>> would be a more intuitive interface to the end-users.
>>
>> It would probably work well in conjunction with
>>
>>     git for-each-ref --format='%(refname)' --merged origin/main refs/heads/\*
>>
>> which is a way to list local branches that are already merged into
>> the upstream, to have the feature appear in the same command,
>> perhaps?
> 
> One thing that we had talked about internally[^1] was the idea of
> specifying multiple bases. IOW, having some way to invoke the
> ahead-behind builtin that gives some set of tips with a common base B1,
> and another set of tips (which could--but doesn't have to--intersect
> with the first) and a common base to compare *them* to, say, B2.
> 
> There are some technical reasons that we might want to consider such a
> thing at least motivated by GitHub's proposed future use of it. But they
> are kind of technical and not that interesting to this discussion, so I
> wouldn't be sad if we didn't have a way to specify multiple bases.
> 
> OTOH, it would be nice to avoid painting ourselves into a corner from a
> UI-perspective if we can avoid it.
> 
> Thanks,
> Taylor
> 
> [^1]: ...and couldn't decide if it was going to be a nice future
> addition or simply another case of YAGNI ;-).

This use of 'git for-each-ref --format=""' actually fixes some of the
issues I had with how to specify multiple bases. I'm not sure there is
a huge need for it, except that if we allow a "%(ahead-behind:<ref>)"
format token, then we would need to support multiple bases.

Thankfully, the implementation in this series is already prepared for
that, so the following diff implements this format token:

--- >8 ---

 builtin/for-each-ref.c | 50 ++++++++++++++++++++++++++++++++++++++++++
 ref-filter.c           | 23 +++++++++++++++++++
 ref-filter.h           | 15 ++++++++++++-
 3 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 6f62f40d126..c8dd21d7e13 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -5,6 +5,7 @@
 #include "object.h"
 #include "parse-options.h"
 #include "ref-filter.h"
+#include "commit-reach.h"
 
 static char const * const for_each_ref_usage[] = {
 	N_("git for-each-ref [<options>] [<pattern>]"),
@@ -14,6 +15,51 @@ static char const * const for_each_ref_usage[] = {
 	NULL
 };
 
+static void compute_ahead_behind(struct ref_format *format,
+				 struct ref_array *array)
+{
+	struct commit **commits;
+	size_t commits_nr = format->bases.nr + array->nr;
+
+	if (!format->bases.nr || !array->nr)
+		return;
+
+	ALLOC_ARRAY(commits, commits_nr);
+	for (size_t i = 0; i < format->bases.nr; i++) {
+		const char *name = format->bases.items[i].string;
+		commits[i] = lookup_commit_reference_by_name(name);
+		if (!commits[i])
+			die("failed to find '%s'", name);
+	}
+
+	ALLOC_ARRAY(array->counts, st_mult(format->bases.nr, array->nr));
+
+	commits_nr = format->bases.nr;
+	array->counts_nr = 0;
+	for (size_t i = 0; i < array->nr; i++) {
+		const char *name = array->items[i]->refname;
+		commits[commits_nr] = lookup_commit_reference_by_name(name);
+
+		if (!commits[commits_nr]) {
+			warning(_("could not find '%s'"), name);
+			continue;
+		}
+
+		CALLOC_ARRAY(array->items[i]->counts, format->bases.nr);
+		for (size_t j = 0; j < format->bases.nr; j++) {
+			struct ahead_behind_count *count;
+			count = &array->counts[array->counts_nr++];
+			count->tip_index = format->bases.nr + i;
+			count->base_index = j;
+
+			array->items[i]->counts[j] = count;
+		}
+		commits_nr++;
+	}
+
+	ahead_behind(commits, commits_nr, array->counts, array->counts_nr);
+}
+
 int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 {
 	int i;
@@ -78,6 +124,10 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	filter.name_patterns = argv;
 	filter.match_as_path = 1;
 	filter_refs(&array, &filter, FILTER_REFS_ALL);
+
+	/* Do ahead-behind things, if necessary. */
+	compute_ahead_behind(&format, &array);
+
 	ref_array_sort(sorting, &array);
 
 	if (!maxcount || array.nr < maxcount)
diff --git a/ref-filter.c b/ref-filter.c
index f8203c6b052..1706b9dd0d5 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -158,6 +158,7 @@ enum atom_type {
 	ATOM_THEN,
 	ATOM_ELSE,
 	ATOM_REST,
+	ATOM_AHEADBEHIND,
 };
 
 /*
@@ -586,6 +587,16 @@ static int rest_atom_parser(struct ref_format *format, struct used_atom *atom,
 	return 0;
 }
 
+static int ahead_behind_atom_parser(struct ref_format *format, struct used_atom *atom,
+				    const char *arg, struct strbuf *err)
+{
+	if (!arg)
+		return strbuf_addf_ret(err, -1, _("expected format: %%(ahead-behind:<ref>)"));
+
+	string_list_append(&format->bases, arg);
+	return 0;
+}
+
 static int head_atom_parser(struct ref_format *format, struct used_atom *atom,
 			    const char *arg, struct strbuf *err)
 {
@@ -645,6 +656,7 @@ static struct {
 	[ATOM_THEN] = { "then", SOURCE_NONE },
 	[ATOM_ELSE] = { "else", SOURCE_NONE },
 	[ATOM_REST] = { "rest", SOURCE_NONE, FIELD_STR, rest_atom_parser },
+	[ATOM_AHEADBEHIND] = { "ahead-behind", SOURCE_OTHER, FIELD_STR, ahead_behind_atom_parser },
 	/*
 	 * Please update $__git_ref_fieldlist in git-completion.bash
 	 * when you add new atoms
@@ -1848,6 +1860,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 	struct object *obj;
 	int i;
 	struct object_info empty = OBJECT_INFO_INIT;
+	int ahead_behind_atoms = 0;
 
 	CALLOC_ARRAY(ref->value, used_atom_cnt);
 
@@ -1978,6 +1991,16 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 			else
 				v->s = xstrdup("");
 			continue;
+		} else if (atom_type == ATOM_AHEADBEHIND) {
+			if (ref->counts) {
+				const struct ahead_behind_count *count;
+				count = ref->counts[ahead_behind_atoms++];
+				v->s = xstrfmt("%d %d", count->ahead, count->behind);
+			} else {
+				/* Not a commit. */
+				v->s = xstrdup("");
+			}
+			continue;
 		} else
 			continue;
 
diff --git a/ref-filter.h b/ref-filter.h
index aa0eea4ecf5..937a857ddee 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -5,6 +5,7 @@
 #include "refs.h"
 #include "commit.h"
 #include "parse-options.h"
+#include "string-list.h"
 
 /* Quoting styles */
 #define QUOTE_NONE 0
@@ -24,6 +25,7 @@
 
 struct atom_value;
 struct ref_sorting;
+struct ahead_behind_count;
 
 enum ref_sorting_order {
 	REF_SORTING_REVERSE = 1<<0,
@@ -40,6 +42,8 @@ struct ref_array_item {
 	const char *symref;
 	struct commit *commit;
 	struct atom_value *value;
+	struct ahead_behind_count **counts;
+
 	char refname[FLEX_ARRAY];
 };
 
@@ -47,6 +51,9 @@ struct ref_array {
 	int nr, alloc;
 	struct ref_array_item **items;
 	struct rev_info *revs;
+
+	struct ahead_behind_count *counts;
+	size_t counts_nr;
 };
 
 struct ref_filter {
@@ -80,9 +87,15 @@ struct ref_format {
 
 	/* Internal state to ref-filter */
 	int need_color_reset_at_eol;
+
+	/* List of bases for ahead-behind counts. */
+	struct string_list bases;
 };
 
-#define REF_FORMAT_INIT { .use_color = -1 }
+#define REF_FORMAT_INIT {             \
+	.use_color = -1,              \
+	.bases = STRING_LIST_INIT_DUP, \
+}
 
 /*  Macros for checking --merged and --no-merged options */
 #define _OPT_MERGED_NO_MERGED(option, filter, h) \
-- 
2.40.0.vfs.0.0.3.g5872ac9aaa4

--- >8 ---

I can already see some things I want to change about this quick
and dirty implementation, but it gets the point across. This
"test" can be added to the end of t6302 for some demonstration:

test_expect_success 'ahead-behind' '
	git for-each-ref --format="%(refname) %(ahead-behind:HEAD)" &&
	git for-each-ref --format="%(refname) %(ahead-behind:HEAD) %(ahead-behind:refs/heads/side)"
'

What I have yet to determine is that 'git for-each-ref' does
not have significant overhead due to how it's implementation is
built around listing "all refs that match" versus an explicit
input list of refs. There's also the concept of '--stdin' that
would be interesting to interact with.

I'll continue to investigate this path and report back when I
have more of this information. This is as far I as I could get
today.

Thanks,
-Stolee

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH 1/8] ahead-behind: create empty builtin
  2023-03-08 22:14       ` Derrick Stolee
@ 2023-03-08 22:56         ` Junio C Hamano
  0 siblings, 0 replies; 90+ messages in thread
From: Junio C Hamano @ 2023-03-08 22:56 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, Derrick Stolee via GitGitGadget, git, vdye

Derrick Stolee <derrickstolee@github.com> writes:

> What I have yet to determine is that 'git for-each-ref' does
> not have significant overhead due to how it's implementation is
> built around listing "all refs that match" versus an explicit
> input list of refs. There's also the concept of '--stdin' that
> would be interesting to interact with.

Yeah, we could add --no-sort to allow streaming better and --stdin
to feed list of refs to work on, if the end-user facing interface
based on --format is what people find reasonable.

> I'll continue to investigate this path and report back when I
> have more of this information. This is as far I as I could get
> today.

Thanks.  It is a very interesting experiment.


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges
  2023-03-07  0:36   ` Taylor Blau
@ 2023-03-09  9:20     ` Jeff King
  2023-03-09 21:51       ` Junio C Hamano
  0 siblings, 1 reply; 90+ messages in thread
From: Jeff King @ 2023-03-09  9:20 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Junio C Hamano, Derrick Stolee via GitGitGadget, git, vdye,
	Derrick Stolee

On Mon, Mar 06, 2023 at 07:36:23PM -0500, Taylor Blau wrote:

> > This makes readers wonder if "git rev-list --count B...C" should be
> > the end-user facing UI for this new feature, perhaps?
> >
> > Of course if you are checking how C0, C1, C2,... relate to a single
> > B, the existing rev-list syntax would not work, and makes a totally
> > new subcommand a possibilty.
> 
> Yeah. You could imagine that `rev-list --count` might do something
> fancy like coalescing
> 
>     git rev-list --count B...C1 B...C2 B...C3
> 
> into a single walk. But I am not sure that just because `rev-list
> --count` provides similar functionality that we should fold in the
> proposed `ahead-behind` interface into that flag.

It does coalesce all of that into a single walk. The problem is somewhat
the opposite: it only has a notion of two "sides" for a symmetric
traversal: left and right. But in your example there are many sides, and
we have to remember which is which.

I think getting the answer from one walk would require an arbitrary
number of bits to paint down each path. Certainly the ahead-behind that
Vicent and I wrote long ago didn't do that (IIRC it mostly relied on
doing multiple traversals in the same process, which amortized the cost
of commit parsing; that's not really an issue these days with commit
graphs).

Peeking at patch 7 of Stolee's series...yep. That's exactly what it
does. :)

I wondered how much it would matter on top of a naive loop of
single-traversals, now that we have commit graphs. It looks like there's
still quite a nice speedup from the numbers in patch 7 (though the
totally naive "loop of rev-list" is incurring extra startup overhead,
too).

> My personal feeling is that we ought to avoid (further) overloading
> `rev-list` absent of a compelling reason to do so. But I am definitely
> open to other thoughts here.

So I think this actually is what "git rev-list --left-right --count
old...new" does now. But extending it to multiple sets in one traversal
means you need:

  - being able to ask for individual left-right markers for each pair,
    not treating all lefts and all rights together

  - don't stop traversing when you hit an UNINTERESTING commit if there
    are still bits to paint. In a single-pair traversal, those two are
    the same thing (we stop at the merge base), but with multiple pairs
    you may have to keep walking past a commit that is excluded from one
    pair, but not another. This _might_ be doable if you assume all of
    the left-hand bases are the same, but I didn't think hard enough to
    feel confident in that. But even so, that only solves cases like
    "how do these branches compare to HEAD" (which I think is what
    GitHub does). But it doesn't allow "how do these branches compare to
    to their respective @{upstream} refs".

So I don't think it would be impossible to make this a mode of rev-list.
And that mode might even provide flexibility for other similar
operations, like a mass "git rev-list --cherry-mark"[1]. But it is a
pretty big departure from the current rev-list traversal (to my mind,
especially the "keep walking past UNINTERESTING part). I don't mind it
as its own command.

-Peff

[1] The reason you might want a mass cherry-mark is basically doing
    something like the "branches" page, but in a workflow where upstream
    applies patches, like git.git. There you may want to ask about
    "origin/next...$branch" for all of your branches to see which ones
    have been merged where.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 7/8] ahead-behind: implement ahead_behind() logic
  2023-03-07  1:05   ` Taylor Blau
@ 2023-03-09 17:32     ` Derrick Stolee
  0 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee @ 2023-03-09 17:32 UTC (permalink / raw)
  To: Taylor Blau, Derrick Stolee via GitGitGadget; +Cc: git, gitster, vdye

On 3/6/2023 8:05 PM, Taylor Blau wrote:
> On Mon, Mar 06, 2023 at 02:06:37PM +0000, Derrick Stolee via GitGitGadget wrote:
>> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> 
> Having read and worked with this code before, I don't have a ton of
> substance to add here. But it was interesting to reread, and I left a
> few sprinklings here and there of some thing that we may want to
> consider for v2.
> 
> Before that, though, IIRC we wrote most of this together, so I would be
> happy to have my:
> 
>     Co-authored-by: Taylor Blau <me@ttaylorr.com>
>     Signed-off-by: Taylor Blau <me@ttaylorr.com>
> 
> above your S-o-b here. But you've done so much work since we originally
> wrote this together that I don't mind being dropped here. Up to you :-).

Sounds good. Sorry for forgetting that collaboration.

>> +static void insert_no_dup(struct prio_queue *queue, struct commit *c)
>> +{
>> +	if (c->object.flags & PARENT2)
>> +		return;
>> +	prio_queue_put(queue, c);
>> +	c->object.flags |= PARENT2;
>> +}
> 
> You mentioned this in the patch message, but:
> 
> It may be worth noting here (or in the call to repo_clear_commit_marks()
> below) that the PARENT2 flag is used to detect and avoid duplicates in
> this list.

I'll add a comment before we clear the bits, to be clear about
why we need each bit.
 
>> +	while (queue_has_nonstale(&queue)) {
>> +		struct commit *c = prio_queue_get(&queue);
>> +		struct commit_list *p;
>> +		struct bitmap *bitmap_c = init_bit_array(c, width);
>> +
>> +		for (i = 0; i < counts_nr; i++) {
>> +			int reach_from_tip = bitmap_get(bitmap_c, counts[i].tip_index);
>> +			int reach_from_base = bitmap_get(bitmap_c, counts[i].base_index);
> 
> Since we're XORing these, I'd hate to get bit by bitmap_get returning
> something other than 0 or 1. It doesn't, since the return value (for any
> "pos" for which it holds that `EWAH_BLOCKI(pos) < self->word_alloc`) is:

> I wonder if we might do a little of belt-and-suspenders here by calling
> these like:
> 
>     int reach_from_tip  = !!(bitmap_get(bitmap_c, counts[i].tip_index));
>     int reach_from_base = !!(bitmap_get(bitmap_c, counts[i].base_index));
> 
> where the "!!(...)" is new.

Can't hurt.
 
>> +			if (bitmap_popcount(bitmap_p) == commits_nr)
>> +				p->item->object.flags |= STALE;
>> +
>> +			insert_no_dup(&queue, p->item);
> 
> Do we care about inserting p->item when the above condition is met? IOW,
> would it be OK to instead write:
> 
>     if (bitmap_popcount(bitmap_p) == commits_nr)
>       p->item->object.flags |= STALE;
>     else
>       insert_no_dup(&queue, p->item);

We need to push p->item to the queue, even if stale, because it
may need to be walked in order to pass the bitmaps (and the STALE
bit) to other commits that were reached by only a subset of the
tips.

Here's an example:

     A B
     |/ \
     C   D
     |  /
     E /
     |/
     F

If A and B are the starting commits, then C is stale when we
walk it, so its parent E would be stale. Your proposed change
would not add it to the queue, and thus F would never see that
it is stale and would be counted as reachable from B but not A.

>> diff --git a/commit-reach.h b/commit-reach.h
>> index 148b56fea50..1780f9317bf 100644
>> --- a/commit-reach.h
>> +++ b/commit-reach.h
>> @@ -104,4 +104,34 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
>>  					 struct commit **to, int nr_to,
>>  					 unsigned int reachable_flag);
>>
>> +struct ahead_behind_count {
>> +	/**
>> +	 * As input, the *_index members indicate which positions in
>> +	 * the 'tips' array correspond to the tip and base of this
>> +	 * comparison.
>> +	 */
>> +	size_t tip_index;
>> +	size_t base_index;
>> +
>> +	/**
>> +	 * These values store the computed counts for each side of the
>> +	 * symmetric difference:
>> +	 *
>> +	 * 'ahead' stores the number of commits reachable from the tip
>> +	 * and not reachable from the base.
>> +	 *
>> +	 * 'behind' stores the number of commits reachable from the base
>> +	 * and not reachable from the tip.
>> +	 */
>> +	int ahead;
>> +	int behind;
>> +};
> 
> Should these be unsigned values? I don't think we have a sensible
> interpretation for what a negative "ahead" or "behind" could would mean.
> I guess behind "behind" by "N" means you're "ahead" by "-N", but I don't
> think it's practical ;-).
Unsigned sounds good.
 
>> +
>> +/**
> 
> Here and elsewhere, these kind of doc-comments are a little
> non-standard, and IIRC the opening should instead be "/*" (with one
> asterisk instead of two).

I think double-asterisk is the preferred choice for new things, but
commit-reach.h only uses a single asterisk so I'll change this to
be consistent.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges
  2023-03-09  9:20     ` Jeff King
@ 2023-03-09 21:51       ` Junio C Hamano
  0 siblings, 0 replies; 90+ messages in thread
From: Junio C Hamano @ 2023-03-09 21:51 UTC (permalink / raw)
  To: Jeff King
  Cc: Taylor Blau, Derrick Stolee via GitGitGadget, git, vdye,
	Derrick Stolee

Jeff King <peff@peff.net> writes:

>> Yeah. You could imagine that `rev-list --count` might do something
>> fancy like coalescing
>> 
>>     git rev-list --count B...C1 B...C2 B...C3
>> 
>> into a single walk. But I am not sure that just because `rev-list
>> --count` provides similar functionality that we should fold in the
>> proposed `ahead-behind` interface into that flag.
>
> It does coalesce all of that into a single walk. The problem is somewhat
> the opposite: it only has a notion of two "sides" for a symmetric
> traversal: left and right. But in your example there are many sides, and
> we have to remember which is which.

Yeah, this reminds me of what I had to do in "show-branch", where
each tip gets assigned a bit in the object->flags (which means it
can only traverse from a very small limited number of tips, like 30
or so), which I once planned to extend to arbitrary number of tips
by storing these bits in commit slab, but it never materialized.

> So I don't think it would be impossible to make this a mode of rev-list.
> And that mode might even provide flexibility for other similar
> operations, like a mass "git rev-list --cherry-mark"[1]. But it is a
> pretty big departure from the current rev-list traversal (to my mind,
> especially the "keep walking past UNINTERESTING part). I don't mind it
> as its own command.

I agree this is not a good fit for the mental model of rev-list or
the revision.c::get_revision() traversal.

Thanks.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option
  2023-03-06 14:06 [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Derrick Stolee via GitGitGadget
                   ` (9 preceding siblings ...)
  2023-03-07  0:33 ` Taylor Blau
@ 2023-03-10 17:20 ` Derrick Stolee via GitGitGadget
  2023-03-10 17:20   ` [PATCH v2 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
                     ` (10 more replies)
  10 siblings, 11 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-10 17:20 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, Jeff King, Derrick Stolee

At $DAYJOB, we have used a custom 'ahead-behind' builtin in our fork of Git
for lots of reasons. The main goal of the builtin is to compare multiple
references against a common base reference. The comparison is number of
commits that are in each side of the symmtric difference of their reachable
sets. A commit C is "ahead" of a commit B by the number of commits in B..C
(reachable from C but not reachable from B). Similarly, the commit C is
"behind" the commit B by the number of commits in C..B (reachable from B but
not reachable from C).

These numbers can be computed by 'git rev-list --count B..C' and 'git
rev-list --count C..B', but there are common needs that benefit from having
the checks being done in the same process:

 1. Our "branches" page lists ahead/behind counts for each listed branch as
    compared to the repo's default branch. This can be done with a single
    'git ahead-behind' process.
 2. When a branch is updated, a background job checks if any pull requests
    that target that branch should be closed because their branches were
    merged implicitly by that update. These queries can e batched into 'git
    ahead-behind' calls.

In that second example, we don't need the full ahead/behind counts (although
it is sufficient to look for branches that are "zero commits ahead", meaning
they are reachable from the base), and instead reachability is the critical
piece.

This series contributes the custom algorithms we used for our 'git
ahead-behind' builtin, but as extensions to 'git for-each-ref':

 * Add a new "%(ahead-behind:)" format token to for-each-ref which allows
   outputting the ahead/behind values in the format string for a matching
   ref.
 * Add a new algorithm that speeds up the 'git for-each-ref --merged='
   option. This also applies to the 'git branch --merged=' option.

The idea to use 'git for-each-ref' instead of creating a new builtin is from
Junio, and simplifies this series significantly compared to v1. I was
initially concerned about the overhead of 'git for-each-ref' and its
generality and sorting, but I was not able to measure any important
difference between this implementation and our internal 'git ahead-behind'
implementation. In particular, when a pattern is given to 'git for-each-ref'
that looks like an exact ref, it navigates directly to the ref instead of
scanning all references for matches.

However, for our specific uses, we like to batch a list of exact references
that could be very long. We introduce a new --stdin option here.

To keep things close to the v1 outline, I replaced the existing patches with
closely-related ones, when possible.

Patch 1 adds the --stdin option to 'git for-each-ref'. (This is similar to
the boilerplate patch from v1.)

Patch 2 adds a test to explicitly check that 'git for-each-ref' will still
succeed when all input refs are missing. (This is similar to the
--ignore-missing patch from v1.)

Patches 3-5 introduce a new method: ensure_generations_valid(). Patch 3 does
some refactoring of the existing generation number computations to make it
more generic, and patch 4 updates the definition of
commit_graph_generation() slightly, making way for patch 5 to implement the
method. With an existing commit-graph file, the commits that are not present
in the file are considered as having generation number "infinity". This is
useful for most of our reachability queries to this point, since those
commits are "above" the ones tracked by the commit-graph. When these commits
are low in number, then there is very little performance cost and zero
correctness cost. (These patches match v1 exactly.)

However, we will see that the ahead/behind computation requires accurate
generation numbers to avoid overcounting. Thus, ensure_generations_valid()
is a way to specify a list of commits that need generation numbers computed
before continuing. It's a no-op if all of those commits are in the
commit-graph file. It's expensive if the commit-graph doesn't exist.
However, '%(ahead-behind:)' computations are likely to be slow no matter
what without a commit-graph, so assuming an existing commit-graph file is
reasonable. If we find sufficient desire to have an implementation that does
not have this requirement, we could create a second implementation and
toggle to it when generation_numbers_enabled() returns false.

Patch 6 implements the ahead-behind algorithm, but it is not connected to a
builtin. It's a long commit message, so hopefully it explains the algorithm
sufficiently. (The difference from v1 is that it no longer integrates with a
builtin and there are no new tests. It also uses 'unsigned int' and is
correctly co-authored by Taylor.)

Patch 7 integrates the ahead-behind algorithm with the ref-filter code,
including parsing the "ahead-behind" token. This finally adds tests that
check both ahead_behind() and ensure_generations_valid() via
t6600-test-reach.sh. (This patch is essentially completely new in v2.)

Patch 8 implements the tips_reachable_from_base() method, and uses it within
the ref-filter code to speed up 'git for-each-ref --merged' and 'git branch
--merged'. (The interface is slightly different than v1, due to the needs of
the new caller.)

Thanks, -Stolee

Derrick Stolee (7):
  for-each-ref: add --stdin option
  for-each-ref: explicitly test no matches
  commit-graph: combine generation computations
  commit-graph: return generation from memory
  commit-reach: implement ahead_behind() logic
  for-each-ref: add ahead-behind format atom
  commit-reach: add tips_reachable_from_bases()

Taylor Blau (1):
  commit-graph: introduce `ensure_generations_valid()`

 Documentation/git-for-each-ref.txt |  12 +-
 builtin/branch.c                   |   1 +
 builtin/for-each-ref.c             |  32 ++++-
 builtin/tag.c                      |   1 +
 commit-graph.c                     | 208 ++++++++++++++++++----------
 commit-graph.h                     |   7 +
 commit-reach.c                     | 210 +++++++++++++++++++++++++++++
 commit-reach.h                     |  38 ++++++
 ref-filter.c                       |  89 +++++++++---
 ref-filter.h                       |  25 +++-
 t/perf/p1500-graph-walks.sh        |  50 +++++++
 t/t3203-branch-output.sh           |  14 ++
 t/t5318-commit-graph.sh            |   2 +-
 t/t6300-for-each-ref.sh            |  50 +++++++
 t/t6301-for-each-ref-errors.sh     |  12 ++
 t/t6600-test-reach.sh              | 169 +++++++++++++++++++++++
 t/t7004-tag.sh                     |  28 ++++
 17 files changed, 859 insertions(+), 89 deletions(-)
 create mode 100755 t/perf/p1500-graph-walks.sh


base-commit: 725f57037d81e24eacfda6e59a19c60c0b4c8062
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1489%2Fderrickstolee%2Fstolee%2Fupstream-ahead-behind-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1489/derrickstolee/stolee/upstream-ahead-behind-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1489

Range-diff vs v1:

 1:  0fd18b6d740 < -:  ----------- ahead-behind: create empty builtin
 2:  08fc0710017 ! 1:  a1d9e0f6ff6 ahead-behind: parse tip references
     @@ Metadata
      Author: Derrick Stolee <derrickstolee@github.com>
      
       ## Commit message ##
     -    ahead-behind: parse tip references
     +    for-each-ref: add --stdin option
      
     -    Before implementing the logic to compute the ahead/behind counts, parse
     -    the unknown options as commits and place them in a string_list.
     +    When a user wishes to input a large list of patterns to 'git
     +    for-each-ref' (likely a long list of exact refs) there are frequently
     +    system limits on the number of command-line arguments.
      
     -    Be sure to error out when the reference is not found.
     +    Add a new --stdin option to instead read the patterns from standard
     +    input. Add tests that check that any unrecognized arguments are
     +    considered an error when --stdin is provided. Also, an empty pattern
     +    list is interpreted as the complete ref set.
     +
     +    When reading from stdin, we populate the filter.name_patterns array
     +    dynamically as opposed to pointing to the 'argv' array directly. This
     +    requires a careful cast while freeing the individual strings,
     +    conditioned on the --stdin option.
      
     -    Co-authored-by: Taylor Blau <me@ttaylorr.com>
     -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
     - ## builtin/ahead-behind.c ##
     -@@
     - #include "builtin.h"
     - #include "parse-options.h"
     - #include "config.h"
     -+#include "commit.h"
     - 
     - static const char * const ahead_behind_usage[] = {
     - 	N_("git ahead-behind --base=<ref> [ --stdin | <revs> ]"),
     - 	NULL
     - };
     + ## Documentation/git-for-each-ref.txt ##
     +@@ Documentation/git-for-each-ref.txt: SYNOPSIS
     + --------
     + [verse]
     + 'git for-each-ref' [--count=<count>] [--shell|--perl|--python|--tcl]
     +-		   [(--sort=<key>)...] [--format=<format>] [<pattern>...]
     ++		   [(--sort=<key>)...] [--format=<format>]
     ++		   [ --stdin | <pattern>... ]
     + 		   [--points-at=<object>]
     + 		   [--merged[=<object>]] [--no-merged[=<object>]]
     + 		   [--contains[=<object>]] [--no-contains[=<object>]]
     +@@ Documentation/git-for-each-ref.txt: OPTIONS
     + 	literally, in the latter case matching completely or from the
     + 	beginning up to a slash.
       
     -+static int handle_arg(struct string_list *tips, const char *arg)
     -+{
     -+	struct string_list_item *item;
     -+	struct commit *c = lookup_commit_reference_by_name(arg);
     -+
     -+	if (!c)
     -+		return error(_("could not resolve '%s'"), arg);
     -+
     -+	item = string_list_append(tips, arg);
     -+	item->util = c;
     -+	return 0;
     -+}
     ++--stdin::
     ++	If `--stdin` is supplied, then the list of patterns is read from
     ++	standard input instead of from the argument list.
      +
     - int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
     - {
     - 	const char *base_ref = NULL;
     - 	int from_stdin = 0;
     -+	struct string_list tips = STRING_LIST_INIT_DUP;
     + --count=<count>::
     + 	By default the command shows all refs that match
     + 	`<pattern>`.  This option makes it stop after showing
     +
     + ## builtin/for-each-ref.c ##
     +@@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
     + 	struct ref_format format = REF_FORMAT_INIT;
     + 	struct strbuf output = STRBUF_INIT;
     + 	struct strbuf err = STRBUF_INIT;
     ++	int from_stdin = 0;
       
     - 	struct option ahead_behind_opts[] = {
     - 		OPT_STRING('b', "base", &base_ref, N_("base"), N_("base reference to process")),
     -@@ builtin/ahead-behind.c: int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
     + 	struct option opts[] = {
     + 		OPT_BIT('s', "shell", &format.quote_style,
     +@@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
     + 		OPT_CONTAINS(&filter.with_commit, N_("print only refs which contain the commit")),
     + 		OPT_NO_CONTAINS(&filter.no_commit, N_("print only refs which don't contain the commit")),
     + 		OPT_BOOL(0, "ignore-case", &icase, N_("sorting and filtering are case insensitive")),
     ++		OPT_BOOL(0, "stdin", &from_stdin, N_("read reference patterns from stdin")),
     + 		OPT_END(),
     + 	};
       
     - 	git_config(git_default_config, NULL);
     +@@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
     + 	ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
     + 	filter.ignore_case = icase;
       
     +-	filter.name_patterns = argv;
      +	if (from_stdin) {
      +		struct strbuf line = STRBUF_INIT;
     ++		size_t nr = 0, alloc = 16;
      +
     -+		while (strbuf_getline(&line, stdin) != EOF) {
     -+			if (!line.len)
     -+				break;
     ++		if (argv[0])
     ++			die(_("unknown arguments supplied with --stdin"));
      +
     -+			if (handle_arg(&tips, line.buf))
     -+				return 1;
     ++		CALLOC_ARRAY(filter.name_patterns, alloc);
     ++
     ++		while (strbuf_getline(&line, stdin) != EOF) {
     ++			ALLOC_GROW(filter.name_patterns, nr + 1, alloc);
     ++			filter.name_patterns[nr++] = strbuf_detach(&line, NULL);
      +		}
      +
     -+		strbuf_release(&line);
     ++		/* Add a terminating NULL string. */
     ++		ALLOC_GROW(filter.name_patterns, nr + 1, alloc);
     ++		filter.name_patterns[nr + 1] = NULL;
      +	} else {
     -+		int i;
     -+		for (i = 0; i < argc; ++i) {
     -+			if (handle_arg(&tips, argv[i]))
     -+				return 1;
     -+		}
     ++		filter.name_patterns = argv;
      +	}
      +
     -+	/* Early return for no tips. */
     -+	if (!tips.nr)
     -+		return 0;
     -+
     + 	filter.match_as_path = 1;
     + 	filter_refs(&array, &filter, FILTER_REFS_ALL);
     + 	ref_array_sort(sorting, &array);
     +@@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
     + 	free_commit_list(filter.with_commit);
     + 	free_commit_list(filter.no_commit);
     + 	ref_sorting_release(sorting);
     ++	if (from_stdin) {
     ++		for (size_t i = 0; filter.name_patterns[i]; i++)
     ++			free((char *)filter.name_patterns[i]);
     ++		free(filter.name_patterns);
     ++	}
       	return 0;
       }
      
     - ## t/t4218-ahead-behind.sh ##
     -@@ t/t4218-ahead-behind.sh: test_expect_success 'git ahead-behind without --base' '
     - 	grep "usage:" err
     - '
     + ## t/t6300-for-each-ref.sh ##
     +@@ t/t6300-for-each-ref.sh: sig_crlf="$(printf "%s" "$sig" | append_cr; echo dummy)"
     + sig_crlf=${sig_crlf%dummy}
     + test_atom refs/tags/fake-sig-crlf contents:signature "$sig_crlf"
       
     -+test_expect_success 'git ahead-behind with broken tip' '
     -+	test_must_fail git ahead-behind --base=HEAD bogus 2>err &&
     -+	grep "could not resolve '\''bogus'\''" err
     ++test_expect_success 'git for-each-ref --stdin: empty' '
     ++	>in &&
     ++	git for-each-ref --format="%(refname)" --stdin <in >actual &&
     ++	git for-each-ref --format="%(refname)" >expect &&
     ++	test_cmp expect actual
      +'
      +
     -+test_expect_success 'git ahead-behind without tips' '
     -+	git ahead-behind --base=HEAD 2>err &&
     -+	test_must_be_empty err
     ++test_expect_success 'git for-each-ref --stdin: fails if extra args' '
     ++	>in &&
     ++	test_must_fail git for-each-ref --format="%(refname)" \
     ++		--stdin refs/heads/extra <in 2>err &&
     ++	grep "unknown arguments supplied with --stdin" err
     ++'
     ++
     ++test_expect_success 'git for-each-ref --stdin: matches' '
     ++	cat >in <<-EOF &&
     ++	refs/tags/multi*
     ++	refs/heads/amb*
     ++	EOF
     ++
     ++	cat >expect <<-EOF &&
     ++	refs/heads/ambiguous
     ++	refs/tags/multi-ref1-100000-user1
     ++	refs/tags/multi-ref1-100000-user2
     ++	refs/tags/multi-ref1-200000-user1
     ++	refs/tags/multi-ref1-200000-user2
     ++	refs/tags/multi-ref2-100000-user1
     ++	refs/tags/multi-ref2-100000-user2
     ++	refs/tags/multi-ref2-200000-user1
     ++	refs/tags/multi-ref2-200000-user2
     ++	refs/tags/multiline
     ++	EOF
     ++
     ++	git for-each-ref --format="%(refname)" --stdin <in >actual &&
     ++	test_cmp expect actual
      +'
      +
       test_done
 3:  b1d022c7cac < -:  ----------- ahead-behind: implement --ignore-missing option
 -:  ----------- > 2:  2f162a2f39f for-each-ref: explicitly test no matches
 4:  853891c0b14 = 3:  db28e82d2a6 commit-graph: combine generation computations
 5:  c6e6581e0ea ! 4:  3cf33801443 commit-graph: return generation from memory
     @@ Commit message
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
       ## commit-graph.c ##
     -@@ commit-graph.c: uint32_t commit_graph_position(const struct commit *c)
     - 	return data ? data->graph_pos : COMMIT_NOT_FROM_GRAPH;
     - }
     - 
     -+
     - timestamp_t commit_graph_generation(const struct commit *c)
     - {
     +@@ commit-graph.c: timestamp_t commit_graph_generation(const struct commit *c)
       	struct commit_graph_data *data =
       		commit_graph_data_slab_peek(&commit_graph_data_slab, c);
       
 6:  f31994ac78a = 5:  34dffd836b1 commit-graph: introduce `ensure_generations_valid()`
 7:  b8c55ecf88d ! 6:  9831c23eadb ahead-behind: implement ahead_behind() logic
     @@ Metadata
      Author: Derrick Stolee <derrickstolee@github.com>
      
       ## Commit message ##
     -    ahead-behind: implement ahead_behind() logic
     +    commit-reach: implement ahead_behind() logic
      
     -    Fully implement the commit-counting logic behind the ahead-behind
     -    builtin as a new ahead_behind() method in commit-reach.h. Add tests
     -    for the functionality in both t4218-ahead-behind.sh and
     -    t6600-test-reach.sh. The tests in t4218 are rather simple, but cover a
     -    simple diamond commit history completely while the tests in t6600 make
     -    use of the more complicated commit history and the test setup to check
     -    three repository states: no commit-graph, a complete commit-graph, and a
     -    half-filled commit-graph. These extra states are particularly helpful to
     -    check due to the implementation of ahead_behind() and how it relies upon
     -    ensure_generations_valid().
     +    Fully implement the commit-counting logic required to determine
     +    ahead/behind counts for a batch of commit pairs. This is a new library
     +    method within commit-reach.h. This method will be linked to the
     +    for-each-ref builtin in the next change.
      
          The interface for ahead_behind() uses two arrays. The first array of
          commits contains the list of all starting points for the walk. This
     @@ Commit message
          new ahead_behind_count struct, indicates which commits from that initial
          array form the base/tip pair for the ahead/behind count it will store.
      
     -    While the ahead-behind builtin currently only supports one base, this
     -    implementation of ahead_behind() allows multiple bases, if desired. Even
     -    with multiple bases, there is only one commit walk used for counting the
     -    ahead/behind values, saving time when the base/tip ranges overlap
     -    significantly.
     +    This implementation of ahead_behind() allows multiple bases, if desired.
     +    Even with multiple bases, there is only one commit walk used for
     +    counting the ahead/behind values, saving time when the base/tip ranges
     +    overlap significantly.
      
          This interface for ahead_behind() also makes it very easy to call
          ensure_generations_valid() on the entire array of bases and tips. This
     @@ Commit message
          commits, in case ahead_behind() is called multiple times in the same
          process.
      
     -    There is no previous implementation of ahead-behind to compare against.
     -    A previous implementation in another fork of Git used a single process
     -    to essentially do the same walk as 'git rev-list --count <base>..<tip>'
     -    for every base/tip pair given as input. The single-walk implementation
     -    in this change was a significant improvement over that implementation.
     -    Another version from that fork used reachability bitmaps for the
     -    comparison, but that implementation was slower than the current commit
     -    walk implementation in almost all cases.
     -
     -    To best present _some_ amount of evidence for this performance gain,
     -    create a new performance test, p1500-graph-walks.sh. This script could
     -    be used for other walks than just ahead-behind in the future, but let's
     -    limit to ahead-behind now.
     -
     -    To gain some amount of a baseline, create one test that checks 'git
     -    ahead-behind' against up to 50 tips and another that uses 'git rev-list
     -    --count' in a loop. Be sure to write a commit-graph before running the
     -    performance tests.
     -
     -    Using the Git source code as the repository, we see a pronounced
     -    improvement:
     -
     -    Test                                            this tree
     -    ---------------------------------------------------------------
     -    1500.2: ahead-behind counts: git ahead-behind   0.08(0.07+0.01)
     -    1500.3: ahead-behind counts: git rev-list       1.11(0.92+0.18)
     -
     -    But the de-facto performance benchmark is the Linux kernel repository,
     -    which presents these values for my copy:
     -
     -    Test                                            this tree
     -    ---------------------------------------------------------------
     -    1500.2: ahead-behind counts: git ahead-behind   0.27(0.25+0.02)
     -    1500.3: ahead-behind counts: git rev-list       4.53(3.92+0.60)
     -
     +    Co-authored-by: Taylor Blau <me@ttaylorr.com>
     +    Signed-off-by: Taylor Blau <me@ttaylorr.com>
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
     - ## builtin/ahead-behind.c ##
     -@@
     - #include "parse-options.h"
     - #include "config.h"
     - #include "commit.h"
     -+#include "commit-reach.h"
     - 
     - static const char * const ahead_behind_usage[] = {
     - 	N_("git ahead-behind --base=<ref> [ --stdin | <revs> ]"),
     -@@ builtin/ahead-behind.c: static int handle_arg(struct string_list *tips, const char *arg)
     - int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
     - {
     - 	const char *base_ref = NULL;
     -+	struct commit *base;
     - 	int from_stdin = 0;
     - 	struct string_list tips = STRING_LIST_INIT_DUP;
     -+	struct commit **commits;
     -+	struct ahead_behind_count *counts;
     -+	size_t i;
     - 
     - 	struct option ahead_behind_opts[] = {
     - 		OPT_STRING('b', "base", &base_ref, N_("base"), N_("base reference to process")),
     -@@ builtin/ahead-behind.c: int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
     - 	if (!tips.nr)
     - 		return 0;
     - 
     -+	ALLOC_ARRAY(commits, tips.nr + 1);
     -+	ALLOC_ARRAY(counts, tips.nr);
     -+
     -+	for (i = 0; i < tips.nr; i++) {
     -+		commits[i] = tips.items[i].util;
     -+		counts[i].tip_index = i;
     -+		counts[i].base_index = tips.nr;
     -+	}
     -+	commits[tips.nr] = base;
     -+
     -+	ahead_behind(commits, tips.nr + 1, counts, tips.nr);
     -+
     -+	for (i = 0; i < tips.nr; i++)
     -+		printf("%s %d %d\n", tips.items[i].string,
     -+		       counts[i].ahead, counts[i].behind);
     -+
     -+	free(counts);
     -+	free(commits);
     - 	return 0;
     - }
     -
       ## commit-reach.c ##
      @@
       #include "revision.h"
     @@ commit-reach.c: struct commit_list *get_reachable_subset(struct commit **from, i
      +		struct bitmap *bitmap_c = init_bit_array(c, width);
      +
      +		for (i = 0; i < counts_nr; i++) {
     -+			int reach_from_tip = bitmap_get(bitmap_c, counts[i].tip_index);
     -+			int reach_from_base = bitmap_get(bitmap_c, counts[i].base_index);
     ++			int reach_from_tip = !!bitmap_get(bitmap_c, counts[i].tip_index);
     ++			int reach_from_base = !!bitmap_get(bitmap_c, counts[i].base_index);
      +
      +			if (reach_from_tip ^ reach_from_base) {
      +				if (reach_from_base)
     @@ commit-reach.c: struct commit_list *get_reachable_subset(struct commit **from, i
      +		free_bit_array(c);
      +	}
      +
     ++	/* STALE is used here, PARENT2 is used by insert_no_dup(). */
      +	repo_clear_commit_marks(the_repository, PARENT2 | STALE);
      +	clear_bit_arrays(&bit_arrays);
      +	clear_prio_queue(&queue);
     @@ commit-reach.h: struct commit_list *get_reachable_subset(struct commit **from, i
      +	 * 'behind' stores the number of commits reachable from the base
      +	 * and not reachable from the tip.
      +	 */
     -+	int ahead;
     -+	int behind;
     ++	unsigned int ahead;
     ++	unsigned int behind;
      +};
      +
     -+/**
     ++/*
      + * Given an array of commits and an array of ahead_behind_count pairs,
      + * compute the ahead/behind counts for each pair.
      + */
     @@ commit-reach.h: struct commit_list *get_reachable_subset(struct commit **from, i
      +		  struct ahead_behind_count *counts, size_t counts_nr);
      +
       #endif
     -
     - ## t/perf/p1500-graph-walks.sh (new) ##
     -@@
     -+#!/bin/sh
     -+
     -+test_description='Commit walk performance tests'
     -+. ./perf-lib.sh
     -+
     -+test_perf_large_repo
     -+
     -+test_expect_success 'setup' '
     -+	git for-each-ref --format="%(refname)" "refs/heads/*" "refs/tags/*" >allrefs &&
     -+	sort -r allrefs | head -n 50 >refs &&
     -+	git commit-graph write --reachable
     -+'
     -+
     -+test_perf 'ahead-behind counts: git ahead-behind' '
     -+	git ahead-behind --base=HEAD --stdin <refs
     -+'
     -+
     -+test_perf 'ahead-behind counts: git rev-list' '
     -+	for r in $(cat refs)
     -+	do
     -+		git rev-list --count "HEAD..$r" || return 1
     -+	done
     -+'
     -+
     -+test_done
     -
     - ## t/t4218-ahead-behind.sh ##
     -@@ t/t4218-ahead-behind.sh: test_description='git ahead-behind command-line options'
     - 
     - . ./test-lib.sh
     - 
     -+test_expect_success 'setup simple history' '
     -+	test_commit base &&
     -+	git checkout -b right &&
     -+	test_commit right &&
     -+	git checkout -b left base &&
     -+	test_commit left &&
     -+	git checkout -b merge &&
     -+	git merge right -m "merge"
     -+'
     -+
     - test_expect_success 'git ahead-behind -h' '
     - 	test_must_fail git ahead-behind -h >out &&
     - 	grep "usage:" out
     -@@ t/t4218-ahead-behind.sh: test_expect_success 'git ahead-behind without --base' '
     - 	grep "usage:" err
     - '
     - 
     -+test_expect_success 'git ahead-behind with broken --base' '
     -+	test_must_fail git ahead-behind --base=bogus HEAD 2>err &&
     -+	grep "could not resolve '\''bogus'\''" err
     -+'
     -+
     - test_expect_success 'git ahead-behind with broken tip' '
     - 	test_must_fail git ahead-behind --base=HEAD bogus 2>err &&
     - 	grep "could not resolve '\''bogus'\''" err
     -@@ t/t4218-ahead-behind.sh: test_expect_success 'git ahead-behind without tips' '
     - 	test_must_be_empty err
     - '
     - 
     -+test_expect_success 'git ahead-behind --base=base' '
     -+	git ahead-behind --base=base base left right merge >actual &&
     -+
     -+	cat >expect <<-EOF &&
     -+	base 0 0
     -+	left 1 0
     -+	right 1 0
     -+	merge 3 0
     -+	EOF
     -+
     -+	test_cmp expect actual
     -+'
     -+
     -+test_expect_success 'git ahead-behind --base=left' '
     -+	git ahead-behind --base=left base left right merge >actual &&
     -+
     -+	cat >expect <<-EOF &&
     -+	base 0 1
     -+	left 0 0
     -+	right 1 1
     -+	merge 2 0
     -+	EOF
     -+
     -+	test_cmp expect actual
     -+'
     -+
     -+test_expect_success 'git ahead-behind --base=right' '
     -+	git ahead-behind --base=right base left right merge >actual &&
     -+
     -+	cat >expect <<-EOF &&
     -+	base 0 1
     -+	left 1 1
     -+	right 0 0
     -+	merge 2 0
     -+	EOF
     -+
     -+	test_cmp expect actual
     -+'
     -+
     -+test_expect_success 'git ahead-behind --base=merge' '
     -+	git ahead-behind --base=merge base left right merge >actual &&
     -+
     -+	cat >expect <<-EOF &&
     -+	base 0 3
     -+	left 0 2
     -+	right 0 2
     -+	merge 0 0
     -+	EOF
     -+
     -+	test_cmp expect actual
     -+'
     -+
     - test_done
     -
     - ## t/t6600-test-reach.sh ##
     -@@ t/t6600-test-reach.sh: test_expect_success 'get_reachable_subset:none' '
     - 	test_all_modes get_reachable_subset
     - '
     - 
     -+test_expect_success 'ahead-behind:linear' '
     -+	cat >input <<-\EOF &&
     -+	commit-1-1
     -+	commit-1-3
     -+	commit-1-5
     -+	commit-1-8
     -+	EOF
     -+	cat >expect <<-\EOF &&
     -+	commit-1-1 0 8
     -+	commit-1-3 0 6
     -+	commit-1-5 0 4
     -+	commit-1-8 0 1
     -+	EOF
     -+	run_all_modes git ahead-behind --base=commit-1-9 --stdin
     -+'
     -+
     -+test_expect_success 'ahead-behind:all' '
     -+	cat >input <<-\EOF &&
     -+	commit-1-1
     -+	commit-2-4
     -+	commit-4-2
     -+	commit-4-4
     -+	EOF
     -+	cat >expect <<-\EOF &&
     -+	commit-1-1 0 24
     -+	commit-2-4 0 17
     -+	commit-4-2 0 17
     -+	commit-4-4 0 9
     -+	EOF
     -+	run_all_modes git ahead-behind --base=commit-5-5 --stdin
     -+'
     -+
     -+test_expect_success 'ahead-behind:some' '
     -+	cat >input <<-\EOF &&
     -+	commit-1-1
     -+	commit-5-3
     -+	commit-4-8
     -+	commit-9-9
     -+	EOF
     -+	cat >expect <<-\EOF &&
     -+	commit-1-1 0 53
     -+	commit-5-3 0 39
     -+	commit-4-8 8 30
     -+	commit-9-9 27 0
     -+	EOF
     -+	run_all_modes git ahead-behind --base=commit-9-6 --stdin
     -+'
     -+
     -+test_expect_success 'ahead-behind:none' '
     -+	cat >input <<-\EOF &&
     -+	commit-7-5
     -+	commit-4-8
     -+	commit-9-9
     -+	EOF
     -+	cat >expect <<-\EOF &&
     -+	commit-7-5 7 4
     -+	commit-4-8 16 16
     -+	commit-9-9 49 0
     -+	EOF
     -+	run_all_modes git ahead-behind --base=commit-8-4 --stdin
     -+'
     -+
     - test_done
 -:  ----------- > 7:  82dd6f44a33 for-each-ref: add ahead-behind format atom
 8:  07eb2cbb699 ! 8:  f3fb6833bd7 ahead-behind: add --contains mode
     @@ Metadata
      Author: Derrick Stolee <derrickstolee@github.com>
      
       ## Commit message ##
     -    ahead-behind: add --contains mode
     +    commit-reach: add tips_reachable_from_bases()
      
     -    The 'git ahead-behind' builtin can answer a list of questions that do
     -    not require the full ahead/behind counts. For example, instead of using
     -    'git branch --contains=<ref>' to get the full list of branches that
     -    contain the commit at <ref>, a list of tips could be passed to 'git
     -    ahead-behind --base=<ref>' and the rows that report a "behind" value of
     -    zero show which tips can reach <ref>.
     +    Both 'git for-each-ref --merged=<X>' and 'git branch --merged=<X>' use
     +    the ref-filter machinery to select references or branches (respectively)
     +    that are reachable from a set of commits presented by one or more
     +    --merged arguments. This happens within reach_filter(), which uses the
     +    revision-walk machinery to walk history in a standard way.
      
     -    By contract, the rows that report an "ahead" value of zero show which
     -    tips are reachable from the given base. This type of query does not have
     -    an existing equivalent for batching this request. While extracting the
     -    information from 'git ahead-behind' is not terribly difficult, it does
     -    more work than required to answer this query: it _counts_.
     +    However, the commit-reach.c file is full of custom searches that are
     +    more efficient, especially for reachability queries that can terminate
     +    early when reachability is discovered. Add a new
     +    tips_reachable_from_bases() method to commit-reach.c and call it from
     +    within reach_filter() in ref-filter.c. This affects both 'git branch'
     +    and 'git for-each-ref' as tested in p1500-graph-walks.sh.
      
     -    Add a new '--contains' mode to 'git ahead-behind' that removes the
     -    counting behavior and focuses instead on the reachability concern. The
     -    output of the builtin changes in this mode: instead of reporting "<tip>
     -    <ahead> <behind>" for every input tip, it will instead report "<tip>"
     -    for the input tips that are reachable from the specified base.
     +    For the Linux kernel repository, we take an already-fast algorithm and
     +    make it even faster:
     +
     +    Test                                            HEAD~1  HEAD
     +    -------------------------------------------------------------------
     +    1500.5: contains: git for-each-ref --merged     0.13    0.02 -84.6%
     +    1500.6: contains: git branch --merged           0.14    0.02 -85.7%
     +    1500.7: contains: git tag --merged              0.15    0.03 -80.0%
     +
     +    (Note that we remove the iterative 'git rev-list' test from p1500
     +    because it no longer makes sense as a comparison to 'git for-each-ref'
     +    and would just waste time running it for these comparisons.)
      
          The algorithm is implemented in commit-reach.c in the method
          tips_reachable_from_base(). This method takes a string_list of tips and
     @@ Commit message
          from the start, but we will walk only the necessary portion of the
          depth-first search for the reachable commits of lower generation.
      
     -    We can test this completely in the simple repo example in t4218 and more
     -    substantially in the larger repository example in t6600. We can also add
     -    a performance test to demonstrate the speedup relative to the 'git
     -    ahead-behind' builtin without the '--contains' option.
     -
     -    For the Git source code repository, I was able to measure a speedup,
     -    even though both are quite fast.
     -
     -    Test                                                       this tree
     -    --------------------------------------------------------------------------
     -    1500.2: ahead-behind counts: git ahead-behind              0.06(0.06+0.00)
     -    1500.3: ahead-behind counts: git rev-list                  1.08(0.90+0.18)
     -    1500.4: ahead-behind counts: git ahead-behind --contains   0.02(0.02+0.00)
     -
     -    In the Linux kernel repository, the impact is more pronounced:
     -
     -    Test                                                       this tree
     -    --------------------------------------------------------------------------
     -    1500.2: ahead-behind counts: git ahead-behind              0.26(0.25+0.01)
     -    1500.3: ahead-behind counts: git rev-list                  4.58(3.92+0.66)
     -    1500.4: ahead-behind counts: git ahead-behind --contains   0.02(0.00+0.02)
     +    Add extra tests for this behavior in t6600-test-reach.sh as the
     +    interesting data shape of that repository can sometimes demonstrate
     +    corner case bugs.
      
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
     - ## Documentation/git-ahead-behind.txt ##
     -@@ Documentation/git-ahead-behind.txt: git-ahead-behind - Count the commits on each side of a revision range
     - SYNOPSIS
     - --------
     - [verse]
     --'git ahead-behind' --base=<ref> [ --stdin | <revs> ]
     -+'git ahead-behind' --base=<ref> [ --contains ] [ --stdin | <revs> ]
     - 
     - DESCRIPTION
     - -----------
     -@@ Documentation/git-ahead-behind.txt: reported to stdout one line at a time as follows:
     - There will be exactly one line per input revision, but the lines may be
     - in an arbitrary order.
     - 
     -+If the `--contains` option is provided, then the output will list the
     -+`<tip>` refs are reachable from the provided `<base>`, one per line.
     -+
     - 
     - OPTIONS
     - -------
     -@@ Documentation/git-ahead-behind.txt: OPTIONS
     - 	Read revision tips and ranges from stdin instead of from the
     - 	command-line.
     - 
     -+--contains::
     -+	Specify that instead of counting the ahead/behind values, only
     -+	indicate whether each tip reference is reachable from the base. In
     -+	this mode, the output format changes to include only the name of
     -+	each tip by name, one per line, and only the tips reachable from
     -+	the base are included in the output.
     -+
     - --ignore-missing::
     - 	When parsing tip references, ignore any references that are not
     - 	found. This is useful when operating in an environment where a
     -
     - ## builtin/ahead-behind.c ##
     -@@
     - #include "commit-reach.h"
     - 
     - static const char * const ahead_behind_usage[] = {
     --	N_("git ahead-behind --base=<ref> [ --stdin | <revs> ]"),
     -+	N_("git ahead-behind --base=<ref> [ --contains ] [ --stdin | <revs> ]"),
     - 	NULL
     - };
     - 
     -@@ builtin/ahead-behind.c: int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
     - {
     - 	const char *base_ref = NULL;
     - 	struct commit *base;
     --	int from_stdin = 0;
     -+	int from_stdin = 0, contains = 0;
     - 	struct string_list tips = STRING_LIST_INIT_DUP;
     - 	struct commit **commits;
     - 	struct ahead_behind_count *counts;
     -@@ builtin/ahead-behind.c: int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
     - 		OPT_STRING('b', "base", &base_ref, N_("base"), N_("base reference to process")),
     - 		OPT_BOOL(0 , "stdin", &from_stdin, N_("read rev names from stdin")),
     - 		OPT_BOOL(0 , "ignore-missing", &ignore_missing, N_("ignore missing tip references")),
     -+		OPT_BOOL(0 , "contains", &contains, N_("only check that tips are reachable from the base")),
     - 		OPT_END()
     - 	};
     - 
     -@@ builtin/ahead-behind.c: int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
     - 
     - 	git_config(git_default_config, NULL);
     - 
     -+	base = lookup_commit_reference_by_name(base_ref);
     -+	if (!base)
     -+		die(_("could not resolve '%s'"), base_ref);
     -+
     - 	if (from_stdin) {
     - 		struct strbuf line = STRBUF_INIT;
     - 
     -@@ builtin/ahead-behind.c: int cmd_ahead_behind(int argc, const char **argv, const char *prefix)
     - 	if (!tips.nr)
     - 		return 0;
     - 
     -+	if (contains) {
     -+		struct string_list_item *item;
     -+
     -+		/* clear out */
     -+		for_each_string_list_item(item, &tips)
     -+			item->util = NULL;
     -+
     -+		tips_reachable_from_base(base, &tips);
     -+
     -+		for_each_string_list_item(item, &tips) {
     -+			if (item->util)
     -+				printf("%s\n", item->string);
     -+		}
     -+
     -+		return 0;
     -+	}
     -+	/* else: not --contains, but normal ahead-behind counting. */
     -+
     - 	ALLOC_ARRAY(commits, tips.nr + 1);
     - 	ALLOC_ARRAY(counts, tips.nr);
     - 
     -
       ## commit-reach.c ##
      @@ commit-reach.c: void ahead_behind(struct commit **commits, size_t commits_nr,
       	clear_bit_arrays(&bit_arrays);
     @@ commit-reach.c: void ahead_behind(struct commit **commits, size_t commits_nr,
      +	return 0;
      +}
      +
     -+void tips_reachable_from_base(struct commit *base,
     -+			      struct string_list *tips)
     ++void tips_reachable_from_bases(struct commit_list *bases,
     ++			       struct commit **tips, size_t tips_nr,
     ++			       int mark)
      +{
     -+	unsigned int i;
     ++	size_t i;
      +	struct commit_and_index *commits;
      +	unsigned int min_generation_index = 0;
      +	timestamp_t min_generation;
      +	struct commit_list *stack = NULL;
      +
     -+	if (!base || !tips || !tips->nr)
     ++	if (!bases || !tips || !tips_nr)
      +		return;
      +
      +	/*
     -+	 * Do a depth-first search starting at 'base' to search for the
     ++	 * Do a depth-first search starting at 'bases' to search for the
      +	 * tips. Stop at the lowest (un-found) generation number. When
      +	 * finding the lowest commit, increase the minimum generation
      +	 * number to the next lowest (un-found) generation number.
      +	 */
      +
     -+	CALLOC_ARRAY(commits, tips->nr);
     ++	CALLOC_ARRAY(commits, tips_nr);
      +
     -+	for (i = 0; i < tips->nr; i++) {
     -+		commits[i].commit = lookup_commit_reference_by_name(tips->items[i].string);
     ++	for (i = 0; i < tips_nr; i++) {
     ++		commits[i].commit = tips[i];
      +		commits[i].index = i;
     -+		commits[i].generation = commit_graph_generation(commits[i].commit);
     ++		commits[i].generation = commit_graph_generation(tips[i]);
      +	}
      +
      +	/* Sort with generation number ascending. */
     -+	QSORT(commits, tips->nr, compare_commit_and_index_by_generation);
     ++	QSORT(commits, tips_nr, compare_commit_and_index_by_generation);
      +	min_generation = commits[0].generation;
      +
     -+	parse_commit(base);
     -+	commit_list_insert(base, &stack);
     ++	while (bases) {
     ++		parse_commit(bases->item);
     ++		commit_list_insert(bases->item, &stack);
     ++		bases = bases->next;
     ++	}
      +
      +	while (stack) {
      +		unsigned int j;
     @@ commit-reach.c: void ahead_behind(struct commit **commits, size_t commits_nr,
      +		struct commit *c = stack->item;
      +		timestamp_t c_gen = commit_graph_generation(c);
      +
     -+		/* Does it match any of our bases? */
     -+		for (j = min_generation_index; j < tips->nr; j++) {
     ++		/* Does it match any of our tips? */
     ++		for (j = min_generation_index; j < tips_nr; j++) {
      +			if (c_gen < commits[j].generation)
      +				break;
      +
      +			if (commits[j].commit == c) {
     -+				tips->items[commits[j].index].util = (void *)(uintptr_t)1;
     ++				tips[commits[j].index]->object.flags |= mark;
      +
      +				if (j == min_generation_index) {
      +					unsigned int k = j + 1;
     -+					while (k < tips->nr &&
     -+					       tips->items[commits[k].index].util)
     ++					while (k < tips_nr &&
     ++					       (tips[commits[k].index]->object.flags & mark))
      +						k++;
      +
      +					/* Terminate early if all found. */
     -+					if (k >= tips->nr)
     ++					if (k >= tips_nr)
      +						goto done;
      +
      +					min_generation_index = k;
     @@ commit-reach.h: struct ahead_behind_count {
       		  struct ahead_behind_count *counts, size_t counts_nr);
       
      +/*
     -+ * Populate the "util" of each string_list item with the boolean value
     -+ * corresponding to "can 'base' reach this tip?"
     ++ * For all tip commits, add 'mark' to their flags if and only if they
     ++ * are reachable from one of the commits in 'bases'.
      + */
     -+void tips_reachable_from_base(struct commit *base,
     -+			      struct string_list *tips);
     ++void tips_reachable_from_bases(struct commit_list *bases,
     ++			       struct commit **tips, size_t tips_nr,
     ++			       int mark);
      +
       #endif
      
     - ## t/perf/p1500-graph-walks.sh ##
     -@@ t/perf/p1500-graph-walks.sh: test_perf 'ahead-behind counts: git rev-list' '
     - 	done
     - '
     + ## ref-filter.c ##
     +@@ ref-filter.c: static void reach_filter(struct ref_array *array,
     + 			 struct commit_list *check_reachable,
     + 			 int include_reached)
     + {
     +-	struct rev_info revs;
     + 	int i, old_nr;
     + 	struct commit **to_clear;
     +-	struct commit_list *cr;
     + 
     + 	if (!check_reachable)
     + 		return;
     + 
     + 	CALLOC_ARRAY(to_clear, array->nr);
     +-
     +-	repo_init_revisions(the_repository, &revs, NULL);
     +-
     + 	for (i = 0; i < array->nr; i++) {
     + 		struct ref_array_item *item = array->items[i];
     +-		add_pending_object(&revs, &item->commit->object, item->refname);
     + 		to_clear[i] = item->commit;
     + 	}
     + 
     +-	for (cr = check_reachable; cr; cr = cr->next) {
     +-		struct commit *merge_commit = cr->item;
     +-		merge_commit->object.flags |= UNINTERESTING;
     +-		add_pending_object(&revs, &merge_commit->object, "");
     +-	}
     +-
     +-	revs.limited = 1;
     +-	if (prepare_revision_walk(&revs))
     +-		die(_("revision walk setup failed"));
     ++	tips_reachable_from_bases(check_reachable,
     ++				  to_clear, array->nr,
     ++				  UNINTERESTING);
     + 
     + 	old_nr = array->nr;
     + 	array->nr = 0;
     +@@ ref-filter.c: static void reach_filter(struct ref_array *array,
     + 		clear_commit_marks(merge_commit, ALL_REV_FLAGS);
     + 	}
     + 
     +-	release_revisions(&revs);
     + 	free(to_clear);
     + }
       
     -+test_perf 'ahead-behind counts: git ahead-behind --contains' '
     -+	git ahead-behind --contains --base=HEAD --stdin <refs
     -+'
     -+
     - test_done
      
     - ## t/t4218-ahead-behind.sh ##
     -@@ t/t4218-ahead-behind.sh: test_expect_success 'git ahead-behind with broken tip and --ignore-missing' '
     - 	test_must_be_empty out
     + ## t/perf/p1500-graph-walks.sh ##
     +@@ t/perf/p1500-graph-walks.sh: test_perf 'ahead-behind counts: git tag' '
     + 	xargs git tag -l --format="%(ahead-behind:HEAD)" <tags
       '
       
     -+test_expect_success 'git ahead-behind --contains with broken tip' '
     -+	test_must_fail git ahead-behind --contains \
     -+		--base=HEAD bogus 2>err &&
     -+	grep "could not resolve '\''bogus'\''" err
     +-test_perf 'ahead-behind counts: git rev-list' '
     +-	for r in $(cat refs)
     +-	do
     +-		git rev-list --count "HEAD..$r" || return 1
     +-	done
     ++test_perf 'contains: git for-each-ref --merged' '
     ++	git for-each-ref --merged=HEAD --stdin <refs
      +'
      +
     -+test_expect_success 'git ahead-behind --contains with broken tip and --ignore-missing' '
     -+	git ahead-behind --base=HEAD --contains \
     -+		--ignore-missing bogus 2>err >out &&
     -+	test_must_be_empty err &&
     -+	test_must_be_empty out
     ++test_perf 'contains: git branch --merged' '
     ++	xargs git branch --merged=HEAD <branches
      +'
      +
     - test_expect_success 'git ahead-behind without tips' '
     - 	git ahead-behind --base=HEAD 2>err &&
     - 	test_must_be_empty err
     -@@ t/t4218-ahead-behind.sh: test_expect_success 'git ahead-behind --base=merge' '
     - 	test_cmp expect actual
     ++test_perf 'contains: git tag --merged' '
     ++	xargs git tag --merged=HEAD <tags
       '
       
     -+test_expect_success 'git ahead-behind --contains --base=base' '
     -+	git ahead-behind --contains --base=base \
     -+		base left right merge >actual &&
     -+
     -+	cat >expect <<-EOF &&
     -+	base
     -+	EOF
     -+
     -+	test_cmp expect actual
     -+'
     -+
     -+test_expect_success 'git ahead-behind --contains --base=left' '
     -+	git ahead-behind --contains --base=left \
     -+		base left right merge >actual &&
     -+
     -+	cat >expect <<-EOF &&
     -+	base
     -+	left
     -+	EOF
     -+
     -+	test_cmp expect actual
     -+'
     -+
     -+test_expect_success 'git ahead-behind --contains --base=right' '
     -+	git ahead-behind --contains --base=right \
     -+		base left right merge >actual &&
     -+
     -+	cat >expect <<-EOF &&
     -+	base
     -+	right
     -+	EOF
     -+
     -+	test_cmp expect actual
     -+'
     -+
     -+test_expect_success 'git ahead-behind --contains --base=merge' '
     -+	git ahead-behind --contains --base=merge \
     -+		base left right merge >actual &&
     -+
     -+	cat >expect <<-EOF &&
     -+	base
     -+	left
     -+	right
     -+	merge
     -+	EOF
     -+
     -+	test_cmp expect actual
     -+'
     -+
       test_done
      
       ## t/t6600-test-reach.sh ##
     -@@ t/t6600-test-reach.sh: test_expect_success 'ahead-behind:none' '
     - 	run_all_modes git ahead-behind --base=commit-8-4 --stdin
     +@@ t/t6600-test-reach.sh: test_expect_success 'for-each-ref ahead-behind:none' '
     + 		--format="%(refname) %(ahead-behind:commit-8-4)" --stdin
       '
       
     -+test_expect_success 'ahead-behind--contains:all' '
     ++test_expect_success 'for-each-ref merged:linear' '
     ++	cat >input <<-\EOF &&
     ++	refs/heads/commit-1-1
     ++	refs/heads/commit-1-3
     ++	refs/heads/commit-1-5
     ++	refs/heads/commit-1-8
     ++	refs/heads/commit-2-1
     ++	refs/heads/commit-5-1
     ++	refs/heads/commit-9-1
     ++	EOF
     ++	cat >expect <<-\EOF &&
     ++	refs/heads/commit-1-1
     ++	refs/heads/commit-1-3
     ++	refs/heads/commit-1-5
     ++	refs/heads/commit-1-8
     ++	EOF
     ++	run_all_modes git for-each-ref --merged=commit-1-9 \
     ++		--format="%(refname)" --stdin
     ++'
     ++
     ++test_expect_success 'for-each-ref merged:all' '
      +	cat >input <<-\EOF &&
     -+	commit-1-1
     -+	commit-2-4
     -+	commit-4-2
     -+	commit-4-4
     ++	refs/heads/commit-1-1
     ++	refs/heads/commit-2-4
     ++	refs/heads/commit-4-2
     ++	refs/heads/commit-4-4
      +	EOF
      +	cat >expect <<-\EOF &&
     -+	commit-1-1
     -+	commit-2-4
     -+	commit-4-2
     -+	commit-4-4
     ++	refs/heads/commit-1-1
     ++	refs/heads/commit-2-4
     ++	refs/heads/commit-4-2
     ++	refs/heads/commit-4-4
      +	EOF
     -+	run_all_modes git ahead-behind --contains --base=commit-5-5 \
     -+		--stdin --use-bitmap-index
     ++	run_all_modes git for-each-ref --merged=commit-5-5 \
     ++		--format="%(refname)" --stdin
      +'
      +
     -+test_expect_success 'ahead-behind--contains:some' '
     ++test_expect_success 'for-each-ref ahead-behind:some' '
      +	cat >input <<-\EOF &&
     -+	commit-1-1
     -+	commit-5-3
     -+	commit-4-8
     -+	commit-9-9
     ++	refs/heads/commit-1-1
     ++	refs/heads/commit-5-3
     ++	refs/heads/commit-4-8
     ++	refs/heads/commit-9-9
      +	EOF
      +	cat >expect <<-\EOF &&
     -+	commit-1-1
     -+	commit-5-3
     ++	refs/heads/commit-1-1
     ++	refs/heads/commit-5-3
      +	EOF
     -+	run_all_modes git ahead-behind --contains --base=commit-9-6 \
     -+		--stdin --use-bitmap-index
     ++	run_all_modes git for-each-ref --merged=commit-9-6 \
     ++		--format="%(refname)" --stdin
      +'
      +
     -+test_expect_success 'ahead-behind--contains:some, reordered' '
     ++test_expect_success 'for-each-ref merged:some, multibase' '
      +	cat >input <<-\EOF &&
     -+	commit-4-8
     -+	commit-5-3
     -+	commit-9-9
     -+	commit-1-1
     ++	refs/heads/commit-1-1
     ++	refs/heads/commit-5-3
     ++	refs/heads/commit-7-8
     ++	refs/heads/commit-4-8
     ++	refs/heads/commit-9-9
      +	EOF
      +	cat >expect <<-\EOF &&
     -+	commit-5-3
     -+	commit-1-1
     ++	refs/heads/commit-1-1
     ++	refs/heads/commit-4-8
     ++	refs/heads/commit-5-3
      +	EOF
     -+	run_all_modes git ahead-behind --contains --base=commit-9-6 \
     -+		--stdin --use-bitmap-index
     ++	run_all_modes git for-each-ref \
     ++		--merged=commit-5-8 \
     ++		--merged=commit-8-5 \
     ++		--format="%(refname)" \
     ++		--stdin
      +'
      +
     -+test_expect_success 'ahead-behind--contains:none' '
     ++test_expect_success 'for-each-ref merged:none' '
      +	cat >input <<-\EOF &&
     -+	commit-7-5
     -+	commit-4-8
     -+	commit-9-9
     ++	refs/heads/commit-7-5
     ++	refs/heads/commit-4-8
     ++	refs/heads/commit-9-9
      +	EOF
      +	>expect &&
     -+	run_all_modes git ahead-behind --contains --base=commit-8-4 \
     -+		--stdin --use-bitmap-index
     ++	run_all_modes git for-each-ref --merged=commit-8-4 \
     ++		--format="%(refname)" --stdin
      +'
      +
       test_done

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v2 1/8] for-each-ref: add --stdin option
  2023-03-10 17:20 ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
@ 2023-03-10 17:20   ` Derrick Stolee via GitGitGadget
  2023-03-10 18:08     ` Junio C Hamano
                       ` (3 more replies)
  2023-03-10 17:20   ` [PATCH v2 2/8] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
                     ` (9 subsequent siblings)
  10 siblings, 4 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-10 17:20 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, Jeff King, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When a user wishes to input a large list of patterns to 'git
for-each-ref' (likely a long list of exact refs) there are frequently
system limits on the number of command-line arguments.

Add a new --stdin option to instead read the patterns from standard
input. Add tests that check that any unrecognized arguments are
considered an error when --stdin is provided. Also, an empty pattern
list is interpreted as the complete ref set.

When reading from stdin, we populate the filter.name_patterns array
dynamically as opposed to pointing to the 'argv' array directly. This
requires a careful cast while freeing the individual strings,
conditioned on the --stdin option.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-for-each-ref.txt |  7 +++++-
 builtin/for-each-ref.c             | 29 ++++++++++++++++++++++-
 t/t6300-for-each-ref.sh            | 37 ++++++++++++++++++++++++++++++
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 6da899c6296..ccdc2911bb9 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -9,7 +9,8 @@ SYNOPSIS
 --------
 [verse]
 'git for-each-ref' [--count=<count>] [--shell|--perl|--python|--tcl]
-		   [(--sort=<key>)...] [--format=<format>] [<pattern>...]
+		   [(--sort=<key>)...] [--format=<format>]
+		   [ --stdin | <pattern>... ]
 		   [--points-at=<object>]
 		   [--merged[=<object>]] [--no-merged[=<object>]]
 		   [--contains[=<object>]] [--no-contains[=<object>]]
@@ -32,6 +33,10 @@ OPTIONS
 	literally, in the latter case matching completely or from the
 	beginning up to a slash.
 
+--stdin::
+	If `--stdin` is supplied, then the list of patterns is read from
+	standard input instead of from the argument list.
+
 --count=<count>::
 	By default the command shows all refs that match
 	`<pattern>`.  This option makes it stop after showing
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 6f62f40d126..e005a7ef3ce 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -25,6 +25,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	struct ref_format format = REF_FORMAT_INIT;
 	struct strbuf output = STRBUF_INIT;
 	struct strbuf err = STRBUF_INIT;
+	int from_stdin = 0;
 
 	struct option opts[] = {
 		OPT_BIT('s', "shell", &format.quote_style,
@@ -49,6 +50,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 		OPT_CONTAINS(&filter.with_commit, N_("print only refs which contain the commit")),
 		OPT_NO_CONTAINS(&filter.no_commit, N_("print only refs which don't contain the commit")),
 		OPT_BOOL(0, "ignore-case", &icase, N_("sorting and filtering are case insensitive")),
+		OPT_BOOL(0, "stdin", &from_stdin, N_("read reference patterns from stdin")),
 		OPT_END(),
 	};
 
@@ -75,7 +77,27 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
 	filter.ignore_case = icase;
 
-	filter.name_patterns = argv;
+	if (from_stdin) {
+		struct strbuf line = STRBUF_INIT;
+		size_t nr = 0, alloc = 16;
+
+		if (argv[0])
+			die(_("unknown arguments supplied with --stdin"));
+
+		CALLOC_ARRAY(filter.name_patterns, alloc);
+
+		while (strbuf_getline(&line, stdin) != EOF) {
+			ALLOC_GROW(filter.name_patterns, nr + 1, alloc);
+			filter.name_patterns[nr++] = strbuf_detach(&line, NULL);
+		}
+
+		/* Add a terminating NULL string. */
+		ALLOC_GROW(filter.name_patterns, nr + 1, alloc);
+		filter.name_patterns[nr + 1] = NULL;
+	} else {
+		filter.name_patterns = argv;
+	}
+
 	filter.match_as_path = 1;
 	filter_refs(&array, &filter, FILTER_REFS_ALL);
 	ref_array_sort(sorting, &array);
@@ -97,5 +119,10 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	free_commit_list(filter.with_commit);
 	free_commit_list(filter.no_commit);
 	ref_sorting_release(sorting);
+	if (from_stdin) {
+		for (size_t i = 0; filter.name_patterns[i]; i++)
+			free((char *)filter.name_patterns[i]);
+		free(filter.name_patterns);
+	}
 	return 0;
 }
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index c466fd989f1..a58053a54c5 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -1464,4 +1464,41 @@ sig_crlf="$(printf "%s" "$sig" | append_cr; echo dummy)"
 sig_crlf=${sig_crlf%dummy}
 test_atom refs/tags/fake-sig-crlf contents:signature "$sig_crlf"
 
+test_expect_success 'git for-each-ref --stdin: empty' '
+	>in &&
+	git for-each-ref --format="%(refname)" --stdin <in >actual &&
+	git for-each-ref --format="%(refname)" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git for-each-ref --stdin: fails if extra args' '
+	>in &&
+	test_must_fail git for-each-ref --format="%(refname)" \
+		--stdin refs/heads/extra <in 2>err &&
+	grep "unknown arguments supplied with --stdin" err
+'
+
+test_expect_success 'git for-each-ref --stdin: matches' '
+	cat >in <<-EOF &&
+	refs/tags/multi*
+	refs/heads/amb*
+	EOF
+
+	cat >expect <<-EOF &&
+	refs/heads/ambiguous
+	refs/tags/multi-ref1-100000-user1
+	refs/tags/multi-ref1-100000-user2
+	refs/tags/multi-ref1-200000-user1
+	refs/tags/multi-ref1-200000-user2
+	refs/tags/multi-ref2-100000-user1
+	refs/tags/multi-ref2-100000-user2
+	refs/tags/multi-ref2-200000-user1
+	refs/tags/multi-ref2-200000-user2
+	refs/tags/multiline
+	EOF
+
+	git for-each-ref --format="%(refname)" --stdin <in >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v2 2/8] for-each-ref: explicitly test no matches
  2023-03-10 17:20 ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
  2023-03-10 17:20   ` [PATCH v2 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
@ 2023-03-10 17:20   ` Derrick Stolee via GitGitGadget
  2023-03-10 17:20   ` [PATCH v2 3/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
                     ` (8 subsequent siblings)
  10 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-10 17:20 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, Jeff King, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The for-each-ref builtin can take a list of ref patterns, but if none
match, it still succeeds (but with no output). Add an explicit test that
demonstrates that behavior.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t6300-for-each-ref.sh | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index a58053a54c5..6614469d2d6 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -1501,4 +1501,17 @@ test_expect_success 'git for-each-ref --stdin: matches' '
 	test_cmp expect actual
 '
 
+test_expect_success 'git for-each-ref with non-existing refs' '
+	cat >in <<-EOF &&
+	refs/heads/this-ref-does-not-exist
+	refs/tags/bogus
+	EOF
+
+	git for-each-ref --format="%(refname)" --stdin <in >actual &&
+	test_must_be_empty actual &&
+
+	xargs git for-each-ref --format="%(refname)" <in >actual &&
+	test_must_be_empty actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v2 3/8] commit-graph: combine generation computations
  2023-03-10 17:20 ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
  2023-03-10 17:20   ` [PATCH v2 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
  2023-03-10 17:20   ` [PATCH v2 2/8] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
@ 2023-03-10 17:20   ` Derrick Stolee via GitGitGadget
  2023-03-10 17:20   ` [PATCH v2 4/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-10 17:20 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, Jeff King, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

This patch extracts the common code used to compute topological levels
and corrected committer dates into a common routine,
compute_reachable_generation_numbers_1().

This new routine dispatches to call the necessary functions to get and
set the generation number for a given commit through a vtable (the
compute_generation_info struct).

Computing the generation number itself is done in
compute_generation_from_max(), which dispatches its implementation based
on the generation version requested, or issuing a BUG() for unrecognized
generation versions.

This patch cleans up the two places that currently compute topological
levels and corrected commit dates by reducing the amount of duplicated
code. It also makes it possible to introduce a function which
dynamically computes those values for commits that aren't stored in a
commit-graph, which will be required for the forthcoming ahead-behind
rewrite.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-graph.c | 171 +++++++++++++++++++++++++++++++------------------
 1 file changed, 107 insertions(+), 64 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index c11b59f28b3..deccf984a0d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1446,24 +1446,53 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 	stop_progress(&ctx->progress);
 }
 
-static void compute_topological_levels(struct write_commit_graph_context *ctx)
+struct compute_generation_info {
+	struct repository *r;
+	struct packed_commit_list *commits;
+	struct progress *progress;
+	int progress_cnt;
+
+	timestamp_t (*get_generation)(struct commit *c, void *data);
+	void (*set_generation)(struct commit *c, timestamp_t gen, void *data);
+	void *data;
+};
+
+static timestamp_t compute_generation_from_max(struct commit *c,
+					       timestamp_t max_gen,
+					       int generation_version)
+{
+	switch (generation_version) {
+	case 1: /* topological levels */
+		if (max_gen > GENERATION_NUMBER_V1_MAX - 1)
+			max_gen = GENERATION_NUMBER_V1_MAX - 1;
+		return max_gen + 1;
+
+	case 2: /* corrected commit date */
+		if (c->date && c->date > max_gen)
+			max_gen = c->date - 1;
+		return max_gen + 1;
+
+	default:
+		BUG("attempting unimplemented version");
+	}
+}
+
+static void compute_reachable_generation_numbers_1(
+			struct compute_generation_info *info,
+			int generation_version)
 {
 	int i;
 	struct commit_list *list = NULL;
 
-	if (ctx->report_progress)
-		ctx->progress = start_delayed_progress(
-					_("Computing commit graph topological levels"),
-					ctx->commits.nr);
-	for (i = 0; i < ctx->commits.nr; i++) {
-		struct commit *c = ctx->commits.list[i];
-		uint32_t level;
+	for (i = 0; i < info->commits->nr; i++) {
+		struct commit *c = info->commits->list[i];
+		timestamp_t gen;
+		repo_parse_commit(info->r, c);
+		gen = info->get_generation(c, info->data);
 
-		repo_parse_commit(ctx->r, c);
-		level = *topo_level_slab_at(ctx->topo_levels, c);
+		display_progress(info->progress, info->progress_cnt + 1);
 
-		display_progress(ctx->progress, i + 1);
-		if (level != GENERATION_NUMBER_ZERO)
+		if (gen != GENERATION_NUMBER_ZERO && gen != GENERATION_NUMBER_INFINITY)
 			continue;
 
 		commit_list_insert(c, &list);
@@ -1471,38 +1500,91 @@ static void compute_topological_levels(struct write_commit_graph_context *ctx)
 			struct commit *current = list->item;
 			struct commit_list *parent;
 			int all_parents_computed = 1;
-			uint32_t max_level = 0;
+			uint32_t max_gen = 0;
 
 			for (parent = current->parents; parent; parent = parent->next) {
-				repo_parse_commit(ctx->r, parent->item);
-				level = *topo_level_slab_at(ctx->topo_levels, parent->item);
+				repo_parse_commit(info->r, parent->item);
+				gen = info->get_generation(parent->item, info->data);
 
-				if (level == GENERATION_NUMBER_ZERO) {
+				if (gen == GENERATION_NUMBER_ZERO) {
 					all_parents_computed = 0;
 					commit_list_insert(parent->item, &list);
 					break;
 				}
 
-				if (level > max_level)
-					max_level = level;
+				if (gen > max_gen)
+					max_gen = gen;
 			}
 
 			if (all_parents_computed) {
 				pop_commit(&list);
-
-				if (max_level > GENERATION_NUMBER_V1_MAX - 1)
-					max_level = GENERATION_NUMBER_V1_MAX - 1;
-				*topo_level_slab_at(ctx->topo_levels, current) = max_level + 1;
+				gen = compute_generation_from_max(
+						current, max_gen,
+						generation_version);
+				info->set_generation(current, gen, info->data);
 			}
 		}
 	}
+}
+
+static timestamp_t get_topo_level(struct commit *c, void *data)
+{
+	struct write_commit_graph_context *ctx = data;
+	return *topo_level_slab_at(ctx->topo_levels, c);
+}
+
+static void set_topo_level(struct commit *c, timestamp_t t, void *data)
+{
+	struct write_commit_graph_context *ctx = data;
+	*topo_level_slab_at(ctx->topo_levels, c) = (uint32_t)t;
+	display_progress(ctx->progress, ctx->progress_cnt + 1);
+}
+
+static void compute_topological_levels(struct write_commit_graph_context *ctx)
+{
+	struct compute_generation_info info = {
+		.r = ctx->r,
+		.progress = ctx->progress,
+		.commits = &ctx->commits,
+		.get_generation = get_topo_level,
+		.set_generation = set_topo_level,
+		.data = ctx,
+	};
+
+	if (ctx->report_progress)
+		ctx->progress = start_delayed_progress(
+					_("Computing commit graph topological levels"),
+					ctx->commits.nr);
+
+	compute_reachable_generation_numbers_1(&info, 1);
+
 	stop_progress(&ctx->progress);
 }
 
+static timestamp_t get_generation_from_graph_data(struct commit *c, void *data)
+{
+	return commit_graph_data_at(c)->generation;
+}
+
+static void set_generation_v2(struct commit *c, timestamp_t t, void *data)
+{
+	struct write_commit_graph_context *ctx = data;
+	struct commit_graph_data *g = commit_graph_data_at(c);
+	g->generation = (uint32_t)t;
+	display_progress(ctx->progress, ctx->progress_cnt + 1);
+}
+
 static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 {
 	int i;
-	struct commit_list *list = NULL;
+	struct compute_generation_info info = {
+		.r = ctx->r,
+		.progress = ctx->progress,
+		.commits = &ctx->commits,
+		.get_generation = get_generation_from_graph_data,
+		.set_generation = set_generation_v2,
+		.data = ctx,
+	};
 
 	if (ctx->report_progress)
 		ctx->progress = start_delayed_progress(
@@ -1517,47 +1599,7 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 		}
 	}
 
-	for (i = 0; i < ctx->commits.nr; i++) {
-		struct commit *c = ctx->commits.list[i];
-		timestamp_t corrected_commit_date;
-
-		repo_parse_commit(ctx->r, c);
-		corrected_commit_date = commit_graph_data_at(c)->generation;
-
-		display_progress(ctx->progress, i + 1);
-		if (corrected_commit_date != GENERATION_NUMBER_ZERO)
-			continue;
-
-		commit_list_insert(c, &list);
-		while (list) {
-			struct commit *current = list->item;
-			struct commit_list *parent;
-			int all_parents_computed = 1;
-			timestamp_t max_corrected_commit_date = 0;
-
-			for (parent = current->parents; parent; parent = parent->next) {
-				repo_parse_commit(ctx->r, parent->item);
-				corrected_commit_date = commit_graph_data_at(parent->item)->generation;
-
-				if (corrected_commit_date == GENERATION_NUMBER_ZERO) {
-					all_parents_computed = 0;
-					commit_list_insert(parent->item, &list);
-					break;
-				}
-
-				if (corrected_commit_date > max_corrected_commit_date)
-					max_corrected_commit_date = corrected_commit_date;
-			}
-
-			if (all_parents_computed) {
-				pop_commit(&list);
-
-				if (current->date && current->date > max_corrected_commit_date)
-					max_corrected_commit_date = current->date - 1;
-				commit_graph_data_at(current)->generation = max_corrected_commit_date + 1;
-			}
-		}
-	}
+	compute_reachable_generation_numbers_1(&info, 2);
 
 	for (i = 0; i < ctx->commits.nr; i++) {
 		struct commit *c = ctx->commits.list[i];
@@ -1565,6 +1607,7 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 		if (offset > GENERATION_NUMBER_V2_OFFSET_MAX)
 			ctx->num_generation_data_overflows++;
 	}
+
 	stop_progress(&ctx->progress);
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v2 4/8] commit-graph: return generation from memory
  2023-03-10 17:20 ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
                     ` (2 preceding siblings ...)
  2023-03-10 17:20   ` [PATCH v2 3/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
@ 2023-03-10 17:20   ` Derrick Stolee via GitGitGadget
  2023-03-10 17:21   ` [PATCH v2 5/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
                     ` (6 subsequent siblings)
  10 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-10 17:20 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, Jeff King, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The commit_graph_generation() method used to report a value of
GENERATION_NUMBER_INFINITY if the commit_graph_data_slab had an instance
for the given commit but the graph_pos indicated the commit was not in
the commit-graph file.

Instead, trust the 'generation' member if the commit has a value in the
slab _and_ the 'generation' member is non-zero. Otherwise, treat it as
GENERATION_NUMBER_INFINITY.

This only makes a difference for a very old case for the commit-graph:
the very first Git release to write commit-graph files wrote zeroes in
the topological level positions. If we are parsing a commit-graph with
all zeroes, those commits will now appear to have
GENERATION_NUMBER_INFINITY (as if they were not parsed from the
commit-graph).

I attempted several variations to work around the need for providing an
uninitialized 'generation' member, but this was the best one I found. It
does require a change to a verification test in t5318 because it reports
a different error than the one about non-zero generation numbers.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-graph.c          | 8 +++-----
 t/t5318-commit-graph.sh | 2 +-
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index deccf984a0d..b4da4e05067 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -116,12 +116,10 @@ timestamp_t commit_graph_generation(const struct commit *c)
 	struct commit_graph_data *data =
 		commit_graph_data_slab_peek(&commit_graph_data_slab, c);
 
-	if (!data)
-		return GENERATION_NUMBER_INFINITY;
-	else if (data->graph_pos == COMMIT_NOT_FROM_GRAPH)
-		return GENERATION_NUMBER_INFINITY;
+	if (data && data->generation)
+		return data->generation;
 
-	return data->generation;
+	return GENERATION_NUMBER_INFINITY;
 }
 
 static struct commit_graph_data *commit_graph_data_at(const struct commit *c)
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 049c5fc8ead..b6e12115786 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -630,7 +630,7 @@ test_expect_success 'detect incorrect generation number' '
 
 test_expect_success 'detect incorrect generation number' '
 	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\01" \
-		"non-zero generation number"
+		"commit-graph generation for commit"
 '
 
 test_expect_success 'detect incorrect commit date' '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v2 5/8] commit-graph: introduce `ensure_generations_valid()`
  2023-03-10 17:20 ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
                     ` (3 preceding siblings ...)
  2023-03-10 17:20   ` [PATCH v2 4/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
@ 2023-03-10 17:21   ` Taylor Blau via GitGitGadget
  2023-03-10 17:21   ` [PATCH v2 6/8] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 90+ messages in thread
From: Taylor Blau via GitGitGadget @ 2023-03-10 17:21 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, Jeff King, Derrick Stolee, Taylor Blau

From: Taylor Blau <me@ttaylorr.com>

Use the just-introduced compute_reachable_generation_numbers_1() to
implement a function which dynamically computes topological levels (or
corrected commit dates) for out-of-graph commits.

This will be useful for the ahead-behind algorithm we are about to
introduce, which needs accurate topological levels on _all_ commits
reachable from the tips in order to avoid over-counting.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-graph.c | 29 +++++++++++++++++++++++++++++
 commit-graph.h |  7 +++++++
 2 files changed, 36 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index b4da4e05067..7311d62a110 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1609,6 +1609,35 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 	stop_progress(&ctx->progress);
 }
 
+static void set_generation_in_graph_data(struct commit *c, timestamp_t t,
+					 void *data)
+{
+	commit_graph_data_at(c)->generation = t;
+}
+
+/*
+ * After this method, all commits reachable from those in the given
+ * list will have non-zero, non-infinite generation numbers.
+ */
+void ensure_generations_valid(struct commit **commits, size_t nr)
+{
+	struct repository *r = the_repository;
+	int generation_version = get_configured_generation_version(r);
+	struct packed_commit_list list = {
+		.list = commits,
+		.alloc = nr,
+		.nr = nr,
+	};
+	struct compute_generation_info info = {
+		.r = r,
+		.commits = &list,
+		.get_generation = get_generation_from_graph_data,
+		.set_generation = set_generation_in_graph_data,
+	};
+
+	compute_reachable_generation_numbers_1(&info, generation_version);
+}
+
 static void trace2_bloom_filter_write_statistics(struct write_commit_graph_context *ctx)
 {
 	trace2_data_intmax("commit-graph", ctx->r, "filter-computed",
diff --git a/commit-graph.h b/commit-graph.h
index 37faee6b66d..a529c62b518 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -190,4 +190,11 @@ struct commit_graph_data {
  */
 timestamp_t commit_graph_generation(const struct commit *);
 uint32_t commit_graph_position(const struct commit *);
+
+/*
+ * After this method, all commits reachable from those in the given
+ * list will have non-zero, non-infinite generation numbers.
+ */
+void ensure_generations_valid(struct commit **commits, size_t nr);
+
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v2 6/8] commit-reach: implement ahead_behind() logic
  2023-03-10 17:20 ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
                     ` (4 preceding siblings ...)
  2023-03-10 17:21   ` [PATCH v2 5/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
@ 2023-03-10 17:21   ` Derrick Stolee via GitGitGadget
  2023-03-15 13:50     ` Ævar Arnfjörð Bjarmason
  2023-03-10 17:21   ` [PATCH v2 7/8] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
                     ` (4 subsequent siblings)
  10 siblings, 1 reply; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-10 17:21 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, Jeff King, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Fully implement the commit-counting logic required to determine
ahead/behind counts for a batch of commit pairs. This is a new library
method within commit-reach.h. This method will be linked to the
for-each-ref builtin in the next change.

The interface for ahead_behind() uses two arrays. The first array of
commits contains the list of all starting points for the walk. This
includes all tip commits _and_ base commits. The second array, using the
new ahead_behind_count struct, indicates which commits from that initial
array form the base/tip pair for the ahead/behind count it will store.

This implementation of ahead_behind() allows multiple bases, if desired.
Even with multiple bases, there is only one commit walk used for
counting the ahead/behind values, saving time when the base/tip ranges
overlap significantly.

This interface for ahead_behind() also makes it very easy to call
ensure_generations_valid() on the entire array of bases and tips. This
call is necessary because it is critical that the walk that counts
ahead/behind values never walks a commit more than once. Without
generation numbers on every commit, there is a possibility that a
commit date skew could cause the walk to revisit a commit and then
double-count it. For this reason, it is strongly recommended that 'git
ahead-behind' is only run in a repository with a commit-graph file that
covers most of the reachable commits, storing precomputed generation
numbers. If no commit-graph exists, this walk will be much slower as it
must walk all reachable commits in ensure_generations_valid() before
performing the counting logic.

It is possible to detect if generation numbers are available at run time
and redirect the implementation to another algorithm that does not
require this property. However, that implementation requires a commit
walk per base/tip pair _and_ can be slower due to the commit date
heuristics required. Such an implementation could be considered in the
future if there is a reason to include it, but most Git hosts should
already be generating a commit-graph file as part of repository
maintenance. Most Git clients should also be generating commit-graph
files as part of background maintenance or automatic GCs.

Now, let's discuss the ahead/behind counting algorithm.

Each commit in the input commit list is associated with a bit position
indicating "the ith commit can reach this commit". Each of these commits
is associated with a bitmap with its position flipped on and then
placed in a queue for walking commit history. We walk commits by popping
the commit with maximum generation number out of the queue, guaranteeing
that we will never walk a child of that commit in any future steps.

As we walk, we load the bitmap for the current commit and perform two
main steps. The _second_ step examines each parent of the current commit
and adds the current commit's bitmap bits to each parent's bitmap. (We
create a new bitmap for the parent if this is our first time seeing that
parent.) After adding the bits to the parent's bitmap, the parent is
added to the walk queue. Due to this passing of bits to parents, the
current commit has a guarantee that the ith bit is enabled on its bitmap
if and only if the ith commit can reach the current commit.

The first step of the walk is to examine the bitmask on the current
commit and decide which ranges the commit is in or not. Due to the "bit
pushing" in the second step, we have a guarantee that the ith bit of the
current commit's bitmap is on if and only if the ith starting commit can
reach it. For each ahead_behind_count struct, check the base_index and
tip_index to see if those bits are enabled on the current bitmap. If
exactly one bit is enabled, then increment the corresponding 'ahead' or
'behind' count.  This increment is the reason we _absolutely need_ to
walk commits at most once.

The only subtle thing to do with this walk is to check to see if a
parent has all bits on in its bitmap, in which case it becomes "stale"
and is marked with the STALE bit. This allows queue_has_nonstale() to be
the terminating condition of the walk, which greatly reduces the number
of commits walked if all of the commits are nearby in history. It avoids
walking a large number of common commits when there is a deep history.
We also use the helper method insert_no_dup() to add commits to the
priority queue without adding them multiple times. This uses the PARENT2
flag. Thus, we must clear both the STALE and PARENT2 bits of all
commits, in case ahead_behind() is called multiple times in the same
process.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-reach.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++
 commit-reach.h | 30 ++++++++++++++++
 2 files changed, 126 insertions(+)

diff --git a/commit-reach.c b/commit-reach.c
index 2e33c599a82..338ca8084b2 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -8,6 +8,7 @@
 #include "revision.h"
 #include "tag.h"
 #include "commit-reach.h"
+#include "ewah/ewok.h"
 
 /* Remember to update object flag allocation in object.h */
 #define PARENT1		(1u<<16)
@@ -941,3 +942,98 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 
 	return found_commits;
 }
+
+define_commit_slab(bit_arrays, struct bitmap *);
+static struct bit_arrays bit_arrays;
+
+static void insert_no_dup(struct prio_queue *queue, struct commit *c)
+{
+	if (c->object.flags & PARENT2)
+		return;
+	prio_queue_put(queue, c);
+	c->object.flags |= PARENT2;
+}
+
+static struct bitmap *init_bit_array(struct commit *c, int width)
+{
+	struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c);
+	if (!*bitmap)
+		*bitmap = bitmap_word_alloc(width);
+	return *bitmap;
+}
+
+static void free_bit_array(struct commit *c)
+{
+	struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c);
+	if (!*bitmap)
+		return;
+	bitmap_free(*bitmap);
+	*bitmap = NULL;
+}
+
+void ahead_behind(struct commit **commits, size_t commits_nr,
+		  struct ahead_behind_count *counts, size_t counts_nr)
+{
+	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+	size_t width = (commits_nr + BITS_IN_EWORD - 1) / BITS_IN_EWORD;
+	size_t i;
+
+	if (!commits_nr || !counts_nr)
+		return;
+
+	for (i = 0; i < counts_nr; i++) {
+		counts[i].ahead = 0;
+		counts[i].behind = 0;
+	}
+
+	ensure_generations_valid(commits, commits_nr);
+
+	init_bit_arrays(&bit_arrays);
+
+	for (i = 0; i < commits_nr; i++) {
+		struct commit *c = commits[i];
+		struct bitmap *bitmap = init_bit_array(c, width);
+
+		bitmap_set(bitmap, i);
+		insert_no_dup(&queue, c);
+	}
+
+	while (queue_has_nonstale(&queue)) {
+		struct commit *c = prio_queue_get(&queue);
+		struct commit_list *p;
+		struct bitmap *bitmap_c = init_bit_array(c, width);
+
+		for (i = 0; i < counts_nr; i++) {
+			int reach_from_tip = !!bitmap_get(bitmap_c, counts[i].tip_index);
+			int reach_from_base = !!bitmap_get(bitmap_c, counts[i].base_index);
+
+			if (reach_from_tip ^ reach_from_base) {
+				if (reach_from_base)
+					counts[i].behind++;
+				else
+					counts[i].ahead++;
+			}
+		}
+
+		for (p = c->parents; p; p = p->next) {
+			struct bitmap *bitmap_p;
+
+			parse_commit(p->item);
+
+			bitmap_p = init_bit_array(p->item, width);
+			bitmap_or(bitmap_p, bitmap_c);
+
+			if (bitmap_popcount(bitmap_p) == commits_nr)
+				p->item->object.flags |= STALE;
+
+			insert_no_dup(&queue, p->item);
+		}
+
+		free_bit_array(c);
+	}
+
+	/* STALE is used here, PARENT2 is used by insert_no_dup(). */
+	repo_clear_commit_marks(the_repository, PARENT2 | STALE);
+	clear_bit_arrays(&bit_arrays);
+	clear_prio_queue(&queue);
+}
diff --git a/commit-reach.h b/commit-reach.h
index 148b56fea50..f871b5dcce9 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -104,4 +104,34 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 					 struct commit **to, int nr_to,
 					 unsigned int reachable_flag);
 
+struct ahead_behind_count {
+	/**
+	 * As input, the *_index members indicate which positions in
+	 * the 'tips' array correspond to the tip and base of this
+	 * comparison.
+	 */
+	size_t tip_index;
+	size_t base_index;
+
+	/**
+	 * These values store the computed counts for each side of the
+	 * symmetric difference:
+	 *
+	 * 'ahead' stores the number of commits reachable from the tip
+	 * and not reachable from the base.
+	 *
+	 * 'behind' stores the number of commits reachable from the base
+	 * and not reachable from the tip.
+	 */
+	unsigned int ahead;
+	unsigned int behind;
+};
+
+/*
+ * Given an array of commits and an array of ahead_behind_count pairs,
+ * compute the ahead/behind counts for each pair.
+ */
+void ahead_behind(struct commit **commits, size_t commits_nr,
+		  struct ahead_behind_count *counts, size_t counts_nr);
+
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v2 7/8] for-each-ref: add ahead-behind format atom
  2023-03-10 17:20 ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
                     ` (5 preceding siblings ...)
  2023-03-10 17:21   ` [PATCH v2 6/8] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
@ 2023-03-10 17:21   ` Derrick Stolee via GitGitGadget
  2023-03-10 19:09     ` Junio C Hamano
  2023-03-15 13:57     ` Ævar Arnfjörð Bjarmason
  2023-03-10 17:21   ` [PATCH v2 8/8] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  10 siblings, 2 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-10 17:21 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, Jeff King, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The previous change implemented the ahead_behind() method, including an
algorithm to compute the ahead/behind values for a number of commit tips
relative to a number of commit bases. Now, integrate that algorithm as
part of 'git for-each-ref' hidden behind a new format atom,
ahead-behind. This naturally extends to 'git branch' and 'git tag'
builtins, as well.

This format allows specifying multiple bases, if so desired, and all
matching references are compared against all of those bases. For this
reason, failing to read a reference provided from these atoms results in
an error.

In order to translate the ahead_behind() method information to the
format output code in ref-filter.c, we must populate arrays of
ahead_behind_count structs. In struct ref_array, we store the full array
that will be passed to ahead_behind(). In struct ref_array_item, we
store an array of pointers that point to the relvant items within the
full array. In this way, we can pull all relevant ahead/behind values
directly when formatting output for a specific item. It also ensures the
lifetime of the ahead_behind_count structs matches the time that the
array is being used.

Add specific tests of the ahead/behind counts in t6600-test-reach.sh, as
it has an interesting repository shape. In particular, its merging
strategy and its use of different commit-graphs would demonstrate over-
counting if the ahead_behind() method did not already account for that
possibility.

Also add tests for the specific for-each-ref, branch, and tag builtins.
In the case of 'git tag', there are intersting cases that happen when
some of the selected tips are not commits. This requires careful logic
around commits_nr in the second loop of filter_ahead_behind(). Also, the
test in t7004 is carefully located to avoid being dependent on the GPG
prereq. It also avoids using the test_commit helper, as that will add
ticks to the time and disrupt the expected timestampes in later tag
tests.

Also add performance tests in a new p1300-graph-walks.sh script. This
will be useful for more uses in the future, but for now compare the
ahead-behind counting algorithm in 'git for-each-ref' to the naive
implementation by running 'git rev-list --count' processes for each
input.

For the Git source code repository, the improvement is already obvious:

Test                                            this tree
---------------------------------------------------------------
1500.2: ahead-behind counts: git for-each-ref   0.07(0.07+0.00)
1500.3: ahead-behind counts: git branch         0.07(0.06+0.00)
1500.4: ahead-behind counts: git tag            0.07(0.06+0.00)
1500.5: ahead-behind counts: git rev-list       1.32(1.04+0.27)

But the standard performance benchmark is the Linux kernel repository,
which demosntrates a significant improvement:

Test                                            this tree
---------------------------------------------------------------
1500.2: ahead-behind counts: git for-each-ref   0.27(0.24+0.02)
1500.3: ahead-behind counts: git branch         0.27(0.24+0.03)
1500.4: ahead-behind counts: git tag            0.28(0.27+0.01)
1500.5: ahead-behind counts: git rev-list       4.57(4.03+0.54)

The 'git rev-list' test exists in this change as a demonstration, but it
will be removed in the next change to avoid wasting time on this
comparison.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-for-each-ref.txt |  5 ++
 builtin/branch.c                   |  1 +
 builtin/for-each-ref.c             |  3 ++
 builtin/tag.c                      |  1 +
 ref-filter.c                       | 70 ++++++++++++++++++++++++
 ref-filter.h                       | 25 ++++++++-
 t/perf/p1500-graph-walks.sh        | 45 ++++++++++++++++
 t/t3203-branch-output.sh           | 14 +++++
 t/t6301-for-each-ref-errors.sh     | 12 +++++
 t/t6600-test-reach.sh              | 86 ++++++++++++++++++++++++++++++
 t/t7004-tag.sh                     | 28 ++++++++++
 11 files changed, 289 insertions(+), 1 deletion(-)
 create mode 100755 t/perf/p1500-graph-walks.sh

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index ccdc2911bb9..d5c3cda4228 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -222,6 +222,11 @@ worktreepath::
 	out, if it is checked out in any linked worktree. Empty string
 	otherwise.
 
+ahead-behind:<ref>::
+	Two integers, separated by a space, demonstrating the number of
+	commits ahead and behind, respectively, when comparing the output
+	ref to the `<ref>` specified in the format.
+
 In addition to the above, for commit and tag objects, the header
 field names (`tree`, `parent`, `object`, `type`, and `tag`) can
 be used to specify the value in the header field.
diff --git a/builtin/branch.c b/builtin/branch.c
index f63fd45edb9..d46ca6147e3 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -448,6 +448,7 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
 	if (verify_ref_format(format))
 		die(_("unable to parse format string"));
 
+	filter_ahead_behind(format, &array);
 	ref_array_sort(sorting, &array);
 
 	for (i = 0; i < array.nr; i++) {
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index e005a7ef3ce..08945ad6802 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -5,6 +5,7 @@
 #include "object.h"
 #include "parse-options.h"
 #include "ref-filter.h"
+#include "commit-reach.h"
 
 static char const * const for_each_ref_usage[] = {
 	N_("git for-each-ref [<options>] [<pattern>]"),
@@ -100,6 +101,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 
 	filter.match_as_path = 1;
 	filter_refs(&array, &filter, FILTER_REFS_ALL);
+	filter_ahead_behind(&format, &array);
+
 	ref_array_sort(sorting, &array);
 
 	if (!maxcount || array.nr < maxcount)
diff --git a/builtin/tag.c b/builtin/tag.c
index d428c45dc8d..4f203a4ad21 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -66,6 +66,7 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 		die(_("unable to parse format string"));
 	filter->with_commit_tag_algo = 1;
 	filter_refs(&array, filter, FILTER_REFS_TAGS);
+	filter_ahead_behind(format, &array);
 	ref_array_sort(sorting, &array);
 
 	for (i = 0; i < array.nr; i++) {
diff --git a/ref-filter.c b/ref-filter.c
index f8203c6b052..896bf703f59 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -158,6 +158,7 @@ enum atom_type {
 	ATOM_THEN,
 	ATOM_ELSE,
 	ATOM_REST,
+	ATOM_AHEADBEHIND,
 };
 
 /*
@@ -586,6 +587,16 @@ static int rest_atom_parser(struct ref_format *format, struct used_atom *atom,
 	return 0;
 }
 
+static int ahead_behind_atom_parser(struct ref_format *format, struct used_atom *atom,
+				    const char *arg, struct strbuf *err)
+{
+	if (!arg)
+		return strbuf_addf_ret(err, -1, _("expected format: %%(ahead-behind:<ref>)"));
+
+	string_list_append(&format->bases, arg);
+	return 0;
+}
+
 static int head_atom_parser(struct ref_format *format, struct used_atom *atom,
 			    const char *arg, struct strbuf *err)
 {
@@ -645,6 +656,7 @@ static struct {
 	[ATOM_THEN] = { "then", SOURCE_NONE },
 	[ATOM_ELSE] = { "else", SOURCE_NONE },
 	[ATOM_REST] = { "rest", SOURCE_NONE, FIELD_STR, rest_atom_parser },
+	[ATOM_AHEADBEHIND] = { "ahead-behind", SOURCE_OTHER, FIELD_STR, ahead_behind_atom_parser },
 	/*
 	 * Please update $__git_ref_fieldlist in git-completion.bash
 	 * when you add new atoms
@@ -1848,6 +1860,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 	struct object *obj;
 	int i;
 	struct object_info empty = OBJECT_INFO_INIT;
+	int ahead_behind_atoms = 0;
 
 	CALLOC_ARRAY(ref->value, used_atom_cnt);
 
@@ -1978,6 +1991,16 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 			else
 				v->s = xstrdup("");
 			continue;
+		} else if (atom_type == ATOM_AHEADBEHIND) {
+			if (ref->counts) {
+				const struct ahead_behind_count *count;
+				count = ref->counts[ahead_behind_atoms++];
+				v->s = xstrfmt("%d %d", count->ahead, count->behind);
+			} else {
+				/* Not a commit. */
+				v->s = xstrdup("");
+			}
+			continue;
 		} else
 			continue;
 
@@ -2328,6 +2351,7 @@ static void free_array_item(struct ref_array_item *item)
 			free((char *)item->value[i].s);
 		free(item->value);
 	}
+	free(item->counts);
 	free(item);
 }
 
@@ -2356,6 +2380,8 @@ void ref_array_clear(struct ref_array *array)
 		free_worktrees(ref_to_worktree_map.worktrees);
 		ref_to_worktree_map.worktrees = NULL;
 	}
+
+	FREE_AND_NULL(array->counts);
 }
 
 #define EXCLUDE_REACHED 0
@@ -2418,6 +2444,50 @@ static void reach_filter(struct ref_array *array,
 	free(to_clear);
 }
 
+void filter_ahead_behind(struct ref_format *format,
+			 struct ref_array *array)
+{
+	struct commit **commits;
+	size_t commits_nr = format->bases.nr + array->nr;
+
+	if (!format->bases.nr || !array->nr)
+		return;
+
+	ALLOC_ARRAY(commits, commits_nr);
+	for (size_t i = 0; i < format->bases.nr; i++) {
+		const char *name = format->bases.items[i].string;
+		commits[i] = lookup_commit_reference_by_name(name);
+		if (!commits[i])
+			die("failed to find '%s'", name);
+	}
+
+	ALLOC_ARRAY(array->counts, st_mult(format->bases.nr, array->nr));
+
+	commits_nr = format->bases.nr;
+	array->counts_nr = 0;
+	for (size_t i = 0; i < array->nr; i++) {
+		const char *name = array->items[i]->refname;
+		commits[commits_nr] = lookup_commit_reference_by_name(name);
+
+		if (!commits[commits_nr])
+			continue;
+
+		CALLOC_ARRAY(array->items[i]->counts, format->bases.nr);
+		for (size_t j = 0; j < format->bases.nr; j++) {
+			struct ahead_behind_count *count;
+			count = &array->counts[array->counts_nr++];
+			count->tip_index = commits_nr;
+			count->base_index = j;
+
+			array->items[i]->counts[j] = count;
+		}
+		commits_nr++;
+	}
+
+	ahead_behind(commits, commits_nr, array->counts, array->counts_nr);
+	free(commits);
+}
+
 /*
  * API for filtering a set of refs. Based on the type of refs the user
  * has requested, we iterate through those refs and apply filters
diff --git a/ref-filter.h b/ref-filter.h
index aa0eea4ecf5..7e8bff3864e 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -5,6 +5,7 @@
 #include "refs.h"
 #include "commit.h"
 #include "parse-options.h"
+#include "string-list.h"
 
 /* Quoting styles */
 #define QUOTE_NONE 0
@@ -24,6 +25,7 @@
 
 struct atom_value;
 struct ref_sorting;
+struct ahead_behind_count;
 
 enum ref_sorting_order {
 	REF_SORTING_REVERSE = 1<<0,
@@ -40,6 +42,8 @@ struct ref_array_item {
 	const char *symref;
 	struct commit *commit;
 	struct atom_value *value;
+	struct ahead_behind_count **counts;
+
 	char refname[FLEX_ARRAY];
 };
 
@@ -47,6 +51,9 @@ struct ref_array {
 	int nr, alloc;
 	struct ref_array_item **items;
 	struct rev_info *revs;
+
+	struct ahead_behind_count *counts;
+	size_t counts_nr;
 };
 
 struct ref_filter {
@@ -80,9 +87,15 @@ struct ref_format {
 
 	/* Internal state to ref-filter */
 	int need_color_reset_at_eol;
+
+	/* List of bases for ahead-behind counts. */
+	struct string_list bases;
 };
 
-#define REF_FORMAT_INIT { .use_color = -1 }
+#define REF_FORMAT_INIT {             \
+	.use_color = -1,              \
+	.bases = STRING_LIST_INIT_DUP, \
+}
 
 /*  Macros for checking --merged and --no-merged options */
 #define _OPT_MERGED_NO_MERGED(option, filter, h) \
@@ -143,4 +156,14 @@ struct ref_array_item *ref_array_push(struct ref_array *array,
 				      const char *refname,
 				      const struct object_id *oid);
 
+/*
+ * If the provided format includes ahead-behind atoms, then compute the
+ * ahead-behind values for the array of filtered references. Must be
+ * called after filter_refs() but before outputting the formatted refs.
+ *
+ * If this is not called, then any ahead-behind atoms will be blank.
+ */
+void filter_ahead_behind(struct ref_format *format,
+			 struct ref_array *array);
+
 #endif /*  REF_FILTER_H  */
diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
new file mode 100755
index 00000000000..439a448c2e6
--- /dev/null
+++ b/t/perf/p1500-graph-walks.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+
+test_description='Commit walk performance tests'
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_expect_success 'setup' '
+	git for-each-ref --format="%(refname)" "refs/heads/*" "refs/tags/*" >allrefs &&
+	sort -r allrefs | head -n 50 >refs &&
+	for ref in $(cat refs)
+	do
+		git branch -f ref-$ref $ref &&
+		echo ref-$ref ||
+		return 1
+	done >branches &&
+	for ref in $(cat refs)
+	do
+		git tag -f tag-$ref $ref &&
+		echo tag-$ref ||
+		return 1
+	done >tags &&
+	git commit-graph write --reachable
+'
+
+test_perf 'ahead-behind counts: git for-each-ref' '
+	git for-each-ref --format="%(ahead-behind:HEAD)" --stdin <refs
+'
+
+test_perf 'ahead-behind counts: git branch' '
+	xargs git branch -l --format="%(ahead-behind:HEAD)" <branches
+'
+
+test_perf 'ahead-behind counts: git tag' '
+	xargs git tag -l --format="%(ahead-behind:HEAD)" <tags
+'
+
+test_perf 'ahead-behind counts: git rev-list' '
+	for r in $(cat refs)
+	do
+		git rev-list --count "HEAD..$r" || return 1
+	done
+'
+
+test_done
diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
index d34d77f8934..1c0f7ea24e7 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3203-branch-output.sh
@@ -337,6 +337,20 @@ test_expect_success 'git branch --format option' '
 	test_cmp expect actual
 '
 
+test_expect_success 'git branch --format with ahead-behind' '
+	cat >expect <<-\EOF &&
+	(HEAD detached from fromtag) 0 0
+	refs/heads/ambiguous 0 0
+	refs/heads/branch-one 1 0
+	refs/heads/branch-two 0 0
+	refs/heads/main 1 0
+	refs/heads/ref-to-branch 1 0
+	refs/heads/ref-to-remote 1 0
+	EOF
+	git branch --format="%(refname) %(ahead-behind:HEAD)" >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'git branch with --format=%(rest) must fail' '
 	test_must_fail git branch --format="%(rest)" >actual
 '
diff --git a/t/t6301-for-each-ref-errors.sh b/t/t6301-for-each-ref-errors.sh
index bfda1f46ad2..7db1fc4d7b3 100755
--- a/t/t6301-for-each-ref-errors.sh
+++ b/t/t6301-for-each-ref-errors.sh
@@ -54,4 +54,16 @@ test_expect_success 'Missing objects are reported correctly' '
 	test_must_be_empty brief-err
 '
 
+test_expect_success 'ahead-behind requires an argument' '
+	test_must_fail git for-each-ref \
+		--format="%(ahead-behind)" 2>err &&
+	grep "expected format: %(ahead-behind:<ref>)" err
+'
+
+test_expect_success 'missing ahead-behind base' '
+	test_must_fail git for-each-ref \
+		--format="%(ahead-behind:refs/heads/missing)" 2>err &&
+	grep "failed to find '\''refs/heads/missing'\''" err
+'
+
 test_done
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 338a9c46a24..0cb50797ef7 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -443,4 +443,90 @@ test_expect_success 'get_reachable_subset:none' '
 	test_all_modes get_reachable_subset
 '
 
+test_expect_success 'for-each-ref ahead-behind:linear' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-1-3
+	refs/heads/commit-1-5
+	refs/heads/commit-1-8
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1 0 8
+	refs/heads/commit-1-3 0 6
+	refs/heads/commit-1-5 0 4
+	refs/heads/commit-1-8 0 1
+	EOF
+	run_all_modes git for-each-ref \
+		--format="%(refname) %(ahead-behind:commit-1-9)" --stdin
+'
+
+test_expect_success 'for-each-ref ahead-behind:all' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-2-4
+	refs/heads/commit-4-2
+	refs/heads/commit-4-4
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1 0 24
+	refs/heads/commit-2-4 0 17
+	refs/heads/commit-4-2 0 17
+	refs/heads/commit-4-4 0 9
+	EOF
+	run_all_modes git for-each-ref \
+		--format="%(refname) %(ahead-behind:commit-5-5)" --stdin
+'
+
+test_expect_success 'for-each-ref ahead-behind:some' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-5-3
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1 0 53
+	refs/heads/commit-4-8 8 30
+	refs/heads/commit-5-3 0 39
+	refs/heads/commit-9-9 27 0
+	EOF
+	run_all_modes git for-each-ref \
+		--format="%(refname) %(ahead-behind:commit-9-6)" --stdin
+'
+
+test_expect_success 'for-each-ref ahead-behind:some, multibase' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-5-3
+	refs/heads/commit-7-8
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1 0 53 0 53
+	refs/heads/commit-4-8 8 30 0 22
+	refs/heads/commit-5-3 0 39 0 39
+	refs/heads/commit-7-8 14 12 8 6
+	refs/heads/commit-9-9 27 0 27 0
+	EOF
+	run_all_modes git for-each-ref \
+		--format="%(refname) %(ahead-behind:commit-9-6) %(ahead-behind:commit-6-9)" \
+		--stdin
+'
+
+test_expect_success 'for-each-ref ahead-behind:none' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-7-5
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-4-8 16 16
+	refs/heads/commit-7-5 7 4
+	refs/heads/commit-9-9 49 0
+	EOF
+	run_all_modes git for-each-ref \
+		--format="%(refname) %(ahead-behind:commit-8-4)" --stdin
+'
+
 test_done
diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index 9aa1660651b..d099e707efd 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -792,6 +792,34 @@ test_expect_success 'annotations for blobs are empty' '
 	test_cmp expect actual
 '
 
+# Run this before doing any signing, so the test has the same results
+# regardless of the GPG prereq.
+test_expect_success 'git tag --format with ahead-behind' '
+	test_when_finished git reset --hard tag-one-line &&
+	git commit --allow-empty -m "left" &&
+	git tag -a -m left tag-left &&
+	git reset --hard HEAD~1 &&
+	git commit --allow-empty -m "right" &&
+	git tag -a -m left tag-right &&
+
+	# Use " !" at the end to demonstrate whitepsace
+	# around empty ahead-behind token for tag-blob.
+	cat >expect <<-EOF &&
+	refs/tags/tag-blob  !
+	refs/tags/tag-left 1 1 !
+	refs/tags/tag-lines 0 1 !
+	refs/tags/tag-one-line 0 1 !
+	refs/tags/tag-right 0 0 !
+	refs/tags/tag-zero-lines 0 1 !
+	EOF
+	git tag -l --format="%(refname) %(ahead-behind:HEAD) !" >actual 2>err &&
+	grep "refs/tags/tag" actual >actual.focus &&
+	test_cmp expect actual.focus &&
+
+	# Error reported for tags that point to non-commits.
+	grep "error: object [0-9a-f]* is a blob, not a commit" err
+'
+
 # trying to verify annotated non-signed tags:
 
 test_expect_success GPG \
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v2 8/8] commit-reach: add tips_reachable_from_bases()
  2023-03-10 17:20 ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
                     ` (6 preceding siblings ...)
  2023-03-10 17:21   ` [PATCH v2 7/8] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
@ 2023-03-10 17:21   ` Derrick Stolee via GitGitGadget
  2023-03-15 14:13     ` Ævar Arnfjörð Bjarmason
  2023-03-10 19:16   ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Junio C Hamano
                     ` (2 subsequent siblings)
  10 siblings, 1 reply; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-10 17:21 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, Jeff King, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Both 'git for-each-ref --merged=<X>' and 'git branch --merged=<X>' use
the ref-filter machinery to select references or branches (respectively)
that are reachable from a set of commits presented by one or more
--merged arguments. This happens within reach_filter(), which uses the
revision-walk machinery to walk history in a standard way.

However, the commit-reach.c file is full of custom searches that are
more efficient, especially for reachability queries that can terminate
early when reachability is discovered. Add a new
tips_reachable_from_bases() method to commit-reach.c and call it from
within reach_filter() in ref-filter.c. This affects both 'git branch'
and 'git for-each-ref' as tested in p1500-graph-walks.sh.

For the Linux kernel repository, we take an already-fast algorithm and
make it even faster:

Test                                            HEAD~1  HEAD
-------------------------------------------------------------------
1500.5: contains: git for-each-ref --merged     0.13    0.02 -84.6%
1500.6: contains: git branch --merged           0.14    0.02 -85.7%
1500.7: contains: git tag --merged              0.15    0.03 -80.0%

(Note that we remove the iterative 'git rev-list' test from p1500
because it no longer makes sense as a comparison to 'git for-each-ref'
and would just waste time running it for these comparisons.)

The algorithm is implemented in commit-reach.c in the method
tips_reachable_from_base(). This method takes a string_list of tips and
assigns the 'util' for each item with the value 1 if the base commit can
reach those tips.

Like other reachability queries in commit-reach.c, the fastest way to
search for "can A reach B?" is to do a depth-first search up to the
generation number of B, preferring to explore first parents before later
parents. While we must walk all reachable commits up to that generation
number when the answer is "no", the depth-first search can answer "yes"
much faster than other approaches in most cases.

This search becomes trickier when there are multiple targets for the
depth-first search. The commits with lower generation number are more
likely to be within the history of the start commit, but we don't want
to waste time searching commits of low generation number if the commit
target with lowest generation number has already been found.

The trick here is to take the input commits and sort them by generation
number in ascending order. Track the index within this order as
min_generation_index. When we find a commit, if its index in the list is
equal to min_generation_index, then we can increase the generation
number boundary of our search to the next-lowest value in the list.

With this mechanism, the number of commits to search is minimized with
respect to the depth-first search heuristic. We will walk all commits up
to the minimum generation number of a commit that is _not_ reachable
from the start, but we will walk only the necessary portion of the
depth-first search for the reachable commits of lower generation.

Add extra tests for this behavior in t6600-test-reach.sh as the
interesting data shape of that repository can sometimes demonstrate
corner case bugs.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-reach.c              | 114 ++++++++++++++++++++++++++++++++++++
 commit-reach.h              |   8 +++
 ref-filter.c                |  19 +-----
 t/perf/p1500-graph-walks.sh |  15 +++--
 t/t6600-test-reach.sh       |  83 ++++++++++++++++++++++++++
 5 files changed, 218 insertions(+), 21 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 338ca8084b2..f6c4a3c93c7 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1037,3 +1037,117 @@ void ahead_behind(struct commit **commits, size_t commits_nr,
 	clear_bit_arrays(&bit_arrays);
 	clear_prio_queue(&queue);
 }
+
+struct commit_and_index {
+	struct commit *commit;
+	unsigned int index;
+	timestamp_t generation;
+};
+
+static int compare_commit_and_index_by_generation(const void *va, const void *vb)
+{
+	const struct commit_and_index *a = (const struct commit_and_index *)va;
+	const struct commit_and_index *b = (const struct commit_and_index *)vb;
+
+	if (a->generation > b->generation)
+		return 1;
+	if (a->generation < b->generation)
+		return -1;
+	return 0;
+}
+
+void tips_reachable_from_bases(struct commit_list *bases,
+			       struct commit **tips, size_t tips_nr,
+			       int mark)
+{
+	size_t i;
+	struct commit_and_index *commits;
+	unsigned int min_generation_index = 0;
+	timestamp_t min_generation;
+	struct commit_list *stack = NULL;
+
+	if (!bases || !tips || !tips_nr)
+		return;
+
+	/*
+	 * Do a depth-first search starting at 'bases' to search for the
+	 * tips. Stop at the lowest (un-found) generation number. When
+	 * finding the lowest commit, increase the minimum generation
+	 * number to the next lowest (un-found) generation number.
+	 */
+
+	CALLOC_ARRAY(commits, tips_nr);
+
+	for (i = 0; i < tips_nr; i++) {
+		commits[i].commit = tips[i];
+		commits[i].index = i;
+		commits[i].generation = commit_graph_generation(tips[i]);
+	}
+
+	/* Sort with generation number ascending. */
+	QSORT(commits, tips_nr, compare_commit_and_index_by_generation);
+	min_generation = commits[0].generation;
+
+	while (bases) {
+		parse_commit(bases->item);
+		commit_list_insert(bases->item, &stack);
+		bases = bases->next;
+	}
+
+	while (stack) {
+		unsigned int j;
+		int explored_all_parents = 1;
+		struct commit_list *p;
+		struct commit *c = stack->item;
+		timestamp_t c_gen = commit_graph_generation(c);
+
+		/* Does it match any of our tips? */
+		for (j = min_generation_index; j < tips_nr; j++) {
+			if (c_gen < commits[j].generation)
+				break;
+
+			if (commits[j].commit == c) {
+				tips[commits[j].index]->object.flags |= mark;
+
+				if (j == min_generation_index) {
+					unsigned int k = j + 1;
+					while (k < tips_nr &&
+					       (tips[commits[k].index]->object.flags & mark))
+						k++;
+
+					/* Terminate early if all found. */
+					if (k >= tips_nr)
+						goto done;
+
+					min_generation_index = k;
+					min_generation = commits[k].generation;
+				}
+			}
+		}
+
+		for (p = c->parents; p; p = p->next) {
+			parse_commit(p->item);
+
+			/* Have we already explored this parent? */
+			if (p->item->object.flags & SEEN)
+				continue;
+
+			/* Is it below the current minimum generation? */
+			if (commit_graph_generation(p->item) < min_generation)
+				continue;
+
+			/* Ok, we will explore from here on. */
+			p->item->object.flags |= SEEN;
+			explored_all_parents = 0;
+			commit_list_insert(p->item, &stack);
+			break;
+		}
+
+		if (explored_all_parents)
+			pop_commit(&stack);
+	}
+
+done:
+	free(commits);
+	repo_clear_commit_marks(the_repository, SEEN);
+}
diff --git a/commit-reach.h b/commit-reach.h
index f871b5dcce9..14043ed8562 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -134,4 +134,12 @@ struct ahead_behind_count {
 void ahead_behind(struct commit **commits, size_t commits_nr,
 		  struct ahead_behind_count *counts, size_t counts_nr);
 
+/*
+ * For all tip commits, add 'mark' to their flags if and only if they
+ * are reachable from one of the commits in 'bases'.
+ */
+void tips_reachable_from_bases(struct commit_list *bases,
+			       struct commit **tips, size_t tips_nr,
+			       int mark);
+
 #endif
diff --git a/ref-filter.c b/ref-filter.c
index 896bf703f59..ece77d7e8ba 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2390,33 +2390,21 @@ static void reach_filter(struct ref_array *array,
 			 struct commit_list *check_reachable,
 			 int include_reached)
 {
-	struct rev_info revs;
 	int i, old_nr;
 	struct commit **to_clear;
-	struct commit_list *cr;
 
 	if (!check_reachable)
 		return;
 
 	CALLOC_ARRAY(to_clear, array->nr);
-
-	repo_init_revisions(the_repository, &revs, NULL);
-
 	for (i = 0; i < array->nr; i++) {
 		struct ref_array_item *item = array->items[i];
-		add_pending_object(&revs, &item->commit->object, item->refname);
 		to_clear[i] = item->commit;
 	}
 
-	for (cr = check_reachable; cr; cr = cr->next) {
-		struct commit *merge_commit = cr->item;
-		merge_commit->object.flags |= UNINTERESTING;
-		add_pending_object(&revs, &merge_commit->object, "");
-	}
-
-	revs.limited = 1;
-	if (prepare_revision_walk(&revs))
-		die(_("revision walk setup failed"));
+	tips_reachable_from_bases(check_reachable,
+				  to_clear, array->nr,
+				  UNINTERESTING);
 
 	old_nr = array->nr;
 	array->nr = 0;
@@ -2440,7 +2428,6 @@ static void reach_filter(struct ref_array *array,
 		clear_commit_marks(merge_commit, ALL_REV_FLAGS);
 	}
 
-	release_revisions(&revs);
 	free(to_clear);
 }
 
diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
index 439a448c2e6..e14e7620cce 100755
--- a/t/perf/p1500-graph-walks.sh
+++ b/t/perf/p1500-graph-walks.sh
@@ -35,11 +35,16 @@ test_perf 'ahead-behind counts: git tag' '
 	xargs git tag -l --format="%(ahead-behind:HEAD)" <tags
 '
 
-test_perf 'ahead-behind counts: git rev-list' '
-	for r in $(cat refs)
-	do
-		git rev-list --count "HEAD..$r" || return 1
-	done
+test_perf 'contains: git for-each-ref --merged' '
+	git for-each-ref --merged=HEAD --stdin <refs
+'
+
+test_perf 'contains: git branch --merged' '
+	xargs git branch --merged=HEAD <branches
+'
+
+test_perf 'contains: git tag --merged' '
+	xargs git tag --merged=HEAD <tags
 '
 
 test_done
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 0cb50797ef7..b330945f497 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -529,4 +529,87 @@ test_expect_success 'for-each-ref ahead-behind:none' '
 		--format="%(refname) %(ahead-behind:commit-8-4)" --stdin
 '
 
+test_expect_success 'for-each-ref merged:linear' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-1-3
+	refs/heads/commit-1-5
+	refs/heads/commit-1-8
+	refs/heads/commit-2-1
+	refs/heads/commit-5-1
+	refs/heads/commit-9-1
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-1-3
+	refs/heads/commit-1-5
+	refs/heads/commit-1-8
+	EOF
+	run_all_modes git for-each-ref --merged=commit-1-9 \
+		--format="%(refname)" --stdin
+'
+
+test_expect_success 'for-each-ref merged:all' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-2-4
+	refs/heads/commit-4-2
+	refs/heads/commit-4-4
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-2-4
+	refs/heads/commit-4-2
+	refs/heads/commit-4-4
+	EOF
+	run_all_modes git for-each-ref --merged=commit-5-5 \
+		--format="%(refname)" --stdin
+'
+
+test_expect_success 'for-each-ref ahead-behind:some' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-5-3
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-5-3
+	EOF
+	run_all_modes git for-each-ref --merged=commit-9-6 \
+		--format="%(refname)" --stdin
+'
+
+test_expect_success 'for-each-ref merged:some, multibase' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-5-3
+	refs/heads/commit-7-8
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-4-8
+	refs/heads/commit-5-3
+	EOF
+	run_all_modes git for-each-ref \
+		--merged=commit-5-8 \
+		--merged=commit-8-5 \
+		--format="%(refname)" \
+		--stdin
+'
+
+test_expect_success 'for-each-ref merged:none' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-7-5
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	>expect &&
+	run_all_modes git for-each-ref --merged=commit-8-4 \
+		--format="%(refname)" --stdin
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] for-each-ref: add --stdin option
  2023-03-10 17:20   ` [PATCH v2 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
@ 2023-03-10 18:08     ` Junio C Hamano
  2023-03-13 10:31     ` Phillip Wood
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 90+ messages in thread
From: Junio C Hamano @ 2023-03-10 18:08 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, me, vdye, Jeff King, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> When reading from stdin, we populate the filter.name_patterns array
> dynamically as opposed to pointing to the 'argv' array directly. This
> requires a careful cast while freeing the individual strings,
> conditioned on the --stdin option.

Indeed.  Thanks for carefully describing the concerns you had.

Looking good.  I'll read on.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 7/8] for-each-ref: add ahead-behind format atom
  2023-03-10 17:21   ` [PATCH v2 7/8] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
@ 2023-03-10 19:09     ` Junio C Hamano
  2023-03-15 13:57     ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 90+ messages in thread
From: Junio C Hamano @ 2023-03-10 19:09 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, me, vdye, Jeff King, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Test                                            this tree
> ---------------------------------------------------------------
> 1500.2: ahead-behind counts: git for-each-ref   0.27(0.24+0.02)
> 1500.3: ahead-behind counts: git branch         0.27(0.24+0.03)
> 1500.4: ahead-behind counts: git tag            0.28(0.27+0.01)
> 1500.5: ahead-behind counts: git rev-list       4.57(4.03+0.54)
>
> The 'git rev-list' test exists in this change as a demonstration, but it
> will be removed in the next change to avoid wasting time on this
> comparison.

Nice.

> +ahead-behind:<ref>::
> +	Two integers, separated by a space, demonstrating the number of
> +	commits ahead and behind, respectively, when comparing the output
> +	ref to the `<ref>` specified in the format.

Don't we take any commit-ish, not necessarily a ref?  In the context
of for-each-ref documentation, I am afraid that the readers assume
that the word refers to a ref and %(ahead-behind:ea6e93913b) and the
like are forbidden, which is not what you wanted when you used
lookup_commit_reference_by_name() in the implementation.

> +	# Use " !" at the end to demonstrate whitepsace

psace.

> +	# around empty ahead-behind token for tag-blob.

;-)

> +	cat >expect <<-EOF &&
> +	refs/tags/tag-blob  !
> +	refs/tags/tag-left 1 1 !
> +	refs/tags/tag-lines 0 1 !
> +	refs/tags/tag-one-line 0 1 !
> +	refs/tags/tag-right 0 0 !
> +	refs/tags/tag-zero-lines 0 1 !
> +	EOF
> +	git tag -l --format="%(refname) %(ahead-behind:HEAD) !" >actual 2>err &&
> +	grep "refs/tags/tag" actual >actual.focus &&
> +	test_cmp expect actual.focus &&
> +
> +	# Error reported for tags that point to non-commits.
> +	grep "error: object [0-9a-f]* is a blob, not a commit" err
> +'
> +
>  # trying to verify annotated non-signed tags:
>  
>  test_expect_success GPG \

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option
  2023-03-10 17:20 ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
                     ` (7 preceding siblings ...)
  2023-03-10 17:21   ` [PATCH v2 8/8] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget
@ 2023-03-10 19:16   ` Junio C Hamano
  2023-03-10 19:25     ` Derrick Stolee
  2023-03-15 13:22   ` Ævar Arnfjörð Bjarmason
  2023-03-15 17:45   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  10 siblings, 1 reply; 90+ messages in thread
From: Junio C Hamano @ 2023-03-10 19:16 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, me, vdye, Jeff King, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> I was
> initially concerned about the overhead of 'git for-each-ref' and its
> generality and sorting, but I was not able to measure any important
> difference between this implementation and our internal 'git ahead-behind'
> implementation.

That certainly is nice to know.

> However, for our specific uses, we like to batch a list of exact references
> that could be very long. We introduce a new --stdin option here.
>
> To keep things close to the v1 outline, I replaced the existing patches with
> closely-related ones, when possible.
>
> Patch 1 adds the --stdin option to 'git for-each-ref'. (This is similar to
> the boilerplate patch from v1.)
>
> Patch 2 adds a test to explicitly check that 'git for-each-ref' will still
> succeed when all input refs are missing. (This is similar to the
> --ignore-missing patch from v1.)

Sensible.

> Patches 3-5 introduce a new method: ensure_generations_valid(). Patch 3 does
> some refactoring of the existing generation number computations to make it
> more generic, and patch 4 updates the definition of
> commit_graph_generation() slightly, making way for patch 5 to implement the
> method. With an existing commit-graph file, the commits that are not present
> in the file are considered as having generation number "infinity". This is
> useful for most of our reachability queries to this point, since those
> commits are "above" the ones tracked by the commit-graph. When these commits
> are low in number, then there is very little performance cost and zero
> correctness cost. (These patches match v1 exactly.)
>
> However, we will see that the ahead/behind computation requires accurate
> generation numbers to avoid overcounting. Thus, ensure_generations_valid()
> is a way to specify a list of commits that need generation numbers computed
> before continuing. It's a no-op if all of those commits are in the
> commit-graph file. It's expensive if the commit-graph doesn't exist.

Reasonable.

> However, '%(ahead-behind:)' computations are likely to be slow no matter
> what without a commit-graph, so assuming an existing commit-graph file is
> reasonable. If we find sufficient desire to have an implementation that does
> not have this requirement, we could create a second implementation and
> toggle to it when generation_numbers_enabled() returns false.

At that point, it might make sense to find a way to make the work
ensure_generations_valid() had to spend cycles on not to go to
waste.  Something like "ah, you do not have commit-graph at all, so
let's try to create one if you can write into the repository" at the
beginning of the function, or something?  Just thinking aloud.

> Patch 6 implements the ahead-behind algorithm, but it is not connected to a
> builtin. It's a long commit message, so hopefully it explains the algorithm
> sufficiently. (The difference from v1 is that it no longer integrates with a
> builtin and there are no new tests. It also uses 'unsigned int' and is
> correctly co-authored by Taylor.)

Nice.

> Patch 7 integrates the ahead-behind algorithm with the ref-filter code,
> including parsing the "ahead-behind" token. This finally adds tests that
> check both ahead_behind() and ensure_generations_valid() via
> t6600-test-reach.sh. (This patch is essentially completely new in v2.)
>
> Patch 8 implements the tips_reachable_from_base() method, and uses it within
> the ref-filter code to speed up 'git for-each-ref --merged' and 'git branch
> --merged'. (The interface is slightly different than v1, due to the needs of
> the new caller.)

Very nice.

Having read all the patches, I am very impressed and pleased, but
are we losing anything by having the feature inside for-each-ref
compared to a new command ahead-behind?  As far as I can tell, the
new "for-each-ref --stdin" would still want to match refs and work
only on refs, but there shouldn't be any reason for ahead-behind
computation to limit to tips that are at the tip of a ref, so that
may be one downside in this updated design.  For the intended use
case of "let's find which branches are stale", that downside does
not matter in practice, but for other use cases people will think
of in the future, the limitation might matter (at which time we can
easily resurrect the other subcommand, using the internal machinery
we have here, so it is not a huge deal, I presume).

Thanks.


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option
  2023-03-10 19:16   ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Junio C Hamano
@ 2023-03-10 19:25     ` Derrick Stolee
  2023-03-15 17:31       ` Jeff King
  0 siblings, 1 reply; 90+ messages in thread
From: Derrick Stolee @ 2023-03-10 19:25 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget; +Cc: git, me, vdye, Jeff King

On 3/10/2023 2:16 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>> However, '%(ahead-behind:)' computations are likely to be slow no matter
>> what without a commit-graph, so assuming an existing commit-graph file is
>> reasonable. If we find sufficient desire to have an implementation that does
>> not have this requirement, we could create a second implementation and
>> toggle to it when generation_numbers_enabled() returns false.
> 
> At that point, it might make sense to find a way to make the work
> ensure_generations_valid() had to spend cycles on not to go to
> waste.  Something like "ah, you do not have commit-graph at all, so
> let's try to create one if you can write into the repository" at the
> beginning of the function, or something?  Just thinking aloud.

This is a reasonable idea for helping users get out of a slow
situation. Let's see how often this is a problem to see if it
is worth doing that extra work in another series.

> Having read all the patches, I am very impressed and pleased, but
> are we losing anything by having the feature inside for-each-ref
> compared to a new command ahead-behind?  As far as I can tell, the
> new "for-each-ref --stdin" would still want to match refs and work
> only on refs, but there shouldn't be any reason for ahead-behind
> computation to limit to tips that are at the tip of a ref, so that
> may be one downside in this updated design.  For the intended use
> case of "let's find which branches are stale", that downside does
> not matter in practice, but for other use cases people will think
> of in the future, the limitation might matter (at which time we can
> easily resurrect the other subcommand, using the internal machinery
> we have here, so it is not a huge deal, I presume).

I think the for-each-ref implementation solves the use case we
had in mind, I think. I'll double-check to see if we ever use
exact commit IDs instead of reference names, but I think these
callers are rarely interested in an exact commit ID but instead
want the latest version of refs.

The idea of using committish tips definitely changes the
functionality boundary. You are right that we can introduce a
new builtin easily if that is necessary. Even without the
ahead-behind builtin, we are succeeding in reducing the diff
between our fork and the core project.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] for-each-ref: add --stdin option
  2023-03-10 17:20   ` [PATCH v2 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
  2023-03-10 18:08     ` Junio C Hamano
@ 2023-03-13 10:31     ` Phillip Wood
  2023-03-13 13:33       ` Derrick Stolee
  2023-03-15 13:37     ` Ævar Arnfjörð Bjarmason
  2023-03-15 17:49     ` Jeff King
  3 siblings, 1 reply; 90+ messages in thread
From: Phillip Wood @ 2023-03-13 10:31 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, vdye, Jeff King, Derrick Stolee

Hi Stolee

On 10/03/2023 17:20, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> When a user wishes to input a large list of patterns to 'git
> for-each-ref' (likely a long list of exact refs) there are frequently
> system limits on the number of command-line arguments.
> 
> Add a new --stdin option to instead read the patterns from standard
> input. Add tests that check that any unrecognized arguments are
> considered an error when --stdin is provided. Also, an empty pattern
> list is interpreted as the complete ref set.
> 
> When reading from stdin, we populate the filter.name_patterns array
> dynamically as opposed to pointing to the 'argv' array directly. This
> requires a careful cast while freeing the individual strings,
> conditioned on the --stdin option.

I think what you've got here is fine, but if you wanted you could 
simplify it by using an strvec. Something like

	struct strvec vec = STRVEC_INIT;

	...

	if (from_stdin) {
		struct strbuf buf = STRBUF_INIT;

		if (argv[0])
			die(_("unknown arguments supplied with --stdin"));

		while (strbuf_getline(&line, stdin) != EOF)
			strvec_push(&vec, buf.buf);

		filter.name_patters = vec.v;
	} else {
		filter.name_patterns = argv;
	}

	...

	strvec_clear(&vec);

gets rid of the manual memory management with ALLOC_GROW() and the need 
to cast filter.name_patterns when free()ing. It is not immediately 
obvious from the name but struct strvec keeps the array NULL terminated.

Best Wishes

Phillip

> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>   Documentation/git-for-each-ref.txt |  7 +++++-
>   builtin/for-each-ref.c             | 29 ++++++++++++++++++++++-
>   t/t6300-for-each-ref.sh            | 37 ++++++++++++++++++++++++++++++
>   3 files changed, 71 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
> index 6da899c6296..ccdc2911bb9 100644
> --- a/Documentation/git-for-each-ref.txt
> +++ b/Documentation/git-for-each-ref.txt
> @@ -9,7 +9,8 @@ SYNOPSIS
>   --------
>   [verse]
>   'git for-each-ref' [--count=<count>] [--shell|--perl|--python|--tcl]
> -		   [(--sort=<key>)...] [--format=<format>] [<pattern>...]
> +		   [(--sort=<key>)...] [--format=<format>]
> +		   [ --stdin | <pattern>... ]
>   		   [--points-at=<object>]
>   		   [--merged[=<object>]] [--no-merged[=<object>]]
>   		   [--contains[=<object>]] [--no-contains[=<object>]]
> @@ -32,6 +33,10 @@ OPTIONS
>   	literally, in the latter case matching completely or from the
>   	beginning up to a slash.
>   
> +--stdin::
> +	If `--stdin` is supplied, then the list of patterns is read from
> +	standard input instead of from the argument list.
> +
>   --count=<count>::
>   	By default the command shows all refs that match
>   	`<pattern>`.  This option makes it stop after showing
> diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> index 6f62f40d126..e005a7ef3ce 100644
> --- a/builtin/for-each-ref.c
> +++ b/builtin/for-each-ref.c
> @@ -25,6 +25,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>   	struct ref_format format = REF_FORMAT_INIT;
>   	struct strbuf output = STRBUF_INIT;
>   	struct strbuf err = STRBUF_INIT;
> +	int from_stdin = 0;
>   
>   	struct option opts[] = {
>   		OPT_BIT('s', "shell", &format.quote_style,
> @@ -49,6 +50,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>   		OPT_CONTAINS(&filter.with_commit, N_("print only refs which contain the commit")),
>   		OPT_NO_CONTAINS(&filter.no_commit, N_("print only refs which don't contain the commit")),
>   		OPT_BOOL(0, "ignore-case", &icase, N_("sorting and filtering are case insensitive")),
> +		OPT_BOOL(0, "stdin", &from_stdin, N_("read reference patterns from stdin")),
>   		OPT_END(),
>   	};
>   
> @@ -75,7 +77,27 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>   	ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
>   	filter.ignore_case = icase;
>   
> -	filter.name_patterns = argv;
> +	if (from_stdin) {
> +		struct strbuf line = STRBUF_INIT;
> +		size_t nr = 0, alloc = 16;
> +
> +		if (argv[0])
> +			die(_("unknown arguments supplied with --stdin"));
> +
> +		CALLOC_ARRAY(filter.name_patterns, alloc);
> +
> +		while (strbuf_getline(&line, stdin) != EOF) {
> +			ALLOC_GROW(filter.name_patterns, nr + 1, alloc);
> +			filter.name_patterns[nr++] = strbuf_detach(&line, NULL);
> +		}
> +
> +		/* Add a terminating NULL string. */
> +		ALLOC_GROW(filter.name_patterns, nr + 1, alloc);
> +		filter.name_patterns[nr + 1] = NULL;
> +	} else {
> +		filter.name_patterns = argv;
> +	}
> +
>   	filter.match_as_path = 1;
>   	filter_refs(&array, &filter, FILTER_REFS_ALL);
>   	ref_array_sort(sorting, &array);
> @@ -97,5 +119,10 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>   	free_commit_list(filter.with_commit);
>   	free_commit_list(filter.no_commit);
>   	ref_sorting_release(sorting);
> +	if (from_stdin) {
> +		for (size_t i = 0; filter.name_patterns[i]; i++)
> +			free((char *)filter.name_patterns[i]);
> +		free(filter.name_patterns);
> +	}
>   	return 0;
>   }
> diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
> index c466fd989f1..a58053a54c5 100755
> --- a/t/t6300-for-each-ref.sh
> +++ b/t/t6300-for-each-ref.sh
> @@ -1464,4 +1464,41 @@ sig_crlf="$(printf "%s" "$sig" | append_cr; echo dummy)"
>   sig_crlf=${sig_crlf%dummy}
>   test_atom refs/tags/fake-sig-crlf contents:signature "$sig_crlf"
>   
> +test_expect_success 'git for-each-ref --stdin: empty' '
> +	>in &&
> +	git for-each-ref --format="%(refname)" --stdin <in >actual &&
> +	git for-each-ref --format="%(refname)" >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'git for-each-ref --stdin: fails if extra args' '
> +	>in &&
> +	test_must_fail git for-each-ref --format="%(refname)" \
> +		--stdin refs/heads/extra <in 2>err &&
> +	grep "unknown arguments supplied with --stdin" err
> +'
> +
> +test_expect_success 'git for-each-ref --stdin: matches' '
> +	cat >in <<-EOF &&
> +	refs/tags/multi*
> +	refs/heads/amb*
> +	EOF
> +
> +	cat >expect <<-EOF &&
> +	refs/heads/ambiguous
> +	refs/tags/multi-ref1-100000-user1
> +	refs/tags/multi-ref1-100000-user2
> +	refs/tags/multi-ref1-200000-user1
> +	refs/tags/multi-ref1-200000-user2
> +	refs/tags/multi-ref2-100000-user1
> +	refs/tags/multi-ref2-100000-user2
> +	refs/tags/multi-ref2-200000-user1
> +	refs/tags/multi-ref2-200000-user2
> +	refs/tags/multiline
> +	EOF
> +
> +	git for-each-ref --format="%(refname)" --stdin <in >actual &&
> +	test_cmp expect actual
> +'
> +
>   test_done

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] for-each-ref: add --stdin option
  2023-03-13 10:31     ` Phillip Wood
@ 2023-03-13 13:33       ` Derrick Stolee
  2023-03-13 21:10         ` Taylor Blau
  0 siblings, 1 reply; 90+ messages in thread
From: Derrick Stolee @ 2023-03-13 13:33 UTC (permalink / raw)
  To: phillip.wood, Derrick Stolee via GitGitGadget, git
  Cc: gitster, me, vdye, Jeff King

On 3/13/2023 6:31 AM, Phillip Wood wrote:
> Hi Stolee
> 
> On 10/03/2023 17:20, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> When a user wishes to input a large list of patterns to 'git
>> for-each-ref' (likely a long list of exact refs) there are frequently
>> system limits on the number of command-line arguments.
>>
>> Add a new --stdin option to instead read the patterns from standard
>> input. Add tests that check that any unrecognized arguments are
>> considered an error when --stdin is provided. Also, an empty pattern
>> list is interpreted as the complete ref set.
>>
>> When reading from stdin, we populate the filter.name_patterns array
>> dynamically as opposed to pointing to the 'argv' array directly. This
>> requires a careful cast while freeing the individual strings,
>> conditioned on the --stdin option.
> 
> I think what you've got here is fine, but if you wanted you could simplify it by using an strvec. Something like
> 
>     struct strvec vec = STRVEC_INIT;
> 
>     ...
> 
>     if (from_stdin) {
>         struct strbuf buf = STRBUF_INIT;
> 
>         if (argv[0])
>             die(_("unknown arguments supplied with --stdin"));
> 
>         while (strbuf_getline(&line, stdin) != EOF)
>             strvec_push(&vec, buf.buf);
> 
>         filter.name_patters = vec.v;
>     } else {
>         filter.name_patterns = argv;
>     }
> 
>     ...
> 
>     strvec_clear(&vec);
> 
> gets rid of the manual memory management with ALLOC_GROW() and the need to cast filter.name_patterns when free()ing. It is not immediately obvious from the name but struct strvec keeps the array NULL terminated.

Thanks, Philip. I like your version a lot and will use
it in the next version.

-Stolee

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] for-each-ref: add --stdin option
  2023-03-13 13:33       ` Derrick Stolee
@ 2023-03-13 21:10         ` Taylor Blau
  0 siblings, 0 replies; 90+ messages in thread
From: Taylor Blau @ 2023-03-13 21:10 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: phillip.wood, Derrick Stolee via GitGitGadget, git, gitster, vdye,
	Jeff King

On Mon, Mar 13, 2023 at 09:33:29AM -0400, Derrick Stolee wrote:
> Thanks, Philip. I like your version a lot and will use
> it in the next version.

Ditto. Thanks, both.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option
  2023-03-10 17:20 ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
                     ` (8 preceding siblings ...)
  2023-03-10 19:16   ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Junio C Hamano
@ 2023-03-15 13:22   ` Ævar Arnfjörð Bjarmason
  2023-03-15 13:54     ` Derrick Stolee
  2023-03-15 17:45   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  10 siblings, 1 reply; 90+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2023-03-15 13:22 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, vdye, Jeff King, Derrick Stolee


On Fri, Mar 10 2023, Derrick Stolee via GitGitGadget wrote:

> At $DAYJOB, we have used a custom 'ahead-behind' builtin in our fork of Git
> for lots of reasons. The main goal of the builtin is to compare multiple
> references against a common base reference. The comparison is number of
> commits that are in each side of the symmtric difference of their reachable
> sets. A commit C is "ahead" of a commit B by the number of commits in B..C
> (reachable from C but not reachable from B). Similarly, the commit C is
> "behind" the commit B by the number of commits in C..B (reachable from B but
> not reachable from C).

I have a local change to get rid of the various "the_repository" macros,
which a merge of this in "seen" conflicted with (semantically).

The below patch on top of "seen" will fix it, could you please squash it
in in the appropriate places?

Aside from a desire to get rid of "the_repository" macros this also
makes your own code consistent, i.e. you use repo_clear_commit_marks(),
but then use parse_commit() instead of the repo_parse_commit() in the
same function.

It also makes the new API more future-proof, I don't think we should be
adding new code that implicitly uses "the_repository" to our libraries
if we can help it, much better to pass it down, even if all the current
users are built-in that end up using "the_repository".

diff --git a/builtin/branch.c b/builtin/branch.c
index 21526d9883a..7c7dba839cf 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -448,7 +448,7 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
 	if (verify_ref_format(format))
 		die(_("unable to parse format string"));
 
-	filter_ahead_behind(format, &array);
+	filter_ahead_behind(the_repository, format, &array);
 	ref_array_sort(sorting, &array);
 
 	for (i = 0; i < array.nr; i++) {
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 1cdf8eb5a6b..e097f44e226 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -102,7 +102,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 
 	filter.match_as_path = 1;
 	filter_refs(&array, &filter, FILTER_REFS_ALL);
-	filter_ahead_behind(&format, &array);
+	filter_ahead_behind(the_repository, &format, &array);
 
 	ref_array_sort(sorting, &array);
 
diff --git a/builtin/tag.c b/builtin/tag.c
index 8652d5edd47..7e2f686600a 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -67,7 +67,7 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 		die(_("unable to parse format string"));
 	filter->with_commit_tag_algo = 1;
 	filter_refs(&array, filter, FILTER_REFS_TAGS);
-	filter_ahead_behind(format, &array);
+	filter_ahead_behind(the_repository, format, &array);
 	ref_array_sort(sorting, &array);
 
 	for (i = 0; i < array.nr; i++) {
diff --git a/commit-reach.c b/commit-reach.c
index 0abd43801fc..4a2216b8ae0 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -977,8 +977,9 @@ static void free_bit_array(struct commit *c)
 	*bitmap = NULL;
 }
 
-void ahead_behind(struct commit **commits, size_t commits_nr,
-		  struct ahead_behind_count *counts, size_t counts_nr)
+void ahead_behind(struct repository *r, struct commit **commits,
+		  size_t commits_nr, struct ahead_behind_count *counts,
+		  size_t counts_nr)
 {
 	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
 	size_t width = (commits_nr + BITS_IN_EWORD - 1) / BITS_IN_EWORD;
@@ -1024,7 +1025,7 @@ void ahead_behind(struct commit **commits, size_t commits_nr,
 		for (p = c->parents; p; p = p->next) {
 			struct bitmap *bitmap_p;
 
-			parse_commit(p->item);
+			repo_parse_commit(r, p->item);
 
 			bitmap_p = init_bit_array(p->item, width);
 			bitmap_or(bitmap_p, bitmap_c);
@@ -1039,7 +1040,7 @@ void ahead_behind(struct commit **commits, size_t commits_nr,
 	}
 
 	/* STALE is used here, PARENT2 is used by insert_no_dup(). */
-	repo_clear_commit_marks(the_repository, PARENT2 | STALE);
+	repo_clear_commit_marks(r, PARENT2 | STALE);
 	clear_bit_arrays(&bit_arrays);
 	clear_prio_queue(&queue);
 }
diff --git a/commit-reach.h b/commit-reach.h
index f871b5dcce9..2269fab8261 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -131,7 +131,8 @@ struct ahead_behind_count {
  * Given an array of commits and an array of ahead_behind_count pairs,
  * compute the ahead/behind counts for each pair.
  */
-void ahead_behind(struct commit **commits, size_t commits_nr,
-		  struct ahead_behind_count *counts, size_t counts_nr);
+void ahead_behind(struct repository *r, struct commit **commits,
+		  size_t commits_nr, struct ahead_behind_count *counts,
+		  size_t counts_nr);
 
 #endif
diff --git a/ref-filter.c b/ref-filter.c
index 4125ec3fd3a..cdc054beede 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2462,7 +2462,7 @@ static void reach_filter(struct ref_array *array,
 	free(to_clear);
 }
 
-void filter_ahead_behind(struct ref_format *format,
+void filter_ahead_behind(struct repository *r, struct ref_format *format,
 			 struct ref_array *array)
 {
 	struct commit **commits;
@@ -2502,7 +2502,7 @@ void filter_ahead_behind(struct ref_format *format,
 		commits_nr++;
 	}
 
-	ahead_behind(commits, commits_nr, array->counts, array->counts_nr);
+	ahead_behind(r, commits, commits_nr, array->counts, array->counts_nr);
 	free(commits);
 }
 
diff --git a/ref-filter.h b/ref-filter.h
index 7e8bff3864e..1a757b49233 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -163,7 +163,7 @@ struct ref_array_item *ref_array_push(struct ref_array *array,
  *
  * If this is not called, then any ahead-behind atoms will be blank.
  */
-void filter_ahead_behind(struct ref_format *format,
+void filter_ahead_behind(struct repository *r, struct ref_format *format,
 			 struct ref_array *array);
 
 #endif /*  REF_FILTER_H  */

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] for-each-ref: add --stdin option
  2023-03-10 17:20   ` [PATCH v2 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
  2023-03-10 18:08     ` Junio C Hamano
  2023-03-13 10:31     ` Phillip Wood
@ 2023-03-15 13:37     ` Ævar Arnfjörð Bjarmason
  2023-03-15 17:17       ` Jeff King
  2023-03-15 17:49     ` Jeff King
  3 siblings, 1 reply; 90+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2023-03-15 13:37 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, vdye, Jeff King, Derrick Stolee


On Fri, Mar 10 2023, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
>
> When a user wishes to input a large list of patterns to 'git
> for-each-ref' (likely a long list of exact refs) there are frequently
> system limits on the number of command-line arguments.

Okey, and the current API assumes you just assign "argv" to this, but...

> When reading from stdin, we populate the filter.name_patterns array
> dynamically as opposed to pointing to the 'argv' array directly. This
> requires a careful cast while freeing the individual strings,
> conditioned on the --stdin option.

..sounds potentially nasty...

> @@ -75,7 +77,27 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>  	ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
>  	filter.ignore_case = icase;
>  
> -	filter.name_patterns = argv;
> +	if (from_stdin) {
> +		struct strbuf line = STRBUF_INIT;
> +		size_t nr = 0, alloc = 16;
> +
> +		if (argv[0])
> +			die(_("unknown arguments supplied with --stdin"));
> +
> +		CALLOC_ARRAY(filter.name_patterns, alloc);
> +
> +		while (strbuf_getline(&line, stdin) != EOF) {
> +			ALLOC_GROW(filter.name_patterns, nr + 1, alloc);
> +			filter.name_patterns[nr++] = strbuf_detach(&line, NULL);
> +		}
> +
> +		/* Add a terminating NULL string. */
> +		ALLOC_GROW(filter.name_patterns, nr + 1, alloc);
> +		filter.name_patterns[nr + 1] = NULL;
> +	} else {
> +		filter.name_patterns = argv;
> +	}
> +
>  	filter.match_as_path = 1;
>  	filter_refs(&array, &filter, FILTER_REFS_ALL);
>  	ref_array_sort(sorting, &array);
> @@ -97,5 +119,10 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>  	free_commit_list(filter.with_commit);
>  	free_commit_list(filter.no_commit);
>  	ref_sorting_release(sorting);
> +	if (from_stdin) {
> +		for (size_t i = 0; filter.name_patterns[i]; i++)
> +			free((char *)filter.name_patterns[i]);
> +		free(filter.name_patterns);
> +	}
>  	return 0;
>  }

Why do we need to seemingly re-invent a "struct strvec" here? I tried to
simplify this on top of this (well, "seen"), and we can get rid of all
of this manual memory management & trailing NULL juggling as a result:
	
	diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
	index cf5ba6ffc12..13b75eff28c 100644
	--- a/builtin/for-each-ref.c
	+++ b/builtin/for-each-ref.c
	@@ -7,6 +7,7 @@
	 #include "parse-options.h"
	 #include "ref-filter.h"
	 #include "commit-reach.h"
	+#include "strvec.h"
	 
	 static char const * const for_each_ref_usage[] = {
	 	N_("git for-each-ref [<options>] [<pattern>]"),
	@@ -27,6 +28,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
	 	struct ref_format format = REF_FORMAT_INIT;
	 	struct strbuf output = STRBUF_INIT;
	 	struct strbuf err = STRBUF_INIT;
	+	struct strvec stdin_pat = STRVEC_INIT;
	 	int from_stdin = 0;
	 
	 	struct option opts[] = {
	@@ -81,21 +83,13 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
	 
	 	if (from_stdin) {
	 		struct strbuf line = STRBUF_INIT;
	-		size_t nr = 0, alloc = 16;
	 
	 		if (argv[0])
	 			die(_("unknown arguments supplied with --stdin"));
	 
	-		CALLOC_ARRAY(filter.name_patterns, alloc);
	-
	-		while (strbuf_getline(&line, stdin) != EOF) {
	-			ALLOC_GROW(filter.name_patterns, nr + 1, alloc);
	-			filter.name_patterns[nr++] = strbuf_detach(&line, NULL);
	-		}
	-
	-		/* Add a terminating NULL string. */
	-		ALLOC_GROW(filter.name_patterns, nr + 1, alloc);
	-		filter.name_patterns[nr + 1] = NULL;
	+		while (strbuf_getline(&line, stdin) != EOF)
	+			strvec_push(&stdin_pat, line.buf);
	+		filter.name_patterns = stdin_pat.v;
	 	} else {
	 		filter.name_patterns = argv;
	 	}
	@@ -123,10 +117,6 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
	 	free_commit_list(filter.with_commit);
	 	free_commit_list(filter.no_commit);
	 	ref_sorting_release(sorting);
	-	if (from_stdin) {
	-		for (size_t i = 0; filter.name_patterns[i]; i++)
	-			free(filter.name_patterns[i]);
	-		free(filter.name_patterns);
	-	}
	+	strvec_clear(&stdin_pat);
	 	return 0;
	 }

It *is* an extra copy though, as your implementation re-uses the strbuf
we already allocated.

But presumably that's trivial in this case, and if we care I think we
should resurrect something like [1] instead, i.e. we could just teach
the strvec API to have a strvec_push_nodup(). But I doubt that in this
case it'll matter.

In any case, if you don't want to take this as-is, please fix this so
that we're not reaching into the "filter.name_patterns" and casting its
"const char" to "char".

If we're going to add a hack here that API should instead know how to
free its own resources (so we could clean up the free_commit_list() here
seen in the context), and we could carry some "my argv needs free-ing".

But none of that seems needed in this case, this is just another case
where we can pretend that we have a "normal" argv, and then clean up our
own strvec, no?

1. https://lore.kernel.org/git/65a620b08ef359e29d678497f1b529e3ce6477b1.1673475190.git.gitgitgadget@gmail.com/

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 6/8] commit-reach: implement ahead_behind() logic
  2023-03-10 17:21   ` [PATCH v2 6/8] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
@ 2023-03-15 13:50     ` Ævar Arnfjörð Bjarmason
  2023-03-15 16:03       ` Junio C Hamano
  0 siblings, 1 reply; 90+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2023-03-15 13:50 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, vdye, Jeff King, Derrick Stolee


On Fri, Mar 10 2023, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
> [...]
> +void ahead_behind(struct commit **commits, size_t commits_nr,
> +		  struct ahead_behind_count *counts, size_t counts_nr)
> +{
> +	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };

I see we have some of this already in this file, so maybe we should
leave it for a subsequent cleanup, but as DEVOPTS=extra-all notes here
we're using a positional initializer.

We could instead be more explicit, and do:

	{ .compare = compare_commits_by_gen_then_commit_date }

But as noted, maybe for asubsequent cleanup...

> +	size_t width = (commits_nr + BITS_IN_EWORD - 1) / BITS_IN_EWORD;
> +	size_t i;

Nit: Consider dropping this "size_t i" line, and instead...

> +
> +	if (!commits_nr || !counts_nr)
> +		return;
> +
> +	for (i = 0; i < counts_nr; i++) {

...just make this "for (size_t i = 0; ...", ditto the 2x ones below.

> +struct ahead_behind_count {
> +	/**
> +	 * As input, the *_index members indicate which positions in
> +	 * the 'tips' array correspond to the tip and base of this
> +	 * comparison.
> +	 */
> +	size_t tip_index;
> +	size_t base_index;
> +
> +	/**
> +	 * These values store the computed counts for each side of the
> +	 * symmetric difference:
> +	 *
> +	 * 'ahead' stores the number of commits reachable from the tip
> +	 * and not reachable from the base.
> +	 *
> +	 * 'behind' stores the number of commits reachable from the base
> +	 * and not reachable from the tip.
> +	 */
> +	unsigned int ahead;
> +	unsigned int behind;

Even though this is the tip of the iceberg in terms of our codebase
overall, can't we just use "size_t" for counts in new APIs?

Trying to squash this into the end-state seems to work:
	
	diff --git a/commit-reach.h b/commit-reach.h
	index 2269fab8261..108651213d9 100644
	--- a/commit-reach.h
	+++ b/commit-reach.h
	@@ -123,8 +123,8 @@ struct ahead_behind_count {
	 	 * 'behind' stores the number of commits reachable from the base
	 	 * and not reachable from the tip.
	 	 */
	-	unsigned int ahead;
	-	unsigned int behind;
	+	size_t ahead;
	+	size_t behind;
	 };
	 
	 /*
	diff --git a/ref-filter.c b/ref-filter.c
	index cdc054beede..b328db696bf 100644
	--- a/ref-filter.c
	+++ b/ref-filter.c
	@@ -2013,7 +2013,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
	 			if (ref->counts) {
	 				const struct ahead_behind_count *count;
	 				count = ref->counts[ahead_behind_atoms++];
	-				v->s = xstrfmt("%d %d", count->ahead, count->behind);
	+				v->s = xstrfmt("%"PRIuMAX" " "%"PRIuMAX, count->ahead, count->behind);
	 			} else {
	 				/* Not a commit. */
	 				v->s = xstrdup("");


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option
  2023-03-15 13:22   ` Ævar Arnfjörð Bjarmason
@ 2023-03-15 13:54     ` Derrick Stolee
  0 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee @ 2023-03-15 13:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason,
	Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, vdye, Jeff King

On 3/15/2023 9:22 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Mar 10 2023, Derrick Stolee via GitGitGadget wrote:
> 
>> At $DAYJOB, we have used a custom 'ahead-behind' builtin in our fork of Git
>> for lots of reasons. The main goal of the builtin is to compare multiple
>> references against a common base reference. The comparison is number of
>> commits that are in each side of the symmtric difference of their reachable
>> sets. A commit C is "ahead" of a commit B by the number of commits in B..C
>> (reachable from C but not reachable from B). Similarly, the commit C is
>> "behind" the commit B by the number of commits in C..B (reachable from B but
>> not reachable from C).
> 
> I have a local change to get rid of the various "the_repository" macros,
> which a merge of this in "seen" conflicted with (semantically).

Thanks for doing that important refactoring.
 
> The below patch on top of "seen" will fix it, could you please squash it
> in in the appropriate places?

Got it. Thanks. v3 will arrive later today with those changes and
the recommended strvec changes.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 7/8] for-each-ref: add ahead-behind format atom
  2023-03-10 17:21   ` [PATCH v2 7/8] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
  2023-03-10 19:09     ` Junio C Hamano
@ 2023-03-15 13:57     ` Ævar Arnfjörð Bjarmason
  2023-03-15 16:01       ` Junio C Hamano
  2023-03-15 16:11       ` Derrick Stolee
  1 sibling, 2 replies; 90+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2023-03-15 13:57 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, vdye, Jeff King, Derrick Stolee


On Fri, Mar 10 2023, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
> [...]
> +ahead-behind:<ref>::
> +	Two integers, separated by a space, demonstrating the number of
> +	commits ahead and behind, respectively, when comparing the output
> +	ref to the `<ref>` specified in the format.
> +

As a potential (expert) user who hasn't read the code yet I'd think the
the "<ref>" here would be the same as "update-ref", but glancing ahead
at your tests it seems that it does ref matching, so "refs/heads/master"
and "master" are both accepted?

Since nothing else uses "<ref>" here I think we should clearly define
the matching rules somehow, or maybe we do, and I missed it.

Is there a reason we couldn't use the same "<pattern>" as for-each-ref's
top-level accepts, with the limitation that if it matches more than one
we'll die?

Later you have e.g. ahead-behind:HEAD, but do we have test coverage for
e.g. the edge cases where a refs/heads/HEAD exists?

> @@ -645,6 +656,7 @@ static struct {
>  	[ATOM_THEN] = { "then", SOURCE_NONE },
>  	[ATOM_ELSE] = { "else", SOURCE_NONE },
>  	[ATOM_REST] = { "rest", SOURCE_NONE, FIELD_STR, rest_atom_parser },
> +	[ATOM_AHEADBEHIND] = { "ahead-behind", SOURCE_OTHER, FIELD_STR, ahead_behind_atom_parser },
>  	/*
>  	 * Please update $__git_ref_fieldlist in git-completion.bash
>  	 * when you add new atoms
> @@ -1848,6 +1860,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
>  	struct object *obj;
>  	int i;
>  	struct object_info empty = OBJECT_INFO_INIT;
> +	int ahead_behind_atoms = 0;
>  
>  	CALLOC_ARRAY(ref->value, used_atom_cnt);
>  
> @@ -1978,6 +1991,16 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
>  			else
>  				v->s = xstrdup("");
>  			continue;
> +		} else if (atom_type == ATOM_AHEADBEHIND) {
> +			if (ref->counts) {
> +				const struct ahead_behind_count *count;
> +				count = ref->counts[ahead_behind_atoms++];
> +				v->s = xstrfmt("%d %d", count->ahead, count->behind);
> +			} else {
> +				/* Not a commit. */
> +				v->s = xstrdup("");
> +			}
> +			continue;
>  		} else
>  			continue;

Hrm, so despite by earlier suggestion of using "size_t" it seems we
really are limited to "int" in the end, as our "used_atom_cnt" is an
"int".

But anyway, better to implement that limitation here, so we only need to
fix ref-filter.c to move beyond "int".

>  
> @@ -2328,6 +2351,7 @@ static void free_array_item(struct ref_array_item *item)
>  			free((char *)item->value[i].s);
>  		free(item->value);
>  	}
> +	free(item->counts);
>  	free(item);
>  }
>  
> @@ -2356,6 +2380,8 @@ void ref_array_clear(struct ref_array *array)
>  		free_worktrees(ref_to_worktree_map.worktrees);
>  		ref_to_worktree_map.worktrees = NULL;
>  	}
> +
> +	FREE_AND_NULL(array->counts);
>  }
>  

Follows the exsiting pattern, so good, but FWIW I think we could do away
with all this "and NULL", it looks like the only users are built-ins
which never look at this data again, but then we should probably rename
it to ref_array_release() or something...

>  #define EXCLUDE_REACHED 0
> @@ -2418,6 +2444,50 @@ static void reach_filter(struct ref_array *array,
>  	free(to_clear);
>  }
>  
> +void filter_ahead_behind(struct ref_format *format,
> +			 struct ref_array *array)
> +{
> +	struct commit **commits;
> +	size_t commits_nr = format->bases.nr + array->nr;
> +
> +	if (!format->bases.nr || !array->nr)
> +		return;
> +
> +	ALLOC_ARRAY(commits, commits_nr);
> +	for (size_t i = 0; i < format->bases.nr; i++) {

Eariler I suggested using this "size_t" in a "for", which is used here,
good, newer code than the other commit, presumably...

> +		const char *name = format->bases.items[i].string;
> +		commits[i] = lookup_commit_reference_by_name(name);
> +		if (!commits[i])
> +			die("failed to find '%s'", name);
> +	}
> +
> +	ALLOC_ARRAY(array->counts, st_mult(format->bases.nr, array->nr));
> +
> +	commits_nr = format->bases.nr;
> +	array->counts_nr = 0;

Not being very familiar with ref-filter.c, it seems odd that the API is
taking pains to clear things elsewhere, but we need to set "counts_nr"
to 0 here before an iteration.

If I comment this assignment out all the tests pass, is this redundant,
or left here for some future potential API use?

> diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
> new file mode 100755
> index 00000000000..439a448c2e6
> --- /dev/null
> +++ b/t/perf/p1500-graph-walks.sh
> @@ -0,0 +1,45 @@
> +#!/bin/sh
> +
> +test_description='Commit walk performance tests'
> +. ./perf-lib.sh
> +
> +test_perf_large_repo
> +
> +test_expect_success 'setup' '
> +	git for-each-ref --format="%(refname)" "refs/heads/*" "refs/tags/*" >allrefs &&
> +	sort -r allrefs | head -n 50 >refs &&

Some of the point of test_perf_large_repo is being able to point the
test to an arbitrary sized repo, why "head -n 50" here, instead of just
doing that filtering when preparing the test repo?

> +test_expect_success 'ahead-behind requires an argument' '
> +	test_must_fail git for-each-ref \
> +		--format="%(ahead-behind)" 2>err &&
> +	grep "expected format: %(ahead-behind:<ref>)" err
> +'
> +
> +test_expect_success 'missing ahead-behind base' '
> +	test_must_fail git for-each-ref \
> +		--format="%(ahead-behind:refs/heads/missing)" 2>err &&
> +	grep "failed to find '\''refs/heads/missing'\''" err
> +'
> +

Is this grep instead of test_cmp for brevity, or because we'll catch
this late and spew out other output as well?

I'd think it would be worth testing that we only emit an error. Even if
you don't want a full test_cmp we could check the line count too to
assert that...

> +# Run this before doing any signing, so the test has the same results
> +# regardless of the GPG prereq.
> +test_expect_success 'git tag --format with ahead-behind' '
> +	test_when_finished git reset --hard tag-one-line &&
> +	git commit --allow-empty -m "left" &&
> +	git tag -a -m left tag-left &&
> +	git reset --hard HEAD~1 &&
> +	git commit --allow-empty -m "right" &&
> +	git tag -a -m left tag-right &&

Do we really need this --allow-empty insted of just using "test_commit"?
I.e. is being TREESAME here important?

> +
> +	# Use " !" at the end to demonstrate whitepsace
> +	# around empty ahead-behind token for tag-blob.
> +	cat >expect <<-EOF &&
> +	refs/tags/tag-blob  !
> +	refs/tags/tag-left 1 1 !
> +	refs/tags/tag-lines 0 1 !
> +	refs/tags/tag-one-line 0 1 !
> +	refs/tags/tag-right 0 0 !
> +	refs/tags/tag-zero-lines 0 1 !
> +	EOF
> +	git tag -l --format="%(refname) %(ahead-behind:HEAD) !" >actual 2>err &&
> +	grep "refs/tags/tag" actual >actual.focus &&
> +	test_cmp expect actual.focus &&
> +
> +	# Error reported for tags that point to non-commits.
> +	grep "error: object [0-9a-f]* is a blob, not a commit" err

Maybe, but at a glance it doesn't seem so, but maybe I'm missing something...

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 8/8] commit-reach: add tips_reachable_from_bases()
  2023-03-10 17:21   ` [PATCH v2 8/8] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget
@ 2023-03-15 14:13     ` Ævar Arnfjörð Bjarmason
  2023-03-15 16:17       ` Derrick Stolee
  0 siblings, 1 reply; 90+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2023-03-15 14:13 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, vdye, Jeff King, Derrick Stolee


On Fri, Mar 10 2023, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>

> +{
> +	size_t i;

Ditto the decl suggestion in an earlier commit, i.e...

> +	struct commit_and_index *commits;
> +	unsigned int min_generation_index = 0;
> +	timestamp_t min_generation;
> +	struct commit_list *stack = NULL;
> +
> +	if (!bases || !tips || !tips_nr)
> +		return;
> +
> +	/*
> +	 * Do a depth-first search starting at 'bases' to search for the
> +	 * tips. Stop at the lowest (un-found) generation number. When
> +	 * finding the lowest commit, increase the minimum generation
> +	 * number to the next lowest (un-found) generation number.
> +	 */
> +
> +	CALLOC_ARRAY(commits, tips_nr);
> +
> +	for (i = 0; i < tips_nr; i++) {

...move this here?

> +		commits[i].commit = tips[i];
> +		commits[i].index = i;
> +		commits[i].generation = commit_graph_generation(tips[i]);
> +	}
> +
> +	/* Sort with generation number ascending. */
> +	QSORT(commits, tips_nr, compare_commit_and_index_by_generation);
> +	min_generation = commits[0].generation;
> +
> +	while (bases) {
> +		parse_commit(bases->item);
> +		commit_list_insert(bases->item, &stack);
> +		bases = bases->next;
> +	}
> +
> +	while (stack) {
> +		unsigned int j;

...ditto...

> +		int explored_all_parents = 1;
> +		struct commit_list *p;
> +		struct commit *c = stack->item;
> +		timestamp_t c_gen = commit_graph_generation(c);
> +
> +		/* Does it match any of our tips? */
> +		for (j = min_generation_index; j < tips_nr; j++) {

...to here...

> +			if (c_gen < commits[j].generation)
> +				break;
> +
> +			if (commits[j].commit == c) {
> +				tips[commits[j].index]->object.flags |= mark;
> +
> +				if (j == min_generation_index) {
> +					unsigned int k = j + 1;
> +					while (k < tips_nr &&
> +					       (tips[commits[k].index]->object.flags & mark))
> +						k++;
> +
> +					/* Terminate early if all found. */
> +					if (k >= tips_nr)
> +						goto done;
> +
> +					min_generation_index = k;
> +					min_generation = commits[k].generation;
> +				}
> +			}
> +		}
> +
> +		for (p = c->parents; p; p = p->next) {
> +			parse_commit(p->item);
> +
> +			/* Have we already explored this parent? */
> +			if (p->item->object.flags & SEEN)
> +				continue;
> +
> +			/* Is it below the current minimum generation? */
> +			if (commit_graph_generation(p->item) < min_generation)
> +				continue;
> +
> +			/* Ok, we will explore from here on. */
> +			p->item->object.flags |= SEEN;
> +			explored_all_parents = 0;
> +			commit_list_insert(p->item, &stack);
> +			break;
> +		}
> +
> +		if (explored_all_parents)
> +			pop_commit(&stack);
> +	}
> +
> +done:
> +	free(commits);
> +	repo_clear_commit_marks(the_repository, SEEN);

I didn't see this in my earlier suggestion for passing "struct
repository", but I think we should do the same here, i.e. have this
function take a "r" argument.

> [...]
> @@ -2390,33 +2390,21 @@ static void reach_filter(struct ref_array *array,
>  			 struct commit_list *check_reachable,
>  			 int include_reached)
>  {
> -	struct rev_info revs;
>  	int i, old_nr;
>  	struct commit **to_clear;
> -	struct commit_list *cr;
>  
>  	if (!check_reachable)
>  		return;
>  
>  	CALLOC_ARRAY(to_clear, array->nr);
> -
> -	repo_init_revisions(the_repository, &revs, NULL);
> -
>  	for (i = 0; i < array->nr; i++) {
>  		struct ref_array_item *item = array->items[i];
> -		add_pending_object(&revs, &item->commit->object, item->refname);
>  		to_clear[i] = item->commit;
>  	}
>  
> -	for (cr = check_reachable; cr; cr = cr->next) {
> -		struct commit *merge_commit = cr->item;
> -		merge_commit->object.flags |= UNINTERESTING;
> -		add_pending_object(&revs, &merge_commit->object, "");
> -	}
> -
> -	revs.limited = 1;
> -	if (prepare_revision_walk(&revs))
> -		die(_("revision walk setup failed"));
> +	tips_reachable_from_bases(check_reachable,
> +				  to_clear, array->nr,
> +				  UNINTERESTING);

I.e. it's not ideal, but we had a the_repository in this function before
(should probably have passed it from further up, but whatever), so we
could pass that to the new tips_reachable_from_bases() still.

> -test_perf 'ahead-behind counts: git rev-list' '
> -	for r in $(cat refs)
> -	do
> -		git rev-list --count "HEAD..$r" || return 1
> -	done

Why does this change require deleting the old perf test? Your commit 7/8
notes this test, but here we're deleting it, let's keep it and instead
note if the results changed, or stayed the same?

More generally, your commit message says:

> Add extra tests for this behavior in t6600-test-reach.sh as the
> interesting data shape of that repository can sometimes demonstrate
> corner case bugs.

And here for a supposed optimization commit you're adding new tests, but
when I try them with the C code at 7/8 they pass.

So it seems we should add them earlier, and this is a pure-optimization
commit, but one that's a bit confused about what goes where? :)

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 7/8] for-each-ref: add ahead-behind format atom
  2023-03-15 13:57     ` Ævar Arnfjörð Bjarmason
@ 2023-03-15 16:01       ` Junio C Hamano
  2023-03-15 16:12         ` Derrick Stolee
  2023-03-15 16:11       ` Derrick Stolee
  1 sibling, 1 reply; 90+ messages in thread
From: Junio C Hamano @ 2023-03-15 16:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee via GitGitGadget, git, me, vdye, Jeff King,
	Derrick Stolee

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Fri, Mar 10 2023, Derrick Stolee via GitGitGadget wrote:
>
>> From: Derrick Stolee <derrickstolee@github.com>
>> [...]
>> +ahead-behind:<ref>::
>> +	Two integers, separated by a space, demonstrating the number of
>> +	commits ahead and behind, respectively, when comparing the output
>> +	ref to the `<ref>` specified in the format.
>> +
>
> As a potential (expert) user who hasn't read the code yet I'd think the
> the "<ref>" here would be the same as "update-ref", but glancing ahead
> at your tests it seems that it does ref matching, so "refs/heads/master"
> and "master" are both accepted?
>
> Since nothing else uses "<ref>" here I think we should clearly define
> the matching rules somehow, or maybe we do, and I missed it.

I vaguely recall noticing this in the previous round, but doesn't
this only require a commit-ish, not even a ref?  It is parsed with
lookup_commit_reference_by_name().




^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 6/8] commit-reach: implement ahead_behind() logic
  2023-03-15 13:50     ` Ævar Arnfjörð Bjarmason
@ 2023-03-15 16:03       ` Junio C Hamano
  2023-03-15 16:13         ` Derrick Stolee
  0 siblings, 1 reply; 90+ messages in thread
From: Junio C Hamano @ 2023-03-15 16:03 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee via GitGitGadget, git, me, vdye, Jeff King,
	Derrick Stolee

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> +	/**
>> +	 * These values store the computed counts for each side of the
>> +	 * symmetric difference:
>> +	 *
>> +	 * 'ahead' stores the number of commits reachable from the tip
>> +	 * and not reachable from the base.
>> +	 *
>> +	 * 'behind' stores the number of commits reachable from the base
>> +	 * and not reachable from the tip.
>> +	 */
>> +	unsigned int ahead;
>> +	unsigned int behind;
>
> Even though this is the tip of the iceberg in terms of our codebase
> overall, can't we just use "size_t" for counts in new APIs?

I personally do not see a point in becoming so dogmatic.  Plain
(possibly) 32-bit integers have their places in the code.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 7/8] for-each-ref: add ahead-behind format atom
  2023-03-15 13:57     ` Ævar Arnfjörð Bjarmason
  2023-03-15 16:01       ` Junio C Hamano
@ 2023-03-15 16:11       ` Derrick Stolee
  1 sibling, 0 replies; 90+ messages in thread
From: Derrick Stolee @ 2023-03-15 16:11 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason,
	Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, vdye, Jeff King

On 3/15/2023 9:57 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Mar 10 2023, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <derrickstolee@github.com>

>> +test_description='Commit walk performance tests'
>> +. ./perf-lib.sh
>> +
>> +test_perf_large_repo
>> +
>> +test_expect_success 'setup' '
>> +	git for-each-ref --format="%(refname)" "refs/heads/*" "refs/tags/*" >allrefs &&
>> +	sort -r allrefs | head -n 50 >refs &&
> 
> Some of the point of test_perf_large_repo is being able to point the
> test to an arbitrary sized repo, why "head -n 50" here, instead of just
> doing that filtering when preparing the test repo?

I think it's too much work to expect that the tester removes
all but a small number of refs for testing here. Using all
refs on a repo with may refs would be too slow to be helpful.

This is especially important when running the entire perf
suite on a repo where a large number of refs is _desired_
for some of the other tests.
 
>> +test_expect_success 'ahead-behind requires an argument' '
>> +	test_must_fail git for-each-ref \
>> +		--format="%(ahead-behind)" 2>err &&
>> +	grep "expected format: %(ahead-behind:<ref>)" err
>> +'
>> +
>> +test_expect_success 'missing ahead-behind base' '
>> +	test_must_fail git for-each-ref \
>> +		--format="%(ahead-behind:refs/heads/missing)" 2>err &&
>> +	grep "failed to find '\''refs/heads/missing'\''" err
>> +'
>> +
> 
> Is this grep instead of test_cmp for brevity, or because we'll catch
> this late and spew out other output as well?
> 
> I'd think it would be worth testing that we only emit an error. Even if
> you don't want a full test_cmp we could check the line count too to
> assert that...

A full test_cmp is a little more annoying to write, but
is a stronger test, so sure.

>> +# Run this before doing any signing, so the test has the same results
>> +# regardless of the GPG prereq.
>> +test_expect_success 'git tag --format with ahead-behind' '
>> +	test_when_finished git reset --hard tag-one-line &&
>> +	git commit --allow-empty -m "left" &&
>> +	git tag -a -m left tag-left &&
>> +	git reset --hard HEAD~1 &&
>> +	git commit --allow-empty -m "right" &&
>> +	git tag -a -m left tag-right &&
> 
> Do we really need this --allow-empty insted of just using "test_commit"?
> I.e. is being TREESAME here important?

You missed this in the commit message:

>> [...] Also, the
>> test in t7004 is carefully located to avoid being dependent on the GPG
>> prereq. It also avoids using the test_commit helper, as that will add
>> ticks to the time and disrupt the expected timestampes in later tag
>> tests.

(And I see the "timestampes" typo now.)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 7/8] for-each-ref: add ahead-behind format atom
  2023-03-15 16:01       ` Junio C Hamano
@ 2023-03-15 16:12         ` Derrick Stolee
  0 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee @ 2023-03-15 16:12 UTC (permalink / raw)
  To: Junio C Hamano, Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee via GitGitGadget, git, me, vdye, Jeff King

On 3/15/2023 12:01 PM, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> 
>> On Fri, Mar 10 2023, Derrick Stolee via GitGitGadget wrote:
>>
>>> From: Derrick Stolee <derrickstolee@github.com>
>>> [...]
>>> +ahead-behind:<ref>::
>>> +	Two integers, separated by a space, demonstrating the number of
>>> +	commits ahead and behind, respectively, when comparing the output
>>> +	ref to the `<ref>` specified in the format.
>>> +
>>
>> As a potential (expert) user who hasn't read the code yet I'd think the
>> the "<ref>" here would be the same as "update-ref", but glancing ahead
>> at your tests it seems that it does ref matching, so "refs/heads/master"
>> and "master" are both accepted?
>>
>> Since nothing else uses "<ref>" here I think we should clearly define
>> the matching rules somehow, or maybe we do, and I missed it.
> 
> I vaguely recall noticing this in the previous round, but doesn't
> this only require a commit-ish, not even a ref?  It is parsed with
> lookup_commit_reference_by_name().

You noticed it in this round, but I haven't sent v3 yet. I have
this in my local copy:

ahead-behind:<committish>::
	Two integers, separated by a space, demonstrating the number of
	commits ahead and behind, respectively, when comparing the output
	ref to the `<committish>` specified in the format.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 6/8] commit-reach: implement ahead_behind() logic
  2023-03-15 16:03       ` Junio C Hamano
@ 2023-03-15 16:13         ` Derrick Stolee
  0 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee @ 2023-03-15 16:13 UTC (permalink / raw)
  To: Junio C Hamano, Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee via GitGitGadget, git, me, vdye, Jeff King

On 3/15/2023 12:03 PM, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> 
>>> +	/**
>>> +	 * These values store the computed counts for each side of the
>>> +	 * symmetric difference:
>>> +	 *
>>> +	 * 'ahead' stores the number of commits reachable from the tip
>>> +	 * and not reachable from the base.
>>> +	 *
>>> +	 * 'behind' stores the number of commits reachable from the base
>>> +	 * and not reachable from the tip.
>>> +	 */
>>> +	unsigned int ahead;
>>> +	unsigned int behind;
>>
>> Even though this is the tip of the iceberg in terms of our codebase
>> overall, can't we just use "size_t" for counts in new APIs?
> 
> I personally do not see a point in becoming so dogmatic.  Plain
> (possibly) 32-bit integers have their places in the code.

In particular, we have 32-bit limits on the commit-graph due
to it being unreasonable to have billions of commits in a
repository.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 8/8] commit-reach: add tips_reachable_from_bases()
  2023-03-15 14:13     ` Ævar Arnfjörð Bjarmason
@ 2023-03-15 16:17       ` Derrick Stolee
  2023-03-15 16:18         ` Derrick Stolee
  0 siblings, 1 reply; 90+ messages in thread
From: Derrick Stolee @ 2023-03-15 16:17 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason,
	Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, vdye, Jeff King

On 3/15/2023 10:13 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Mar 10 2023, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <derrickstolee@github.com>

(omitting the_repository stuff which will be reflected in v3)

>> -test_perf 'ahead-behind counts: git rev-list' '
>> -	for r in $(cat refs)
>> -	do
>> -		git rev-list --count "HEAD..$r" || return 1
>> -	done
> 
> Why does this change require deleting the old perf test? Your commit 7/8
> notes this test, but here we're deleting it, let's keep it and instead
> note if the results changed, or stayed the same?
>
> More generally, your commit message says:
> 
>> Add extra tests for this behavior in t6600-test-reach.sh as the
>> interesting data shape of that repository can sometimes demonstrate
>> corner case bugs.

This note is about t6600-test-reach.sh, not p1500-graph-walks.sh.

Not only does the previous message note this perf test, it has this to say:

  The 'git rev-list' test exists in this change as a demonstration, but it
  will be removed in the next change to avoid wasting time on this
  comparison.
 
> And here for a supposed optimization commit you're adding new tests, but
> when I try them with the C code at 7/8 they pass.
> 
> So it seems we should add them earlier, and this is a pure-optimization
> commit, but one that's a bit confused about what goes where? :)

I can make it more clear why we are removing it in this commit.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 8/8] commit-reach: add tips_reachable_from_bases()
  2023-03-15 16:17       ` Derrick Stolee
@ 2023-03-15 16:18         ` Derrick Stolee
  0 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee @ 2023-03-15 16:18 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason,
	Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, vdye, Jeff King

On 3/15/2023 12:17 PM, Derrick Stolee wrote:
> On 3/15/2023 10:13 AM, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Fri, Mar 10 2023, Derrick Stolee via GitGitGadget wrote:
>>
>>> From: Derrick Stolee <derrickstolee@github.com>

> Not only does the previous message note this perf test, it has this to say:
> 
>   The 'git rev-list' test exists in this change as a demonstration, but it
>   will be removed in the next change to avoid wasting time on this
>   comparison.
>  
>> And here for a supposed optimization commit you're adding new tests, but
>> when I try them with the C code at 7/8 they pass.
>>
>> So it seems we should add them earlier, and this is a pure-optimization
>> commit, but one that's a bit confused about what goes where? :)
> 
> I can make it more clear why we are removing it in this commit.

Oh wait, I did:

>> (Note that we remove the iterative 'git rev-list' test from p1500
>> because it no longer makes sense as a comparison to 'git for-each-ref'
>> and would just waste time running it for these comparisons.)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] for-each-ref: add --stdin option
  2023-03-15 13:37     ` Ævar Arnfjörð Bjarmason
@ 2023-03-15 17:17       ` Jeff King
  0 siblings, 0 replies; 90+ messages in thread
From: Jeff King @ 2023-03-15 17:17 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee via GitGitGadget, git, gitster, me, vdye,
	Derrick Stolee

On Wed, Mar 15, 2023 at 02:37:39PM +0100, Ævar Arnfjörð Bjarmason wrote:

> 	-		CALLOC_ARRAY(filter.name_patterns, alloc);
> 	-
> 	-		while (strbuf_getline(&line, stdin) != EOF) {
> 	-			ALLOC_GROW(filter.name_patterns, nr + 1, alloc);
> 	-			filter.name_patterns[nr++] = strbuf_detach(&line, NULL);
> 	-		}
> 	-
> 	-		/* Add a terminating NULL string. */
> 	-		ALLOC_GROW(filter.name_patterns, nr + 1, alloc);
> 	-		filter.name_patterns[nr + 1] = NULL;
> 	+		while (strbuf_getline(&line, stdin) != EOF)
> 	+			strvec_push(&stdin_pat, line.buf);
> 	+		filter.name_patterns = stdin_pat.v;
> 	 	} else {
> 	 		filter.name_patterns = argv;
> 	 	}
> 	@@ -123,10 +117,6 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
> 	 	free_commit_list(filter.with_commit);
> 	 	free_commit_list(filter.no_commit);
> 	 	ref_sorting_release(sorting);
> 	-	if (from_stdin) {
> 	-		for (size_t i = 0; filter.name_patterns[i]; i++)
> 	-			free(filter.name_patterns[i]);
> 	-		free(filter.name_patterns);
> 	-	}
> 	+	strvec_clear(&stdin_pat);
> 	 	return 0;
> 	 }
> 
> It *is* an extra copy though, as your implementation re-uses the strbuf
> we already allocated.

At first I thought you meant "extra allocation" here. But you really do
mean an extra copy of the bytes.

The number of allocations is the same either way. In the original, we
detach the strbuf in each iteration of the loop as it becomes the final
entry in the array, but then have to allocate a new strbuf for the next
iteration. With a strvec, we can reuse the same strbuf over and over,
but make a new allocation when we add it to the strvec.

So yes, we end up with an extra memcpy() of the bytes. But the flip side
is that the final allocations we store in the strvec are correctly
sized, without the extra slop that the strbuf added while reading.

> But presumably that's trivial in this case, and if we care I think we
> should resurrect something like [1] instead, i.e. we could just teach
> the strvec API to have a strvec_push_nodup(). But I doubt that in this
> case it'll matter.

Yeah, I'd agree it is not important either way in this case. But I
wanted to think it through above, just because it's not clear to me that
even in a tight loop, the "allocate buffer and then attach to the
strvec" approach would be the better tradeoff.

I guess it would make sense to wait for a case where it _does_ matter
and then we could experiment with the two approaches. ;)

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option
  2023-03-10 19:25     ` Derrick Stolee
@ 2023-03-15 17:31       ` Jeff King
  2023-03-15 17:44         ` Derrick Stolee
  2023-03-15 19:34         ` Junio C Hamano
  0 siblings, 2 replies; 90+ messages in thread
From: Jeff King @ 2023-03-15 17:31 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Junio C Hamano, Derrick Stolee via GitGitGadget, git, me, vdye

On Fri, Mar 10, 2023 at 02:25:52PM -0500, Derrick Stolee wrote:

> > Having read all the patches, I am very impressed and pleased, but
> > are we losing anything by having the feature inside for-each-ref
> > compared to a new command ahead-behind?  As far as I can tell, the
> > new "for-each-ref --stdin" would still want to match refs and work
> > only on refs, but there shouldn't be any reason for ahead-behind
> > computation to limit to tips that are at the tip of a ref, so that
> > may be one downside in this updated design.  For the intended use
> > case of "let's find which branches are stale", that downside does
> > not matter in practice, but for other use cases people will think
> > of in the future, the limitation might matter (at which time we can
> > easily resurrect the other subcommand, using the internal machinery
> > we have here, so it is not a huge deal, I presume).
> 
> I think the for-each-ref implementation solves the use case we
> had in mind, I think. I'll double-check to see if we ever use
> exact commit IDs instead of reference names, but I think these
> callers are rarely interested in an exact commit ID but instead
> want the latest version of refs.

One thing I'd worry about here are race conditions.

If you have a porcelain-ish view (and I'd count "showing a web page" as
a porcelain view) that requires several commands to compute, it's
possible for there to be simultaneous ref updates between your commands.
If each command is given a refname, then the results may not be
consistent.

E.g., imagine resolving "main" to 1234abcd in step one, then somebody
updates it to 5678cdef, then you run "for-each-ref" to compute
ahead/behind, and now you show an inconsistent result: you say that
"main" points to 1234abcd, but show the wrong ahead/behind information.

Showing 1234abcd at all is out-of-date, of course, but the real problem
is the lack of atomicity. Most porcelain scripts deal with this by
resolving the refs immediately, assuming object ids are immutable (which
they are modulo games like refs/replace), and then working with them.

I don't know if this is how your current application-level code calling
ahead-behind works, or if it just accepts the possibility of a race (or
maybe the call is not presented along with other information so it's
sort-of atomic on its own). Presumably your double-checking will find
out. :)

I do otherwise like exposing this as an option of for-each-ref, as that
is the way I'd expect most normal client users to want to get at the
information. And if this is step 1 and that's good enough for now, and
we have a path forward to later expose it for general commits, that's OK
with me.

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option
  2023-03-15 17:31       ` Jeff King
@ 2023-03-15 17:44         ` Derrick Stolee
  2023-03-15 19:34         ` Junio C Hamano
  1 sibling, 0 replies; 90+ messages in thread
From: Derrick Stolee @ 2023-03-15 17:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Derrick Stolee via GitGitGadget, git, me, vdye

On 3/15/2023 1:31 PM, Jeff King wrote:
> On Fri, Mar 10, 2023 at 02:25:52PM -0500, Derrick Stolee wrote:
> 
>>> Having read all the patches, I am very impressed and pleased, but
>>> are we losing anything by having the feature inside for-each-ref
>>> compared to a new command ahead-behind?  As far as I can tell, the
>>> new "for-each-ref --stdin" would still want to match refs and work
>>> only on refs, but there shouldn't be any reason for ahead-behind
>>> computation to limit to tips that are at the tip of a ref, so that
>>> may be one downside in this updated design.  For the intended use
>>> case of "let's find which branches are stale", that downside does
>>> not matter in practice, but for other use cases people will think
>>> of in the future, the limitation might matter (at which time we can
>>> easily resurrect the other subcommand, using the internal machinery
>>> we have here, so it is not a huge deal, I presume).
>>
>> I think the for-each-ref implementation solves the use case we
>> had in mind, I think. I'll double-check to see if we ever use
>> exact commit IDs instead of reference names, but I think these
>> callers are rarely interested in an exact commit ID but instead
>> want the latest version of refs.
> 
> One thing I'd worry about here are race conditions.
> 
> If you have a porcelain-ish view (and I'd count "showing a web page" as
> a porcelain view) that requires several commands to compute, it's
> possible for there to be simultaneous ref updates between your commands.
> If each command is given a refname, then the results may not be
> consistent.

> I don't know if this is how your current application-level code calling
> ahead-behind works, or if it just accepts the possibility of a race (or
> maybe the call is not presented along with other information so it's
> sort-of atomic on its own). Presumably your double-checking will find
> out. :)

I completely agree on both of these points.

The major lift in this series is that the two commit walk algorithms
are being contributed to the core project in a way that are easy to
modify our 'git ahead-behind' builtin to use the "new" internals
without any UX change. Actually porting the application layer to use
'git for-each-ref' instead would be a second step, where I'd plan to
do this deep dive. From what I understand, though, these race conditions
do exist already, but they are minor relative to the cost of doing a
lookup of all the ref values and then calling this backend.
 
> I do otherwise like exposing this as an option of for-each-ref, as that
> is the way I'd expect most normal client users to want to get at the
> information. And if this is step 1 and that's good enough for now, and
> we have a path forward to later expose it for general commits, that's OK
> with me.

And if we truly need more general committish inputs for tips (HEAD~10,
too), then a new builtin could be built on top. Modeling it after
for-each-ref (for-each-commit?) would be a good start to make the
behavior as similar as possible. Doing that in full generality might
require strange updates to ref-filter.[c|h], but we can cross that
bridge when we come to it.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v3 0/8] ref-filter: ahead/behind counting, faster --merged option
  2023-03-10 17:20 ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
                     ` (9 preceding siblings ...)
  2023-03-15 13:22   ` Ævar Arnfjörð Bjarmason
@ 2023-03-15 17:45   ` Derrick Stolee via GitGitGadget
  2023-03-15 17:45     ` [PATCH v3 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
                       ` (8 more replies)
  10 siblings, 9 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-15 17:45 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Derrick Stolee

At $DAYJOB, we have used a custom 'ahead-behind' builtin in our fork of Git
for lots of reasons. The main goal of the builtin is to compare multiple
references against a common base reference. The comparison is number of
commits that are in each side of the symmtric difference of their reachable
sets. A commit C is "ahead" of a commit B by the number of commits in B..C
(reachable from C but not reachable from B). Similarly, the commit C is
"behind" the commit B by the number of commits in C..B (reachable from B but
not reachable from C).

These numbers can be computed by 'git rev-list --count B..C' and 'git
rev-list --count C..B', but there are common needs that benefit from having
the checks being done in the same process:

 1. Our "branches" page lists ahead/behind counts for each listed branch as
    compared to the repo's default branch. This can be done with a single
    'git ahead-behind' process.
 2. When a branch is updated, a background job checks if any pull requests
    that target that branch should be closed because their branches were
    merged implicitly by that update. These queries can e batched into 'git
    ahead-behind' calls.

In that second example, we don't need the full ahead/behind counts (although
it is sufficient to look for branches that are "zero commits ahead", meaning
they are reachable from the base), and instead reachability is the critical
piece.

This series contributes the custom algorithms we used for our 'git
ahead-behind' builtin, but as extensions to 'git for-each-ref':

 * Add a new "%(ahead-behind:)" format token to for-each-ref which allows
   outputting the ahead/behind values in the format string for a matching
   ref.
 * Add a new algorithm that speeds up the 'git for-each-ref --merged='
   option. This also applies to the 'git branch --merged=' option.

The idea to use 'git for-each-ref' instead of creating a new builtin is from
Junio, and simplifies this series significantly compared to v1. I was
initially concerned about the overhead of 'git for-each-ref' and its
generality and sorting, but I was not able to measure any important
difference between this implementation and our internal 'git ahead-behind'
implementation. In particular, when a pattern is given to 'git for-each-ref'
that looks like an exact ref, it navigates directly to the ref instead of
scanning all references for matches.

However, for our specific uses, we like to batch a list of exact references
that could be very long. We introduce a new --stdin option here.

To keep things close to the v1 outline, I replaced the existing patches with
closely-related ones, when possible.

Patch 1 adds the --stdin option to 'git for-each-ref'. (This is similar to
the boilerplate patch from v1.)

Patch 2 adds a test to explicitly check that 'git for-each-ref' will still
succeed when all input refs are missing. (This is similar to the
--ignore-missing patch from v1.)

Patches 3-5 introduce a new method: ensure_generations_valid(). Patch 3 does
some refactoring of the existing generation number computations to make it
more generic, and patch 4 updates the definition of
commit_graph_generation() slightly, making way for patch 5 to implement the
method. With an existing commit-graph file, the commits that are not present
in the file are considered as having generation number "infinity". This is
useful for most of our reachability queries to this point, since those
commits are "above" the ones tracked by the commit-graph. When these commits
are low in number, then there is very little performance cost and zero
correctness cost. (These patches match v1 exactly.)

However, we will see that the ahead/behind computation requires accurate
generation numbers to avoid overcounting. Thus, ensure_generations_valid()
is a way to specify a list of commits that need generation numbers computed
before continuing. It's a no-op if all of those commits are in the
commit-graph file. It's expensive if the commit-graph doesn't exist.
However, '%(ahead-behind:)' computations are likely to be slow no matter
what without a commit-graph, so assuming an existing commit-graph file is
reasonable. If we find sufficient desire to have an implementation that does
not have this requirement, we could create a second implementation and
toggle to it when generation_numbers_enabled() returns false.

Patch 6 implements the ahead-behind algorithm, but it is not connected to a
builtin. It's a long commit message, so hopefully it explains the algorithm
sufficiently. (The difference from v1 is that it no longer integrates with a
builtin and there are no new tests. It also uses 'unsigned int' and is
correctly co-authored by Taylor.)

Patch 7 integrates the ahead-behind algorithm with the ref-filter code,
including parsing the "ahead-behind" token. This finally adds tests that
check both ahead_behind() and ensure_generations_valid() via
t6600-test-reach.sh. (This patch is essentially completely new in v2.)

Patch 8 implements the tips_reachable_from_base() method, and uses it within
the ref-filter code to speed up 'git for-each-ref --merged' and 'git branch
--merged'. (The interface is slightly different than v1, due to the needs of
the new caller.)


Updates in v3
=============

 * The APIs are modified to take a 'struct repository *' and use them
   appropriately.
 * The --stdin option in 'git for-each-ref' now uses strvec instead of an
   ad-hoc array.

Thanks, -Stolee

Derrick Stolee (7):
  for-each-ref: add --stdin option
  for-each-ref: explicitly test no matches
  commit-graph: combine generation computations
  commit-graph: return generation from memory
  commit-reach: implement ahead_behind() logic
  for-each-ref: add ahead-behind format atom
  commit-reach: add tips_reachable_from_bases()

Taylor Blau (1):
  commit-graph: introduce `ensure_generations_valid()`

 Documentation/git-for-each-ref.txt |  12 +-
 builtin/branch.c                   |   1 +
 builtin/for-each-ref.c             |  24 +++-
 builtin/tag.c                      |   1 +
 commit-graph.c                     | 208 ++++++++++++++++++----------
 commit-graph.h                     |   8 ++
 commit-reach.c                     | 209 +++++++++++++++++++++++++++++
 commit-reach.h                     |  40 ++++++
 ref-filter.c                       |  91 ++++++++++---
 ref-filter.h                       |  26 +++-
 t/perf/p1500-graph-walks.sh        |  50 +++++++
 t/t3203-branch-output.sh           |  14 ++
 t/t5318-commit-graph.sh            |   2 +-
 t/t6300-for-each-ref.sh            |  50 +++++++
 t/t6301-for-each-ref-errors.sh     |  14 ++
 t/t6600-test-reach.sh              | 169 +++++++++++++++++++++++
 t/t7004-tag.sh                     |  28 ++++
 17 files changed, 858 insertions(+), 89 deletions(-)
 create mode 100755 t/perf/p1500-graph-walks.sh


base-commit: 725f57037d81e24eacfda6e59a19c60c0b4c8062
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1489%2Fderrickstolee%2Fstolee%2Fupstream-ahead-behind-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1489/derrickstolee/stolee/upstream-ahead-behind-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1489

Range-diff vs v2:

 1:  a1d9e0f6ff6 ! 1:  f9e80e233f1 for-each-ref: add --stdin option
     @@ Commit message
          list is interpreted as the complete ref set.
      
          When reading from stdin, we populate the filter.name_patterns array
     -    dynamically as opposed to pointing to the 'argv' array directly. This
     -    requires a careful cast while freeing the individual strings,
     -    conditioned on the --stdin option.
     +    dynamically as opposed to pointing to the 'argv' array directly. This is
     +    simple when using a strvec, as it is NULL-terminated in the same way. We
     +    then free the memory directly from the strvec.
      
     +    Helped-by: Phillip Wood <phillip.wood123@gmail.com>
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
       ## Documentation/git-for-each-ref.txt ##
     @@ Documentation/git-for-each-ref.txt: OPTIONS
       	`<pattern>`.  This option makes it stop after showing
      
       ## builtin/for-each-ref.c ##
     +@@
     + #include "object.h"
     + #include "parse-options.h"
     + #include "ref-filter.h"
     ++#include "strvec.h"
     + 
     + static char const * const for_each_ref_usage[] = {
     + 	N_("git for-each-ref [<options>] [<pattern>]"),
      @@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
       	struct ref_format format = REF_FORMAT_INIT;
       	struct strbuf output = STRBUF_INIT;
       	struct strbuf err = STRBUF_INIT;
      +	int from_stdin = 0;
     ++	struct strvec vec = STRVEC_INIT;
       
       	struct option opts[] = {
       		OPT_BIT('s', "shell", &format.quote_style,
     @@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc, const char **argv, const
      -	filter.name_patterns = argv;
      +	if (from_stdin) {
      +		struct strbuf line = STRBUF_INIT;
     -+		size_t nr = 0, alloc = 16;
      +
      +		if (argv[0])
      +			die(_("unknown arguments supplied with --stdin"));
      +
     -+		CALLOC_ARRAY(filter.name_patterns, alloc);
     -+
     -+		while (strbuf_getline(&line, stdin) != EOF) {
     -+			ALLOC_GROW(filter.name_patterns, nr + 1, alloc);
     -+			filter.name_patterns[nr++] = strbuf_detach(&line, NULL);
     -+		}
     ++		while (strbuf_getline(&line, stdin) != EOF)
     ++			strvec_push(&vec, line.buf);
      +
     -+		/* Add a terminating NULL string. */
     -+		ALLOC_GROW(filter.name_patterns, nr + 1, alloc);
     -+		filter.name_patterns[nr + 1] = NULL;
     ++		/* vec.v is NULL-terminated, just like 'argv'. */
     ++		filter.name_patterns = vec.v;
      +	} else {
      +		filter.name_patterns = argv;
      +	}
     @@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc, const char **argv, const
       	free_commit_list(filter.with_commit);
       	free_commit_list(filter.no_commit);
       	ref_sorting_release(sorting);
     -+	if (from_stdin) {
     -+		for (size_t i = 0; filter.name_patterns[i]; i++)
     -+			free((char *)filter.name_patterns[i]);
     -+		free(filter.name_patterns);
     -+	}
     ++	strvec_clear(&vec);
       	return 0;
       }
      
 2:  2f162a2f39f = 2:  f56d6a64d24 for-each-ref: explicitly test no matches
 3:  db28e82d2a6 = 3:  3b15e9df770 commit-graph: combine generation computations
 4:  3cf33801443 = 4:  abd3e7a67be commit-graph: return generation from memory
 5:  34dffd836b1 ! 5:  e197bddcace commit-graph: introduce `ensure_generations_valid()`
     @@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph
      + * After this method, all commits reachable from those in the given
      + * list will have non-zero, non-infinite generation numbers.
      + */
     -+void ensure_generations_valid(struct commit **commits, size_t nr)
     ++void ensure_generations_valid(struct repository *r,
     ++			      struct commit **commits, size_t nr)
      +{
     -+	struct repository *r = the_repository;
      +	int generation_version = get_configured_generation_version(r);
      +	struct packed_commit_list list = {
      +		.list = commits,
     @@ commit-graph.h: struct commit_graph_data {
      + * After this method, all commits reachable from those in the given
      + * list will have non-zero, non-infinite generation numbers.
      + */
     -+void ensure_generations_valid(struct commit **commits, size_t nr);
     ++void ensure_generations_valid(struct repository *r,
     ++			      struct commit **commits, size_t nr);
      +
       #endif
 6:  9831c23eadb ! 6:  0fb3913810b commit-reach: implement ahead_behind() logic
     @@ commit-reach.c: struct commit_list *get_reachable_subset(struct commit **from, i
      +	*bitmap = NULL;
      +}
      +
     -+void ahead_behind(struct commit **commits, size_t commits_nr,
     ++void ahead_behind(struct repository *r,
     ++		  struct commit **commits, size_t commits_nr,
      +		  struct ahead_behind_count *counts, size_t counts_nr)
      +{
     -+	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
     ++	struct prio_queue queue = { .compare = compare_commits_by_gen_then_commit_date };
      +	size_t width = (commits_nr + BITS_IN_EWORD - 1) / BITS_IN_EWORD;
     -+	size_t i;
      +
      +	if (!commits_nr || !counts_nr)
      +		return;
      +
     -+	for (i = 0; i < counts_nr; i++) {
     ++	for (size_t i = 0; i < counts_nr; i++) {
      +		counts[i].ahead = 0;
      +		counts[i].behind = 0;
      +	}
      +
     -+	ensure_generations_valid(commits, commits_nr);
     ++	ensure_generations_valid(r, commits, commits_nr);
      +
      +	init_bit_arrays(&bit_arrays);
      +
     -+	for (i = 0; i < commits_nr; i++) {
     ++	for (size_t i = 0; i < commits_nr; i++) {
      +		struct commit *c = commits[i];
      +		struct bitmap *bitmap = init_bit_array(c, width);
      +
     @@ commit-reach.c: struct commit_list *get_reachable_subset(struct commit **from, i
      +		struct commit_list *p;
      +		struct bitmap *bitmap_c = init_bit_array(c, width);
      +
     -+		for (i = 0; i < counts_nr; i++) {
     ++		for (size_t i = 0; i < counts_nr; i++) {
      +			int reach_from_tip = !!bitmap_get(bitmap_c, counts[i].tip_index);
      +			int reach_from_base = !!bitmap_get(bitmap_c, counts[i].base_index);
      +
     @@ commit-reach.c: struct commit_list *get_reachable_subset(struct commit **from, i
      +		for (p = c->parents; p; p = p->next) {
      +			struct bitmap *bitmap_p;
      +
     -+			parse_commit(p->item);
     ++			repo_parse_commit(r, p->item);
      +
      +			bitmap_p = init_bit_array(p->item, width);
      +			bitmap_or(bitmap_p, bitmap_c);
     @@ commit-reach.c: struct commit_list *get_reachable_subset(struct commit **from, i
      +	}
      +
      +	/* STALE is used here, PARENT2 is used by insert_no_dup(). */
     -+	repo_clear_commit_marks(the_repository, PARENT2 | STALE);
     ++	repo_clear_commit_marks(r, PARENT2 | STALE);
      +	clear_bit_arrays(&bit_arrays);
      +	clear_prio_queue(&queue);
      +}
     @@ commit-reach.h: struct commit_list *get_reachable_subset(struct commit **from, i
      + * Given an array of commits and an array of ahead_behind_count pairs,
      + * compute the ahead/behind counts for each pair.
      + */
     -+void ahead_behind(struct commit **commits, size_t commits_nr,
     ++void ahead_behind(struct repository *r,
     ++		  struct commit **commits, size_t commits_nr,
      +		  struct ahead_behind_count *counts, size_t counts_nr);
      +
       #endif
 7:  82dd6f44a33 ! 7:  59cf6759e60 for-each-ref: add ahead-behind format atom
     @@ Commit message
          around commits_nr in the second loop of filter_ahead_behind(). Also, the
          test in t7004 is carefully located to avoid being dependent on the GPG
          prereq. It also avoids using the test_commit helper, as that will add
     -    ticks to the time and disrupt the expected timestampes in later tag
     +    ticks to the time and disrupt the expected timestamps in later tag
          tests.
      
          Also add performance tests in a new p1300-graph-walks.sh script. This
     @@ Documentation/git-for-each-ref.txt: worktreepath::
       	out, if it is checked out in any linked worktree. Empty string
       	otherwise.
       
     -+ahead-behind:<ref>::
     ++ahead-behind:<committish>::
      +	Two integers, separated by a space, demonstrating the number of
      +	commits ahead and behind, respectively, when comparing the output
     -+	ref to the `<ref>` specified in the format.
     ++	ref to the `<committish>` specified in the format.
      +
       In addition to the above, for commit and tag objects, the header
       field names (`tree`, `parent`, `object`, `type`, and `tag`) can
     @@ builtin/branch.c: static void print_ref_list(struct ref_filter *filter, struct r
       	if (verify_ref_format(format))
       		die(_("unable to parse format string"));
       
     -+	filter_ahead_behind(format, &array);
     ++	filter_ahead_behind(the_repository, format, &array);
       	ref_array_sort(sorting, &array);
       
       	for (i = 0; i < array.nr; i++) {
      
       ## builtin/for-each-ref.c ##
      @@
     - #include "object.h"
       #include "parse-options.h"
       #include "ref-filter.h"
     + #include "strvec.h"
      +#include "commit-reach.h"
       
       static char const * const for_each_ref_usage[] = {
     @@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc, const char **argv, const
       
       	filter.match_as_path = 1;
       	filter_refs(&array, &filter, FILTER_REFS_ALL);
     -+	filter_ahead_behind(&format, &array);
     ++	filter_ahead_behind(the_repository, &format, &array);
      +
       	ref_array_sort(sorting, &array);
       
     @@ builtin/tag.c: static int list_tags(struct ref_filter *filter, struct ref_sortin
       		die(_("unable to parse format string"));
       	filter->with_commit_tag_algo = 1;
       	filter_refs(&array, filter, FILTER_REFS_TAGS);
     -+	filter_ahead_behind(format, &array);
     ++	filter_ahead_behind(the_repository, format, &array);
       	ref_array_sort(sorting, &array);
       
       	for (i = 0; i < array.nr; i++) {
     @@ ref-filter.c: static void reach_filter(struct ref_array *array,
       	free(to_clear);
       }
       
     -+void filter_ahead_behind(struct ref_format *format,
     ++void filter_ahead_behind(struct repository *r,
     ++			 struct ref_format *format,
      +			 struct ref_array *array)
      +{
      +	struct commit **commits;
     @@ ref-filter.c: static void reach_filter(struct ref_array *array,
      +		commits_nr++;
      +	}
      +
     -+	ahead_behind(commits, commits_nr, array->counts, array->counts_nr);
     ++	ahead_behind(r, commits, commits_nr, array->counts, array->counts_nr);
      +	free(commits);
      +}
      +
     @@ ref-filter.h: struct ref_array_item *ref_array_push(struct ref_array *array,
      + *
      + * If this is not called, then any ahead-behind atoms will be blank.
      + */
     -+void filter_ahead_behind(struct ref_format *format,
     ++void filter_ahead_behind(struct repository *r,
     ++			 struct ref_format *format,
      +			 struct ref_array *array);
      +
       #endif /*  REF_FILTER_H  */
     @@ t/t6301-for-each-ref-errors.sh: test_expect_success 'Missing objects are reporte
      +test_expect_success 'ahead-behind requires an argument' '
      +	test_must_fail git for-each-ref \
      +		--format="%(ahead-behind)" 2>err &&
     -+	grep "expected format: %(ahead-behind:<ref>)" err
     ++	echo "fatal: expected format: %(ahead-behind:<ref>)" >expect &&
     ++	test_cmp expect err
      +'
      +
      +test_expect_success 'missing ahead-behind base' '
      +	test_must_fail git for-each-ref \
      +		--format="%(ahead-behind:refs/heads/missing)" 2>err &&
     -+	grep "failed to find '\''refs/heads/missing'\''" err
     ++	echo "fatal: failed to find '\''refs/heads/missing'\''" >expect &&
     ++	test_cmp expect err
      +'
      +
       test_done
     @@ t/t7004-tag.sh: test_expect_success 'annotations for blobs are empty' '
      +	git commit --allow-empty -m "right" &&
      +	git tag -a -m left tag-right &&
      +
     -+	# Use " !" at the end to demonstrate whitepsace
     ++	# Use " !" at the end to demonstrate whitespace
      +	# around empty ahead-behind token for tag-blob.
      +	cat >expect <<-EOF &&
      +	refs/tags/tag-blob  !
 8:  f3fb6833bd7 ! 8:  7476a39331e commit-reach: add tips_reachable_from_bases()
     @@ Commit message
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
       ## commit-reach.c ##
     -@@ commit-reach.c: void ahead_behind(struct commit **commits, size_t commits_nr,
     +@@ commit-reach.c: void ahead_behind(struct repository *r,
       	clear_bit_arrays(&bit_arrays);
       	clear_prio_queue(&queue);
       }
     @@ commit-reach.c: void ahead_behind(struct commit **commits, size_t commits_nr,
      +	return 0;
      +}
      +
     -+void tips_reachable_from_bases(struct commit_list *bases,
     ++void tips_reachable_from_bases(struct repository *r,
     ++			       struct commit_list *bases,
      +			       struct commit **tips, size_t tips_nr,
      +			       int mark)
      +{
     -+	size_t i;
      +	struct commit_and_index *commits;
     -+	unsigned int min_generation_index = 0;
     ++	size_t min_generation_index = 0;
      +	timestamp_t min_generation;
      +	struct commit_list *stack = NULL;
      +
     @@ commit-reach.c: void ahead_behind(struct commit **commits, size_t commits_nr,
      +
      +	CALLOC_ARRAY(commits, tips_nr);
      +
     -+	for (i = 0; i < tips_nr; i++) {
     ++	for (size_t i = 0; i < tips_nr; i++) {
      +		commits[i].commit = tips[i];
      +		commits[i].index = i;
      +		commits[i].generation = commit_graph_generation(tips[i]);
     @@ commit-reach.c: void ahead_behind(struct commit **commits, size_t commits_nr,
      +	min_generation = commits[0].generation;
      +
      +	while (bases) {
     -+		parse_commit(bases->item);
     ++		repo_parse_commit(r, bases->item);
      +		commit_list_insert(bases->item, &stack);
      +		bases = bases->next;
      +	}
      +
      +	while (stack) {
     -+		unsigned int j;
      +		int explored_all_parents = 1;
      +		struct commit_list *p;
      +		struct commit *c = stack->item;
      +		timestamp_t c_gen = commit_graph_generation(c);
      +
      +		/* Does it match any of our tips? */
     -+		for (j = min_generation_index; j < tips_nr; j++) {
     ++		for (size_t j = min_generation_index; j < tips_nr; j++) {
      +			if (c_gen < commits[j].generation)
      +				break;
      +
     @@ commit-reach.c: void ahead_behind(struct commit **commits, size_t commits_nr,
      +		}
      +
      +		for (p = c->parents; p; p = p->next) {
     -+			parse_commit(p->item);
     ++			repo_parse_commit(r, p->item);
      +
      +			/* Have we already explored this parent? */
      +			if (p->item->object.flags & SEEN)
     @@ commit-reach.c: void ahead_behind(struct commit **commits, size_t commits_nr,
      +
      +done:
      +	free(commits);
     -+	repo_clear_commit_marks(the_repository, SEEN);
     ++	repo_clear_commit_marks(r, SEEN);
      +}
      
       ## commit-reach.h ##
     -@@ commit-reach.h: struct ahead_behind_count {
     - void ahead_behind(struct commit **commits, size_t commits_nr,
     +@@ commit-reach.h: void ahead_behind(struct repository *r,
     + 		  struct commit **commits, size_t commits_nr,
       		  struct ahead_behind_count *counts, size_t counts_nr);
       
      +/*
      + * For all tip commits, add 'mark' to their flags if and only if they
      + * are reachable from one of the commits in 'bases'.
      + */
     -+void tips_reachable_from_bases(struct commit_list *bases,
     ++void tips_reachable_from_bases(struct repository *r,
     ++			       struct commit_list *bases,
      +			       struct commit **tips, size_t tips_nr,
      +			       int mark);
      +
     @@ ref-filter.c: static void reach_filter(struct ref_array *array,
      -	revs.limited = 1;
      -	if (prepare_revision_walk(&revs))
      -		die(_("revision walk setup failed"));
     -+	tips_reachable_from_bases(check_reachable,
     ++	tips_reachable_from_bases(the_repository,
     ++				  check_reachable,
      +				  to_clear, array->nr,
      +				  UNINTERESTING);
       

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v3 1/8] for-each-ref: add --stdin option
  2023-03-15 17:45   ` [PATCH v3 " Derrick Stolee via GitGitGadget
@ 2023-03-15 17:45     ` Derrick Stolee via GitGitGadget
  2023-03-15 18:06       ` Jeff King
  2023-03-15 22:41       ` Jonathan Tan
  2023-03-15 17:45     ` [PATCH v3 2/8] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
                       ` (7 subsequent siblings)
  8 siblings, 2 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-15 17:45 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When a user wishes to input a large list of patterns to 'git
for-each-ref' (likely a long list of exact refs) there are frequently
system limits on the number of command-line arguments.

Add a new --stdin option to instead read the patterns from standard
input. Add tests that check that any unrecognized arguments are
considered an error when --stdin is provided. Also, an empty pattern
list is interpreted as the complete ref set.

When reading from stdin, we populate the filter.name_patterns array
dynamically as opposed to pointing to the 'argv' array directly. This is
simple when using a strvec, as it is NULL-terminated in the same way. We
then free the memory directly from the strvec.

Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-for-each-ref.txt |  7 +++++-
 builtin/for-each-ref.c             | 21 ++++++++++++++++-
 t/t6300-for-each-ref.sh            | 37 ++++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 6da899c6296..ccdc2911bb9 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -9,7 +9,8 @@ SYNOPSIS
 --------
 [verse]
 'git for-each-ref' [--count=<count>] [--shell|--perl|--python|--tcl]
-		   [(--sort=<key>)...] [--format=<format>] [<pattern>...]
+		   [(--sort=<key>)...] [--format=<format>]
+		   [ --stdin | <pattern>... ]
 		   [--points-at=<object>]
 		   [--merged[=<object>]] [--no-merged[=<object>]]
 		   [--contains[=<object>]] [--no-contains[=<object>]]
@@ -32,6 +33,10 @@ OPTIONS
 	literally, in the latter case matching completely or from the
 	beginning up to a slash.
 
+--stdin::
+	If `--stdin` is supplied, then the list of patterns is read from
+	standard input instead of from the argument list.
+
 --count=<count>::
 	By default the command shows all refs that match
 	`<pattern>`.  This option makes it stop after showing
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 6f62f40d126..4c5f2324793 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -5,6 +5,7 @@
 #include "object.h"
 #include "parse-options.h"
 #include "ref-filter.h"
+#include "strvec.h"
 
 static char const * const for_each_ref_usage[] = {
 	N_("git for-each-ref [<options>] [<pattern>]"),
@@ -25,6 +26,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	struct ref_format format = REF_FORMAT_INIT;
 	struct strbuf output = STRBUF_INIT;
 	struct strbuf err = STRBUF_INIT;
+	int from_stdin = 0;
+	struct strvec vec = STRVEC_INIT;
 
 	struct option opts[] = {
 		OPT_BIT('s', "shell", &format.quote_style,
@@ -49,6 +52,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 		OPT_CONTAINS(&filter.with_commit, N_("print only refs which contain the commit")),
 		OPT_NO_CONTAINS(&filter.no_commit, N_("print only refs which don't contain the commit")),
 		OPT_BOOL(0, "ignore-case", &icase, N_("sorting and filtering are case insensitive")),
+		OPT_BOOL(0, "stdin", &from_stdin, N_("read reference patterns from stdin")),
 		OPT_END(),
 	};
 
@@ -75,7 +79,21 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
 	filter.ignore_case = icase;
 
-	filter.name_patterns = argv;
+	if (from_stdin) {
+		struct strbuf line = STRBUF_INIT;
+
+		if (argv[0])
+			die(_("unknown arguments supplied with --stdin"));
+
+		while (strbuf_getline(&line, stdin) != EOF)
+			strvec_push(&vec, line.buf);
+
+		/* vec.v is NULL-terminated, just like 'argv'. */
+		filter.name_patterns = vec.v;
+	} else {
+		filter.name_patterns = argv;
+	}
+
 	filter.match_as_path = 1;
 	filter_refs(&array, &filter, FILTER_REFS_ALL);
 	ref_array_sort(sorting, &array);
@@ -97,5 +115,6 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	free_commit_list(filter.with_commit);
 	free_commit_list(filter.no_commit);
 	ref_sorting_release(sorting);
+	strvec_clear(&vec);
 	return 0;
 }
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index c466fd989f1..a58053a54c5 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -1464,4 +1464,41 @@ sig_crlf="$(printf "%s" "$sig" | append_cr; echo dummy)"
 sig_crlf=${sig_crlf%dummy}
 test_atom refs/tags/fake-sig-crlf contents:signature "$sig_crlf"
 
+test_expect_success 'git for-each-ref --stdin: empty' '
+	>in &&
+	git for-each-ref --format="%(refname)" --stdin <in >actual &&
+	git for-each-ref --format="%(refname)" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git for-each-ref --stdin: fails if extra args' '
+	>in &&
+	test_must_fail git for-each-ref --format="%(refname)" \
+		--stdin refs/heads/extra <in 2>err &&
+	grep "unknown arguments supplied with --stdin" err
+'
+
+test_expect_success 'git for-each-ref --stdin: matches' '
+	cat >in <<-EOF &&
+	refs/tags/multi*
+	refs/heads/amb*
+	EOF
+
+	cat >expect <<-EOF &&
+	refs/heads/ambiguous
+	refs/tags/multi-ref1-100000-user1
+	refs/tags/multi-ref1-100000-user2
+	refs/tags/multi-ref1-200000-user1
+	refs/tags/multi-ref1-200000-user2
+	refs/tags/multi-ref2-100000-user1
+	refs/tags/multi-ref2-100000-user2
+	refs/tags/multi-ref2-200000-user1
+	refs/tags/multi-ref2-200000-user2
+	refs/tags/multiline
+	EOF
+
+	git for-each-ref --format="%(refname)" --stdin <in >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v3 2/8] for-each-ref: explicitly test no matches
  2023-03-15 17:45   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  2023-03-15 17:45     ` [PATCH v3 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
@ 2023-03-15 17:45     ` Derrick Stolee via GitGitGadget
  2023-03-15 17:45     ` [PATCH v3 3/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
                       ` (6 subsequent siblings)
  8 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-15 17:45 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The for-each-ref builtin can take a list of ref patterns, but if none
match, it still succeeds (but with no output). Add an explicit test that
demonstrates that behavior.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t6300-for-each-ref.sh | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index a58053a54c5..6614469d2d6 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -1501,4 +1501,17 @@ test_expect_success 'git for-each-ref --stdin: matches' '
 	test_cmp expect actual
 '
 
+test_expect_success 'git for-each-ref with non-existing refs' '
+	cat >in <<-EOF &&
+	refs/heads/this-ref-does-not-exist
+	refs/tags/bogus
+	EOF
+
+	git for-each-ref --format="%(refname)" --stdin <in >actual &&
+	test_must_be_empty actual &&
+
+	xargs git for-each-ref --format="%(refname)" <in >actual &&
+	test_must_be_empty actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v3 3/8] commit-graph: combine generation computations
  2023-03-15 17:45   ` [PATCH v3 " Derrick Stolee via GitGitGadget
  2023-03-15 17:45     ` [PATCH v3 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
  2023-03-15 17:45     ` [PATCH v3 2/8] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
@ 2023-03-15 17:45     ` Derrick Stolee via GitGitGadget
  2023-03-15 22:49       ` Jonathan Tan
  2023-03-15 17:45     ` [PATCH v3 4/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
                       ` (5 subsequent siblings)
  8 siblings, 1 reply; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-15 17:45 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

This patch extracts the common code used to compute topological levels
and corrected committer dates into a common routine,
compute_reachable_generation_numbers_1().

This new routine dispatches to call the necessary functions to get and
set the generation number for a given commit through a vtable (the
compute_generation_info struct).

Computing the generation number itself is done in
compute_generation_from_max(), which dispatches its implementation based
on the generation version requested, or issuing a BUG() for unrecognized
generation versions.

This patch cleans up the two places that currently compute topological
levels and corrected commit dates by reducing the amount of duplicated
code. It also makes it possible to introduce a function which
dynamically computes those values for commits that aren't stored in a
commit-graph, which will be required for the forthcoming ahead-behind
rewrite.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-graph.c | 171 +++++++++++++++++++++++++++++++------------------
 1 file changed, 107 insertions(+), 64 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index c11b59f28b3..deccf984a0d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1446,24 +1446,53 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 	stop_progress(&ctx->progress);
 }
 
-static void compute_topological_levels(struct write_commit_graph_context *ctx)
+struct compute_generation_info {
+	struct repository *r;
+	struct packed_commit_list *commits;
+	struct progress *progress;
+	int progress_cnt;
+
+	timestamp_t (*get_generation)(struct commit *c, void *data);
+	void (*set_generation)(struct commit *c, timestamp_t gen, void *data);
+	void *data;
+};
+
+static timestamp_t compute_generation_from_max(struct commit *c,
+					       timestamp_t max_gen,
+					       int generation_version)
+{
+	switch (generation_version) {
+	case 1: /* topological levels */
+		if (max_gen > GENERATION_NUMBER_V1_MAX - 1)
+			max_gen = GENERATION_NUMBER_V1_MAX - 1;
+		return max_gen + 1;
+
+	case 2: /* corrected commit date */
+		if (c->date && c->date > max_gen)
+			max_gen = c->date - 1;
+		return max_gen + 1;
+
+	default:
+		BUG("attempting unimplemented version");
+	}
+}
+
+static void compute_reachable_generation_numbers_1(
+			struct compute_generation_info *info,
+			int generation_version)
 {
 	int i;
 	struct commit_list *list = NULL;
 
-	if (ctx->report_progress)
-		ctx->progress = start_delayed_progress(
-					_("Computing commit graph topological levels"),
-					ctx->commits.nr);
-	for (i = 0; i < ctx->commits.nr; i++) {
-		struct commit *c = ctx->commits.list[i];
-		uint32_t level;
+	for (i = 0; i < info->commits->nr; i++) {
+		struct commit *c = info->commits->list[i];
+		timestamp_t gen;
+		repo_parse_commit(info->r, c);
+		gen = info->get_generation(c, info->data);
 
-		repo_parse_commit(ctx->r, c);
-		level = *topo_level_slab_at(ctx->topo_levels, c);
+		display_progress(info->progress, info->progress_cnt + 1);
 
-		display_progress(ctx->progress, i + 1);
-		if (level != GENERATION_NUMBER_ZERO)
+		if (gen != GENERATION_NUMBER_ZERO && gen != GENERATION_NUMBER_INFINITY)
 			continue;
 
 		commit_list_insert(c, &list);
@@ -1471,38 +1500,91 @@ static void compute_topological_levels(struct write_commit_graph_context *ctx)
 			struct commit *current = list->item;
 			struct commit_list *parent;
 			int all_parents_computed = 1;
-			uint32_t max_level = 0;
+			uint32_t max_gen = 0;
 
 			for (parent = current->parents; parent; parent = parent->next) {
-				repo_parse_commit(ctx->r, parent->item);
-				level = *topo_level_slab_at(ctx->topo_levels, parent->item);
+				repo_parse_commit(info->r, parent->item);
+				gen = info->get_generation(parent->item, info->data);
 
-				if (level == GENERATION_NUMBER_ZERO) {
+				if (gen == GENERATION_NUMBER_ZERO) {
 					all_parents_computed = 0;
 					commit_list_insert(parent->item, &list);
 					break;
 				}
 
-				if (level > max_level)
-					max_level = level;
+				if (gen > max_gen)
+					max_gen = gen;
 			}
 
 			if (all_parents_computed) {
 				pop_commit(&list);
-
-				if (max_level > GENERATION_NUMBER_V1_MAX - 1)
-					max_level = GENERATION_NUMBER_V1_MAX - 1;
-				*topo_level_slab_at(ctx->topo_levels, current) = max_level + 1;
+				gen = compute_generation_from_max(
+						current, max_gen,
+						generation_version);
+				info->set_generation(current, gen, info->data);
 			}
 		}
 	}
+}
+
+static timestamp_t get_topo_level(struct commit *c, void *data)
+{
+	struct write_commit_graph_context *ctx = data;
+	return *topo_level_slab_at(ctx->topo_levels, c);
+}
+
+static void set_topo_level(struct commit *c, timestamp_t t, void *data)
+{
+	struct write_commit_graph_context *ctx = data;
+	*topo_level_slab_at(ctx->topo_levels, c) = (uint32_t)t;
+	display_progress(ctx->progress, ctx->progress_cnt + 1);
+}
+
+static void compute_topological_levels(struct write_commit_graph_context *ctx)
+{
+	struct compute_generation_info info = {
+		.r = ctx->r,
+		.progress = ctx->progress,
+		.commits = &ctx->commits,
+		.get_generation = get_topo_level,
+		.set_generation = set_topo_level,
+		.data = ctx,
+	};
+
+	if (ctx->report_progress)
+		ctx->progress = start_delayed_progress(
+					_("Computing commit graph topological levels"),
+					ctx->commits.nr);
+
+	compute_reachable_generation_numbers_1(&info, 1);
+
 	stop_progress(&ctx->progress);
 }
 
+static timestamp_t get_generation_from_graph_data(struct commit *c, void *data)
+{
+	return commit_graph_data_at(c)->generation;
+}
+
+static void set_generation_v2(struct commit *c, timestamp_t t, void *data)
+{
+	struct write_commit_graph_context *ctx = data;
+	struct commit_graph_data *g = commit_graph_data_at(c);
+	g->generation = (uint32_t)t;
+	display_progress(ctx->progress, ctx->progress_cnt + 1);
+}
+
 static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 {
 	int i;
-	struct commit_list *list = NULL;
+	struct compute_generation_info info = {
+		.r = ctx->r,
+		.progress = ctx->progress,
+		.commits = &ctx->commits,
+		.get_generation = get_generation_from_graph_data,
+		.set_generation = set_generation_v2,
+		.data = ctx,
+	};
 
 	if (ctx->report_progress)
 		ctx->progress = start_delayed_progress(
@@ -1517,47 +1599,7 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 		}
 	}
 
-	for (i = 0; i < ctx->commits.nr; i++) {
-		struct commit *c = ctx->commits.list[i];
-		timestamp_t corrected_commit_date;
-
-		repo_parse_commit(ctx->r, c);
-		corrected_commit_date = commit_graph_data_at(c)->generation;
-
-		display_progress(ctx->progress, i + 1);
-		if (corrected_commit_date != GENERATION_NUMBER_ZERO)
-			continue;
-
-		commit_list_insert(c, &list);
-		while (list) {
-			struct commit *current = list->item;
-			struct commit_list *parent;
-			int all_parents_computed = 1;
-			timestamp_t max_corrected_commit_date = 0;
-
-			for (parent = current->parents; parent; parent = parent->next) {
-				repo_parse_commit(ctx->r, parent->item);
-				corrected_commit_date = commit_graph_data_at(parent->item)->generation;
-
-				if (corrected_commit_date == GENERATION_NUMBER_ZERO) {
-					all_parents_computed = 0;
-					commit_list_insert(parent->item, &list);
-					break;
-				}
-
-				if (corrected_commit_date > max_corrected_commit_date)
-					max_corrected_commit_date = corrected_commit_date;
-			}
-
-			if (all_parents_computed) {
-				pop_commit(&list);
-
-				if (current->date && current->date > max_corrected_commit_date)
-					max_corrected_commit_date = current->date - 1;
-				commit_graph_data_at(current)->generation = max_corrected_commit_date + 1;
-			}
-		}
-	}
+	compute_reachable_generation_numbers_1(&info, 2);
 
 	for (i = 0; i < ctx->commits.nr; i++) {
 		struct commit *c = ctx->commits.list[i];
@@ -1565,6 +1607,7 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 		if (offset > GENERATION_NUMBER_V2_OFFSET_MAX)
 			ctx->num_generation_data_overflows++;
 	}
+
 	stop_progress(&ctx->progress);
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v3 4/8] commit-graph: return generation from memory
  2023-03-15 17:45   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (2 preceding siblings ...)
  2023-03-15 17:45     ` [PATCH v3 3/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
@ 2023-03-15 17:45     ` Derrick Stolee via GitGitGadget
  2023-03-15 22:58       ` Jonathan Tan
  2023-03-15 17:45     ` [PATCH v3 5/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
                       ` (4 subsequent siblings)
  8 siblings, 1 reply; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-15 17:45 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The commit_graph_generation() method used to report a value of
GENERATION_NUMBER_INFINITY if the commit_graph_data_slab had an instance
for the given commit but the graph_pos indicated the commit was not in
the commit-graph file.

Instead, trust the 'generation' member if the commit has a value in the
slab _and_ the 'generation' member is non-zero. Otherwise, treat it as
GENERATION_NUMBER_INFINITY.

This only makes a difference for a very old case for the commit-graph:
the very first Git release to write commit-graph files wrote zeroes in
the topological level positions. If we are parsing a commit-graph with
all zeroes, those commits will now appear to have
GENERATION_NUMBER_INFINITY (as if they were not parsed from the
commit-graph).

I attempted several variations to work around the need for providing an
uninitialized 'generation' member, but this was the best one I found. It
does require a change to a verification test in t5318 because it reports
a different error than the one about non-zero generation numbers.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-graph.c          | 8 +++-----
 t/t5318-commit-graph.sh | 2 +-
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index deccf984a0d..b4da4e05067 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -116,12 +116,10 @@ timestamp_t commit_graph_generation(const struct commit *c)
 	struct commit_graph_data *data =
 		commit_graph_data_slab_peek(&commit_graph_data_slab, c);
 
-	if (!data)
-		return GENERATION_NUMBER_INFINITY;
-	else if (data->graph_pos == COMMIT_NOT_FROM_GRAPH)
-		return GENERATION_NUMBER_INFINITY;
+	if (data && data->generation)
+		return data->generation;
 
-	return data->generation;
+	return GENERATION_NUMBER_INFINITY;
 }
 
 static struct commit_graph_data *commit_graph_data_at(const struct commit *c)
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 049c5fc8ead..b6e12115786 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -630,7 +630,7 @@ test_expect_success 'detect incorrect generation number' '
 
 test_expect_success 'detect incorrect generation number' '
 	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\01" \
-		"non-zero generation number"
+		"commit-graph generation for commit"
 '
 
 test_expect_success 'detect incorrect commit date' '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v3 5/8] commit-graph: introduce `ensure_generations_valid()`
  2023-03-15 17:45   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (3 preceding siblings ...)
  2023-03-15 17:45     ` [PATCH v3 4/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
@ 2023-03-15 17:45     ` Taylor Blau via GitGitGadget
  2023-03-15 17:45     ` [PATCH v3 6/8] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
                       ` (3 subsequent siblings)
  8 siblings, 0 replies; 90+ messages in thread
From: Taylor Blau via GitGitGadget @ 2023-03-15 17:45 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Taylor Blau

From: Taylor Blau <me@ttaylorr.com>

Use the just-introduced compute_reachable_generation_numbers_1() to
implement a function which dynamically computes topological levels (or
corrected commit dates) for out-of-graph commits.

This will be useful for the ahead-behind algorithm we are about to
introduce, which needs accurate topological levels on _all_ commits
reachable from the tips in order to avoid over-counting.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-graph.c | 29 +++++++++++++++++++++++++++++
 commit-graph.h |  8 ++++++++
 2 files changed, 37 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index b4da4e05067..0df8d27afc8 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1609,6 +1609,35 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 	stop_progress(&ctx->progress);
 }
 
+static void set_generation_in_graph_data(struct commit *c, timestamp_t t,
+					 void *data)
+{
+	commit_graph_data_at(c)->generation = t;
+}
+
+/*
+ * After this method, all commits reachable from those in the given
+ * list will have non-zero, non-infinite generation numbers.
+ */
+void ensure_generations_valid(struct repository *r,
+			      struct commit **commits, size_t nr)
+{
+	int generation_version = get_configured_generation_version(r);
+	struct packed_commit_list list = {
+		.list = commits,
+		.alloc = nr,
+		.nr = nr,
+	};
+	struct compute_generation_info info = {
+		.r = r,
+		.commits = &list,
+		.get_generation = get_generation_from_graph_data,
+		.set_generation = set_generation_in_graph_data,
+	};
+
+	compute_reachable_generation_numbers_1(&info, generation_version);
+}
+
 static void trace2_bloom_filter_write_statistics(struct write_commit_graph_context *ctx)
 {
 	trace2_data_intmax("commit-graph", ctx->r, "filter-computed",
diff --git a/commit-graph.h b/commit-graph.h
index 37faee6b66d..73e182ab2d0 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -190,4 +190,12 @@ struct commit_graph_data {
  */
 timestamp_t commit_graph_generation(const struct commit *);
 uint32_t commit_graph_position(const struct commit *);
+
+/*
+ * After this method, all commits reachable from those in the given
+ * list will have non-zero, non-infinite generation numbers.
+ */
+void ensure_generations_valid(struct repository *r,
+			      struct commit **commits, size_t nr);
+
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v3 6/8] commit-reach: implement ahead_behind() logic
  2023-03-15 17:45   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (4 preceding siblings ...)
  2023-03-15 17:45     ` [PATCH v3 5/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
@ 2023-03-15 17:45     ` Derrick Stolee via GitGitGadget
  2023-03-15 23:28       ` Jonathan Tan
  2023-03-15 17:45     ` [PATCH v3 7/8] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
                       ` (2 subsequent siblings)
  8 siblings, 1 reply; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-15 17:45 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Fully implement the commit-counting logic required to determine
ahead/behind counts for a batch of commit pairs. This is a new library
method within commit-reach.h. This method will be linked to the
for-each-ref builtin in the next change.

The interface for ahead_behind() uses two arrays. The first array of
commits contains the list of all starting points for the walk. This
includes all tip commits _and_ base commits. The second array, using the
new ahead_behind_count struct, indicates which commits from that initial
array form the base/tip pair for the ahead/behind count it will store.

This implementation of ahead_behind() allows multiple bases, if desired.
Even with multiple bases, there is only one commit walk used for
counting the ahead/behind values, saving time when the base/tip ranges
overlap significantly.

This interface for ahead_behind() also makes it very easy to call
ensure_generations_valid() on the entire array of bases and tips. This
call is necessary because it is critical that the walk that counts
ahead/behind values never walks a commit more than once. Without
generation numbers on every commit, there is a possibility that a
commit date skew could cause the walk to revisit a commit and then
double-count it. For this reason, it is strongly recommended that 'git
ahead-behind' is only run in a repository with a commit-graph file that
covers most of the reachable commits, storing precomputed generation
numbers. If no commit-graph exists, this walk will be much slower as it
must walk all reachable commits in ensure_generations_valid() before
performing the counting logic.

It is possible to detect if generation numbers are available at run time
and redirect the implementation to another algorithm that does not
require this property. However, that implementation requires a commit
walk per base/tip pair _and_ can be slower due to the commit date
heuristics required. Such an implementation could be considered in the
future if there is a reason to include it, but most Git hosts should
already be generating a commit-graph file as part of repository
maintenance. Most Git clients should also be generating commit-graph
files as part of background maintenance or automatic GCs.

Now, let's discuss the ahead/behind counting algorithm.

Each commit in the input commit list is associated with a bit position
indicating "the ith commit can reach this commit". Each of these commits
is associated with a bitmap with its position flipped on and then
placed in a queue for walking commit history. We walk commits by popping
the commit with maximum generation number out of the queue, guaranteeing
that we will never walk a child of that commit in any future steps.

As we walk, we load the bitmap for the current commit and perform two
main steps. The _second_ step examines each parent of the current commit
and adds the current commit's bitmap bits to each parent's bitmap. (We
create a new bitmap for the parent if this is our first time seeing that
parent.) After adding the bits to the parent's bitmap, the parent is
added to the walk queue. Due to this passing of bits to parents, the
current commit has a guarantee that the ith bit is enabled on its bitmap
if and only if the ith commit can reach the current commit.

The first step of the walk is to examine the bitmask on the current
commit and decide which ranges the commit is in or not. Due to the "bit
pushing" in the second step, we have a guarantee that the ith bit of the
current commit's bitmap is on if and only if the ith starting commit can
reach it. For each ahead_behind_count struct, check the base_index and
tip_index to see if those bits are enabled on the current bitmap. If
exactly one bit is enabled, then increment the corresponding 'ahead' or
'behind' count.  This increment is the reason we _absolutely need_ to
walk commits at most once.

The only subtle thing to do with this walk is to check to see if a
parent has all bits on in its bitmap, in which case it becomes "stale"
and is marked with the STALE bit. This allows queue_has_nonstale() to be
the terminating condition of the walk, which greatly reduces the number
of commits walked if all of the commits are nearby in history. It avoids
walking a large number of common commits when there is a deep history.
We also use the helper method insert_no_dup() to add commits to the
priority queue without adding them multiple times. This uses the PARENT2
flag. Thus, we must clear both the STALE and PARENT2 bits of all
commits, in case ahead_behind() is called multiple times in the same
process.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-reach.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++
 commit-reach.h | 31 ++++++++++++++++
 2 files changed, 127 insertions(+)

diff --git a/commit-reach.c b/commit-reach.c
index 2e33c599a82..1e5a1c37fb7 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -8,6 +8,7 @@
 #include "revision.h"
 #include "tag.h"
 #include "commit-reach.h"
+#include "ewah/ewok.h"
 
 /* Remember to update object flag allocation in object.h */
 #define PARENT1		(1u<<16)
@@ -941,3 +942,98 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 
 	return found_commits;
 }
+
+define_commit_slab(bit_arrays, struct bitmap *);
+static struct bit_arrays bit_arrays;
+
+static void insert_no_dup(struct prio_queue *queue, struct commit *c)
+{
+	if (c->object.flags & PARENT2)
+		return;
+	prio_queue_put(queue, c);
+	c->object.flags |= PARENT2;
+}
+
+static struct bitmap *init_bit_array(struct commit *c, int width)
+{
+	struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c);
+	if (!*bitmap)
+		*bitmap = bitmap_word_alloc(width);
+	return *bitmap;
+}
+
+static void free_bit_array(struct commit *c)
+{
+	struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c);
+	if (!*bitmap)
+		return;
+	bitmap_free(*bitmap);
+	*bitmap = NULL;
+}
+
+void ahead_behind(struct repository *r,
+		  struct commit **commits, size_t commits_nr,
+		  struct ahead_behind_count *counts, size_t counts_nr)
+{
+	struct prio_queue queue = { .compare = compare_commits_by_gen_then_commit_date };
+	size_t width = (commits_nr + BITS_IN_EWORD - 1) / BITS_IN_EWORD;
+
+	if (!commits_nr || !counts_nr)
+		return;
+
+	for (size_t i = 0; i < counts_nr; i++) {
+		counts[i].ahead = 0;
+		counts[i].behind = 0;
+	}
+
+	ensure_generations_valid(r, commits, commits_nr);
+
+	init_bit_arrays(&bit_arrays);
+
+	for (size_t i = 0; i < commits_nr; i++) {
+		struct commit *c = commits[i];
+		struct bitmap *bitmap = init_bit_array(c, width);
+
+		bitmap_set(bitmap, i);
+		insert_no_dup(&queue, c);
+	}
+
+	while (queue_has_nonstale(&queue)) {
+		struct commit *c = prio_queue_get(&queue);
+		struct commit_list *p;
+		struct bitmap *bitmap_c = init_bit_array(c, width);
+
+		for (size_t i = 0; i < counts_nr; i++) {
+			int reach_from_tip = !!bitmap_get(bitmap_c, counts[i].tip_index);
+			int reach_from_base = !!bitmap_get(bitmap_c, counts[i].base_index);
+
+			if (reach_from_tip ^ reach_from_base) {
+				if (reach_from_base)
+					counts[i].behind++;
+				else
+					counts[i].ahead++;
+			}
+		}
+
+		for (p = c->parents; p; p = p->next) {
+			struct bitmap *bitmap_p;
+
+			repo_parse_commit(r, p->item);
+
+			bitmap_p = init_bit_array(p->item, width);
+			bitmap_or(bitmap_p, bitmap_c);
+
+			if (bitmap_popcount(bitmap_p) == commits_nr)
+				p->item->object.flags |= STALE;
+
+			insert_no_dup(&queue, p->item);
+		}
+
+		free_bit_array(c);
+	}
+
+	/* STALE is used here, PARENT2 is used by insert_no_dup(). */
+	repo_clear_commit_marks(r, PARENT2 | STALE);
+	clear_bit_arrays(&bit_arrays);
+	clear_prio_queue(&queue);
+}
diff --git a/commit-reach.h b/commit-reach.h
index 148b56fea50..f708c46e523 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -104,4 +104,35 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 					 struct commit **to, int nr_to,
 					 unsigned int reachable_flag);
 
+struct ahead_behind_count {
+	/**
+	 * As input, the *_index members indicate which positions in
+	 * the 'tips' array correspond to the tip and base of this
+	 * comparison.
+	 */
+	size_t tip_index;
+	size_t base_index;
+
+	/**
+	 * These values store the computed counts for each side of the
+	 * symmetric difference:
+	 *
+	 * 'ahead' stores the number of commits reachable from the tip
+	 * and not reachable from the base.
+	 *
+	 * 'behind' stores the number of commits reachable from the base
+	 * and not reachable from the tip.
+	 */
+	unsigned int ahead;
+	unsigned int behind;
+};
+
+/*
+ * Given an array of commits and an array of ahead_behind_count pairs,
+ * compute the ahead/behind counts for each pair.
+ */
+void ahead_behind(struct repository *r,
+		  struct commit **commits, size_t commits_nr,
+		  struct ahead_behind_count *counts, size_t counts_nr);
+
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v3 7/8] for-each-ref: add ahead-behind format atom
  2023-03-15 17:45   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (5 preceding siblings ...)
  2023-03-15 17:45     ` [PATCH v3 6/8] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
@ 2023-03-15 17:45     ` Derrick Stolee via GitGitGadget
  2023-03-15 17:45     ` [PATCH v3 8/8] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget
  2023-03-20 11:26     ` [PATCH v4 0/9] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
  8 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-15 17:45 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The previous change implemented the ahead_behind() method, including an
algorithm to compute the ahead/behind values for a number of commit tips
relative to a number of commit bases. Now, integrate that algorithm as
part of 'git for-each-ref' hidden behind a new format atom,
ahead-behind. This naturally extends to 'git branch' and 'git tag'
builtins, as well.

This format allows specifying multiple bases, if so desired, and all
matching references are compared against all of those bases. For this
reason, failing to read a reference provided from these atoms results in
an error.

In order to translate the ahead_behind() method information to the
format output code in ref-filter.c, we must populate arrays of
ahead_behind_count structs. In struct ref_array, we store the full array
that will be passed to ahead_behind(). In struct ref_array_item, we
store an array of pointers that point to the relvant items within the
full array. In this way, we can pull all relevant ahead/behind values
directly when formatting output for a specific item. It also ensures the
lifetime of the ahead_behind_count structs matches the time that the
array is being used.

Add specific tests of the ahead/behind counts in t6600-test-reach.sh, as
it has an interesting repository shape. In particular, its merging
strategy and its use of different commit-graphs would demonstrate over-
counting if the ahead_behind() method did not already account for that
possibility.

Also add tests for the specific for-each-ref, branch, and tag builtins.
In the case of 'git tag', there are intersting cases that happen when
some of the selected tips are not commits. This requires careful logic
around commits_nr in the second loop of filter_ahead_behind(). Also, the
test in t7004 is carefully located to avoid being dependent on the GPG
prereq. It also avoids using the test_commit helper, as that will add
ticks to the time and disrupt the expected timestamps in later tag
tests.

Also add performance tests in a new p1300-graph-walks.sh script. This
will be useful for more uses in the future, but for now compare the
ahead-behind counting algorithm in 'git for-each-ref' to the naive
implementation by running 'git rev-list --count' processes for each
input.

For the Git source code repository, the improvement is already obvious:

Test                                            this tree
---------------------------------------------------------------
1500.2: ahead-behind counts: git for-each-ref   0.07(0.07+0.00)
1500.3: ahead-behind counts: git branch         0.07(0.06+0.00)
1500.4: ahead-behind counts: git tag            0.07(0.06+0.00)
1500.5: ahead-behind counts: git rev-list       1.32(1.04+0.27)

But the standard performance benchmark is the Linux kernel repository,
which demosntrates a significant improvement:

Test                                            this tree
---------------------------------------------------------------
1500.2: ahead-behind counts: git for-each-ref   0.27(0.24+0.02)
1500.3: ahead-behind counts: git branch         0.27(0.24+0.03)
1500.4: ahead-behind counts: git tag            0.28(0.27+0.01)
1500.5: ahead-behind counts: git rev-list       4.57(4.03+0.54)

The 'git rev-list' test exists in this change as a demonstration, but it
will be removed in the next change to avoid wasting time on this
comparison.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-for-each-ref.txt |  5 ++
 builtin/branch.c                   |  1 +
 builtin/for-each-ref.c             |  3 ++
 builtin/tag.c                      |  1 +
 ref-filter.c                       | 71 ++++++++++++++++++++++++
 ref-filter.h                       | 26 ++++++++-
 t/perf/p1500-graph-walks.sh        | 45 ++++++++++++++++
 t/t3203-branch-output.sh           | 14 +++++
 t/t6301-for-each-ref-errors.sh     | 14 +++++
 t/t6600-test-reach.sh              | 86 ++++++++++++++++++++++++++++++
 t/t7004-tag.sh                     | 28 ++++++++++
 11 files changed, 293 insertions(+), 1 deletion(-)
 create mode 100755 t/perf/p1500-graph-walks.sh

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index ccdc2911bb9..0713e49b499 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -222,6 +222,11 @@ worktreepath::
 	out, if it is checked out in any linked worktree. Empty string
 	otherwise.
 
+ahead-behind:<committish>::
+	Two integers, separated by a space, demonstrating the number of
+	commits ahead and behind, respectively, when comparing the output
+	ref to the `<committish>` specified in the format.
+
 In addition to the above, for commit and tag objects, the header
 field names (`tree`, `parent`, `object`, `type`, and `tag`) can
 be used to specify the value in the header field.
diff --git a/builtin/branch.c b/builtin/branch.c
index f63fd45edb9..0554d7cebb3 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -448,6 +448,7 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
 	if (verify_ref_format(format))
 		die(_("unable to parse format string"));
 
+	filter_ahead_behind(the_repository, format, &array);
 	ref_array_sort(sorting, &array);
 
 	for (i = 0; i < array.nr; i++) {
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 4c5f2324793..bb559c343ed 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -6,6 +6,7 @@
 #include "parse-options.h"
 #include "ref-filter.h"
 #include "strvec.h"
+#include "commit-reach.h"
 
 static char const * const for_each_ref_usage[] = {
 	N_("git for-each-ref [<options>] [<pattern>]"),
@@ -96,6 +97,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 
 	filter.match_as_path = 1;
 	filter_refs(&array, &filter, FILTER_REFS_ALL);
+	filter_ahead_behind(the_repository, &format, &array);
+
 	ref_array_sort(sorting, &array);
 
 	if (!maxcount || array.nr < maxcount)
diff --git a/builtin/tag.c b/builtin/tag.c
index d428c45dc8d..1b3f49d7b4c 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -66,6 +66,7 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 		die(_("unable to parse format string"));
 	filter->with_commit_tag_algo = 1;
 	filter_refs(&array, filter, FILTER_REFS_TAGS);
+	filter_ahead_behind(the_repository, format, &array);
 	ref_array_sort(sorting, &array);
 
 	for (i = 0; i < array.nr; i++) {
diff --git a/ref-filter.c b/ref-filter.c
index f8203c6b052..5a94fea7981 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -158,6 +158,7 @@ enum atom_type {
 	ATOM_THEN,
 	ATOM_ELSE,
 	ATOM_REST,
+	ATOM_AHEADBEHIND,
 };
 
 /*
@@ -586,6 +587,16 @@ static int rest_atom_parser(struct ref_format *format, struct used_atom *atom,
 	return 0;
 }
 
+static int ahead_behind_atom_parser(struct ref_format *format, struct used_atom *atom,
+				    const char *arg, struct strbuf *err)
+{
+	if (!arg)
+		return strbuf_addf_ret(err, -1, _("expected format: %%(ahead-behind:<ref>)"));
+
+	string_list_append(&format->bases, arg);
+	return 0;
+}
+
 static int head_atom_parser(struct ref_format *format, struct used_atom *atom,
 			    const char *arg, struct strbuf *err)
 {
@@ -645,6 +656,7 @@ static struct {
 	[ATOM_THEN] = { "then", SOURCE_NONE },
 	[ATOM_ELSE] = { "else", SOURCE_NONE },
 	[ATOM_REST] = { "rest", SOURCE_NONE, FIELD_STR, rest_atom_parser },
+	[ATOM_AHEADBEHIND] = { "ahead-behind", SOURCE_OTHER, FIELD_STR, ahead_behind_atom_parser },
 	/*
 	 * Please update $__git_ref_fieldlist in git-completion.bash
 	 * when you add new atoms
@@ -1848,6 +1860,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 	struct object *obj;
 	int i;
 	struct object_info empty = OBJECT_INFO_INIT;
+	int ahead_behind_atoms = 0;
 
 	CALLOC_ARRAY(ref->value, used_atom_cnt);
 
@@ -1978,6 +1991,16 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 			else
 				v->s = xstrdup("");
 			continue;
+		} else if (atom_type == ATOM_AHEADBEHIND) {
+			if (ref->counts) {
+				const struct ahead_behind_count *count;
+				count = ref->counts[ahead_behind_atoms++];
+				v->s = xstrfmt("%d %d", count->ahead, count->behind);
+			} else {
+				/* Not a commit. */
+				v->s = xstrdup("");
+			}
+			continue;
 		} else
 			continue;
 
@@ -2328,6 +2351,7 @@ static void free_array_item(struct ref_array_item *item)
 			free((char *)item->value[i].s);
 		free(item->value);
 	}
+	free(item->counts);
 	free(item);
 }
 
@@ -2356,6 +2380,8 @@ void ref_array_clear(struct ref_array *array)
 		free_worktrees(ref_to_worktree_map.worktrees);
 		ref_to_worktree_map.worktrees = NULL;
 	}
+
+	FREE_AND_NULL(array->counts);
 }
 
 #define EXCLUDE_REACHED 0
@@ -2418,6 +2444,51 @@ static void reach_filter(struct ref_array *array,
 	free(to_clear);
 }
 
+void filter_ahead_behind(struct repository *r,
+			 struct ref_format *format,
+			 struct ref_array *array)
+{
+	struct commit **commits;
+	size_t commits_nr = format->bases.nr + array->nr;
+
+	if (!format->bases.nr || !array->nr)
+		return;
+
+	ALLOC_ARRAY(commits, commits_nr);
+	for (size_t i = 0; i < format->bases.nr; i++) {
+		const char *name = format->bases.items[i].string;
+		commits[i] = lookup_commit_reference_by_name(name);
+		if (!commits[i])
+			die("failed to find '%s'", name);
+	}
+
+	ALLOC_ARRAY(array->counts, st_mult(format->bases.nr, array->nr));
+
+	commits_nr = format->bases.nr;
+	array->counts_nr = 0;
+	for (size_t i = 0; i < array->nr; i++) {
+		const char *name = array->items[i]->refname;
+		commits[commits_nr] = lookup_commit_reference_by_name(name);
+
+		if (!commits[commits_nr])
+			continue;
+
+		CALLOC_ARRAY(array->items[i]->counts, format->bases.nr);
+		for (size_t j = 0; j < format->bases.nr; j++) {
+			struct ahead_behind_count *count;
+			count = &array->counts[array->counts_nr++];
+			count->tip_index = commits_nr;
+			count->base_index = j;
+
+			array->items[i]->counts[j] = count;
+		}
+		commits_nr++;
+	}
+
+	ahead_behind(r, commits, commits_nr, array->counts, array->counts_nr);
+	free(commits);
+}
+
 /*
  * API for filtering a set of refs. Based on the type of refs the user
  * has requested, we iterate through those refs and apply filters
diff --git a/ref-filter.h b/ref-filter.h
index aa0eea4ecf5..c9a11495177 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -5,6 +5,7 @@
 #include "refs.h"
 #include "commit.h"
 #include "parse-options.h"
+#include "string-list.h"
 
 /* Quoting styles */
 #define QUOTE_NONE 0
@@ -24,6 +25,7 @@
 
 struct atom_value;
 struct ref_sorting;
+struct ahead_behind_count;
 
 enum ref_sorting_order {
 	REF_SORTING_REVERSE = 1<<0,
@@ -40,6 +42,8 @@ struct ref_array_item {
 	const char *symref;
 	struct commit *commit;
 	struct atom_value *value;
+	struct ahead_behind_count **counts;
+
 	char refname[FLEX_ARRAY];
 };
 
@@ -47,6 +51,9 @@ struct ref_array {
 	int nr, alloc;
 	struct ref_array_item **items;
 	struct rev_info *revs;
+
+	struct ahead_behind_count *counts;
+	size_t counts_nr;
 };
 
 struct ref_filter {
@@ -80,9 +87,15 @@ struct ref_format {
 
 	/* Internal state to ref-filter */
 	int need_color_reset_at_eol;
+
+	/* List of bases for ahead-behind counts. */
+	struct string_list bases;
 };
 
-#define REF_FORMAT_INIT { .use_color = -1 }
+#define REF_FORMAT_INIT {             \
+	.use_color = -1,              \
+	.bases = STRING_LIST_INIT_DUP, \
+}
 
 /*  Macros for checking --merged and --no-merged options */
 #define _OPT_MERGED_NO_MERGED(option, filter, h) \
@@ -143,4 +156,15 @@ struct ref_array_item *ref_array_push(struct ref_array *array,
 				      const char *refname,
 				      const struct object_id *oid);
 
+/*
+ * If the provided format includes ahead-behind atoms, then compute the
+ * ahead-behind values for the array of filtered references. Must be
+ * called after filter_refs() but before outputting the formatted refs.
+ *
+ * If this is not called, then any ahead-behind atoms will be blank.
+ */
+void filter_ahead_behind(struct repository *r,
+			 struct ref_format *format,
+			 struct ref_array *array);
+
 #endif /*  REF_FILTER_H  */
diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
new file mode 100755
index 00000000000..439a448c2e6
--- /dev/null
+++ b/t/perf/p1500-graph-walks.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+
+test_description='Commit walk performance tests'
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_expect_success 'setup' '
+	git for-each-ref --format="%(refname)" "refs/heads/*" "refs/tags/*" >allrefs &&
+	sort -r allrefs | head -n 50 >refs &&
+	for ref in $(cat refs)
+	do
+		git branch -f ref-$ref $ref &&
+		echo ref-$ref ||
+		return 1
+	done >branches &&
+	for ref in $(cat refs)
+	do
+		git tag -f tag-$ref $ref &&
+		echo tag-$ref ||
+		return 1
+	done >tags &&
+	git commit-graph write --reachable
+'
+
+test_perf 'ahead-behind counts: git for-each-ref' '
+	git for-each-ref --format="%(ahead-behind:HEAD)" --stdin <refs
+'
+
+test_perf 'ahead-behind counts: git branch' '
+	xargs git branch -l --format="%(ahead-behind:HEAD)" <branches
+'
+
+test_perf 'ahead-behind counts: git tag' '
+	xargs git tag -l --format="%(ahead-behind:HEAD)" <tags
+'
+
+test_perf 'ahead-behind counts: git rev-list' '
+	for r in $(cat refs)
+	do
+		git rev-list --count "HEAD..$r" || return 1
+	done
+'
+
+test_done
diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
index d34d77f8934..1c0f7ea24e7 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3203-branch-output.sh
@@ -337,6 +337,20 @@ test_expect_success 'git branch --format option' '
 	test_cmp expect actual
 '
 
+test_expect_success 'git branch --format with ahead-behind' '
+	cat >expect <<-\EOF &&
+	(HEAD detached from fromtag) 0 0
+	refs/heads/ambiguous 0 0
+	refs/heads/branch-one 1 0
+	refs/heads/branch-two 0 0
+	refs/heads/main 1 0
+	refs/heads/ref-to-branch 1 0
+	refs/heads/ref-to-remote 1 0
+	EOF
+	git branch --format="%(refname) %(ahead-behind:HEAD)" >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'git branch with --format=%(rest) must fail' '
 	test_must_fail git branch --format="%(rest)" >actual
 '
diff --git a/t/t6301-for-each-ref-errors.sh b/t/t6301-for-each-ref-errors.sh
index bfda1f46ad2..47c2c183c30 100755
--- a/t/t6301-for-each-ref-errors.sh
+++ b/t/t6301-for-each-ref-errors.sh
@@ -54,4 +54,18 @@ test_expect_success 'Missing objects are reported correctly' '
 	test_must_be_empty brief-err
 '
 
+test_expect_success 'ahead-behind requires an argument' '
+	test_must_fail git for-each-ref \
+		--format="%(ahead-behind)" 2>err &&
+	echo "fatal: expected format: %(ahead-behind:<ref>)" >expect &&
+	test_cmp expect err
+'
+
+test_expect_success 'missing ahead-behind base' '
+	test_must_fail git for-each-ref \
+		--format="%(ahead-behind:refs/heads/missing)" 2>err &&
+	echo "fatal: failed to find '\''refs/heads/missing'\''" >expect &&
+	test_cmp expect err
+'
+
 test_done
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 338a9c46a24..0cb50797ef7 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -443,4 +443,90 @@ test_expect_success 'get_reachable_subset:none' '
 	test_all_modes get_reachable_subset
 '
 
+test_expect_success 'for-each-ref ahead-behind:linear' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-1-3
+	refs/heads/commit-1-5
+	refs/heads/commit-1-8
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1 0 8
+	refs/heads/commit-1-3 0 6
+	refs/heads/commit-1-5 0 4
+	refs/heads/commit-1-8 0 1
+	EOF
+	run_all_modes git for-each-ref \
+		--format="%(refname) %(ahead-behind:commit-1-9)" --stdin
+'
+
+test_expect_success 'for-each-ref ahead-behind:all' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-2-4
+	refs/heads/commit-4-2
+	refs/heads/commit-4-4
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1 0 24
+	refs/heads/commit-2-4 0 17
+	refs/heads/commit-4-2 0 17
+	refs/heads/commit-4-4 0 9
+	EOF
+	run_all_modes git for-each-ref \
+		--format="%(refname) %(ahead-behind:commit-5-5)" --stdin
+'
+
+test_expect_success 'for-each-ref ahead-behind:some' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-5-3
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1 0 53
+	refs/heads/commit-4-8 8 30
+	refs/heads/commit-5-3 0 39
+	refs/heads/commit-9-9 27 0
+	EOF
+	run_all_modes git for-each-ref \
+		--format="%(refname) %(ahead-behind:commit-9-6)" --stdin
+'
+
+test_expect_success 'for-each-ref ahead-behind:some, multibase' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-5-3
+	refs/heads/commit-7-8
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1 0 53 0 53
+	refs/heads/commit-4-8 8 30 0 22
+	refs/heads/commit-5-3 0 39 0 39
+	refs/heads/commit-7-8 14 12 8 6
+	refs/heads/commit-9-9 27 0 27 0
+	EOF
+	run_all_modes git for-each-ref \
+		--format="%(refname) %(ahead-behind:commit-9-6) %(ahead-behind:commit-6-9)" \
+		--stdin
+'
+
+test_expect_success 'for-each-ref ahead-behind:none' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-7-5
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-4-8 16 16
+	refs/heads/commit-7-5 7 4
+	refs/heads/commit-9-9 49 0
+	EOF
+	run_all_modes git for-each-ref \
+		--format="%(refname) %(ahead-behind:commit-8-4)" --stdin
+'
+
 test_done
diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index 9aa1660651b..04a4b44183d 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -792,6 +792,34 @@ test_expect_success 'annotations for blobs are empty' '
 	test_cmp expect actual
 '
 
+# Run this before doing any signing, so the test has the same results
+# regardless of the GPG prereq.
+test_expect_success 'git tag --format with ahead-behind' '
+	test_when_finished git reset --hard tag-one-line &&
+	git commit --allow-empty -m "left" &&
+	git tag -a -m left tag-left &&
+	git reset --hard HEAD~1 &&
+	git commit --allow-empty -m "right" &&
+	git tag -a -m left tag-right &&
+
+	# Use " !" at the end to demonstrate whitespace
+	# around empty ahead-behind token for tag-blob.
+	cat >expect <<-EOF &&
+	refs/tags/tag-blob  !
+	refs/tags/tag-left 1 1 !
+	refs/tags/tag-lines 0 1 !
+	refs/tags/tag-one-line 0 1 !
+	refs/tags/tag-right 0 0 !
+	refs/tags/tag-zero-lines 0 1 !
+	EOF
+	git tag -l --format="%(refname) %(ahead-behind:HEAD) !" >actual 2>err &&
+	grep "refs/tags/tag" actual >actual.focus &&
+	test_cmp expect actual.focus &&
+
+	# Error reported for tags that point to non-commits.
+	grep "error: object [0-9a-f]* is a blob, not a commit" err
+'
+
 # trying to verify annotated non-signed tags:
 
 test_expect_success GPG \
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v3 8/8] commit-reach: add tips_reachable_from_bases()
  2023-03-15 17:45   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (6 preceding siblings ...)
  2023-03-15 17:45     ` [PATCH v3 7/8] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
@ 2023-03-15 17:45     ` Derrick Stolee via GitGitGadget
  2023-03-20 11:26     ` [PATCH v4 0/9] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
  8 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-15 17:45 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Both 'git for-each-ref --merged=<X>' and 'git branch --merged=<X>' use
the ref-filter machinery to select references or branches (respectively)
that are reachable from a set of commits presented by one or more
--merged arguments. This happens within reach_filter(), which uses the
revision-walk machinery to walk history in a standard way.

However, the commit-reach.c file is full of custom searches that are
more efficient, especially for reachability queries that can terminate
early when reachability is discovered. Add a new
tips_reachable_from_bases() method to commit-reach.c and call it from
within reach_filter() in ref-filter.c. This affects both 'git branch'
and 'git for-each-ref' as tested in p1500-graph-walks.sh.

For the Linux kernel repository, we take an already-fast algorithm and
make it even faster:

Test                                            HEAD~1  HEAD
-------------------------------------------------------------------
1500.5: contains: git for-each-ref --merged     0.13    0.02 -84.6%
1500.6: contains: git branch --merged           0.14    0.02 -85.7%
1500.7: contains: git tag --merged              0.15    0.03 -80.0%

(Note that we remove the iterative 'git rev-list' test from p1500
because it no longer makes sense as a comparison to 'git for-each-ref'
and would just waste time running it for these comparisons.)

The algorithm is implemented in commit-reach.c in the method
tips_reachable_from_base(). This method takes a string_list of tips and
assigns the 'util' for each item with the value 1 if the base commit can
reach those tips.

Like other reachability queries in commit-reach.c, the fastest way to
search for "can A reach B?" is to do a depth-first search up to the
generation number of B, preferring to explore first parents before later
parents. While we must walk all reachable commits up to that generation
number when the answer is "no", the depth-first search can answer "yes"
much faster than other approaches in most cases.

This search becomes trickier when there are multiple targets for the
depth-first search. The commits with lower generation number are more
likely to be within the history of the start commit, but we don't want
to waste time searching commits of low generation number if the commit
target with lowest generation number has already been found.

The trick here is to take the input commits and sort them by generation
number in ascending order. Track the index within this order as
min_generation_index. When we find a commit, if its index in the list is
equal to min_generation_index, then we can increase the generation
number boundary of our search to the next-lowest value in the list.

With this mechanism, the number of commits to search is minimized with
respect to the depth-first search heuristic. We will walk all commits up
to the minimum generation number of a commit that is _not_ reachable
from the start, but we will walk only the necessary portion of the
depth-first search for the reachable commits of lower generation.

Add extra tests for this behavior in t6600-test-reach.sh as the
interesting data shape of that repository can sometimes demonstrate
corner case bugs.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-reach.c              | 113 ++++++++++++++++++++++++++++++++++++
 commit-reach.h              |   9 +++
 ref-filter.c                |  20 ++-----
 t/perf/p1500-graph-walks.sh |  15 +++--
 t/t6600-test-reach.sh       |  83 ++++++++++++++++++++++++++
 5 files changed, 219 insertions(+), 21 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 1e5a1c37fb7..463e9e8fd0e 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1037,3 +1037,116 @@ void ahead_behind(struct repository *r,
 	clear_bit_arrays(&bit_arrays);
 	clear_prio_queue(&queue);
 }
+
+struct commit_and_index {
+	struct commit *commit;
+	unsigned int index;
+	timestamp_t generation;
+};
+
+static int compare_commit_and_index_by_generation(const void *va, const void *vb)
+{
+	const struct commit_and_index *a = (const struct commit_and_index *)va;
+	const struct commit_and_index *b = (const struct commit_and_index *)vb;
+
+	if (a->generation > b->generation)
+		return 1;
+	if (a->generation < b->generation)
+		return -1;
+	return 0;
+}
+
+void tips_reachable_from_bases(struct repository *r,
+			       struct commit_list *bases,
+			       struct commit **tips, size_t tips_nr,
+			       int mark)
+{
+	struct commit_and_index *commits;
+	size_t min_generation_index = 0;
+	timestamp_t min_generation;
+	struct commit_list *stack = NULL;
+
+	if (!bases || !tips || !tips_nr)
+		return;
+
+	/*
+	 * Do a depth-first search starting at 'bases' to search for the
+	 * tips. Stop at the lowest (un-found) generation number. When
+	 * finding the lowest commit, increase the minimum generation
+	 * number to the next lowest (un-found) generation number.
+	 */
+
+	CALLOC_ARRAY(commits, tips_nr);
+
+	for (size_t i = 0; i < tips_nr; i++) {
+		commits[i].commit = tips[i];
+		commits[i].index = i;
+		commits[i].generation = commit_graph_generation(tips[i]);
+	}
+
+	/* Sort with generation number ascending. */
+	QSORT(commits, tips_nr, compare_commit_and_index_by_generation);
+	min_generation = commits[0].generation;
+
+	while (bases) {
+		repo_parse_commit(r, bases->item);
+		commit_list_insert(bases->item, &stack);
+		bases = bases->next;
+	}
+
+	while (stack) {
+		int explored_all_parents = 1;
+		struct commit_list *p;
+		struct commit *c = stack->item;
+		timestamp_t c_gen = commit_graph_generation(c);
+
+		/* Does it match any of our tips? */
+		for (size_t j = min_generation_index; j < tips_nr; j++) {
+			if (c_gen < commits[j].generation)
+				break;
+
+			if (commits[j].commit == c) {
+				tips[commits[j].index]->object.flags |= mark;
+
+				if (j == min_generation_index) {
+					unsigned int k = j + 1;
+					while (k < tips_nr &&
+					       (tips[commits[k].index]->object.flags & mark))
+						k++;
+
+					/* Terminate early if all found. */
+					if (k >= tips_nr)
+						goto done;
+
+					min_generation_index = k;
+					min_generation = commits[k].generation;
+				}
+			}
+		}
+
+		for (p = c->parents; p; p = p->next) {
+			repo_parse_commit(r, p->item);
+
+			/* Have we already explored this parent? */
+			if (p->item->object.flags & SEEN)
+				continue;
+
+			/* Is it below the current minimum generation? */
+			if (commit_graph_generation(p->item) < min_generation)
+				continue;
+
+			/* Ok, we will explore from here on. */
+			p->item->object.flags |= SEEN;
+			explored_all_parents = 0;
+			commit_list_insert(p->item, &stack);
+			break;
+		}
+
+		if (explored_all_parents)
+			pop_commit(&stack);
+	}
+
+done:
+	free(commits);
+	repo_clear_commit_marks(r, SEEN);
+}
diff --git a/commit-reach.h b/commit-reach.h
index f708c46e523..d6321ae700e 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -135,4 +135,13 @@ void ahead_behind(struct repository *r,
 		  struct commit **commits, size_t commits_nr,
 		  struct ahead_behind_count *counts, size_t counts_nr);
 
+/*
+ * For all tip commits, add 'mark' to their flags if and only if they
+ * are reachable from one of the commits in 'bases'.
+ */
+void tips_reachable_from_bases(struct repository *r,
+			       struct commit_list *bases,
+			       struct commit **tips, size_t tips_nr,
+			       int mark);
+
 #endif
diff --git a/ref-filter.c b/ref-filter.c
index 5a94fea7981..c20010a6e94 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2390,33 +2390,22 @@ static void reach_filter(struct ref_array *array,
 			 struct commit_list *check_reachable,
 			 int include_reached)
 {
-	struct rev_info revs;
 	int i, old_nr;
 	struct commit **to_clear;
-	struct commit_list *cr;
 
 	if (!check_reachable)
 		return;
 
 	CALLOC_ARRAY(to_clear, array->nr);
-
-	repo_init_revisions(the_repository, &revs, NULL);
-
 	for (i = 0; i < array->nr; i++) {
 		struct ref_array_item *item = array->items[i];
-		add_pending_object(&revs, &item->commit->object, item->refname);
 		to_clear[i] = item->commit;
 	}
 
-	for (cr = check_reachable; cr; cr = cr->next) {
-		struct commit *merge_commit = cr->item;
-		merge_commit->object.flags |= UNINTERESTING;
-		add_pending_object(&revs, &merge_commit->object, "");
-	}
-
-	revs.limited = 1;
-	if (prepare_revision_walk(&revs))
-		die(_("revision walk setup failed"));
+	tips_reachable_from_bases(the_repository,
+				  check_reachable,
+				  to_clear, array->nr,
+				  UNINTERESTING);
 
 	old_nr = array->nr;
 	array->nr = 0;
@@ -2440,7 +2429,6 @@ static void reach_filter(struct ref_array *array,
 		clear_commit_marks(merge_commit, ALL_REV_FLAGS);
 	}
 
-	release_revisions(&revs);
 	free(to_clear);
 }
 
diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
index 439a448c2e6..e14e7620cce 100755
--- a/t/perf/p1500-graph-walks.sh
+++ b/t/perf/p1500-graph-walks.sh
@@ -35,11 +35,16 @@ test_perf 'ahead-behind counts: git tag' '
 	xargs git tag -l --format="%(ahead-behind:HEAD)" <tags
 '
 
-test_perf 'ahead-behind counts: git rev-list' '
-	for r in $(cat refs)
-	do
-		git rev-list --count "HEAD..$r" || return 1
-	done
+test_perf 'contains: git for-each-ref --merged' '
+	git for-each-ref --merged=HEAD --stdin <refs
+'
+
+test_perf 'contains: git branch --merged' '
+	xargs git branch --merged=HEAD <branches
+'
+
+test_perf 'contains: git tag --merged' '
+	xargs git tag --merged=HEAD <tags
 '
 
 test_done
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 0cb50797ef7..b330945f497 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -529,4 +529,87 @@ test_expect_success 'for-each-ref ahead-behind:none' '
 		--format="%(refname) %(ahead-behind:commit-8-4)" --stdin
 '
 
+test_expect_success 'for-each-ref merged:linear' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-1-3
+	refs/heads/commit-1-5
+	refs/heads/commit-1-8
+	refs/heads/commit-2-1
+	refs/heads/commit-5-1
+	refs/heads/commit-9-1
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-1-3
+	refs/heads/commit-1-5
+	refs/heads/commit-1-8
+	EOF
+	run_all_modes git for-each-ref --merged=commit-1-9 \
+		--format="%(refname)" --stdin
+'
+
+test_expect_success 'for-each-ref merged:all' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-2-4
+	refs/heads/commit-4-2
+	refs/heads/commit-4-4
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-2-4
+	refs/heads/commit-4-2
+	refs/heads/commit-4-4
+	EOF
+	run_all_modes git for-each-ref --merged=commit-5-5 \
+		--format="%(refname)" --stdin
+'
+
+test_expect_success 'for-each-ref ahead-behind:some' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-5-3
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-5-3
+	EOF
+	run_all_modes git for-each-ref --merged=commit-9-6 \
+		--format="%(refname)" --stdin
+'
+
+test_expect_success 'for-each-ref merged:some, multibase' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-5-3
+	refs/heads/commit-7-8
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-4-8
+	refs/heads/commit-5-3
+	EOF
+	run_all_modes git for-each-ref \
+		--merged=commit-5-8 \
+		--merged=commit-8-5 \
+		--format="%(refname)" \
+		--stdin
+'
+
+test_expect_success 'for-each-ref merged:none' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-7-5
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	>expect &&
+	run_all_modes git for-each-ref --merged=commit-8-4 \
+		--format="%(refname)" --stdin
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] for-each-ref: add --stdin option
  2023-03-10 17:20   ` [PATCH v2 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
                       ` (2 preceding siblings ...)
  2023-03-15 13:37     ` Ævar Arnfjörð Bjarmason
@ 2023-03-15 17:49     ` Jeff King
  2023-03-15 19:24       ` Junio C Hamano
  3 siblings, 1 reply; 90+ messages in thread
From: Jeff King @ 2023-03-15 17:49 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, me, vdye, Derrick Stolee

On Fri, Mar 10, 2023 at 05:20:56PM +0000, Derrick Stolee via GitGitGadget wrote:

> When a user wishes to input a large list of patterns to 'git
> for-each-ref' (likely a long list of exact refs) there are frequently
> system limits on the number of command-line arguments.
> 
> Add a new --stdin option to instead read the patterns from standard
> input. Add tests that check that any unrecognized arguments are
> considered an error when --stdin is provided. Also, an empty pattern
> list is interpreted as the complete ref set.
> 
> When reading from stdin, we populate the filter.name_patterns array
> dynamically as opposed to pointing to the 'argv' array directly. This
> requires a careful cast while freeing the individual strings,
> conditioned on the --stdin option.

This is a nice feature to have, but I suspect like other pattern
features in Git (e.g., pathspecs), the matching is linear, and thus
pre-expanding the set of refs you're interested in becomes accidentally
quadratic.

And that seems to be the case here. If I have N refs and feed the whole
set as patterns via --stdin:

-- >8 --
for i in 4000 8000 16000 32000; do
  rm -rf repo
  git init -q repo
  (
    cd repo
    git commit --allow-empty -qm foo
    perl -e '
      my ($oid, $n) = @ARGV;
      print "create refs/heads/branch$_ $oid\n" for (1..$n);
    ' $(git rev-parse HEAD) $i |
    git update-ref --stdin
    git for-each-ref --format='%(refname)' >refs
    echo -n "$i: "
    command time -f %U \
      git.compile for-each-ref --stdin <refs 2>&1 >/dev/null
  )
done
-- 8< --

then the result quadruples for every doubling of the refs.

  4000: 0.32
  8000: 1.33
  16000: 5.10
  32000: 20.90

That may or may not be a show-stopper for your use case, and if not,
I don't think it's something we need to address immediately. But we may
want some kind of "literal" mode, that takes in a list of refs rather
than a list of patterns, and does a sorted-merge with the list of
available refs (or uses a hash table, I guess, but for-each-ref also
tries to avoid even being linear in the total number of refs, so you'd
still want to find the lowest/highest to bound the iteration).

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 1/8] for-each-ref: add --stdin option
  2023-03-15 17:45     ` [PATCH v3 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
@ 2023-03-15 18:06       ` Jeff King
  2023-03-15 19:14         ` Junio C Hamano
  2023-03-15 22:41       ` Jonathan Tan
  1 sibling, 1 reply; 90+ messages in thread
From: Jeff King @ 2023-03-15 18:06 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, vdye, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Derrick Stolee

On Wed, Mar 15, 2023 at 05:45:36PM +0000, Derrick Stolee via GitGitGadget wrote:

> @@ -75,7 +79,21 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>  	ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
>  	filter.ignore_case = icase;
>  
> -	filter.name_patterns = argv;
> +	if (from_stdin) {
> +		struct strbuf line = STRBUF_INIT;
> +
> +		if (argv[0])
> +			die(_("unknown arguments supplied with --stdin"));
> +
> +		while (strbuf_getline(&line, stdin) != EOF)
> +			strvec_push(&vec, line.buf);
> +
> +		/* vec.v is NULL-terminated, just like 'argv'. */
> +		filter.name_patterns = vec.v;
> +	} else {
> +		filter.name_patterns = argv;
> +	}

Now that you aren't detaching the "line" strbuf in each iteration of the
loop, it needs to eventually be cleaned up. strbuf_getline() will
_reset() it, which is good, but at the end we'd need a strbuf_release()
or it will leak.

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 1/8] for-each-ref: add --stdin option
  2023-03-15 18:06       ` Jeff King
@ 2023-03-15 19:14         ` Junio C Hamano
  0 siblings, 0 replies; 90+ messages in thread
From: Junio C Hamano @ 2023-03-15 19:14 UTC (permalink / raw)
  To: Jeff King
  Cc: Derrick Stolee via GitGitGadget, git, me, vdye, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Derrick Stolee

Jeff King <peff@peff.net> writes:

> On Wed, Mar 15, 2023 at 05:45:36PM +0000, Derrick Stolee via GitGitGadget wrote:
>
>> @@ -75,7 +79,21 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>>  	ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
>>  	filter.ignore_case = icase;
>>  
>> -	filter.name_patterns = argv;
>> +	if (from_stdin) {
>> +		struct strbuf line = STRBUF_INIT;
>> +
>> +		if (argv[0])
>> +			die(_("unknown arguments supplied with --stdin"));
>> +
>> +		while (strbuf_getline(&line, stdin) != EOF)
>> +			strvec_push(&vec, line.buf);
>> +
>> +		/* vec.v is NULL-terminated, just like 'argv'. */
>> +		filter.name_patterns = vec.v;
>> +	} else {
>> +		filter.name_patterns = argv;
>> +	}
>
> Now that you aren't detaching the "line" strbuf in each iteration of the
> loop, it needs to eventually be cleaned up. strbuf_getline() will
> _reset() it, which is good, but at the end we'd need a strbuf_release()
> or it will leak.

Nicely spotted.

Thanks.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] for-each-ref: add --stdin option
  2023-03-15 17:49     ` Jeff King
@ 2023-03-15 19:24       ` Junio C Hamano
  2023-03-15 19:44         ` Jeff King
  0 siblings, 1 reply; 90+ messages in thread
From: Junio C Hamano @ 2023-03-15 19:24 UTC (permalink / raw)
  To: Jeff King; +Cc: Derrick Stolee via GitGitGadget, git, me, vdye, Derrick Stolee

Jeff King <peff@peff.net> writes:

> ... we may
> want some kind of "literal" mode, that takes in a list of refs rather
> than a list of patterns, and does a sorted-merge with the list of
> available refs (or uses a hash table, I guess, but for-each-ref also
> tries to avoid even being linear in the total number of refs, so you'd
> still want to find the lowest/highest to bound the iteration).

Exactly.

I actually was wondering if "literal" mode can just take a list of
<things>, and when a <thing> is not a refname, use it as if it
were. I.e. %(refname) would parrot it, while %(refname:short) would
not shorten and still parrot it, if the <thing> is 73876f4861c, but
something like %(subject) would still work.

For that, I suspect ref-filter.c::filter_refs() would need to learn
a different kind fo finter->kind that iterates over the literal
"refs" that was fed from the standard input, instead of calling
for_each_fullref_in() for the given hierarchy, but the new iterator
should be able to reuse the ref_filter_hander() for the heavy
lifting.



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option
  2023-03-15 17:31       ` Jeff King
  2023-03-15 17:44         ` Derrick Stolee
@ 2023-03-15 19:34         ` Junio C Hamano
  1 sibling, 0 replies; 90+ messages in thread
From: Junio C Hamano @ 2023-03-15 19:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Derrick Stolee, Derrick Stolee via GitGitGadget, git, me, vdye

Jeff King <peff@peff.net> writes:

> E.g., imagine resolving "main" to 1234abcd in step one, then somebody
> updates it to 5678cdef, then you run "for-each-ref" to compute
> ahead/behind, and now you show an inconsistent result: you say that
> "main" points to 1234abcd, but show the wrong ahead/behind information.
>
> Showing 1234abcd at all is out-of-date, of course, but the real problem
> is the lack of atomicity. Most porcelain scripts deal with this by
> resolving the refs immediately, assuming object ids are immutable (which
> they are modulo games like refs/replace), and then working with them.

A really paranoid caller can use %(ahead-behind-detail:refs/heads/main)
and get a report on refs/heads/topic, something that conveys

    refs/heads/topic (at 67f9f40d) is ahead by 2 commits and behind
    by 4 commits relative to refs/heads/main (at d7c3a768).

in a machine readable form.  And when the "ahead by 2 commits"
disappears, we know 67f9f40d is merged to main sometime before
d7c3a768.  Then it can say "update-ref -d refs/heads/topic 67f9f40d"
to avoid racing with simultanous updaters.


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] for-each-ref: add --stdin option
  2023-03-15 19:24       ` Junio C Hamano
@ 2023-03-15 19:44         ` Jeff King
  0 siblings, 0 replies; 90+ messages in thread
From: Jeff King @ 2023-03-15 19:44 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, me, vdye, Derrick Stolee

On Wed, Mar 15, 2023 at 12:24:18PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > ... we may
> > want some kind of "literal" mode, that takes in a list of refs rather
> > than a list of patterns, and does a sorted-merge with the list of
> > available refs (or uses a hash table, I guess, but for-each-ref also
> > tries to avoid even being linear in the total number of refs, so you'd
> > still want to find the lowest/highest to bound the iteration).
> 
> Exactly.
> 
> I actually was wondering if "literal" mode can just take a list of
> <things>, and when a <thing> is not a refname, use it as if it
> were. I.e. %(refname) would parrot it, while %(refname:short) would
> not shorten and still parrot it, if the <thing> is 73876f4861c, but
> something like %(subject) would still work.

Yeah, I think that would nicely solve the quadratic issue _and_ the
"we are stuck using only ref tips" issue. I like it.

> For that, I suspect ref-filter.c::filter_refs() would need to learn
> a different kind fo finter->kind that iterates over the literal
> "refs" that was fed from the standard input, instead of calling
> for_each_fullref_in() for the given hierarchy, but the new iterator
> should be able to reuse the ref_filter_hander() for the heavy
> lifting.

Yeah, that sounds about right from my recollection of the code. I
suspect there may be other sharp edges (e.g., asking for %(upstream)
isn't meaningful for a non-ref). But softening those is actually
something I think we want to do in the long run, as it helps with the
long-term goal of sharing pretty-printing code between ref-filter,
cat-file, and pretty.c.

-Peff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 1/8] for-each-ref: add --stdin option
  2023-03-15 17:45     ` [PATCH v3 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
  2023-03-15 18:06       ` Jeff King
@ 2023-03-15 22:41       ` Jonathan Tan
  1 sibling, 0 replies; 90+ messages in thread
From: Jonathan Tan @ 2023-03-15 22:41 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Jonathan Tan, git, gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> @@ -75,7 +79,21 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>  	ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
>  	filter.ignore_case = icase;
>  
> -	filter.name_patterns = argv;
> +	if (from_stdin) {
> +		struct strbuf line = STRBUF_INIT;
> +
> +		if (argv[0])
> +			die(_("unknown arguments supplied with --stdin"));

As a reference point, both fetch and send-pack accept input from stdin
too, and use them to augment what's provided on the CLI (and not only
accept them when nothing's provided on the CLI).

Having said that, it's also reasonable to start by prohibiting CLI
arguments when --stdin is given, and we can always relax this later
if needed.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 3/8] commit-graph: combine generation computations
  2023-03-15 17:45     ` [PATCH v3 3/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
@ 2023-03-15 22:49       ` Jonathan Tan
  2023-03-17 18:30         ` Derrick Stolee
  0 siblings, 1 reply; 90+ messages in thread
From: Jonathan Tan @ 2023-03-15 22:49 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Jonathan Tan, git, gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> +static void compute_reachable_generation_numbers_1(
> +			struct compute_generation_info *info,
> +			int generation_version)
>  {
>  	int i;
>  	struct commit_list *list = NULL;
>  
> -	if (ctx->report_progress)
> -		ctx->progress = start_delayed_progress(
> -					_("Computing commit graph topological levels"),
> -					ctx->commits.nr);
> -	for (i = 0; i < ctx->commits.nr; i++) {
> -		struct commit *c = ctx->commits.list[i];
> -		uint32_t level;
> +	for (i = 0; i < info->commits->nr; i++) {
> +		struct commit *c = info->commits->list[i];
> +		timestamp_t gen;
> +		repo_parse_commit(info->r, c);
> +		gen = info->get_generation(c, info->data);
>  
> -		repo_parse_commit(ctx->r, c);
> -		level = *topo_level_slab_at(ctx->topo_levels, c);
> +		display_progress(info->progress, info->progress_cnt + 1);
>  
> -		display_progress(ctx->progress, i + 1);
> -		if (level != GENERATION_NUMBER_ZERO)
> +		if (gen != GENERATION_NUMBER_ZERO && gen != GENERATION_NUMBER_INFINITY)
>  			continue;
>  
>  		commit_list_insert(c, &list);

So this replaces a call to display_progress with another...

>  			if (all_parents_computed) {
>  				pop_commit(&list);
> -
> -				if (max_level > GENERATION_NUMBER_V1_MAX - 1)
> -					max_level = GENERATION_NUMBER_V1_MAX - 1;
> -				*topo_level_slab_at(ctx->topo_levels, current) = max_level + 1;
> +				gen = compute_generation_from_max(
> +						current, max_gen,
> +						generation_version);
> +				info->set_generation(current, gen, info->data);
>  			}

...here is where set_generation is called...

> +static void set_topo_level(struct commit *c, timestamp_t t, void *data)
> +{
> +	struct write_commit_graph_context *ctx = data;
> +	*topo_level_slab_at(ctx->topo_levels, c) = (uint32_t)t;
> +	display_progress(ctx->progress, ctx->progress_cnt + 1);
> +}

...is this display_progress() redundant? (set_topo_level() is one of the
possibilities that set_generation could be assigned to.) There already
seems to be one at the top. Further supporting my query is the fact that
in the hunk containing set_generation, there is no progress report on
the LHS of the diff.

> +static void set_generation_v2(struct commit *c, timestamp_t t, void *data)
> +{
> +	struct write_commit_graph_context *ctx = data;
> +	struct commit_graph_data *g = commit_graph_data_at(c);
> +	g->generation = (uint32_t)t;
> +	display_progress(ctx->progress, ctx->progress_cnt + 1);
> +}

Likewise for this function.

Everything else up to and including this patch looks good.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 4/8] commit-graph: return generation from memory
  2023-03-15 17:45     ` [PATCH v3 4/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
@ 2023-03-15 22:58       ` Jonathan Tan
  0 siblings, 0 replies; 90+ messages in thread
From: Jonathan Tan @ 2023-03-15 22:58 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Jonathan Tan, git, gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> The commit_graph_generation() method used to report a value of
> GENERATION_NUMBER_INFINITY if the commit_graph_data_slab had an instance
> for the given commit but the graph_pos indicated the commit was not in
> the commit-graph file.
> 
> Instead, trust the 'generation' member if the commit has a value in the
> slab _and_ the 'generation' member is non-zero. Otherwise, treat it as
> GENERATION_NUMBER_INFINITY.

I would replace "Instead" with "However, a future commit intends to
compute and use commit generation numbers even for commits that are
not in the commit-graph file (and thus have no graph_pos). Therefore,
we need a new criterion for deciding if a generation number can be
trusted:" (or something to that effect).

> This only makes a difference for a very old case for the commit-graph:
> the very first Git release to write commit-graph files wrote zeroes in
> the topological level positions. If we are parsing a commit-graph with
> all zeroes, those commits will now appear to have
> GENERATION_NUMBER_INFINITY (as if they were not parsed from the
> commit-graph).
> 
> I attempted several variations to work around the need for providing an
> uninitialized 'generation' member, but this was the best one I found. It
> does require a change to a verification test in t5318 because it reports
> a different error than the one about non-zero generation numbers.

Thanks for investigating, and I think the method in this patch would
work. As you have stated, this only affects the commit-graph files that
once upon a time were written with no generation numbers, and this patch
makes those behave as if there were no generation numbers in the first
place (which is exactly what happened).


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 6/8] commit-reach: implement ahead_behind() logic
  2023-03-15 17:45     ` [PATCH v3 6/8] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
@ 2023-03-15 23:28       ` Jonathan Tan
  2023-03-17 18:44         ` Derrick Stolee
  0 siblings, 1 reply; 90+ messages in thread
From: Jonathan Tan @ 2023-03-15 23:28 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Jonathan Tan, git, gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Derrick Stolee

First of all, thanks to Taylor and Stolee for this algorithm and code
- it is straightforwardly written and looks correct to me. I have some
commit message and code comment suggestions that if taken, would have
helped me on my first reading, but these are subjective so feel free
to ignore them if you think they would add unnecessary detail (I did
understand the algorithm in the end, after all).

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> The second array, using the
> new ahead_behind_count struct, indicates which commits from that initial
> array form the base/tip pair for the ahead/behind count it will store.

I would have preferred: The second array contains base/tip pairs
designating pairs of commits for which ahead/behind counts need to be
computed, each pair being a pair of indexes into the first array.

> Each commit in the input commit list is associated with a bit position
> indicating "the ith commit can reach this commit". Each of these commits
> is associated with a bitmap with its position flipped on and then
> placed in a queue for walking commit history. 

"this commit" is not necessarily a commit in the input commit list (it
is actually the commit that we're currently at in our iteration) and I
think that the association of bitmaps with commits in the input commit
list could be more clearly described. So I would have preferred: Each
commit in the priority queue is associated with a bitmap of width N
(N being the count of commits in the first array), in which a bit is
set iff the commit can be reached by the corresponding commit in the
first array. This is different from packfile or MIDX bitmaps in that
a commit's bitmap stores what can reach it, not what it can reach.
The priority queue is initialized with N commits, each commit being
associated with a bitmap in which a single bit is set (indicating that
the commit can be reached by itself).

> +void ahead_behind(struct repository *r,
> +		  struct commit **commits, size_t commits_nr,
> +		  struct ahead_behind_count *counts, size_t counts_nr)
> +{
> +	struct prio_queue queue = { .compare = compare_commits_by_gen_then_commit_date };
> +	size_t width = (commits_nr + BITS_IN_EWORD - 1) / BITS_IN_EWORD;

As we discussed in our Review Club, DIV_ROUND_UP can be used for this.

(For those reading who do not know what Review Club is, search the
archives and/or look out for future announcements!)

> +			if (bitmap_popcount(bitmap_p) == commits_nr)
> +				p->item->object.flags |= STALE;

Might be worth adding a comment above the STALE line: this parent commit
and all its ancestors can be reached by every commit in the commits
list and thus can never be "ahead" or "behind" in any pair; mark this
STALE so that, as an optimization, we can stop iteration if only STALE
commits remain (since further iteration would never change any "ahead"
or "behind" value).


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 3/8] commit-graph: combine generation computations
  2023-03-15 22:49       ` Jonathan Tan
@ 2023-03-17 18:30         ` Derrick Stolee
  0 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee @ 2023-03-17 18:30 UTC (permalink / raw)
  To: Jonathan Tan, Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason

On 3/15/2023 6:49 PM, Jonathan Tan wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> +static void compute_reachable_generation_numbers_1(
>> +			struct compute_generation_info *info,
>> +			int generation_version)
>>  {
>>  	int i;
>>  	struct commit_list *list = NULL;
>>  
>> -	if (ctx->report_progress)
>> -		ctx->progress = start_delayed_progress(
>> -					_("Computing commit graph topological levels"),
>> -					ctx->commits.nr);
>> -	for (i = 0; i < ctx->commits.nr; i++) {
>> -		struct commit *c = ctx->commits.list[i];
>> -		uint32_t level;
>> +	for (i = 0; i < info->commits->nr; i++) {
>> +		struct commit *c = info->commits->list[i];
>> +		timestamp_t gen;
>> +		repo_parse_commit(info->r, c);
>> +		gen = info->get_generation(c, info->data);
>>  
>> -		repo_parse_commit(ctx->r, c);
>> -		level = *topo_level_slab_at(ctx->topo_levels, c);
>> +		display_progress(info->progress, info->progress_cnt + 1);
>>  
>> -		display_progress(ctx->progress, i + 1);
>> -		if (level != GENERATION_NUMBER_ZERO)
>> +		if (gen != GENERATION_NUMBER_ZERO && gen != GENERATION_NUMBER_INFINITY)
>>  			continue;
>>  
>>  		commit_list_insert(c, &list);
> 
> So this replaces a call to display_progress with another...
> 
>>  			if (all_parents_computed) {
>>  				pop_commit(&list);
>> -
>> -				if (max_level > GENERATION_NUMBER_V1_MAX - 1)
>> -					max_level = GENERATION_NUMBER_V1_MAX - 1;
>> -				*topo_level_slab_at(ctx->topo_levels, current) = max_level + 1;
>> +				gen = compute_generation_from_max(
>> +						current, max_gen,
>> +						generation_version);
>> +				info->set_generation(current, gen, info->data);
>>  			}
> 
> ...here is where set_generation is called...
> 
>> +static void set_topo_level(struct commit *c, timestamp_t t, void *data)
>> +{
>> +	struct write_commit_graph_context *ctx = data;
>> +	*topo_level_slab_at(ctx->topo_levels, c) = (uint32_t)t;
>> +	display_progress(ctx->progress, ctx->progress_cnt + 1);
>> +}
> 
> ...is this display_progress() redundant? (set_topo_level() is one of the
> possibilities that set_generation could be assigned to.) There already
> seems to be one at the top. Further supporting my query is the fact that
> in the hunk containing set_generation, there is no progress report on
> the LHS of the diff.

It turns out the progress is a bit redundant here, but not entirely in
the case of ensure_generations_valid() (if progress was enabled).

Let's break down the iteration, which has nested loops:

 1. for all commits in the initial list.
   2. perform DFS until generation can be computed. (while loop)

When writing a commit-graph file, that initial list is _every commit
in the commit-graph_, so having a display_progress() in the for loop
is sufficient to get the exact number.

In the case of ensure_generations_valid(), the number of assignments
in the while loop can be much larger than the initial input list.

However, ensure_generations_valid() does not use progress _and_ even
if it did, it would make sense to signal progress based on the number
of tips that need to be computed. I'll remove these progress counts
inside the mutators.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v3 6/8] commit-reach: implement ahead_behind() logic
  2023-03-15 23:28       ` Jonathan Tan
@ 2023-03-17 18:44         ` Derrick Stolee
  0 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee @ 2023-03-17 18:44 UTC (permalink / raw)
  To: Jonathan Tan, Derrick Stolee via GitGitGadget
  Cc: git, gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason

On 3/15/2023 7:28 PM, Jonathan Tan wrote:
> First of all, thanks to Taylor and Stolee for this algorithm and code
> - it is straightforwardly written and looks correct to me. I have some
> commit message and code comment suggestions that if taken, would have
> helped me on my first reading, but these are subjective so feel free
> to ignore them if you think they would add unnecessary detail (I did
> understand the algorithm in the end, after all).

I appreciate your comments here. I've done some reworking of the
message based on what you say here, as well as the verbal feedback from
review club.

>> +void ahead_behind(struct repository *r,
>> +		  struct commit **commits, size_t commits_nr,
>> +		  struct ahead_behind_count *counts, size_t counts_nr)
>> +{
>> +	struct prio_queue queue = { .compare = compare_commits_by_gen_then_commit_date };
>> +	size_t width = (commits_nr + BITS_IN_EWORD - 1) / BITS_IN_EWORD;
> 
> As we discussed in our Review Club, DIV_ROUND_UP can be used for this.

Got it!

>> +			if (bitmap_popcount(bitmap_p) == commits_nr)
>> +				p->item->object.flags |= STALE;
> 
> Might be worth adding a comment above the STALE line: this parent commit
> and all its ancestors can be reached by every commit in the commits
> list and thus can never be "ahead" or "behind" in any pair; mark this
> STALE so that, as an optimization, we can stop iteration if only STALE
> commits remain (since further iteration would never change any "ahead"
> or "behind" value).

This is a helpful thing to point out, so a comment is appropriate.

Overall, maybe algorithms like this should have more inline comments
than we typically expect in the Git codebase. We want to make sure that
these things are readable in the future, hopefully without digging too
far in the history to find the lengthy commit message about it.

I'll delay sending v4 until giving a little time to hear back on this
point. My default is to not add the comments, but I'd be happy to, if
we think this is an appropriate time to deviate from the standard.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v4 0/9] ref-filter: ahead/behind counting, faster --merged option
  2023-03-15 17:45   ` [PATCH v3 " Derrick Stolee via GitGitGadget
                       ` (7 preceding siblings ...)
  2023-03-15 17:45     ` [PATCH v3 8/8] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget
@ 2023-03-20 11:26     ` Derrick Stolee via GitGitGadget
  2023-03-20 11:26       ` [PATCH v4 1/9] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
                         ` (8 more replies)
  8 siblings, 9 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-20 11:26 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Jonathan Tan,
	Derrick Stolee

At $DAYJOB, we have used a custom 'ahead-behind' builtin in our fork of Git
for lots of reasons. The main goal of the builtin is to compare multiple
references against a common base reference. The comparison is number of
commits that are in each side of the symmtric difference of their reachable
sets. A commit C is "ahead" of a commit B by the number of commits in B..C
(reachable from C but not reachable from B). Similarly, the commit C is
"behind" the commit B by the number of commits in C..B (reachable from B but
not reachable from C).

These numbers can be computed by 'git rev-list --count B..C' and 'git
rev-list --count C..B', but there are common needs that benefit from having
the checks being done in the same process:

 1. Our "branches" page lists ahead/behind counts for each listed branch as
    compared to the repo's default branch. This can be done with a single
    'git ahead-behind' process.
 2. When a branch is updated, a background job checks if any pull requests
    that target that branch should be closed because their branches were
    merged implicitly by that update. These queries can be batched into 'git
    ahead-behind' calls.

In that second example, we don't need the full ahead/behind counts (although
it is sufficient to look for branches that are "zero commits ahead", meaning
they are reachable from the base), and instead reachability is the critical
piece.

This series contributes the custom algorithms we used for our 'git
ahead-behind' builtin, but as extensions to 'git for-each-ref':

 * Add a new "%(ahead-behind:<base>)" format token to for-each-ref which
   allows outputting the ahead/behind values in the format string for a
   matching ref.
 * Add a new algorithm that speeds up the 'git for-each-ref --merged='
   option. This also applies to the 'git branch --merged=' option.

The idea to use 'git for-each-ref' instead of creating a new builtin is from
Junio, and simplifies this series significantly compared to v1. I was
initially concerned about the overhead of 'git for-each-ref' and its
generality and sorting, but I was not able to measure any important
difference between this implementation and our internal 'git ahead-behind'
implementation. In particular, when a pattern is given to 'git for-each-ref'
that looks like an exact ref, it navigates directly to the ref instead of
scanning all references for matches.

However, for our specific uses, we like to batch a list of exact references
that could be very long. We introduce a new --stdin option here.

To keep things close to the v1 outline, I replaced the existing patches with
closely-related ones, when possible.

Patch 1 adds the --stdin option to 'git for-each-ref'. (This is similar to
the boilerplate patch from v1.)

Patch 2 adds a test to explicitly check that 'git for-each-ref' will still
succeed when all input refs are missing. (This is similar to the
--ignore-missing patch from v1.)

Patches 3-6 introduce a new method: ensure_generations_valid(). Patch 3
refactors the existing compute_topological_levels() to make it more generic
while Patch 4 ports compute_generation_numbers() onto that generic base.
Patch 5 updates the definition of commit_graph_generation() slightly, making
way for patch 6 to implement the in-memory computation of generation
numbers. With an existing commit-graph file, the commits that are not
present in the file are considered as having generation number "infinity".
This is useful for most of our reachability queries to this point, since
those commits are "above" the ones tracked by the commit-graph. When these
commits are low in number, then there is very little performance cost and
zero correctness cost. (These patches match v1 exactly.)

However, we will see that the ahead/behind computation requires accurate
generation numbers to avoid overcounting. Thus, ensure_generations_valid()
isa way to specify a list of commits that need generation numbers computed
before continuing. It's a no-op if all of those commits are in the
commit-graph file. It's expensive if the commit-graph doesn't exist.
However, '%(ahead-behind:)' computations are likely to be slow no matter
what without a commit-graph, so assuming an existing commit-graph file is
reasonable. If we find sufficient desire to have an implementation that does
not have this requirement, we could create a second implementation and
toggle to it when generation_numbers_enabled() returns false.

Patch 7 implements the ahead-behind algorithm, but it is not connected to a
builtin. It's a long commit message, so hopefully it explains the algorithm
sufficiently. (The difference from v1 is that it no longer integrates with a
builtin and there are no new tests. It also uses 'unsigned int' and is
correctly co-authored by Taylor.)

Patch 8 integrates the ahead-behind algorithm with the ref-filter code,
including parsing the "ahead-behind" token. This finally adds tests that
check both ahead_behind() and ensure_generations_valid() via
t6600-test-reach.sh. (This patch is essentially completely new in v2.)

Patch 9 implements the tips_reachable_from_base() method, and uses it within
the ref-filter code to speed up 'git for-each-ref --merged' and 'git branch
--merged'. (The interface is slightly different than v1, due to the needs of
the new caller.)


Updates in v4
=============

 * A string leak in Patch 1 is remedied.
 * v3's patch 3 is split into v4's patches 3 and 4. Patch 3 now only
   refactors compute_topological_levels(), leaving Patch 4 to only port
   compute_generation_numbers() to the new
   compute_reachable_generation_numbers().
 * Other edits in Patch 3:
   * compute_reachable_generation_numbers() drops the "_1" in its name.
   * Redundant progress counters are removed.
   * The progress struct is passed down into 'struct
     compute_generation_info' and not just the 'struct
     write_commit_graph_context'
 * Patch 6 adds context around why we care about in-memory generation
   number.
 * Patch 7 has several changes:
   * Includes extra context around the input data to ahead_behind() as well
     as some rewrite of the algorithm description.
   * uses DIV_ROUND_UP() to initialize the bitmap width.
   * renames init_bit_array() to get_bit_array() to make it clear that it
     will not regenerate the bitmap if it already exists.
   * adds a new comment before assigning the STALE bit
 * Patch 8 changes the expected format to output <committish> and checks the
   given base during the token parsing instead of delaying.


Updates in v3
=============

 * The APIs are modified to take a 'struct repository *' and use them
   appropriately.
 * The --stdin option in 'git for-each-ref' now uses strvec instead of an
   ad-hoc array.

Thanks, -Stolee

Derrick Stolee (8):
  for-each-ref: add --stdin option
  for-each-ref: explicitly test no matches
  commit-graph: refactor compute_topological_levels()
  commit-graph: simplify compute_generation_numbers()
  commit-graph: return generation from memory
  commit-reach: implement ahead_behind() logic
  for-each-ref: add ahead-behind format atom
  commit-reach: add tips_reachable_from_bases()

Taylor Blau (1):
  commit-graph: introduce `ensure_generations_valid()`

 Documentation/git-for-each-ref.txt |  12 +-
 builtin/branch.c                   |   1 +
 builtin/for-each-ref.c             |  26 +++-
 builtin/tag.c                      |   1 +
 commit-graph.c                     | 207 +++++++++++++++++----------
 commit-graph.h                     |   8 ++
 commit-reach.c                     | 216 +++++++++++++++++++++++++++++
 commit-reach.h                     |  40 ++++++
 ref-filter.c                       |  93 ++++++++++---
 ref-filter.h                       |  26 +++-
 t/perf/p1500-graph-walks.sh        |  50 +++++++
 t/t3203-branch-output.sh           |  14 ++
 t/t5318-commit-graph.sh            |   2 +-
 t/t6300-for-each-ref.sh            |  50 +++++++
 t/t6301-for-each-ref-errors.sh     |  14 ++
 t/t6600-test-reach.sh              | 169 ++++++++++++++++++++++
 t/t7004-tag.sh                     |  28 ++++
 17 files changed, 866 insertions(+), 91 deletions(-)
 create mode 100755 t/perf/p1500-graph-walks.sh


base-commit: 725f57037d81e24eacfda6e59a19c60c0b4c8062
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1489%2Fderrickstolee%2Fstolee%2Fupstream-ahead-behind-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1489/derrickstolee/stolee/upstream-ahead-behind-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1489

Range-diff vs v3:

  1:  f9e80e233f1 !  1:  27d94077aa9 for-each-ref: add --stdin option
     @@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc, const char **argv, const
      +		while (strbuf_getline(&line, stdin) != EOF)
      +			strvec_push(&vec, line.buf);
      +
     ++		strbuf_release(&line);
     ++
      +		/* vec.v is NULL-terminated, just like 'argv'. */
      +		filter.name_patterns = vec.v;
      +	} else {
  2:  f56d6a64d24 =  2:  1e3d499431a for-each-ref: explicitly test no matches
  3:  3b15e9df770 !  3:  79a57f30a85 commit-graph: combine generation computations
     @@ Metadata
      Author: Derrick Stolee <derrickstolee@github.com>
      
       ## Commit message ##
     -    commit-graph: combine generation computations
     +    commit-graph: refactor compute_topological_levels()
      
          This patch extracts the common code used to compute topological levels
          and corrected committer dates into a common routine,
     -    compute_reachable_generation_numbers_1().
     +    compute_reachable_generation_numbers(). For ease of reading, it only
     +    modifies compute_topological_levels() to use this new routine, leaving
     +    compute_generation_numbers() to be modified in the next change.
      
          This new routine dispatches to call the necessary functions to get and
          set the generation number for a given commit through a vtable (the
     @@ Commit message
          Computing the generation number itself is done in
          compute_generation_from_max(), which dispatches its implementation based
          on the generation version requested, or issuing a BUG() for unrecognized
     -    generation versions.
     +    generation versions. This does not use a vtable because the logic
     +    depends only on the generation number version, not where the data is
     +    being loaded from or being stored to. This is a subtle point that will
     +    make more sense in a future change that modifies the in-memory
     +    generation values instead of just preparing values for writing to a
     +    commit-graph file.
      
     -    This patch cleans up the two places that currently compute topological
     -    levels and corrected commit dates by reducing the amount of duplicated
     -    code. It also makes it possible to introduce a function which
     -    dynamically computes those values for commits that aren't stored in a
     -    commit-graph, which will be required for the forthcoming ahead-behind
     -    rewrite.
     +    This change looks like it adds a lot of new code. However, two upcoming
     +    changes will be quite small due to the work being done in this change.
      
     +    Co-authored-by: Taylor Blau <me@ttaylorr.com>
          Signed-off-by: Taylor Blau <me@ttaylorr.com>
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
     @@ commit-graph.c: static void close_reachable(struct write_commit_graph_context *c
      +	}
      +}
      +
     -+static void compute_reachable_generation_numbers_1(
     ++static void compute_reachable_generation_numbers(
      +			struct compute_generation_info *info,
      +			int generation_version)
       {
     @@ commit-graph.c: static void close_reachable(struct write_commit_graph_context *c
      -	for (i = 0; i < ctx->commits.nr; i++) {
      -		struct commit *c = ctx->commits.list[i];
      -		uint32_t level;
     +-
     +-		repo_parse_commit(ctx->r, c);
     +-		level = *topo_level_slab_at(ctx->topo_levels, c);
      +	for (i = 0; i < info->commits->nr; i++) {
      +		struct commit *c = info->commits->list[i];
      +		timestamp_t gen;
      +		repo_parse_commit(info->r, c);
      +		gen = info->get_generation(c, info->data);
     - 
     --		repo_parse_commit(ctx->r, c);
     --		level = *topo_level_slab_at(ctx->topo_levels, c);
      +		display_progress(info->progress, info->progress_cnt + 1);
       
      -		display_progress(ctx->progress, i + 1);
     @@ commit-graph.c: static void compute_topological_levels(struct write_commit_graph
      +{
      +	struct write_commit_graph_context *ctx = data;
      +	*topo_level_slab_at(ctx->topo_levels, c) = (uint32_t)t;
     -+	display_progress(ctx->progress, ctx->progress_cnt + 1);
      +}
      +
      +static void compute_topological_levels(struct write_commit_graph_context *ctx)
      +{
      +	struct compute_generation_info info = {
      +		.r = ctx->r,
     -+		.progress = ctx->progress,
      +		.commits = &ctx->commits,
      +		.get_generation = get_topo_level,
      +		.set_generation = set_topo_level,
     @@ commit-graph.c: static void compute_topological_levels(struct write_commit_graph
      +	};
      +
      +	if (ctx->report_progress)
     -+		ctx->progress = start_delayed_progress(
     ++		info.progress = ctx->progress
     ++			      = start_delayed_progress(
      +					_("Computing commit graph topological levels"),
      +					ctx->commits.nr);
      +
     -+	compute_reachable_generation_numbers_1(&info, 1);
     -+
     - 	stop_progress(&ctx->progress);
     - }
     - 
     -+static timestamp_t get_generation_from_graph_data(struct commit *c, void *data)
     -+{
     -+	return commit_graph_data_at(c)->generation;
     -+}
     -+
     -+static void set_generation_v2(struct commit *c, timestamp_t t, void *data)
     -+{
     -+	struct write_commit_graph_context *ctx = data;
     -+	struct commit_graph_data *g = commit_graph_data_at(c);
     -+	g->generation = (uint32_t)t;
     -+	display_progress(ctx->progress, ctx->progress_cnt + 1);
     -+}
     -+
     - static void compute_generation_numbers(struct write_commit_graph_context *ctx)
     - {
     - 	int i;
     --	struct commit_list *list = NULL;
     -+	struct compute_generation_info info = {
     -+		.r = ctx->r,
     -+		.progress = ctx->progress,
     -+		.commits = &ctx->commits,
     -+		.get_generation = get_generation_from_graph_data,
     -+		.set_generation = set_generation_v2,
     -+		.data = ctx,
     -+	};
     - 
     - 	if (ctx->report_progress)
     - 		ctx->progress = start_delayed_progress(
     -@@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph_context *ctx)
     - 		}
     - 	}
     - 
     --	for (i = 0; i < ctx->commits.nr; i++) {
     --		struct commit *c = ctx->commits.list[i];
     --		timestamp_t corrected_commit_date;
     --
     --		repo_parse_commit(ctx->r, c);
     --		corrected_commit_date = commit_graph_data_at(c)->generation;
     --
     --		display_progress(ctx->progress, i + 1);
     --		if (corrected_commit_date != GENERATION_NUMBER_ZERO)
     --			continue;
     --
     --		commit_list_insert(c, &list);
     --		while (list) {
     --			struct commit *current = list->item;
     --			struct commit_list *parent;
     --			int all_parents_computed = 1;
     --			timestamp_t max_corrected_commit_date = 0;
     --
     --			for (parent = current->parents; parent; parent = parent->next) {
     --				repo_parse_commit(ctx->r, parent->item);
     --				corrected_commit_date = commit_graph_data_at(parent->item)->generation;
     --
     --				if (corrected_commit_date == GENERATION_NUMBER_ZERO) {
     --					all_parents_computed = 0;
     --					commit_list_insert(parent->item, &list);
     --					break;
     --				}
     --
     --				if (corrected_commit_date > max_corrected_commit_date)
     --					max_corrected_commit_date = corrected_commit_date;
     --			}
     --
     --			if (all_parents_computed) {
     --				pop_commit(&list);
     --
     --				if (current->date && current->date > max_corrected_commit_date)
     --					max_corrected_commit_date = current->date - 1;
     --				commit_graph_data_at(current)->generation = max_corrected_commit_date + 1;
     --			}
     --		}
     --	}
     -+	compute_reachable_generation_numbers_1(&info, 2);
     - 
     - 	for (i = 0; i < ctx->commits.nr; i++) {
     - 		struct commit *c = ctx->commits.list[i];
     -@@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph_context *ctx)
     - 		if (offset > GENERATION_NUMBER_V2_OFFSET_MAX)
     - 			ctx->num_generation_data_overflows++;
     - 	}
     ++	compute_reachable_generation_numbers(&info, 1);
      +
       	stop_progress(&ctx->progress);
       }
  -:  ----------- >  4:  3fd6c758129 commit-graph: simplify compute_generation_numbers()
  4:  abd3e7a67be !  5:  fed76f0f08e commit-graph: return generation from memory
     @@ Commit message
          for the given commit but the graph_pos indicated the commit was not in
          the commit-graph file.
      
     +    However, an upcoming change will introduce the ability to set generation
     +    values in-memory without writing the commit-graph file. Thus, we can no
     +    longer trust 'graph_pos' to indicate whether or not the generation
     +    member can be trusted.
     +
          Instead, trust the 'generation' member if the commit has a value in the
          slab _and_ the 'generation' member is non-zero. Otherwise, treat it as
          GENERATION_NUMBER_INFINITY.
  5:  e197bddcace !  6:  17a1fc9b15e commit-graph: introduce `ensure_generations_valid()`
     @@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph
      +		.set_generation = set_generation_in_graph_data,
      +	};
      +
     -+	compute_reachable_generation_numbers_1(&info, generation_version);
     ++	compute_reachable_generation_numbers(&info, generation_version);
      +}
      +
       static void trace2_bloom_filter_write_statistics(struct write_commit_graph_context *ctx)
  6:  0fb3913810b !  7:  5d937184a0e commit-reach: implement ahead_behind() logic
     @@ Commit message
      
          The interface for ahead_behind() uses two arrays. The first array of
          commits contains the list of all starting points for the walk. This
     -    includes all tip commits _and_ base commits. The second array, using the
     -    new ahead_behind_count struct, indicates which commits from that initial
     -    array form the base/tip pair for the ahead/behind count it will store.
     +    includes all tip commits _and_ base commits. The second array specifies
     +    base/tip pairs by pointing to commits within the first array, by index.
     +    The second array also stores the resulting ahead/behind counts for each
     +    of these pairs.
      
          This implementation of ahead_behind() allows multiple bases, if desired.
          Even with multiple bases, there is only one commit walk used for
     @@ Commit message
      
          Now, let's discuss the ahead/behind counting algorithm.
      
     -    Each commit in the input commit list is associated with a bit position
     -    indicating "the ith commit can reach this commit". Each of these commits
     -    is associated with a bitmap with its position flipped on and then
     -    placed in a queue for walking commit history. We walk commits by popping
     -    the commit with maximum generation number out of the queue, guaranteeing
     -    that we will never walk a child of that commit in any future steps.
     +    The first array of commits are considered the starting commits. The
     +    index within that array will play a critical role.
     +
     +    We create a new commit slab that maps commits to a bitmap. For a given
     +    commit (anywhere in the history), its bitmap stores information relative
     +    to which of the input commits can reach that commit. The ith bit will be
     +    on if the ith commit from the starting list can reach that commit. It is
     +    important to notice that these bitmaps are not the typical "reachability
     +    bitmaps" that are stored in .bitmap files. Instead of signalling which
     +    objects are reachable from the current commit, they instead signal
     +    "which starting commits can reach me?" It is also important to know that
     +    the bitmap is not necessarily "complete" until we walk that commit. We
     +    will perform a commit walk by generation number in such a way that we
     +    can guarantee the bitmap is correct when we visit that commit.
     +
     +    At the beginning of the ahead_behind() method, we initialize the bitmaps
     +    for each of the starting commits. By enabling the ith bit for the ith
     +    starting commit, we signal "the ith commit can reach itself."
     +
     +    We walk commits by popping the commit with maximum generation number out
     +    of the queue, guaranteeing that we will never walk a child of that
     +    commit in any future steps.
      
          As we walk, we load the bitmap for the current commit and perform two
          main steps. The _second_ step examines each parent of the current commit
     @@ commit-reach.c: struct commit_list *get_reachable_subset(struct commit **from, i
      +	c->object.flags |= PARENT2;
      +}
      +
     -+static struct bitmap *init_bit_array(struct commit *c, int width)
     ++static struct bitmap *get_bit_array(struct commit *c, int width)
      +{
      +	struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c);
      +	if (!*bitmap)
     @@ commit-reach.c: struct commit_list *get_reachable_subset(struct commit **from, i
      +		  struct ahead_behind_count *counts, size_t counts_nr)
      +{
      +	struct prio_queue queue = { .compare = compare_commits_by_gen_then_commit_date };
     -+	size_t width = (commits_nr + BITS_IN_EWORD - 1) / BITS_IN_EWORD;
     ++	size_t width = DIV_ROUND_UP(commits_nr, BITS_IN_EWORD);
      +
      +	if (!commits_nr || !counts_nr)
      +		return;
     @@ commit-reach.c: struct commit_list *get_reachable_subset(struct commit **from, i
      +
      +	for (size_t i = 0; i < commits_nr; i++) {
      +		struct commit *c = commits[i];
     -+		struct bitmap *bitmap = init_bit_array(c, width);
     ++		struct bitmap *bitmap = get_bit_array(c, width);
      +
      +		bitmap_set(bitmap, i);
      +		insert_no_dup(&queue, c);
     @@ commit-reach.c: struct commit_list *get_reachable_subset(struct commit **from, i
      +	while (queue_has_nonstale(&queue)) {
      +		struct commit *c = prio_queue_get(&queue);
      +		struct commit_list *p;
     -+		struct bitmap *bitmap_c = init_bit_array(c, width);
     ++		struct bitmap *bitmap_c = get_bit_array(c, width);
      +
      +		for (size_t i = 0; i < counts_nr; i++) {
      +			int reach_from_tip = !!bitmap_get(bitmap_c, counts[i].tip_index);
     @@ commit-reach.c: struct commit_list *get_reachable_subset(struct commit **from, i
      +
      +			repo_parse_commit(r, p->item);
      +
     -+			bitmap_p = init_bit_array(p->item, width);
     ++			bitmap_p = get_bit_array(p->item, width);
      +			bitmap_or(bitmap_p, bitmap_c);
      +
     ++			/*
     ++			 * If this parent is reachable from every starting
     ++			 * commit, then none of its ancestors can contribute
     ++			 * to the ahead/behind count. Mark it as STALE, so
     ++			 * we can stop the walk when every commit in the
     ++			 * queue is STALE.
     ++			 */
      +			if (bitmap_popcount(bitmap_p) == commits_nr)
      +				p->item->object.flags |= STALE;
      +
  7:  59cf6759e60 !  8:  b8523e2be0b for-each-ref: add ahead-behind format atom
     @@ ref-filter.c: static int rest_atom_parser(struct ref_format *format, struct used
      +static int ahead_behind_atom_parser(struct ref_format *format, struct used_atom *atom,
      +				    const char *arg, struct strbuf *err)
      +{
     ++	struct string_list_item *item;
     ++
      +	if (!arg)
     -+		return strbuf_addf_ret(err, -1, _("expected format: %%(ahead-behind:<ref>)"));
     ++		return strbuf_addf_ret(err, -1, _("expected format: %%(ahead-behind:<committish>)"));
     ++
     ++	item = string_list_append(&format->bases, arg);
     ++	item->util = lookup_commit_reference_by_name(arg);
     ++	if (!item->util)
     ++		die("failed to find '%s'", arg);
      +
     -+	string_list_append(&format->bases, arg);
      +	return 0;
      +}
      +
     @@ ref-filter.c: static void reach_filter(struct ref_array *array,
      +		return;
      +
      +	ALLOC_ARRAY(commits, commits_nr);
     -+	for (size_t i = 0; i < format->bases.nr; i++) {
     -+		const char *name = format->bases.items[i].string;
     -+		commits[i] = lookup_commit_reference_by_name(name);
     -+		if (!commits[i])
     -+			die("failed to find '%s'", name);
     -+	}
     ++	for (size_t i = 0; i < format->bases.nr; i++)
     ++		commits[i] = format->bases.items[i].util;
      +
      +	ALLOC_ARRAY(array->counts, st_mult(format->bases.nr, array->nr));
      +
     @@ t/t6301-for-each-ref-errors.sh: test_expect_success 'Missing objects are reporte
      +test_expect_success 'ahead-behind requires an argument' '
      +	test_must_fail git for-each-ref \
      +		--format="%(ahead-behind)" 2>err &&
     -+	echo "fatal: expected format: %(ahead-behind:<ref>)" >expect &&
     ++	echo "fatal: expected format: %(ahead-behind:<committish>)" >expect &&
      +	test_cmp expect err
      +'
      +
  8:  7476a39331e =  9:  87fe9676aec commit-reach: add tips_reachable_from_bases()

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v4 1/9] for-each-ref: add --stdin option
  2023-03-20 11:26     ` [PATCH v4 0/9] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
@ 2023-03-20 11:26       ` Derrick Stolee via GitGitGadget
  2023-03-20 11:26       ` [PATCH v4 2/9] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
                         ` (7 subsequent siblings)
  8 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-20 11:26 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Jonathan Tan,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When a user wishes to input a large list of patterns to 'git
for-each-ref' (likely a long list of exact refs) there are frequently
system limits on the number of command-line arguments.

Add a new --stdin option to instead read the patterns from standard
input. Add tests that check that any unrecognized arguments are
considered an error when --stdin is provided. Also, an empty pattern
list is interpreted as the complete ref set.

When reading from stdin, we populate the filter.name_patterns array
dynamically as opposed to pointing to the 'argv' array directly. This is
simple when using a strvec, as it is NULL-terminated in the same way. We
then free the memory directly from the strvec.

Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-for-each-ref.txt |  7 +++++-
 builtin/for-each-ref.c             | 23 ++++++++++++++++++-
 t/t6300-for-each-ref.sh            | 37 ++++++++++++++++++++++++++++++
 3 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index 6da899c6296..ccdc2911bb9 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -9,7 +9,8 @@ SYNOPSIS
 --------
 [verse]
 'git for-each-ref' [--count=<count>] [--shell|--perl|--python|--tcl]
-		   [(--sort=<key>)...] [--format=<format>] [<pattern>...]
+		   [(--sort=<key>)...] [--format=<format>]
+		   [ --stdin | <pattern>... ]
 		   [--points-at=<object>]
 		   [--merged[=<object>]] [--no-merged[=<object>]]
 		   [--contains[=<object>]] [--no-contains[=<object>]]
@@ -32,6 +33,10 @@ OPTIONS
 	literally, in the latter case matching completely or from the
 	beginning up to a slash.
 
+--stdin::
+	If `--stdin` is supplied, then the list of patterns is read from
+	standard input instead of from the argument list.
+
 --count=<count>::
 	By default the command shows all refs that match
 	`<pattern>`.  This option makes it stop after showing
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 6f62f40d126..9df16cfb854 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -5,6 +5,7 @@
 #include "object.h"
 #include "parse-options.h"
 #include "ref-filter.h"
+#include "strvec.h"
 
 static char const * const for_each_ref_usage[] = {
 	N_("git for-each-ref [<options>] [<pattern>]"),
@@ -25,6 +26,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	struct ref_format format = REF_FORMAT_INIT;
 	struct strbuf output = STRBUF_INIT;
 	struct strbuf err = STRBUF_INIT;
+	int from_stdin = 0;
+	struct strvec vec = STRVEC_INIT;
 
 	struct option opts[] = {
 		OPT_BIT('s', "shell", &format.quote_style,
@@ -49,6 +52,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 		OPT_CONTAINS(&filter.with_commit, N_("print only refs which contain the commit")),
 		OPT_NO_CONTAINS(&filter.no_commit, N_("print only refs which don't contain the commit")),
 		OPT_BOOL(0, "ignore-case", &icase, N_("sorting and filtering are case insensitive")),
+		OPT_BOOL(0, "stdin", &from_stdin, N_("read reference patterns from stdin")),
 		OPT_END(),
 	};
 
@@ -75,7 +79,23 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
 	filter.ignore_case = icase;
 
-	filter.name_patterns = argv;
+	if (from_stdin) {
+		struct strbuf line = STRBUF_INIT;
+
+		if (argv[0])
+			die(_("unknown arguments supplied with --stdin"));
+
+		while (strbuf_getline(&line, stdin) != EOF)
+			strvec_push(&vec, line.buf);
+
+		strbuf_release(&line);
+
+		/* vec.v is NULL-terminated, just like 'argv'. */
+		filter.name_patterns = vec.v;
+	} else {
+		filter.name_patterns = argv;
+	}
+
 	filter.match_as_path = 1;
 	filter_refs(&array, &filter, FILTER_REFS_ALL);
 	ref_array_sort(sorting, &array);
@@ -97,5 +117,6 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	free_commit_list(filter.with_commit);
 	free_commit_list(filter.no_commit);
 	ref_sorting_release(sorting);
+	strvec_clear(&vec);
 	return 0;
 }
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index c466fd989f1..a58053a54c5 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -1464,4 +1464,41 @@ sig_crlf="$(printf "%s" "$sig" | append_cr; echo dummy)"
 sig_crlf=${sig_crlf%dummy}
 test_atom refs/tags/fake-sig-crlf contents:signature "$sig_crlf"
 
+test_expect_success 'git for-each-ref --stdin: empty' '
+	>in &&
+	git for-each-ref --format="%(refname)" --stdin <in >actual &&
+	git for-each-ref --format="%(refname)" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'git for-each-ref --stdin: fails if extra args' '
+	>in &&
+	test_must_fail git for-each-ref --format="%(refname)" \
+		--stdin refs/heads/extra <in 2>err &&
+	grep "unknown arguments supplied with --stdin" err
+'
+
+test_expect_success 'git for-each-ref --stdin: matches' '
+	cat >in <<-EOF &&
+	refs/tags/multi*
+	refs/heads/amb*
+	EOF
+
+	cat >expect <<-EOF &&
+	refs/heads/ambiguous
+	refs/tags/multi-ref1-100000-user1
+	refs/tags/multi-ref1-100000-user2
+	refs/tags/multi-ref1-200000-user1
+	refs/tags/multi-ref1-200000-user2
+	refs/tags/multi-ref2-100000-user1
+	refs/tags/multi-ref2-100000-user2
+	refs/tags/multi-ref2-200000-user1
+	refs/tags/multi-ref2-200000-user2
+	refs/tags/multiline
+	EOF
+
+	git for-each-ref --format="%(refname)" --stdin <in >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v4 2/9] for-each-ref: explicitly test no matches
  2023-03-20 11:26     ` [PATCH v4 0/9] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
  2023-03-20 11:26       ` [PATCH v4 1/9] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
@ 2023-03-20 11:26       ` Derrick Stolee via GitGitGadget
  2023-03-20 11:26       ` [PATCH v4 3/9] commit-graph: refactor compute_topological_levels() Derrick Stolee via GitGitGadget
                         ` (6 subsequent siblings)
  8 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-20 11:26 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Jonathan Tan,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The for-each-ref builtin can take a list of ref patterns, but if none
match, it still succeeds (but with no output). Add an explicit test that
demonstrates that behavior.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t6300-for-each-ref.sh | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index a58053a54c5..6614469d2d6 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -1501,4 +1501,17 @@ test_expect_success 'git for-each-ref --stdin: matches' '
 	test_cmp expect actual
 '
 
+test_expect_success 'git for-each-ref with non-existing refs' '
+	cat >in <<-EOF &&
+	refs/heads/this-ref-does-not-exist
+	refs/tags/bogus
+	EOF
+
+	git for-each-ref --format="%(refname)" --stdin <in >actual &&
+	test_must_be_empty actual &&
+
+	xargs git for-each-ref --format="%(refname)" <in >actual &&
+	test_must_be_empty actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v4 3/9] commit-graph: refactor compute_topological_levels()
  2023-03-20 11:26     ` [PATCH v4 0/9] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
  2023-03-20 11:26       ` [PATCH v4 1/9] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
  2023-03-20 11:26       ` [PATCH v4 2/9] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
@ 2023-03-20 11:26       ` Derrick Stolee via GitGitGadget
  2023-03-20 11:26       ` [PATCH v4 4/9] commit-graph: simplify compute_generation_numbers() Derrick Stolee via GitGitGadget
                         ` (5 subsequent siblings)
  8 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-20 11:26 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Jonathan Tan,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

This patch extracts the common code used to compute topological levels
and corrected committer dates into a common routine,
compute_reachable_generation_numbers(). For ease of reading, it only
modifies compute_topological_levels() to use this new routine, leaving
compute_generation_numbers() to be modified in the next change.

This new routine dispatches to call the necessary functions to get and
set the generation number for a given commit through a vtable (the
compute_generation_info struct).

Computing the generation number itself is done in
compute_generation_from_max(), which dispatches its implementation based
on the generation version requested, or issuing a BUG() for unrecognized
generation versions. This does not use a vtable because the logic
depends only on the generation number version, not where the data is
being loaded from or being stored to. This is a subtle point that will
make more sense in a future change that modifies the in-memory
generation values instead of just preparing values for writing to a
commit-graph file.

This change looks like it adds a lot of new code. However, two upcoming
changes will be quite small due to the work being done in this change.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-graph.c | 106 ++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 83 insertions(+), 23 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index c11b59f28b3..4356c8c1f4b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1446,24 +1446,52 @@ static void close_reachable(struct write_commit_graph_context *ctx)
 	stop_progress(&ctx->progress);
 }
 
-static void compute_topological_levels(struct write_commit_graph_context *ctx)
+struct compute_generation_info {
+	struct repository *r;
+	struct packed_commit_list *commits;
+	struct progress *progress;
+	int progress_cnt;
+
+	timestamp_t (*get_generation)(struct commit *c, void *data);
+	void (*set_generation)(struct commit *c, timestamp_t gen, void *data);
+	void *data;
+};
+
+static timestamp_t compute_generation_from_max(struct commit *c,
+					       timestamp_t max_gen,
+					       int generation_version)
+{
+	switch (generation_version) {
+	case 1: /* topological levels */
+		if (max_gen > GENERATION_NUMBER_V1_MAX - 1)
+			max_gen = GENERATION_NUMBER_V1_MAX - 1;
+		return max_gen + 1;
+
+	case 2: /* corrected commit date */
+		if (c->date && c->date > max_gen)
+			max_gen = c->date - 1;
+		return max_gen + 1;
+
+	default:
+		BUG("attempting unimplemented version");
+	}
+}
+
+static void compute_reachable_generation_numbers(
+			struct compute_generation_info *info,
+			int generation_version)
 {
 	int i;
 	struct commit_list *list = NULL;
 
-	if (ctx->report_progress)
-		ctx->progress = start_delayed_progress(
-					_("Computing commit graph topological levels"),
-					ctx->commits.nr);
-	for (i = 0; i < ctx->commits.nr; i++) {
-		struct commit *c = ctx->commits.list[i];
-		uint32_t level;
-
-		repo_parse_commit(ctx->r, c);
-		level = *topo_level_slab_at(ctx->topo_levels, c);
+	for (i = 0; i < info->commits->nr; i++) {
+		struct commit *c = info->commits->list[i];
+		timestamp_t gen;
+		repo_parse_commit(info->r, c);
+		gen = info->get_generation(c, info->data);
+		display_progress(info->progress, info->progress_cnt + 1);
 
-		display_progress(ctx->progress, i + 1);
-		if (level != GENERATION_NUMBER_ZERO)
+		if (gen != GENERATION_NUMBER_ZERO && gen != GENERATION_NUMBER_INFINITY)
 			continue;
 
 		commit_list_insert(c, &list);
@@ -1471,31 +1499,63 @@ static void compute_topological_levels(struct write_commit_graph_context *ctx)
 			struct commit *current = list->item;
 			struct commit_list *parent;
 			int all_parents_computed = 1;
-			uint32_t max_level = 0;
+			uint32_t max_gen = 0;
 
 			for (parent = current->parents; parent; parent = parent->next) {
-				repo_parse_commit(ctx->r, parent->item);
-				level = *topo_level_slab_at(ctx->topo_levels, parent->item);
+				repo_parse_commit(info->r, parent->item);
+				gen = info->get_generation(parent->item, info->data);
 
-				if (level == GENERATION_NUMBER_ZERO) {
+				if (gen == GENERATION_NUMBER_ZERO) {
 					all_parents_computed = 0;
 					commit_list_insert(parent->item, &list);
 					break;
 				}
 
-				if (level > max_level)
-					max_level = level;
+				if (gen > max_gen)
+					max_gen = gen;
 			}
 
 			if (all_parents_computed) {
 				pop_commit(&list);
-
-				if (max_level > GENERATION_NUMBER_V1_MAX - 1)
-					max_level = GENERATION_NUMBER_V1_MAX - 1;
-				*topo_level_slab_at(ctx->topo_levels, current) = max_level + 1;
+				gen = compute_generation_from_max(
+						current, max_gen,
+						generation_version);
+				info->set_generation(current, gen, info->data);
 			}
 		}
 	}
+}
+
+static timestamp_t get_topo_level(struct commit *c, void *data)
+{
+	struct write_commit_graph_context *ctx = data;
+	return *topo_level_slab_at(ctx->topo_levels, c);
+}
+
+static void set_topo_level(struct commit *c, timestamp_t t, void *data)
+{
+	struct write_commit_graph_context *ctx = data;
+	*topo_level_slab_at(ctx->topo_levels, c) = (uint32_t)t;
+}
+
+static void compute_topological_levels(struct write_commit_graph_context *ctx)
+{
+	struct compute_generation_info info = {
+		.r = ctx->r,
+		.commits = &ctx->commits,
+		.get_generation = get_topo_level,
+		.set_generation = set_topo_level,
+		.data = ctx,
+	};
+
+	if (ctx->report_progress)
+		info.progress = ctx->progress
+			      = start_delayed_progress(
+					_("Computing commit graph topological levels"),
+					ctx->commits.nr);
+
+	compute_reachable_generation_numbers(&info, 1);
+
 	stop_progress(&ctx->progress);
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v4 4/9] commit-graph: simplify compute_generation_numbers()
  2023-03-20 11:26     ` [PATCH v4 0/9] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
                         ` (2 preceding siblings ...)
  2023-03-20 11:26       ` [PATCH v4 3/9] commit-graph: refactor compute_topological_levels() Derrick Stolee via GitGitGadget
@ 2023-03-20 11:26       ` Derrick Stolee via GitGitGadget
  2023-03-20 11:26       ` [PATCH v4 5/9] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
                         ` (4 subsequent siblings)
  8 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-20 11:26 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Jonathan Tan,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The previous change introduced the generic algorithm
compute_reachable_generation_numbers() and used it as the core
functionality of compute_topological_levels(). Now, use it as the core
functionality of compute_generation_numbers().

The main difference here is that we use generation version 2, which is
used in to toggle the logic in compute_generation_from_max() for
computing the corrected commit date based on the corrected commit dates
of the parent commits (and the commit date of the current commit). It
also uses different methods for (get|set)_generation in the vtable in
order to store and access the value in the correct places.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-graph.c | 64 +++++++++++++++++---------------------------------
 1 file changed, 21 insertions(+), 43 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 4356c8c1f4b..d1c98681e88 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1559,13 +1559,31 @@ static void compute_topological_levels(struct write_commit_graph_context *ctx)
 	stop_progress(&ctx->progress);
 }
 
+static timestamp_t get_generation_from_graph_data(struct commit *c, void *data)
+{
+	return commit_graph_data_at(c)->generation;
+}
+
+static void set_generation_v2(struct commit *c, timestamp_t t, void *data)
+{
+	struct commit_graph_data *g = commit_graph_data_at(c);
+	g->generation = (uint32_t)t;
+}
+
 static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 {
 	int i;
-	struct commit_list *list = NULL;
+	struct compute_generation_info info = {
+		.r = ctx->r,
+		.commits = &ctx->commits,
+		.get_generation = get_generation_from_graph_data,
+		.set_generation = set_generation_v2,
+		.data = ctx,
+	};
 
 	if (ctx->report_progress)
-		ctx->progress = start_delayed_progress(
+		info.progress = ctx->progress
+			      = start_delayed_progress(
 					_("Computing commit graph generation numbers"),
 					ctx->commits.nr);
 
@@ -1577,47 +1595,7 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 		}
 	}
 
-	for (i = 0; i < ctx->commits.nr; i++) {
-		struct commit *c = ctx->commits.list[i];
-		timestamp_t corrected_commit_date;
-
-		repo_parse_commit(ctx->r, c);
-		corrected_commit_date = commit_graph_data_at(c)->generation;
-
-		display_progress(ctx->progress, i + 1);
-		if (corrected_commit_date != GENERATION_NUMBER_ZERO)
-			continue;
-
-		commit_list_insert(c, &list);
-		while (list) {
-			struct commit *current = list->item;
-			struct commit_list *parent;
-			int all_parents_computed = 1;
-			timestamp_t max_corrected_commit_date = 0;
-
-			for (parent = current->parents; parent; parent = parent->next) {
-				repo_parse_commit(ctx->r, parent->item);
-				corrected_commit_date = commit_graph_data_at(parent->item)->generation;
-
-				if (corrected_commit_date == GENERATION_NUMBER_ZERO) {
-					all_parents_computed = 0;
-					commit_list_insert(parent->item, &list);
-					break;
-				}
-
-				if (corrected_commit_date > max_corrected_commit_date)
-					max_corrected_commit_date = corrected_commit_date;
-			}
-
-			if (all_parents_computed) {
-				pop_commit(&list);
-
-				if (current->date && current->date > max_corrected_commit_date)
-					max_corrected_commit_date = current->date - 1;
-				commit_graph_data_at(current)->generation = max_corrected_commit_date + 1;
-			}
-		}
-	}
+	compute_reachable_generation_numbers(&info, 2);
 
 	for (i = 0; i < ctx->commits.nr; i++) {
 		struct commit *c = ctx->commits.list[i];
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v4 5/9] commit-graph: return generation from memory
  2023-03-20 11:26     ` [PATCH v4 0/9] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
                         ` (3 preceding siblings ...)
  2023-03-20 11:26       ` [PATCH v4 4/9] commit-graph: simplify compute_generation_numbers() Derrick Stolee via GitGitGadget
@ 2023-03-20 11:26       ` Derrick Stolee via GitGitGadget
  2023-03-20 11:26       ` [PATCH v4 6/9] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
                         ` (3 subsequent siblings)
  8 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-20 11:26 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Jonathan Tan,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The commit_graph_generation() method used to report a value of
GENERATION_NUMBER_INFINITY if the commit_graph_data_slab had an instance
for the given commit but the graph_pos indicated the commit was not in
the commit-graph file.

However, an upcoming change will introduce the ability to set generation
values in-memory without writing the commit-graph file. Thus, we can no
longer trust 'graph_pos' to indicate whether or not the generation
member can be trusted.

Instead, trust the 'generation' member if the commit has a value in the
slab _and_ the 'generation' member is non-zero. Otherwise, treat it as
GENERATION_NUMBER_INFINITY.

This only makes a difference for a very old case for the commit-graph:
the very first Git release to write commit-graph files wrote zeroes in
the topological level positions. If we are parsing a commit-graph with
all zeroes, those commits will now appear to have
GENERATION_NUMBER_INFINITY (as if they were not parsed from the
commit-graph).

I attempted several variations to work around the need for providing an
uninitialized 'generation' member, but this was the best one I found. It
does require a change to a verification test in t5318 because it reports
a different error than the one about non-zero generation numbers.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-graph.c          | 8 +++-----
 t/t5318-commit-graph.sh | 2 +-
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index d1c98681e88..63a56483cf6 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -116,12 +116,10 @@ timestamp_t commit_graph_generation(const struct commit *c)
 	struct commit_graph_data *data =
 		commit_graph_data_slab_peek(&commit_graph_data_slab, c);
 
-	if (!data)
-		return GENERATION_NUMBER_INFINITY;
-	else if (data->graph_pos == COMMIT_NOT_FROM_GRAPH)
-		return GENERATION_NUMBER_INFINITY;
+	if (data && data->generation)
+		return data->generation;
 
-	return data->generation;
+	return GENERATION_NUMBER_INFINITY;
 }
 
 static struct commit_graph_data *commit_graph_data_at(const struct commit *c)
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 049c5fc8ead..b6e12115786 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -630,7 +630,7 @@ test_expect_success 'detect incorrect generation number' '
 
 test_expect_success 'detect incorrect generation number' '
 	corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\01" \
-		"non-zero generation number"
+		"commit-graph generation for commit"
 '
 
 test_expect_success 'detect incorrect commit date' '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v4 6/9] commit-graph: introduce `ensure_generations_valid()`
  2023-03-20 11:26     ` [PATCH v4 0/9] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
                         ` (4 preceding siblings ...)
  2023-03-20 11:26       ` [PATCH v4 5/9] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
@ 2023-03-20 11:26       ` Taylor Blau via GitGitGadget
  2023-03-20 11:26       ` [PATCH v4 7/9] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
                         ` (2 subsequent siblings)
  8 siblings, 0 replies; 90+ messages in thread
From: Taylor Blau via GitGitGadget @ 2023-03-20 11:26 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Jonathan Tan,
	Derrick Stolee, Taylor Blau

From: Taylor Blau <me@ttaylorr.com>

Use the just-introduced compute_reachable_generation_numbers_1() to
implement a function which dynamically computes topological levels (or
corrected commit dates) for out-of-graph commits.

This will be useful for the ahead-behind algorithm we are about to
introduce, which needs accurate topological levels on _all_ commits
reachable from the tips in order to avoid over-counting.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-graph.c | 29 +++++++++++++++++++++++++++++
 commit-graph.h |  8 ++++++++
 2 files changed, 37 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 63a56483cf6..172e679db19 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1604,6 +1604,35 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
 	stop_progress(&ctx->progress);
 }
 
+static void set_generation_in_graph_data(struct commit *c, timestamp_t t,
+					 void *data)
+{
+	commit_graph_data_at(c)->generation = t;
+}
+
+/*
+ * After this method, all commits reachable from those in the given
+ * list will have non-zero, non-infinite generation numbers.
+ */
+void ensure_generations_valid(struct repository *r,
+			      struct commit **commits, size_t nr)
+{
+	int generation_version = get_configured_generation_version(r);
+	struct packed_commit_list list = {
+		.list = commits,
+		.alloc = nr,
+		.nr = nr,
+	};
+	struct compute_generation_info info = {
+		.r = r,
+		.commits = &list,
+		.get_generation = get_generation_from_graph_data,
+		.set_generation = set_generation_in_graph_data,
+	};
+
+	compute_reachable_generation_numbers(&info, generation_version);
+}
+
 static void trace2_bloom_filter_write_statistics(struct write_commit_graph_context *ctx)
 {
 	trace2_data_intmax("commit-graph", ctx->r, "filter-computed",
diff --git a/commit-graph.h b/commit-graph.h
index 37faee6b66d..73e182ab2d0 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -190,4 +190,12 @@ struct commit_graph_data {
  */
 timestamp_t commit_graph_generation(const struct commit *);
 uint32_t commit_graph_position(const struct commit *);
+
+/*
+ * After this method, all commits reachable from those in the given
+ * list will have non-zero, non-infinite generation numbers.
+ */
+void ensure_generations_valid(struct repository *r,
+			      struct commit **commits, size_t nr);
+
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v4 7/9] commit-reach: implement ahead_behind() logic
  2023-03-20 11:26     ` [PATCH v4 0/9] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
                         ` (5 preceding siblings ...)
  2023-03-20 11:26       ` [PATCH v4 6/9] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
@ 2023-03-20 11:26       ` Derrick Stolee via GitGitGadget
  2023-03-20 20:40         ` Jonathan Tan
  2023-03-20 11:26       ` [PATCH v4 8/9] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
  2023-03-20 11:26       ` [PATCH v4 9/9] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget
  8 siblings, 1 reply; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-20 11:26 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Jonathan Tan,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Fully implement the commit-counting logic required to determine
ahead/behind counts for a batch of commit pairs. This is a new library
method within commit-reach.h. This method will be linked to the
for-each-ref builtin in the next change.

The interface for ahead_behind() uses two arrays. The first array of
commits contains the list of all starting points for the walk. This
includes all tip commits _and_ base commits. The second array specifies
base/tip pairs by pointing to commits within the first array, by index.
The second array also stores the resulting ahead/behind counts for each
of these pairs.

This implementation of ahead_behind() allows multiple bases, if desired.
Even with multiple bases, there is only one commit walk used for
counting the ahead/behind values, saving time when the base/tip ranges
overlap significantly.

This interface for ahead_behind() also makes it very easy to call
ensure_generations_valid() on the entire array of bases and tips. This
call is necessary because it is critical that the walk that counts
ahead/behind values never walks a commit more than once. Without
generation numbers on every commit, there is a possibility that a
commit date skew could cause the walk to revisit a commit and then
double-count it. For this reason, it is strongly recommended that 'git
ahead-behind' is only run in a repository with a commit-graph file that
covers most of the reachable commits, storing precomputed generation
numbers. If no commit-graph exists, this walk will be much slower as it
must walk all reachable commits in ensure_generations_valid() before
performing the counting logic.

It is possible to detect if generation numbers are available at run time
and redirect the implementation to another algorithm that does not
require this property. However, that implementation requires a commit
walk per base/tip pair _and_ can be slower due to the commit date
heuristics required. Such an implementation could be considered in the
future if there is a reason to include it, but most Git hosts should
already be generating a commit-graph file as part of repository
maintenance. Most Git clients should also be generating commit-graph
files as part of background maintenance or automatic GCs.

Now, let's discuss the ahead/behind counting algorithm.

The first array of commits are considered the starting commits. The
index within that array will play a critical role.

We create a new commit slab that maps commits to a bitmap. For a given
commit (anywhere in the history), its bitmap stores information relative
to which of the input commits can reach that commit. The ith bit will be
on if the ith commit from the starting list can reach that commit. It is
important to notice that these bitmaps are not the typical "reachability
bitmaps" that are stored in .bitmap files. Instead of signalling which
objects are reachable from the current commit, they instead signal
"which starting commits can reach me?" It is also important to know that
the bitmap is not necessarily "complete" until we walk that commit. We
will perform a commit walk by generation number in such a way that we
can guarantee the bitmap is correct when we visit that commit.

At the beginning of the ahead_behind() method, we initialize the bitmaps
for each of the starting commits. By enabling the ith bit for the ith
starting commit, we signal "the ith commit can reach itself."

We walk commits by popping the commit with maximum generation number out
of the queue, guaranteeing that we will never walk a child of that
commit in any future steps.

As we walk, we load the bitmap for the current commit and perform two
main steps. The _second_ step examines each parent of the current commit
and adds the current commit's bitmap bits to each parent's bitmap. (We
create a new bitmap for the parent if this is our first time seeing that
parent.) After adding the bits to the parent's bitmap, the parent is
added to the walk queue. Due to this passing of bits to parents, the
current commit has a guarantee that the ith bit is enabled on its bitmap
if and only if the ith commit can reach the current commit.

The first step of the walk is to examine the bitmask on the current
commit and decide which ranges the commit is in or not. Due to the "bit
pushing" in the second step, we have a guarantee that the ith bit of the
current commit's bitmap is on if and only if the ith starting commit can
reach it. For each ahead_behind_count struct, check the base_index and
tip_index to see if those bits are enabled on the current bitmap. If
exactly one bit is enabled, then increment the corresponding 'ahead' or
'behind' count.  This increment is the reason we _absolutely need_ to
walk commits at most once.

The only subtle thing to do with this walk is to check to see if a
parent has all bits on in its bitmap, in which case it becomes "stale"
and is marked with the STALE bit. This allows queue_has_nonstale() to be
the terminating condition of the walk, which greatly reduces the number
of commits walked if all of the commits are nearby in history. It avoids
walking a large number of common commits when there is a deep history.
We also use the helper method insert_no_dup() to add commits to the
priority queue without adding them multiple times. This uses the PARENT2
flag. Thus, we must clear both the STALE and PARENT2 bits of all
commits, in case ahead_behind() is called multiple times in the same
process.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-reach.c | 103 +++++++++++++++++++++++++++++++++++++++++++++++++
 commit-reach.h |  31 +++++++++++++++
 2 files changed, 134 insertions(+)

diff --git a/commit-reach.c b/commit-reach.c
index 2e33c599a82..cd990dce16a 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -8,6 +8,7 @@
 #include "revision.h"
 #include "tag.h"
 #include "commit-reach.h"
+#include "ewah/ewok.h"
 
 /* Remember to update object flag allocation in object.h */
 #define PARENT1		(1u<<16)
@@ -941,3 +942,105 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 
 	return found_commits;
 }
+
+define_commit_slab(bit_arrays, struct bitmap *);
+static struct bit_arrays bit_arrays;
+
+static void insert_no_dup(struct prio_queue *queue, struct commit *c)
+{
+	if (c->object.flags & PARENT2)
+		return;
+	prio_queue_put(queue, c);
+	c->object.flags |= PARENT2;
+}
+
+static struct bitmap *get_bit_array(struct commit *c, int width)
+{
+	struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c);
+	if (!*bitmap)
+		*bitmap = bitmap_word_alloc(width);
+	return *bitmap;
+}
+
+static void free_bit_array(struct commit *c)
+{
+	struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c);
+	if (!*bitmap)
+		return;
+	bitmap_free(*bitmap);
+	*bitmap = NULL;
+}
+
+void ahead_behind(struct repository *r,
+		  struct commit **commits, size_t commits_nr,
+		  struct ahead_behind_count *counts, size_t counts_nr)
+{
+	struct prio_queue queue = { .compare = compare_commits_by_gen_then_commit_date };
+	size_t width = DIV_ROUND_UP(commits_nr, BITS_IN_EWORD);
+
+	if (!commits_nr || !counts_nr)
+		return;
+
+	for (size_t i = 0; i < counts_nr; i++) {
+		counts[i].ahead = 0;
+		counts[i].behind = 0;
+	}
+
+	ensure_generations_valid(r, commits, commits_nr);
+
+	init_bit_arrays(&bit_arrays);
+
+	for (size_t i = 0; i < commits_nr; i++) {
+		struct commit *c = commits[i];
+		struct bitmap *bitmap = get_bit_array(c, width);
+
+		bitmap_set(bitmap, i);
+		insert_no_dup(&queue, c);
+	}
+
+	while (queue_has_nonstale(&queue)) {
+		struct commit *c = prio_queue_get(&queue);
+		struct commit_list *p;
+		struct bitmap *bitmap_c = get_bit_array(c, width);
+
+		for (size_t i = 0; i < counts_nr; i++) {
+			int reach_from_tip = !!bitmap_get(bitmap_c, counts[i].tip_index);
+			int reach_from_base = !!bitmap_get(bitmap_c, counts[i].base_index);
+
+			if (reach_from_tip ^ reach_from_base) {
+				if (reach_from_base)
+					counts[i].behind++;
+				else
+					counts[i].ahead++;
+			}
+		}
+
+		for (p = c->parents; p; p = p->next) {
+			struct bitmap *bitmap_p;
+
+			repo_parse_commit(r, p->item);
+
+			bitmap_p = get_bit_array(p->item, width);
+			bitmap_or(bitmap_p, bitmap_c);
+
+			/*
+			 * If this parent is reachable from every starting
+			 * commit, then none of its ancestors can contribute
+			 * to the ahead/behind count. Mark it as STALE, so
+			 * we can stop the walk when every commit in the
+			 * queue is STALE.
+			 */
+			if (bitmap_popcount(bitmap_p) == commits_nr)
+				p->item->object.flags |= STALE;
+
+			insert_no_dup(&queue, p->item);
+		}
+
+		free_bit_array(c);
+	}
+
+	/* STALE is used here, PARENT2 is used by insert_no_dup(). */
+	repo_clear_commit_marks(r, PARENT2 | STALE);
+	clear_bit_arrays(&bit_arrays);
+	clear_prio_queue(&queue);
+}
diff --git a/commit-reach.h b/commit-reach.h
index 148b56fea50..f708c46e523 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -104,4 +104,35 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 					 struct commit **to, int nr_to,
 					 unsigned int reachable_flag);
 
+struct ahead_behind_count {
+	/**
+	 * As input, the *_index members indicate which positions in
+	 * the 'tips' array correspond to the tip and base of this
+	 * comparison.
+	 */
+	size_t tip_index;
+	size_t base_index;
+
+	/**
+	 * These values store the computed counts for each side of the
+	 * symmetric difference:
+	 *
+	 * 'ahead' stores the number of commits reachable from the tip
+	 * and not reachable from the base.
+	 *
+	 * 'behind' stores the number of commits reachable from the base
+	 * and not reachable from the tip.
+	 */
+	unsigned int ahead;
+	unsigned int behind;
+};
+
+/*
+ * Given an array of commits and an array of ahead_behind_count pairs,
+ * compute the ahead/behind counts for each pair.
+ */
+void ahead_behind(struct repository *r,
+		  struct commit **commits, size_t commits_nr,
+		  struct ahead_behind_count *counts, size_t counts_nr);
+
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v4 8/9] for-each-ref: add ahead-behind format atom
  2023-03-20 11:26     ` [PATCH v4 0/9] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
                         ` (6 preceding siblings ...)
  2023-03-20 11:26       ` [PATCH v4 7/9] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
@ 2023-03-20 11:26       ` Derrick Stolee via GitGitGadget
  2023-03-20 11:26       ` [PATCH v4 9/9] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget
  8 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-20 11:26 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Jonathan Tan,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The previous change implemented the ahead_behind() method, including an
algorithm to compute the ahead/behind values for a number of commit tips
relative to a number of commit bases. Now, integrate that algorithm as
part of 'git for-each-ref' hidden behind a new format atom,
ahead-behind. This naturally extends to 'git branch' and 'git tag'
builtins, as well.

This format allows specifying multiple bases, if so desired, and all
matching references are compared against all of those bases. For this
reason, failing to read a reference provided from these atoms results in
an error.

In order to translate the ahead_behind() method information to the
format output code in ref-filter.c, we must populate arrays of
ahead_behind_count structs. In struct ref_array, we store the full array
that will be passed to ahead_behind(). In struct ref_array_item, we
store an array of pointers that point to the relvant items within the
full array. In this way, we can pull all relevant ahead/behind values
directly when formatting output for a specific item. It also ensures the
lifetime of the ahead_behind_count structs matches the time that the
array is being used.

Add specific tests of the ahead/behind counts in t6600-test-reach.sh, as
it has an interesting repository shape. In particular, its merging
strategy and its use of different commit-graphs would demonstrate over-
counting if the ahead_behind() method did not already account for that
possibility.

Also add tests for the specific for-each-ref, branch, and tag builtins.
In the case of 'git tag', there are intersting cases that happen when
some of the selected tips are not commits. This requires careful logic
around commits_nr in the second loop of filter_ahead_behind(). Also, the
test in t7004 is carefully located to avoid being dependent on the GPG
prereq. It also avoids using the test_commit helper, as that will add
ticks to the time and disrupt the expected timestamps in later tag
tests.

Also add performance tests in a new p1300-graph-walks.sh script. This
will be useful for more uses in the future, but for now compare the
ahead-behind counting algorithm in 'git for-each-ref' to the naive
implementation by running 'git rev-list --count' processes for each
input.

For the Git source code repository, the improvement is already obvious:

Test                                            this tree
---------------------------------------------------------------
1500.2: ahead-behind counts: git for-each-ref   0.07(0.07+0.00)
1500.3: ahead-behind counts: git branch         0.07(0.06+0.00)
1500.4: ahead-behind counts: git tag            0.07(0.06+0.00)
1500.5: ahead-behind counts: git rev-list       1.32(1.04+0.27)

But the standard performance benchmark is the Linux kernel repository,
which demosntrates a significant improvement:

Test                                            this tree
---------------------------------------------------------------
1500.2: ahead-behind counts: git for-each-ref   0.27(0.24+0.02)
1500.3: ahead-behind counts: git branch         0.27(0.24+0.03)
1500.4: ahead-behind counts: git tag            0.28(0.27+0.01)
1500.5: ahead-behind counts: git rev-list       4.57(4.03+0.54)

The 'git rev-list' test exists in this change as a demonstration, but it
will be removed in the next change to avoid wasting time on this
comparison.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/git-for-each-ref.txt |  5 ++
 builtin/branch.c                   |  1 +
 builtin/for-each-ref.c             |  3 ++
 builtin/tag.c                      |  1 +
 ref-filter.c                       | 73 +++++++++++++++++++++++++
 ref-filter.h                       | 26 ++++++++-
 t/perf/p1500-graph-walks.sh        | 45 ++++++++++++++++
 t/t3203-branch-output.sh           | 14 +++++
 t/t6301-for-each-ref-errors.sh     | 14 +++++
 t/t6600-test-reach.sh              | 86 ++++++++++++++++++++++++++++++
 t/t7004-tag.sh                     | 28 ++++++++++
 11 files changed, 295 insertions(+), 1 deletion(-)
 create mode 100755 t/perf/p1500-graph-walks.sh

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index ccdc2911bb9..0713e49b499 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -222,6 +222,11 @@ worktreepath::
 	out, if it is checked out in any linked worktree. Empty string
 	otherwise.
 
+ahead-behind:<committish>::
+	Two integers, separated by a space, demonstrating the number of
+	commits ahead and behind, respectively, when comparing the output
+	ref to the `<committish>` specified in the format.
+
 In addition to the above, for commit and tag objects, the header
 field names (`tree`, `parent`, `object`, `type`, and `tag`) can
 be used to specify the value in the header field.
diff --git a/builtin/branch.c b/builtin/branch.c
index f63fd45edb9..0554d7cebb3 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -448,6 +448,7 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin
 	if (verify_ref_format(format))
 		die(_("unable to parse format string"));
 
+	filter_ahead_behind(the_repository, format, &array);
 	ref_array_sort(sorting, &array);
 
 	for (i = 0; i < array.nr; i++) {
diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 9df16cfb854..6b3d07ef409 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -6,6 +6,7 @@
 #include "parse-options.h"
 #include "ref-filter.h"
 #include "strvec.h"
+#include "commit-reach.h"
 
 static char const * const for_each_ref_usage[] = {
 	N_("git for-each-ref [<options>] [<pattern>]"),
@@ -98,6 +99,8 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 
 	filter.match_as_path = 1;
 	filter_refs(&array, &filter, FILTER_REFS_ALL);
+	filter_ahead_behind(the_repository, &format, &array);
+
 	ref_array_sort(sorting, &array);
 
 	if (!maxcount || array.nr < maxcount)
diff --git a/builtin/tag.c b/builtin/tag.c
index d428c45dc8d..1b3f49d7b4c 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -66,6 +66,7 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 		die(_("unable to parse format string"));
 	filter->with_commit_tag_algo = 1;
 	filter_refs(&array, filter, FILTER_REFS_TAGS);
+	filter_ahead_behind(the_repository, format, &array);
 	ref_array_sort(sorting, &array);
 
 	for (i = 0; i < array.nr; i++) {
diff --git a/ref-filter.c b/ref-filter.c
index f8203c6b052..62135f649ec 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -158,6 +158,7 @@ enum atom_type {
 	ATOM_THEN,
 	ATOM_ELSE,
 	ATOM_REST,
+	ATOM_AHEADBEHIND,
 };
 
 /*
@@ -586,6 +587,22 @@ static int rest_atom_parser(struct ref_format *format, struct used_atom *atom,
 	return 0;
 }
 
+static int ahead_behind_atom_parser(struct ref_format *format, struct used_atom *atom,
+				    const char *arg, struct strbuf *err)
+{
+	struct string_list_item *item;
+
+	if (!arg)
+		return strbuf_addf_ret(err, -1, _("expected format: %%(ahead-behind:<committish>)"));
+
+	item = string_list_append(&format->bases, arg);
+	item->util = lookup_commit_reference_by_name(arg);
+	if (!item->util)
+		die("failed to find '%s'", arg);
+
+	return 0;
+}
+
 static int head_atom_parser(struct ref_format *format, struct used_atom *atom,
 			    const char *arg, struct strbuf *err)
 {
@@ -645,6 +662,7 @@ static struct {
 	[ATOM_THEN] = { "then", SOURCE_NONE },
 	[ATOM_ELSE] = { "else", SOURCE_NONE },
 	[ATOM_REST] = { "rest", SOURCE_NONE, FIELD_STR, rest_atom_parser },
+	[ATOM_AHEADBEHIND] = { "ahead-behind", SOURCE_OTHER, FIELD_STR, ahead_behind_atom_parser },
 	/*
 	 * Please update $__git_ref_fieldlist in git-completion.bash
 	 * when you add new atoms
@@ -1848,6 +1866,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 	struct object *obj;
 	int i;
 	struct object_info empty = OBJECT_INFO_INIT;
+	int ahead_behind_atoms = 0;
 
 	CALLOC_ARRAY(ref->value, used_atom_cnt);
 
@@ -1978,6 +1997,16 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 			else
 				v->s = xstrdup("");
 			continue;
+		} else if (atom_type == ATOM_AHEADBEHIND) {
+			if (ref->counts) {
+				const struct ahead_behind_count *count;
+				count = ref->counts[ahead_behind_atoms++];
+				v->s = xstrfmt("%d %d", count->ahead, count->behind);
+			} else {
+				/* Not a commit. */
+				v->s = xstrdup("");
+			}
+			continue;
 		} else
 			continue;
 
@@ -2328,6 +2357,7 @@ static void free_array_item(struct ref_array_item *item)
 			free((char *)item->value[i].s);
 		free(item->value);
 	}
+	free(item->counts);
 	free(item);
 }
 
@@ -2356,6 +2386,8 @@ void ref_array_clear(struct ref_array *array)
 		free_worktrees(ref_to_worktree_map.worktrees);
 		ref_to_worktree_map.worktrees = NULL;
 	}
+
+	FREE_AND_NULL(array->counts);
 }
 
 #define EXCLUDE_REACHED 0
@@ -2418,6 +2450,47 @@ static void reach_filter(struct ref_array *array,
 	free(to_clear);
 }
 
+void filter_ahead_behind(struct repository *r,
+			 struct ref_format *format,
+			 struct ref_array *array)
+{
+	struct commit **commits;
+	size_t commits_nr = format->bases.nr + array->nr;
+
+	if (!format->bases.nr || !array->nr)
+		return;
+
+	ALLOC_ARRAY(commits, commits_nr);
+	for (size_t i = 0; i < format->bases.nr; i++)
+		commits[i] = format->bases.items[i].util;
+
+	ALLOC_ARRAY(array->counts, st_mult(format->bases.nr, array->nr));
+
+	commits_nr = format->bases.nr;
+	array->counts_nr = 0;
+	for (size_t i = 0; i < array->nr; i++) {
+		const char *name = array->items[i]->refname;
+		commits[commits_nr] = lookup_commit_reference_by_name(name);
+
+		if (!commits[commits_nr])
+			continue;
+
+		CALLOC_ARRAY(array->items[i]->counts, format->bases.nr);
+		for (size_t j = 0; j < format->bases.nr; j++) {
+			struct ahead_behind_count *count;
+			count = &array->counts[array->counts_nr++];
+			count->tip_index = commits_nr;
+			count->base_index = j;
+
+			array->items[i]->counts[j] = count;
+		}
+		commits_nr++;
+	}
+
+	ahead_behind(r, commits, commits_nr, array->counts, array->counts_nr);
+	free(commits);
+}
+
 /*
  * API for filtering a set of refs. Based on the type of refs the user
  * has requested, we iterate through those refs and apply filters
diff --git a/ref-filter.h b/ref-filter.h
index aa0eea4ecf5..c9a11495177 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -5,6 +5,7 @@
 #include "refs.h"
 #include "commit.h"
 #include "parse-options.h"
+#include "string-list.h"
 
 /* Quoting styles */
 #define QUOTE_NONE 0
@@ -24,6 +25,7 @@
 
 struct atom_value;
 struct ref_sorting;
+struct ahead_behind_count;
 
 enum ref_sorting_order {
 	REF_SORTING_REVERSE = 1<<0,
@@ -40,6 +42,8 @@ struct ref_array_item {
 	const char *symref;
 	struct commit *commit;
 	struct atom_value *value;
+	struct ahead_behind_count **counts;
+
 	char refname[FLEX_ARRAY];
 };
 
@@ -47,6 +51,9 @@ struct ref_array {
 	int nr, alloc;
 	struct ref_array_item **items;
 	struct rev_info *revs;
+
+	struct ahead_behind_count *counts;
+	size_t counts_nr;
 };
 
 struct ref_filter {
@@ -80,9 +87,15 @@ struct ref_format {
 
 	/* Internal state to ref-filter */
 	int need_color_reset_at_eol;
+
+	/* List of bases for ahead-behind counts. */
+	struct string_list bases;
 };
 
-#define REF_FORMAT_INIT { .use_color = -1 }
+#define REF_FORMAT_INIT {             \
+	.use_color = -1,              \
+	.bases = STRING_LIST_INIT_DUP, \
+}
 
 /*  Macros for checking --merged and --no-merged options */
 #define _OPT_MERGED_NO_MERGED(option, filter, h) \
@@ -143,4 +156,15 @@ struct ref_array_item *ref_array_push(struct ref_array *array,
 				      const char *refname,
 				      const struct object_id *oid);
 
+/*
+ * If the provided format includes ahead-behind atoms, then compute the
+ * ahead-behind values for the array of filtered references. Must be
+ * called after filter_refs() but before outputting the formatted refs.
+ *
+ * If this is not called, then any ahead-behind atoms will be blank.
+ */
+void filter_ahead_behind(struct repository *r,
+			 struct ref_format *format,
+			 struct ref_array *array);
+
 #endif /*  REF_FILTER_H  */
diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
new file mode 100755
index 00000000000..439a448c2e6
--- /dev/null
+++ b/t/perf/p1500-graph-walks.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+
+test_description='Commit walk performance tests'
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_expect_success 'setup' '
+	git for-each-ref --format="%(refname)" "refs/heads/*" "refs/tags/*" >allrefs &&
+	sort -r allrefs | head -n 50 >refs &&
+	for ref in $(cat refs)
+	do
+		git branch -f ref-$ref $ref &&
+		echo ref-$ref ||
+		return 1
+	done >branches &&
+	for ref in $(cat refs)
+	do
+		git tag -f tag-$ref $ref &&
+		echo tag-$ref ||
+		return 1
+	done >tags &&
+	git commit-graph write --reachable
+'
+
+test_perf 'ahead-behind counts: git for-each-ref' '
+	git for-each-ref --format="%(ahead-behind:HEAD)" --stdin <refs
+'
+
+test_perf 'ahead-behind counts: git branch' '
+	xargs git branch -l --format="%(ahead-behind:HEAD)" <branches
+'
+
+test_perf 'ahead-behind counts: git tag' '
+	xargs git tag -l --format="%(ahead-behind:HEAD)" <tags
+'
+
+test_perf 'ahead-behind counts: git rev-list' '
+	for r in $(cat refs)
+	do
+		git rev-list --count "HEAD..$r" || return 1
+	done
+'
+
+test_done
diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
index d34d77f8934..1c0f7ea24e7 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3203-branch-output.sh
@@ -337,6 +337,20 @@ test_expect_success 'git branch --format option' '
 	test_cmp expect actual
 '
 
+test_expect_success 'git branch --format with ahead-behind' '
+	cat >expect <<-\EOF &&
+	(HEAD detached from fromtag) 0 0
+	refs/heads/ambiguous 0 0
+	refs/heads/branch-one 1 0
+	refs/heads/branch-two 0 0
+	refs/heads/main 1 0
+	refs/heads/ref-to-branch 1 0
+	refs/heads/ref-to-remote 1 0
+	EOF
+	git branch --format="%(refname) %(ahead-behind:HEAD)" >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'git branch with --format=%(rest) must fail' '
 	test_must_fail git branch --format="%(rest)" >actual
 '
diff --git a/t/t6301-for-each-ref-errors.sh b/t/t6301-for-each-ref-errors.sh
index bfda1f46ad2..2667dd13fe3 100755
--- a/t/t6301-for-each-ref-errors.sh
+++ b/t/t6301-for-each-ref-errors.sh
@@ -54,4 +54,18 @@ test_expect_success 'Missing objects are reported correctly' '
 	test_must_be_empty brief-err
 '
 
+test_expect_success 'ahead-behind requires an argument' '
+	test_must_fail git for-each-ref \
+		--format="%(ahead-behind)" 2>err &&
+	echo "fatal: expected format: %(ahead-behind:<committish>)" >expect &&
+	test_cmp expect err
+'
+
+test_expect_success 'missing ahead-behind base' '
+	test_must_fail git for-each-ref \
+		--format="%(ahead-behind:refs/heads/missing)" 2>err &&
+	echo "fatal: failed to find '\''refs/heads/missing'\''" >expect &&
+	test_cmp expect err
+'
+
 test_done
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 338a9c46a24..0cb50797ef7 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -443,4 +443,90 @@ test_expect_success 'get_reachable_subset:none' '
 	test_all_modes get_reachable_subset
 '
 
+test_expect_success 'for-each-ref ahead-behind:linear' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-1-3
+	refs/heads/commit-1-5
+	refs/heads/commit-1-8
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1 0 8
+	refs/heads/commit-1-3 0 6
+	refs/heads/commit-1-5 0 4
+	refs/heads/commit-1-8 0 1
+	EOF
+	run_all_modes git for-each-ref \
+		--format="%(refname) %(ahead-behind:commit-1-9)" --stdin
+'
+
+test_expect_success 'for-each-ref ahead-behind:all' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-2-4
+	refs/heads/commit-4-2
+	refs/heads/commit-4-4
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1 0 24
+	refs/heads/commit-2-4 0 17
+	refs/heads/commit-4-2 0 17
+	refs/heads/commit-4-4 0 9
+	EOF
+	run_all_modes git for-each-ref \
+		--format="%(refname) %(ahead-behind:commit-5-5)" --stdin
+'
+
+test_expect_success 'for-each-ref ahead-behind:some' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-5-3
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1 0 53
+	refs/heads/commit-4-8 8 30
+	refs/heads/commit-5-3 0 39
+	refs/heads/commit-9-9 27 0
+	EOF
+	run_all_modes git for-each-ref \
+		--format="%(refname) %(ahead-behind:commit-9-6)" --stdin
+'
+
+test_expect_success 'for-each-ref ahead-behind:some, multibase' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-5-3
+	refs/heads/commit-7-8
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1 0 53 0 53
+	refs/heads/commit-4-8 8 30 0 22
+	refs/heads/commit-5-3 0 39 0 39
+	refs/heads/commit-7-8 14 12 8 6
+	refs/heads/commit-9-9 27 0 27 0
+	EOF
+	run_all_modes git for-each-ref \
+		--format="%(refname) %(ahead-behind:commit-9-6) %(ahead-behind:commit-6-9)" \
+		--stdin
+'
+
+test_expect_success 'for-each-ref ahead-behind:none' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-7-5
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-4-8 16 16
+	refs/heads/commit-7-5 7 4
+	refs/heads/commit-9-9 49 0
+	EOF
+	run_all_modes git for-each-ref \
+		--format="%(refname) %(ahead-behind:commit-8-4)" --stdin
+'
+
 test_done
diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index 9aa1660651b..04a4b44183d 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -792,6 +792,34 @@ test_expect_success 'annotations for blobs are empty' '
 	test_cmp expect actual
 '
 
+# Run this before doing any signing, so the test has the same results
+# regardless of the GPG prereq.
+test_expect_success 'git tag --format with ahead-behind' '
+	test_when_finished git reset --hard tag-one-line &&
+	git commit --allow-empty -m "left" &&
+	git tag -a -m left tag-left &&
+	git reset --hard HEAD~1 &&
+	git commit --allow-empty -m "right" &&
+	git tag -a -m left tag-right &&
+
+	# Use " !" at the end to demonstrate whitespace
+	# around empty ahead-behind token for tag-blob.
+	cat >expect <<-EOF &&
+	refs/tags/tag-blob  !
+	refs/tags/tag-left 1 1 !
+	refs/tags/tag-lines 0 1 !
+	refs/tags/tag-one-line 0 1 !
+	refs/tags/tag-right 0 0 !
+	refs/tags/tag-zero-lines 0 1 !
+	EOF
+	git tag -l --format="%(refname) %(ahead-behind:HEAD) !" >actual 2>err &&
+	grep "refs/tags/tag" actual >actual.focus &&
+	test_cmp expect actual.focus &&
+
+	# Error reported for tags that point to non-commits.
+	grep "error: object [0-9a-f]* is a blob, not a commit" err
+'
+
 # trying to verify annotated non-signed tags:
 
 test_expect_success GPG \
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v4 9/9] commit-reach: add tips_reachable_from_bases()
  2023-03-20 11:26     ` [PATCH v4 0/9] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
                         ` (7 preceding siblings ...)
  2023-03-20 11:26       ` [PATCH v4 8/9] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
@ 2023-03-20 11:26       ` Derrick Stolee via GitGitGadget
  8 siblings, 0 replies; 90+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2023-03-20 11:26 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Jonathan Tan,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

Both 'git for-each-ref --merged=<X>' and 'git branch --merged=<X>' use
the ref-filter machinery to select references or branches (respectively)
that are reachable from a set of commits presented by one or more
--merged arguments. This happens within reach_filter(), which uses the
revision-walk machinery to walk history in a standard way.

However, the commit-reach.c file is full of custom searches that are
more efficient, especially for reachability queries that can terminate
early when reachability is discovered. Add a new
tips_reachable_from_bases() method to commit-reach.c and call it from
within reach_filter() in ref-filter.c. This affects both 'git branch'
and 'git for-each-ref' as tested in p1500-graph-walks.sh.

For the Linux kernel repository, we take an already-fast algorithm and
make it even faster:

Test                                            HEAD~1  HEAD
-------------------------------------------------------------------
1500.5: contains: git for-each-ref --merged     0.13    0.02 -84.6%
1500.6: contains: git branch --merged           0.14    0.02 -85.7%
1500.7: contains: git tag --merged              0.15    0.03 -80.0%

(Note that we remove the iterative 'git rev-list' test from p1500
because it no longer makes sense as a comparison to 'git for-each-ref'
and would just waste time running it for these comparisons.)

The algorithm is implemented in commit-reach.c in the method
tips_reachable_from_base(). This method takes a string_list of tips and
assigns the 'util' for each item with the value 1 if the base commit can
reach those tips.

Like other reachability queries in commit-reach.c, the fastest way to
search for "can A reach B?" is to do a depth-first search up to the
generation number of B, preferring to explore first parents before later
parents. While we must walk all reachable commits up to that generation
number when the answer is "no", the depth-first search can answer "yes"
much faster than other approaches in most cases.

This search becomes trickier when there are multiple targets for the
depth-first search. The commits with lower generation number are more
likely to be within the history of the start commit, but we don't want
to waste time searching commits of low generation number if the commit
target with lowest generation number has already been found.

The trick here is to take the input commits and sort them by generation
number in ascending order. Track the index within this order as
min_generation_index. When we find a commit, if its index in the list is
equal to min_generation_index, then we can increase the generation
number boundary of our search to the next-lowest value in the list.

With this mechanism, the number of commits to search is minimized with
respect to the depth-first search heuristic. We will walk all commits up
to the minimum generation number of a commit that is _not_ reachable
from the start, but we will walk only the necessary portion of the
depth-first search for the reachable commits of lower generation.

Add extra tests for this behavior in t6600-test-reach.sh as the
interesting data shape of that repository can sometimes demonstrate
corner case bugs.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-reach.c              | 113 ++++++++++++++++++++++++++++++++++++
 commit-reach.h              |   9 +++
 ref-filter.c                |  20 ++-----
 t/perf/p1500-graph-walks.sh |  15 +++--
 t/t6600-test-reach.sh       |  83 ++++++++++++++++++++++++++
 5 files changed, 219 insertions(+), 21 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index cd990dce16a..c1edeb46106 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1044,3 +1044,116 @@ void ahead_behind(struct repository *r,
 	clear_bit_arrays(&bit_arrays);
 	clear_prio_queue(&queue);
 }
+
+struct commit_and_index {
+	struct commit *commit;
+	unsigned int index;
+	timestamp_t generation;
+};
+
+static int compare_commit_and_index_by_generation(const void *va, const void *vb)
+{
+	const struct commit_and_index *a = (const struct commit_and_index *)va;
+	const struct commit_and_index *b = (const struct commit_and_index *)vb;
+
+	if (a->generation > b->generation)
+		return 1;
+	if (a->generation < b->generation)
+		return -1;
+	return 0;
+}
+
+void tips_reachable_from_bases(struct repository *r,
+			       struct commit_list *bases,
+			       struct commit **tips, size_t tips_nr,
+			       int mark)
+{
+	struct commit_and_index *commits;
+	size_t min_generation_index = 0;
+	timestamp_t min_generation;
+	struct commit_list *stack = NULL;
+
+	if (!bases || !tips || !tips_nr)
+		return;
+
+	/*
+	 * Do a depth-first search starting at 'bases' to search for the
+	 * tips. Stop at the lowest (un-found) generation number. When
+	 * finding the lowest commit, increase the minimum generation
+	 * number to the next lowest (un-found) generation number.
+	 */
+
+	CALLOC_ARRAY(commits, tips_nr);
+
+	for (size_t i = 0; i < tips_nr; i++) {
+		commits[i].commit = tips[i];
+		commits[i].index = i;
+		commits[i].generation = commit_graph_generation(tips[i]);
+	}
+
+	/* Sort with generation number ascending. */
+	QSORT(commits, tips_nr, compare_commit_and_index_by_generation);
+	min_generation = commits[0].generation;
+
+	while (bases) {
+		repo_parse_commit(r, bases->item);
+		commit_list_insert(bases->item, &stack);
+		bases = bases->next;
+	}
+
+	while (stack) {
+		int explored_all_parents = 1;
+		struct commit_list *p;
+		struct commit *c = stack->item;
+		timestamp_t c_gen = commit_graph_generation(c);
+
+		/* Does it match any of our tips? */
+		for (size_t j = min_generation_index; j < tips_nr; j++) {
+			if (c_gen < commits[j].generation)
+				break;
+
+			if (commits[j].commit == c) {
+				tips[commits[j].index]->object.flags |= mark;
+
+				if (j == min_generation_index) {
+					unsigned int k = j + 1;
+					while (k < tips_nr &&
+					       (tips[commits[k].index]->object.flags & mark))
+						k++;
+
+					/* Terminate early if all found. */
+					if (k >= tips_nr)
+						goto done;
+
+					min_generation_index = k;
+					min_generation = commits[k].generation;
+				}
+			}
+		}
+
+		for (p = c->parents; p; p = p->next) {
+			repo_parse_commit(r, p->item);
+
+			/* Have we already explored this parent? */
+			if (p->item->object.flags & SEEN)
+				continue;
+
+			/* Is it below the current minimum generation? */
+			if (commit_graph_generation(p->item) < min_generation)
+				continue;
+
+			/* Ok, we will explore from here on. */
+			p->item->object.flags |= SEEN;
+			explored_all_parents = 0;
+			commit_list_insert(p->item, &stack);
+			break;
+		}
+
+		if (explored_all_parents)
+			pop_commit(&stack);
+	}
+
+done:
+	free(commits);
+	repo_clear_commit_marks(r, SEEN);
+}
diff --git a/commit-reach.h b/commit-reach.h
index f708c46e523..d6321ae700e 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -135,4 +135,13 @@ void ahead_behind(struct repository *r,
 		  struct commit **commits, size_t commits_nr,
 		  struct ahead_behind_count *counts, size_t counts_nr);
 
+/*
+ * For all tip commits, add 'mark' to their flags if and only if they
+ * are reachable from one of the commits in 'bases'.
+ */
+void tips_reachable_from_bases(struct repository *r,
+			       struct commit_list *bases,
+			       struct commit **tips, size_t tips_nr,
+			       int mark);
+
 #endif
diff --git a/ref-filter.c b/ref-filter.c
index 62135f649ec..c724ff94113 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2396,33 +2396,22 @@ static void reach_filter(struct ref_array *array,
 			 struct commit_list *check_reachable,
 			 int include_reached)
 {
-	struct rev_info revs;
 	int i, old_nr;
 	struct commit **to_clear;
-	struct commit_list *cr;
 
 	if (!check_reachable)
 		return;
 
 	CALLOC_ARRAY(to_clear, array->nr);
-
-	repo_init_revisions(the_repository, &revs, NULL);
-
 	for (i = 0; i < array->nr; i++) {
 		struct ref_array_item *item = array->items[i];
-		add_pending_object(&revs, &item->commit->object, item->refname);
 		to_clear[i] = item->commit;
 	}
 
-	for (cr = check_reachable; cr; cr = cr->next) {
-		struct commit *merge_commit = cr->item;
-		merge_commit->object.flags |= UNINTERESTING;
-		add_pending_object(&revs, &merge_commit->object, "");
-	}
-
-	revs.limited = 1;
-	if (prepare_revision_walk(&revs))
-		die(_("revision walk setup failed"));
+	tips_reachable_from_bases(the_repository,
+				  check_reachable,
+				  to_clear, array->nr,
+				  UNINTERESTING);
 
 	old_nr = array->nr;
 	array->nr = 0;
@@ -2446,7 +2435,6 @@ static void reach_filter(struct ref_array *array,
 		clear_commit_marks(merge_commit, ALL_REV_FLAGS);
 	}
 
-	release_revisions(&revs);
 	free(to_clear);
 }
 
diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
index 439a448c2e6..e14e7620cce 100755
--- a/t/perf/p1500-graph-walks.sh
+++ b/t/perf/p1500-graph-walks.sh
@@ -35,11 +35,16 @@ test_perf 'ahead-behind counts: git tag' '
 	xargs git tag -l --format="%(ahead-behind:HEAD)" <tags
 '
 
-test_perf 'ahead-behind counts: git rev-list' '
-	for r in $(cat refs)
-	do
-		git rev-list --count "HEAD..$r" || return 1
-	done
+test_perf 'contains: git for-each-ref --merged' '
+	git for-each-ref --merged=HEAD --stdin <refs
+'
+
+test_perf 'contains: git branch --merged' '
+	xargs git branch --merged=HEAD <branches
+'
+
+test_perf 'contains: git tag --merged' '
+	xargs git tag --merged=HEAD <tags
 '
 
 test_done
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 0cb50797ef7..b330945f497 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -529,4 +529,87 @@ test_expect_success 'for-each-ref ahead-behind:none' '
 		--format="%(refname) %(ahead-behind:commit-8-4)" --stdin
 '
 
+test_expect_success 'for-each-ref merged:linear' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-1-3
+	refs/heads/commit-1-5
+	refs/heads/commit-1-8
+	refs/heads/commit-2-1
+	refs/heads/commit-5-1
+	refs/heads/commit-9-1
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-1-3
+	refs/heads/commit-1-5
+	refs/heads/commit-1-8
+	EOF
+	run_all_modes git for-each-ref --merged=commit-1-9 \
+		--format="%(refname)" --stdin
+'
+
+test_expect_success 'for-each-ref merged:all' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-2-4
+	refs/heads/commit-4-2
+	refs/heads/commit-4-4
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-2-4
+	refs/heads/commit-4-2
+	refs/heads/commit-4-4
+	EOF
+	run_all_modes git for-each-ref --merged=commit-5-5 \
+		--format="%(refname)" --stdin
+'
+
+test_expect_success 'for-each-ref ahead-behind:some' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-5-3
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-5-3
+	EOF
+	run_all_modes git for-each-ref --merged=commit-9-6 \
+		--format="%(refname)" --stdin
+'
+
+test_expect_success 'for-each-ref merged:some, multibase' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-5-3
+	refs/heads/commit-7-8
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-4-8
+	refs/heads/commit-5-3
+	EOF
+	run_all_modes git for-each-ref \
+		--merged=commit-5-8 \
+		--merged=commit-8-5 \
+		--format="%(refname)" \
+		--stdin
+'
+
+test_expect_success 'for-each-ref merged:none' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-7-5
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	>expect &&
+	run_all_modes git for-each-ref --merged=commit-8-4 \
+		--format="%(refname)" --stdin
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH v4 7/9] commit-reach: implement ahead_behind() logic
  2023-03-20 11:26       ` [PATCH v4 7/9] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
@ 2023-03-20 20:40         ` Jonathan Tan
  0 siblings, 0 replies; 90+ messages in thread
From: Jonathan Tan @ 2023-03-20 20:40 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Jonathan Tan, git, gitster, me, vdye, Jeff King, Phillip Wood,
	Ævar Arnfjörð Bjarmason, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> Fully implement the commit-counting logic required to determine
> ahead/behind counts for a batch of commit pairs. This is a new library
> method within commit-reach.h. This method will be linked to the
> for-each-ref builtin in the next change.

Thanks. I see that all my review comments have been addressed, so up to
and including this patch looks good. I haven't had time to look at the
last 2 patches, but it seems that other reviewers are already looking
at those.

^ permalink raw reply	[flat|nested] 90+ messages in thread

end of thread, other threads:[~2023-03-20 20:41 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-06 14:06 [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Derrick Stolee via GitGitGadget
2023-03-06 14:06 ` [PATCH 1/8] ahead-behind: create empty builtin Derrick Stolee via GitGitGadget
2023-03-06 18:48   ` Junio C Hamano
2023-03-07  0:40     ` Taylor Blau
2023-03-08 22:14       ` Derrick Stolee
2023-03-08 22:56         ` Junio C Hamano
2023-03-06 14:06 ` [PATCH 2/8] ahead-behind: parse tip references Derrick Stolee via GitGitGadget
2023-03-07  0:43   ` Taylor Blau
2023-03-06 14:06 ` [PATCH 3/8] ahead-behind: implement --ignore-missing option Derrick Stolee via GitGitGadget
2023-03-07  0:46   ` Taylor Blau
2023-03-06 14:06 ` [PATCH 4/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
2023-03-06 14:06 ` [PATCH 5/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-06 14:06 ` [PATCH 6/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-06 18:52   ` Junio C Hamano
2023-03-07  0:50     ` Taylor Blau
2023-03-06 14:06 ` [PATCH 7/8] ahead-behind: implement ahead_behind() logic Derrick Stolee via GitGitGadget
2023-03-07  1:05   ` Taylor Blau
2023-03-09 17:32     ` Derrick Stolee
2023-03-06 14:06 ` [PATCH 8/8] ahead-behind: add --contains mode Derrick Stolee via GitGitGadget
2023-03-06 18:26 ` [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Junio C Hamano
2023-03-06 20:18   ` Derrick Stolee
2023-03-06 22:24     ` Junio C Hamano
2023-03-07  0:36   ` Taylor Blau
2023-03-09  9:20     ` Jeff King
2023-03-09 21:51       ` Junio C Hamano
2023-03-07  0:33 ` Taylor Blau
2023-03-10 17:20 ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
2023-03-10 17:20   ` [PATCH v2 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
2023-03-10 18:08     ` Junio C Hamano
2023-03-13 10:31     ` Phillip Wood
2023-03-13 13:33       ` Derrick Stolee
2023-03-13 21:10         ` Taylor Blau
2023-03-15 13:37     ` Ævar Arnfjörð Bjarmason
2023-03-15 17:17       ` Jeff King
2023-03-15 17:49     ` Jeff King
2023-03-15 19:24       ` Junio C Hamano
2023-03-15 19:44         ` Jeff King
2023-03-10 17:20   ` [PATCH v2 2/8] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
2023-03-10 17:20   ` [PATCH v2 3/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
2023-03-10 17:20   ` [PATCH v2 4/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-10 17:21   ` [PATCH v2 5/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-10 17:21   ` [PATCH v2 6/8] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
2023-03-15 13:50     ` Ævar Arnfjörð Bjarmason
2023-03-15 16:03       ` Junio C Hamano
2023-03-15 16:13         ` Derrick Stolee
2023-03-10 17:21   ` [PATCH v2 7/8] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
2023-03-10 19:09     ` Junio C Hamano
2023-03-15 13:57     ` Ævar Arnfjörð Bjarmason
2023-03-15 16:01       ` Junio C Hamano
2023-03-15 16:12         ` Derrick Stolee
2023-03-15 16:11       ` Derrick Stolee
2023-03-10 17:21   ` [PATCH v2 8/8] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget
2023-03-15 14:13     ` Ævar Arnfjörð Bjarmason
2023-03-15 16:17       ` Derrick Stolee
2023-03-15 16:18         ` Derrick Stolee
2023-03-10 19:16   ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Junio C Hamano
2023-03-10 19:25     ` Derrick Stolee
2023-03-15 17:31       ` Jeff King
2023-03-15 17:44         ` Derrick Stolee
2023-03-15 19:34         ` Junio C Hamano
2023-03-15 13:22   ` Ævar Arnfjörð Bjarmason
2023-03-15 13:54     ` Derrick Stolee
2023-03-15 17:45   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
2023-03-15 18:06       ` Jeff King
2023-03-15 19:14         ` Junio C Hamano
2023-03-15 22:41       ` Jonathan Tan
2023-03-15 17:45     ` [PATCH v3 2/8] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 3/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
2023-03-15 22:49       ` Jonathan Tan
2023-03-17 18:30         ` Derrick Stolee
2023-03-15 17:45     ` [PATCH v3 4/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-15 22:58       ` Jonathan Tan
2023-03-15 17:45     ` [PATCH v3 5/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 6/8] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
2023-03-15 23:28       ` Jonathan Tan
2023-03-17 18:44         ` Derrick Stolee
2023-03-15 17:45     ` [PATCH v3 7/8] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 8/8] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget
2023-03-20 11:26     ` [PATCH v4 0/9] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 1/9] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 2/9] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 3/9] commit-graph: refactor compute_topological_levels() Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 4/9] commit-graph: simplify compute_generation_numbers() Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 5/9] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 6/9] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 7/9] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
2023-03-20 20:40         ` Jonathan Tan
2023-03-20 11:26       ` [PATCH v4 8/9] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 9/9] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).