git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/7] rev-parse: implement object type filter
@ 2021-03-01 12:20 Patrick Steinhardt
  2021-03-01 12:20 ` [PATCH 1/7] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
                   ` (9 more replies)
  0 siblings, 10 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-01 12:20 UTC (permalink / raw)
  To: git; +Cc: Christian Couder

[-- Attachment #1: Type: text/plain, Size: 3150 bytes --]

Hi,

I've recently had the usecase to retrieve all blobs introduces between
two versions which have a limit smaller than 200 bytes in order to find
all potential candidates for LFS pointers. This is currently done with
`git rev-list --objects --filter=blob:limit=200 <newrev> ^<oldrev>`, but
this is kind of inefficient: the resulting list is way too long as it
also potentially includes tags, commits and trees.

To be able to more efficiently answer this query, I've implemented
multiple things:

- A new object type filter `--filter=object:type=<type>` for
  git-rev-list(1), which is implemented both for normal graph walks and
  for the packfile bitmap index.

- Given that above usecase requires two filters (the object type
  and blob size filters), bitmap filters were extended to support
  combined filters.

- git-rev-list(1) doesn't filter user-provided objects and always prints
  them. I don't want the listed commits though and only their referenced
  potential LFS blobs. So I've added a new flag `--filter-provided`
  which marks all provided objects as not-user-provided such that they
  get filtered the same as all the other objects.

Altogether, this ends up with the following queries, both of which have
been executed in a well-packed linux.git repository:

    # Previous query which uses object names as a heuristic to filter
    # non-blob objects, which bars us from using bitmap indices because
    # they cannot print paths.
    $ time git rev-list --objects --filter=blob:limit=200 \
        --object-names --all | sed -r '/^.{,41}$/d' | wc -l
    4502300

    real 1m23.872s
    user 1m30.076s
    sys  0m6.002s

    # New query.
    $ time git rev-list --objects --filter-provided \
        --filter=object:type=blob --filter=blob:limit=200 \
        --use-bitmap-index --all | wc -l
    22585

    real 0m19.216s
    user 0m16.768s
    sys  0m2.450s

So with the new optimized query, we can both significantly reduce the
list of candidate LFS pointers and execution time.

Patrick

Patrick Steinhardt (7):
  revision: mark commit parents as NOT_USER_GIVEN
  list-objects: move tag processing into its own function
  list-objects: support filtering by tag and commit
  list-objects: implement object type filter
  pack-bitmap: implement object type filter
  pack-bitmap: implement combined filter
  rev-list: allow filtering of provided items

 Documentation/rev-list-options.txt  |   3 +
 builtin/rev-list.c                  |  14 ++++
 list-objects-filter-options.c       |  14 ++++
 list-objects-filter-options.h       |   8 ++
 list-objects-filter.c               | 116 ++++++++++++++++++++++++++++
 list-objects-filter.h               |   2 +
 list-objects.c                      |  32 +++++++-
 pack-bitmap.c                       |  71 +++++++++++++++--
 revision.c                          |   4 +-
 revision.h                          |   3 -
 t/t6112-rev-list-filters-objects.sh |  76 ++++++++++++++++++
 t/t6113-rev-list-bitmap-filters.sh  |  54 ++++++++++++-
 12 files changed, 380 insertions(+), 17 deletions(-)

-- 
2.30.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 1/7] revision: mark commit parents as NOT_USER_GIVEN
  2021-03-01 12:20 [PATCH 0/7] rev-parse: implement object type filter Patrick Steinhardt
@ 2021-03-01 12:20 ` Patrick Steinhardt
  2021-03-01 12:20 ` [PATCH 2/7] list-objects: move tag processing into its own function Patrick Steinhardt
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-01 12:20 UTC (permalink / raw)
  To: git; +Cc: Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2338 bytes --]

The NOT_USER_GIVEN flag of an object marks whether a flag was explicitly
provided by the user or not. The most important use case for this is
when filtering objects: only objects that were not explicitly requested
will get filtered.

The flag is currently only set for blobs and trees, which has been fine
given that there are no filters for tags or commits currently. We're
about to extend filtering capabilities to add object type filter though,
which requires us to set up the NOT_USER_GIVEN flag correctly -- if it's
not set, the object wouldn't get filtered at all.

Mark unseen commit parents as NOT_USER_GIVEN when processing parents.
Like this, explicitly provided parents stay user-given and thus
unfiltered, while parents which get loaded as part of the graph walk
can be filtered.

This commit shouldn't have any user-visible impact yet as there is no
logic to filter commits yet.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 revision.c | 4 ++--
 revision.h | 3 ---
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/revision.c b/revision.c
index b78733f508..26f422f50d 100644
--- a/revision.c
+++ b/revision.c
@@ -1123,7 +1123,7 @@ static int process_parents(struct rev_info *revs, struct commit *commit,
 				mark_parents_uninteresting(p);
 			if (p->object.flags & SEEN)
 				continue;
-			p->object.flags |= SEEN;
+			p->object.flags |= (SEEN | NOT_USER_GIVEN);
 			if (list)
 				commit_list_insert_by_date(p, list);
 			if (queue)
@@ -1165,7 +1165,7 @@ static int process_parents(struct rev_info *revs, struct commit *commit,
 		}
 		p->object.flags |= left_flag;
 		if (!(p->object.flags & SEEN)) {
-			p->object.flags |= SEEN;
+			p->object.flags |= (SEEN | NOT_USER_GIVEN);
 			if (list)
 				commit_list_insert_by_date(p, list);
 			if (queue)
diff --git a/revision.h b/revision.h
index e6be3c845e..f1f324a19b 100644
--- a/revision.h
+++ b/revision.h
@@ -44,9 +44,6 @@
 /*
  * Indicates object was reached by traversal. i.e. not given by user on
  * command-line or stdin.
- * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
- * filtering trees and blobs, but it may be useful to support filtering commits
- * in the future.
  */
 #define NOT_USER_GIVEN	(1u<<25)
 #define TRACK_LINEAR	(1u<<26)
-- 
2.30.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 2/7] list-objects: move tag processing into its own function
  2021-03-01 12:20 [PATCH 0/7] rev-parse: implement object type filter Patrick Steinhardt
  2021-03-01 12:20 ` [PATCH 1/7] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
@ 2021-03-01 12:20 ` Patrick Steinhardt
  2021-03-01 12:20 ` [PATCH 3/7] list-objects: support filtering by tag and commit Patrick Steinhardt
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-01 12:20 UTC (permalink / raw)
  To: git; +Cc: Christian Couder

[-- Attachment #1: Type: text/plain, Size: 1293 bytes --]

Move processing of tags into its own function to make the logic easier
to extend when we're going to implement filtering for tags. No change in
behaviour is expected from this commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 list-objects.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index e19589baa0..093adf85b1 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -213,6 +213,15 @@ static void process_tree(struct traversal_context *ctx,
 	free_tree_buffer(tree);
 }
 
+static void process_tag(struct traversal_context *ctx,
+			struct tag *tag,
+			struct strbuf *base,
+			const char *name)
+{
+	tag->object.flags |= SEEN;
+	ctx->show_object(&tag->object, name, ctx->show_data);
+}
+
 static void mark_edge_parents_uninteresting(struct commit *commit,
 					    struct rev_info *revs,
 					    show_edge_fn show_edge)
@@ -334,8 +343,7 @@ static void traverse_trees_and_blobs(struct traversal_context *ctx,
 		if (obj->flags & (UNINTERESTING | SEEN))
 			continue;
 		if (obj->type == OBJ_TAG) {
-			obj->flags |= SEEN;
-			ctx->show_object(obj, name, ctx->show_data);
+			process_tag(ctx, (struct tag *)obj, base, name);
 			continue;
 		}
 		if (!path)
-- 
2.30.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 3/7] list-objects: support filtering by tag and commit
  2021-03-01 12:20 [PATCH 0/7] rev-parse: implement object type filter Patrick Steinhardt
  2021-03-01 12:20 ` [PATCH 1/7] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
  2021-03-01 12:20 ` [PATCH 2/7] list-objects: move tag processing into its own function Patrick Steinhardt
@ 2021-03-01 12:20 ` Patrick Steinhardt
  2021-03-01 12:20 ` [PATCH 4/7] list-objects: implement object type filter Patrick Steinhardt
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-01 12:20 UTC (permalink / raw)
  To: git; +Cc: Christian Couder

[-- Attachment #1: Type: text/plain, Size: 4850 bytes --]

Object filters currently only support filtering blobs or trees based on
some criteria. This commit lays the foundation to also allow filtering
of tags and commits.

No change in behaviour is expected from this commit given that there are
no filters yet for those object types.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 list-objects-filter.c | 40 ++++++++++++++++++++++++++++++++++++++++
 list-objects-filter.h |  2 ++
 list-objects.c        | 24 +++++++++++++++++++++---
 3 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/list-objects-filter.c b/list-objects-filter.c
index 4ec0041cfb..7def039435 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -82,6 +82,16 @@ static enum list_objects_filter_result filter_blobs_none(
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		/* always include all tag objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		/* always include all commit objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		/* always include all tree objects */
@@ -173,6 +183,16 @@ static enum list_objects_filter_result filter_trees_depth(
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		/* always include all tag objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		/* always include all commit objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
 	case LOFS_END_TREE:
 		assert(obj->type == OBJ_TREE);
 		filter_data->current_depth--;
@@ -267,6 +287,16 @@ static enum list_objects_filter_result filter_blobs_limit(
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		/* always include all tag objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		/* always include all commit objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		/* always include all tree objects */
@@ -371,6 +401,16 @@ static enum list_objects_filter_result filter_sparse(
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		/* always include all tag objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		/* always include all commit objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		dtype = DT_DIR;
diff --git a/list-objects-filter.h b/list-objects-filter.h
index cfd784e203..9e98814111 100644
--- a/list-objects-filter.h
+++ b/list-objects-filter.h
@@ -55,6 +55,8 @@ enum list_objects_filter_result {
 };
 
 enum list_objects_filter_situation {
+	LOFS_COMMIT,
+	LOFS_TAG,
 	LOFS_BEGIN_TREE,
 	LOFS_END_TREE,
 	LOFS_BLOB
diff --git a/list-objects.c b/list-objects.c
index 093adf85b1..3b63dfd4f2 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -218,8 +218,16 @@ static void process_tag(struct traversal_context *ctx,
 			struct strbuf *base,
 			const char *name)
 {
-	tag->object.flags |= SEEN;
-	ctx->show_object(&tag->object, name, ctx->show_data);
+	enum list_objects_filter_result r;
+
+	r = list_objects_filter__filter_object(ctx->revs->repo, LOFS_TAG,
+					       &tag->object, base->buf,
+					       &base->buf[base->len],
+					       ctx->filter);
+	if (r & LOFR_MARK_SEEN)
+		tag->object.flags |= SEEN;
+	if (r & LOFR_DO_SHOW)
+		ctx->show_object(&tag->object, name, ctx->show_data);
 }
 
 static void mark_edge_parents_uninteresting(struct commit *commit,
@@ -369,6 +377,12 @@ static void do_traverse(struct traversal_context *ctx)
 	strbuf_init(&csp, PATH_MAX);
 
 	while ((commit = get_revision(ctx->revs)) != NULL) {
+		enum list_objects_filter_result r;
+
+		r = list_objects_filter__filter_object(ctx->revs->repo,
+				LOFS_COMMIT, &commit->object,
+				NULL, NULL, ctx->filter);
+
 		/*
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
@@ -383,7 +397,11 @@ static void do_traverse(struct traversal_context *ctx)
 			die(_("unable to load root tree for commit %s"),
 			      oid_to_hex(&commit->object.oid));
 		}
-		ctx->show_commit(commit, ctx->show_data);
+
+		if (r & LOFR_MARK_SEEN)
+			commit->object.flags |= SEEN;
+		if (r & LOFR_DO_SHOW)
+			ctx->show_commit(commit, ctx->show_data);
 
 		if (ctx->revs->tree_blobs_in_commit_order)
 			/*
-- 
2.30.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 4/7] list-objects: implement object type filter
  2021-03-01 12:20 [PATCH 0/7] rev-parse: implement object type filter Patrick Steinhardt
                   ` (2 preceding siblings ...)
  2021-03-01 12:20 ` [PATCH 3/7] list-objects: support filtering by tag and commit Patrick Steinhardt
@ 2021-03-01 12:20 ` Patrick Steinhardt
  2021-03-01 12:20 ` [PATCH 5/7] pack-bitmap: " Patrick Steinhardt
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-01 12:20 UTC (permalink / raw)
  To: git; +Cc: Christian Couder

[-- Attachment #1: Type: text/plain, Size: 8517 bytes --]

While it already is possible to filter objects by some criteria in
git-rev-list(1), it is not yet possible to filter out only a specific
type of objects. This makes some filters less useful. The `blob:limit`
filter for example filters blobs such that only those which are smaller
than the given limit are returned. But it is unfit to ask only for these
smallish blobs, given that git-rev-list(1) will continue to print tags,
commits and trees.

Now that we have the infrastructure in place to also filter tags and
commits, we can improve this situation by implementing a new filter
which selects objects based on their type. Above query can thus
trivially be implemented with the following command:

    $ git rev-list --objects --filter=object:type=blob \
        --filter=blob:limit=200

Furthermore, this filter allows to optimize for certain other cases: if
for example only tags or commits have been selected, there is no need to
walk down trees.

The new filter is not yet supported in bitmaps. This is going to be
implemented in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/rev-list-options.txt  |  3 ++
 list-objects-filter-options.c       | 14 ++++++
 list-objects-filter-options.h       |  2 +
 list-objects-filter.c               | 76 +++++++++++++++++++++++++++++
 t/t6112-rev-list-filters-objects.sh | 48 ++++++++++++++++++
 5 files changed, 143 insertions(+)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index b1c8f86c6e..3afa8fffbd 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -892,6 +892,9 @@ or units.  n may be zero.  The suffixes k, m, and g can be used to name
 units in KiB, MiB, or GiB.  For example, 'blob:limit=1k' is the same
 as 'blob:limit=1024'.
 +
+The form '--filter=object:type=(tag|commit|tree|blob)' omits all objects
+which are not of the requested type.
++
 The form '--filter=sparse:oid=<blob-ish>' uses a sparse-checkout
 specification contained in the blob (or blob-expression) '<blob-ish>'
 to omit blobs that would not be not required for a sparse checkout on
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index d2d1c81caf..bb6f6577d5 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -29,6 +29,8 @@ const char *list_object_filter_config_name(enum list_objects_filter_choice c)
 		return "tree";
 	case LOFC_SPARSE_OID:
 		return "sparse:oid";
+	case LOFC_OBJECT_TYPE:
+		return "object:type";
 	case LOFC_COMBINE:
 		return "combine";
 	case LOFC__COUNT:
@@ -97,6 +99,18 @@ static int gently_parse_list_objects_filter(
 		}
 		return 1;
 
+	} else if (skip_prefix(arg, "object:type=", &v0)) {
+		int type = type_from_string_gently(v0, -1, 1);
+		if (type < 0) {
+			strbuf_addstr(errbuf, _("expected 'object:type=<type>'"));
+			return 1;
+		}
+
+		filter_options->object_type = type;
+		filter_options->choice = LOFC_OBJECT_TYPE;
+
+		return 0;
+
 	} else if (skip_prefix(arg, "combine:", &v0)) {
 		return parse_combine_filter(filter_options, v0, errbuf);
 
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 01767c3c96..4d0d0588cc 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -13,6 +13,7 @@ enum list_objects_filter_choice {
 	LOFC_BLOB_LIMIT,
 	LOFC_TREE_DEPTH,
 	LOFC_SPARSE_OID,
+	LOFC_OBJECT_TYPE,
 	LOFC_COMBINE,
 	LOFC__COUNT /* must be last */
 };
@@ -54,6 +55,7 @@ struct list_objects_filter_options {
 	char *sparse_oid_name;
 	unsigned long blob_limit_value;
 	unsigned long tree_exclude_depth;
+	enum object_type object_type;
 
 	/* LOFC_COMBINE values */
 
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 7def039435..650a7c2c80 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -545,6 +545,81 @@ static void filter_sparse_oid__init(
 	filter->free_fn = filter_sparse_free;
 }
 
+/*
+ * A filter for list-objects to omit large blobs.
+ * And to OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_object_type_data {
+	enum object_type object_type;
+};
+
+static enum list_objects_filter_result filter_object_type(
+	struct repository *r,
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	struct oidset *omits,
+	void *filter_data_)
+{
+	struct filter_object_type_data *filter_data = filter_data_;
+
+	switch (filter_situation) {
+	default:
+		BUG("unknown filter_situation: %d", filter_situation);
+
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		if (filter_data->object_type == OBJ_TAG)
+			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+		return LOFR_MARK_SEEN;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		if (filter_data->object_type == OBJ_COMMIT)
+			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+		return LOFR_MARK_SEEN;
+
+	case LOFS_BEGIN_TREE:
+		assert(obj->type == OBJ_TREE);
+
+		/*
+		 * If we only want to show commits or tags, then there is no
+		 * need to walk down trees.
+		 */
+		if (filter_data->object_type == OBJ_COMMIT ||
+		    filter_data->object_type == OBJ_TAG)
+			return LOFR_SKIP_TREE;
+
+		if (filter_data->object_type == OBJ_TREE)
+			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+		return LOFR_MARK_SEEN;
+
+	case LOFS_BLOB:
+		assert(obj->type == OBJ_BLOB);
+
+		if (filter_data->object_type == OBJ_BLOB)
+			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+		return LOFR_MARK_SEEN;
+
+	case LOFS_END_TREE:
+		return LOFR_ZERO;
+	}
+}
+
+static void filter_object_type__init(
+	struct list_objects_filter_options *filter_options,
+	struct filter *filter)
+{
+	struct filter_object_type_data *d = xcalloc(1, sizeof(*d));
+	d->object_type = filter_options->object_type;
+
+	filter->filter_data = d;
+	filter->filter_object_fn = filter_object_type;
+	filter->free_fn = free;
+}
+
 /* A filter which only shows objects shown by all sub-filters. */
 struct combine_filter_data {
 	struct subfilter *sub;
@@ -691,6 +766,7 @@ static filter_init_fn s_filters[] = {
 	filter_blobs_limit__init,
 	filter_trees_depth__init,
 	filter_sparse_oid__init,
+	filter_object_type__init,
 	filter_combine__init,
 };
 
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 31457d13b9..c79ec04060 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -159,6 +159,54 @@ test_expect_success 'verify blob:limit=1m' '
 	test_must_be_empty observed
 '
 
+# Test object:type=<type> filter.
+
+test_expect_success 'setup object-type' '
+	git init object-type &&
+	echo contents >object-type/blob &&
+	git -C object-type add blob &&
+	git -C object-type commit -m commit-message &&
+	git -C object-type tag tag -m tag-message
+'
+
+test_expect_success 'verify object:type= fails with invalid type' '
+	test_must_fail git -C object-type rev-list --objects --filter=object:type= HEAD &&
+	test_must_fail git -C object-type rev-list --objects --filter=object:type=invalid HEAD
+'
+
+test_expect_success 'verify object:type=blob prints blob and commit' '
+	(
+		git -C object-type rev-parse HEAD &&
+		printf "%s blob\n" $(git -C object-type rev-parse HEAD:blob)
+	) >expected &&
+	git -C object-type rev-list --objects --filter=object:type=blob HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tree prints tree and commit' '
+	(
+		git -C object-type rev-parse HEAD &&
+		printf "%s \n" $(git -C object-type rev-parse HEAD^{tree})
+	) >expected &&
+	git -C object-type rev-list --objects --filter=object:type=tree HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=commit prints commit' '
+	git -C object-type rev-parse HEAD >expected &&
+	git -C object-type rev-list --objects --filter=object:type=commit HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tag prints tag' '
+	(
+		git -C object-type rev-parse HEAD &&
+		printf "%s tag\n" $(git -C object-type rev-parse tag)
+	) >expected &&
+	git -C object-type rev-list --objects --filter=object:type=tag tag >actual &&
+	test_cmp expected actual
+'
+
 # Test sparse:path=<path> filter.
 # !!!!
 # NOTE: sparse:path filter support has been dropped for security reasons,
-- 
2.30.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 5/7] pack-bitmap: implement object type filter
  2021-03-01 12:20 [PATCH 0/7] rev-parse: implement object type filter Patrick Steinhardt
                   ` (3 preceding siblings ...)
  2021-03-01 12:20 ` [PATCH 4/7] list-objects: implement object type filter Patrick Steinhardt
@ 2021-03-01 12:20 ` Patrick Steinhardt
  2021-03-01 12:20 ` [PATCH 6/7] pack-bitmap: implement combined filter Patrick Steinhardt
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-01 12:20 UTC (permalink / raw)
  To: git; +Cc: Christian Couder

[-- Attachment #1: Type: text/plain, Size: 3572 bytes --]

The preceding commit has added a new object filter for git-rev-list(1)
which allows to filter objects by type. Implement the equivalent filter
for packfile bitmaps so that we can answer these queries fast.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 pack-bitmap.c                      | 28 +++++++++++++++++++++++++---
 t/t6113-rev-list-bitmap-filters.sh | 25 ++++++++++++++++++++++++-
 2 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 1f69b5fa85..196d38c91d 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -779,9 +779,6 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
 	eword_t mask;
 	uint32_t i;
 
-	if (type != OBJ_BLOB && type != OBJ_TREE)
-		BUG("filter_bitmap_exclude_type: unsupported type '%d'", type);
-
 	/*
 	 * The non-bitmap version of this filter never removes
 	 * objects which the other side specifically asked for,
@@ -911,6 +908,23 @@ static void filter_bitmap_tree_depth(struct bitmap_index *bitmap_git,
 				   OBJ_BLOB);
 }
 
+static void filter_bitmap_object_type(struct bitmap_index *bitmap_git,
+				      struct object_list *tip_objects,
+				      struct bitmap *to_filter,
+				      enum object_type object_type)
+{
+	enum object_type t;
+
+	if (object_type < OBJ_COMMIT || object_type > OBJ_TAG)
+		BUG("filter_bitmap_object_type given invalid object");
+
+	for (t = OBJ_COMMIT; t <= OBJ_TAG; t++) {
+		if (t == object_type)
+			continue;
+		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, t);
+	}
+}
+
 static int filter_bitmap(struct bitmap_index *bitmap_git,
 			 struct object_list *tip_objects,
 			 struct bitmap *to_filter,
@@ -943,6 +957,14 @@ static int filter_bitmap(struct bitmap_index *bitmap_git,
 		return 0;
 	}
 
+	if (filter->choice == LOFC_OBJECT_TYPE) {
+		if (bitmap_git)
+			filter_bitmap_object_type(bitmap_git, tip_objects,
+						  to_filter,
+						  filter->object_type);
+		return 0;
+	}
+
 	/* filter choice not handled */
 	return -1;
 }
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
index 3f889949ca..fb66735ac8 100755
--- a/t/t6113-rev-list-bitmap-filters.sh
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -10,7 +10,8 @@ test_expect_success 'set up bitmapped repo' '
 	test_commit much-larger-blob-one &&
 	git repack -adb &&
 	test_commit two &&
-	test_commit much-larger-blob-two
+	test_commit much-larger-blob-two &&
+	git tag tag
 '
 
 test_expect_success 'filters fallback to non-bitmap traversal' '
@@ -75,4 +76,26 @@ test_expect_success 'tree:1 filter' '
 	test_cmp expect actual
 '
 
+test_expect_success 'object:type filter' '
+	git rev-list --objects --filter=object:type=tag tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=tag tag >actual &&
+	test_cmp expect actual &&
+
+	git rev-list --objects --filter=object:type=commit tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=commit tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter=object:type=tree tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=tree tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter=object:type=blob tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=blob tag >actual &&
+	test_bitmap_traversal expect actual
+'
+
 test_done
-- 
2.30.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 6/7] pack-bitmap: implement combined filter
  2021-03-01 12:20 [PATCH 0/7] rev-parse: implement object type filter Patrick Steinhardt
                   ` (4 preceding siblings ...)
  2021-03-01 12:20 ` [PATCH 5/7] pack-bitmap: " Patrick Steinhardt
@ 2021-03-01 12:20 ` Patrick Steinhardt
  2021-03-01 12:21 ` [PATCH 7/7] rev-list: allow filtering of provided items Patrick Steinhardt
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-01 12:20 UTC (permalink / raw)
  To: git; +Cc: Christian Couder

[-- Attachment #1: Type: text/plain, Size: 3204 bytes --]

When the user has multiple objects filters specified, then this is
internally represented by having a "combined" filter. These combined
filters aren't yet supported by bitmap indices and can thus not be
accelerated.

Fix this by implementing support for these combined filters. The
implementation is quite trivial: when there's a combined filter, we
simply recurse into `filter_bitmap()` for all of the sub-filters.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 pack-bitmap.c                      | 40 +++++++++++++++++++++++++++---
 t/t6113-rev-list-bitmap-filters.sh |  7 ++++++
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 196d38c91d..e33805e076 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -925,6 +925,29 @@ static void filter_bitmap_object_type(struct bitmap_index *bitmap_git,
 	}
 }
 
+static int filter_supported(struct list_objects_filter_options *filter)
+{
+	int i;
+
+	switch (filter->choice) {
+	case LOFC_BLOB_NONE:
+	case LOFC_BLOB_LIMIT:
+	case LOFC_OBJECT_TYPE:
+		return 1;
+	case LOFC_TREE_DEPTH:
+		if (filter->tree_exclude_depth == 0)
+			return 1;
+		return 0;
+	case LOFC_COMBINE:
+		for (i = 0; i < filter->sub_nr; i++)
+			if (!filter_supported(&filter->sub[i]))
+				return 0;
+		return 1;
+	default:
+		return 0;
+	}
+}
+
 static int filter_bitmap(struct bitmap_index *bitmap_git,
 			 struct object_list *tip_objects,
 			 struct bitmap *to_filter,
@@ -932,6 +955,8 @@ static int filter_bitmap(struct bitmap_index *bitmap_git,
 {
 	if (!filter || filter->choice == LOFC_DISABLED)
 		return 0;
+	if (!filter_supported(filter))
+		return -1;
 
 	if (filter->choice == LOFC_BLOB_NONE) {
 		if (bitmap_git)
@@ -948,8 +973,7 @@ static int filter_bitmap(struct bitmap_index *bitmap_git,
 		return 0;
 	}
 
-	if (filter->choice == LOFC_TREE_DEPTH &&
-	    filter->tree_exclude_depth == 0) {
+	if (filter->choice == LOFC_TREE_DEPTH) {
 		if (bitmap_git)
 			filter_bitmap_tree_depth(bitmap_git, tip_objects,
 						 to_filter,
@@ -965,8 +989,16 @@ static int filter_bitmap(struct bitmap_index *bitmap_git,
 		return 0;
 	}
 
-	/* filter choice not handled */
-	return -1;
+	if (filter->choice == LOFC_COMBINE) {
+		int i;
+		for (i = 0; i < filter->sub_nr; i++) {
+			filter_bitmap(bitmap_git, tip_objects, to_filter,
+				      &filter->sub[i]);
+		}
+		return 0;
+	}
+
+	BUG("unsupported filter choice");
 }
 
 static int can_filter_bitmap(struct list_objects_filter_options *filter)
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
index fb66735ac8..cb9db7df6f 100755
--- a/t/t6113-rev-list-bitmap-filters.sh
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -98,4 +98,11 @@ test_expect_success 'object:type filter' '
 	test_bitmap_traversal expect actual
 '
 
+test_expect_success 'combine filter' '
+	git rev-list --objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
+	test_bitmap_traversal expect actual
+'
+
 test_done
-- 
2.30.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH 7/7] rev-list: allow filtering of provided items
  2021-03-01 12:20 [PATCH 0/7] rev-parse: implement object type filter Patrick Steinhardt
                   ` (5 preceding siblings ...)
  2021-03-01 12:20 ` [PATCH 6/7] pack-bitmap: implement combined filter Patrick Steinhardt
@ 2021-03-01 12:21 ` Patrick Steinhardt
  2021-03-10 21:39 ` [PATCH 0/7] rev-parse: implement object type filter Jeff King
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-01 12:21 UTC (permalink / raw)
  To: git; +Cc: Christian Couder

[-- Attachment #1: Type: text/plain, Size: 6386 bytes --]

When providing an object filter, it is currently impossible to also
filter provided items. E.g. when executing `git rev-list HEAD` , the
commit this reference points to will be treated as user-provided and is
thus excluded from the filtering mechanism. This makes it harder than
necessary to properly use the new `--filter=object:type` filter given
that even if the user wants to only see blobs, he'll still see commits
of provided references.

Improve this by introducing a new `--filter-provided` option to the
git-rev-parse(1) command. If given, then all user-provided references
will be subject to filtering.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/rev-list.c                  | 14 ++++++++++++++
 list-objects-filter-options.h       |  6 ++++++
 pack-bitmap.c                       |  3 ++-
 t/t6112-rev-list-filters-objects.sh | 28 ++++++++++++++++++++++++++++
 t/t6113-rev-list-bitmap-filters.sh  | 22 ++++++++++++++++++++++
 5 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index b4d8ea0a35..0f959b266d 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -599,6 +599,10 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			list_objects_filter_set_no_filter(&filter_options);
 			continue;
 		}
+		if (!strcmp(arg, "--filter-provided")) {
+			filter_options.filter_wants = 1;
+			continue;
+		}
 		if (!strcmp(arg, "--filter-print-omitted")) {
 			arg_print_omitted = 1;
 			continue;
@@ -694,6 +698,16 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			return show_bisect_vars(&info, reaches, all);
 	}
 
+	if (filter_options.filter_wants) {
+		struct commit_list *c;
+		for (i = 0; i < revs.pending.nr; i++) {
+			struct object_array_entry *pending = revs.pending.objects + i;
+			pending->item->flags |= NOT_USER_GIVEN;
+		}
+		for (c = revs.commits; c; c = c->next)
+			c->item->object.flags |= NOT_USER_GIVEN;
+	}
+
 	if (arg_print_omitted)
 		oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
 	if (arg_missing_action == MA_PRINT)
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 4d0d0588cc..5e609e307a 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -42,6 +42,12 @@ struct list_objects_filter_options {
 	 */
 	enum list_objects_filter_choice choice;
 
+	/*
+	 * "--filter-provided" was given by the user, instructing us to also
+	 * filter all explicitly provided objects.
+	 */
+	unsigned int filter_wants : 1;
+
 	/*
 	 * Choice is LOFC_DISABLED because "--no-filter" was requested.
 	 */
diff --git a/pack-bitmap.c b/pack-bitmap.c
index e33805e076..5ff800316b 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1101,7 +1101,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	if (haves_bitmap)
 		bitmap_and_not(wants_bitmap, haves_bitmap);
 
-	filter_bitmap(bitmap_git, wants, wants_bitmap, filter);
+	filter_bitmap(bitmap_git, (filter && filter->filter_wants) ? NULL : wants,
+		      wants_bitmap, filter);
 
 	bitmap_git->result = wants_bitmap;
 	bitmap_git->haves = haves_bitmap;
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index c79ec04060..47c558ab0e 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -207,6 +207,34 @@ test_expect_success 'verify object:type=tag prints tag' '
 	test_cmp expected actual
 '
 
+test_expect_success 'verify object:type=blob prints only blob with --filter-provided' '
+	printf "%s blob\n" $(git -C object-type rev-parse HEAD:blob) >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=blob --filter-provided HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tree prints only tree with --filter-provided' '
+	printf "%s \n" $(git -C object-type rev-parse HEAD^{tree}) >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=tree HEAD --filter-provided >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=commit prints only commit with --filter-provided' '
+	git -C object-type rev-parse HEAD >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=commit --filter-provided HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tag prints only tag with --filter-provided' '
+	printf "%s tag\n" $(git -C object-type rev-parse tag) >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=tag --filter-provided tag >actual &&
+	test_cmp expected actual
+'
+
 # Test sparse:path=<path> filter.
 # !!!!
 # NOTE: sparse:path filter support has been dropped for security reasons,
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
index cb9db7df6f..fe3df0ee14 100755
--- a/t/t6113-rev-list-bitmap-filters.sh
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -98,6 +98,28 @@ test_expect_success 'object:type filter' '
 	test_bitmap_traversal expect actual
 '
 
+test_expect_success 'object:type filter with --filter-provided' '
+	git rev-list --objects --filter=object:type=tag --filter-provided tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=tag --filter-provided tag >actual &&
+	test_cmp expect actual &&
+
+	git rev-list --objects --filter=object:type=commit --filter-provided tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=commit --filter-provided tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter=object:type=tree --filter-provided tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=tree --filter-provided tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter=object:type=blob --filter-provided tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=blob --filter-provided tag >actual &&
+	test_bitmap_traversal expect actual
+'
+
 test_expect_success 'combine filter' '
 	git rev-list --objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
 	git rev-list --use-bitmap-index \
-- 
2.30.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/7] rev-parse: implement object type filter
  2021-03-01 12:20 [PATCH 0/7] rev-parse: implement object type filter Patrick Steinhardt
                   ` (6 preceding siblings ...)
  2021-03-01 12:21 ` [PATCH 7/7] rev-list: allow filtering of provided items Patrick Steinhardt
@ 2021-03-10 21:39 ` Jeff King
  2021-03-11 14:38   ` Patrick Steinhardt
  2021-03-15 11:25   ` Patrick Steinhardt
  2021-03-10 21:58 ` Taylor Blau
  2021-03-15 13:14 ` [PATCH v2 0/8] " Patrick Steinhardt
  9 siblings, 2 replies; 67+ messages in thread
From: Jeff King @ 2021-03-10 21:39 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder

On Mon, Mar 01, 2021 at 01:20:26PM +0100, Patrick Steinhardt wrote:

> Altogether, this ends up with the following queries, both of which have
> been executed in a well-packed linux.git repository:
> 
>     # Previous query which uses object names as a heuristic to filter
>     # non-blob objects, which bars us from using bitmap indices because
>     # they cannot print paths.
>     $ time git rev-list --objects --filter=blob:limit=200 \
>         --object-names --all | sed -r '/^.{,41}$/d' | wc -l
>     4502300
> 
>     real 1m23.872s
>     user 1m30.076s
>     sys  0m6.002s
> 
>     # New query.
>     $ time git rev-list --objects --filter-provided \
>         --filter=object:type=blob --filter=blob:limit=200 \
>         --use-bitmap-index --all | wc -l
>     22585
> 
>     real 0m19.216s
>     user 0m16.768s
>     sys  0m2.450s

Those produce very different answers. I guess because in the first one,
you still have a bunch of tree objects, too. You'd do much better to get
the actual types from cat-file, and filter on that. That also lets you
use bitmaps for the traversal portion. E.g.:

  $ time git rev-list --use-bitmap-index --objects --filter=blob:limit=200 --all |
         git cat-file --buffer --batch-check='%(objecttype) %(objectname)' |
	 perl -lne 'print $1 if /^blob (.*)/' | wc -l
  14966
  
  real	0m6.248s
  user	0m7.810s
  sys	0m0.440s

which is faster than what you showed above (this is on linux.git, but my
result is different; maybe you have more refs than me?). But we should
be able to do better purely internally, so I suspect my computer is just
faster (or maybe your extra refs just aren't well-covered by bitmaps).
Running with your patches I get:

  $ time git rev-list --objects --use-bitmap-index --all \
             --filter-provided --filter=object:type=blob \
	     --filter=blob:limit=200 | wc -l
  16339

  real	0m1.309s
  user	0m1.234s
  sys	0m0.079s

which is indeed faster. It's quite curious that the answer is not the
same, though! I think yours has some bugs. If I sort and diff the
results, I see some commits mentioned in the output. Perhaps this is
--filter-provided not working, as they all seem to be ref tips.

> To be able to more efficiently answer this query, I've implemented
> multiple things:
> 
> - A new object type filter `--filter=object:type=<type>` for
>   git-rev-list(1), which is implemented both for normal graph walks and
>   for the packfile bitmap index.
> 
> - Given that above usecase requires two filters (the object type
>   and blob size filters), bitmap filters were extended to support
>   combined filters.

That's probably reasonable, especially because it lets us use bitmaps. I
do have a dream that we'll eventually be able to support more extensive
formatting via log/rev-list, which would allow:

  git rev-list --use-bitmap-index --objects --all \
               --format=%(objecttype) %(objectname) |
  perl -ne 'print $1 if /^blob (.*)/'

That should be faster than the separate cat-file (which has to re-lookup
each object, in addition to the extra pipe overhead), but I expect the
--filter solution should always be faster still, as it can very quickly
eliminate the majority of the objects at the bitmap level.

> - git-rev-list(1) doesn't filter user-provided objects and always prints
>   them. I don't want the listed commits though and only their referenced
>   potential LFS blobs. So I've added a new flag `--filter-provided`
>   which marks all provided objects as not-user-provided such that they
>   get filtered the same as all the other objects.

Yeah, this "user-provided" behavior was quite a surprise to me when I
started implementing the bitmap versions of the existing filters. It's
nice to have the option to specify which you want.

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/7] rev-parse: implement object type filter
  2021-03-01 12:20 [PATCH 0/7] rev-parse: implement object type filter Patrick Steinhardt
                   ` (7 preceding siblings ...)
  2021-03-10 21:39 ` [PATCH 0/7] rev-parse: implement object type filter Jeff King
@ 2021-03-10 21:58 ` Taylor Blau
  2021-03-10 22:19   ` Jeff King
  2021-03-15 13:14 ` [PATCH v2 0/8] " Patrick Steinhardt
  9 siblings, 1 reply; 67+ messages in thread
From: Taylor Blau @ 2021-03-10 21:58 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder

On Mon, Mar 01, 2021 at 01:20:26PM +0100, Patrick Steinhardt wrote:
> - A new object type filter `--filter=object:type=<type>` for
>   git-rev-list(1), which is implemented both for normal graph walks and
>   for the packfile bitmap index.

I understand what you're looking for here, but I worry that '--filter'
might be too leaky of an abstraction.

I was a little surprised to learn that you can clone a repository with
--filter=object:type=tree (excluding commits), but it does work. I'm
fine reusing a lot of the object filtering code if it makes this an
easier task, but I think it may be worthwhile to hide this new kind of
filter from upload-pack.

> - Given that above usecase requires two filters (the object type
>   and blob size filters), bitmap filters were extended to support
>   combined filters.

Nice. We didn't do this since the only previously supported filters were
blob:none and tree:0 (the latter implying the former), so there was no
need.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/7] rev-parse: implement object type filter
  2021-03-10 21:58 ` Taylor Blau
@ 2021-03-10 22:19   ` Jeff King
  2021-03-11 14:43     ` Patrick Steinhardt
  0 siblings, 1 reply; 67+ messages in thread
From: Jeff King @ 2021-03-10 22:19 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Patrick Steinhardt, git, Christian Couder

On Wed, Mar 10, 2021 at 04:58:16PM -0500, Taylor Blau wrote:

> On Mon, Mar 01, 2021 at 01:20:26PM +0100, Patrick Steinhardt wrote:
> > - A new object type filter `--filter=object:type=<type>` for
> >   git-rev-list(1), which is implemented both for normal graph walks and
> >   for the packfile bitmap index.
> 
> I understand what you're looking for here, but I worry that '--filter'
> might be too leaky of an abstraction.
> 
> I was a little surprised to learn that you can clone a repository with
> --filter=object:type=tree (excluding commits), but it does work. I'm
> fine reusing a lot of the object filtering code if it makes this an
> easier task, but I think it may be worthwhile to hide this new kind of
> filter from upload-pack.

I had a similar thought, but wouldn't the existing uploadpackfilter
config take care of this?

I guess the catch-all "allow" option defaults to "true", so we'd support
any new filters that are added. Which seems like a poor choice in
general, but flipping it would mean that servers have to update their
config.

I do wonder if it's that bad for clients to be able to specify something
like this, though. Even though there's not that much use for it with a
regular partial clone, it could conceivably used for some special cases.
I do think it would be more useful if you could OR together multiple
types. Asking for "commits|tags|trees" is really the same as the already
useful "blob:none". And "commits|tags" is the same as tree:depth=0.

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/7] rev-parse: implement object type filter
  2021-03-10 21:39 ` [PATCH 0/7] rev-parse: implement object type filter Jeff King
@ 2021-03-11 14:38   ` Patrick Steinhardt
  2021-03-11 17:54     ` Jeff King
  2021-03-15 11:25   ` Patrick Steinhardt
  1 sibling, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-11 14:38 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Christian Couder

On Wed, Mar 10, 2021 at 04:39:22PM -0500, Jeff King wrote:
> On Mon, Mar 01, 2021 at 01:20:26PM +0100, Patrick Steinhardt wrote:
> 
> > Altogether, this ends up with the following queries, both of which have
> > been executed in a well-packed linux.git repository:
> > 
> >     # Previous query which uses object names as a heuristic to filter
> >     # non-blob objects, which bars us from using bitmap indices because
> >     # they cannot print paths.
> >     $ time git rev-list --objects --filter=blob:limit=200 \
> >         --object-names --all | sed -r '/^.{,41}$/d' | wc -l
> >     4502300
> > 
> >     real 1m23.872s
> >     user 1m30.076s
> >     sys  0m6.002s
> > 
> >     # New query.
> >     $ time git rev-list --objects --filter-provided \
> >         --filter=object:type=blob --filter=blob:limit=200 \
> >         --use-bitmap-index --all | wc -l
> >     22585
> > 
> >     real 0m19.216s
> >     user 0m16.768s
> >     sys  0m2.450s
> 
> Those produce very different answers. I guess because in the first one,
> you still have a bunch of tree objects, too. You'd do much better to get
> the actual types from cat-file, and filter on that. That also lets you
> use bitmaps for the traversal portion. E.g.:

They do provide different answers, and you're right that `--batch-check`
would have helped to filter by type. Your idea doesn't really work in my
usecase though to identify LFS pointers, at least not without additional
tooling on top of what you've provided. There'd at least need to be two
git-cat-file(1) processes: one to do the `--batch-check` thing to
actually filter by object type, and one to then read the actual LFS
pointer candidates from disk in order to see whether they are LFS
pointers or not.

Actually, we currently are doing something similar to that at GitLab: we
list all potential candidates via git-rev-list(1), write the output into
`git-cat-file --batch-check`, and anything that is a blob then gets
forwarded into `git-cat-file --batch`.

>   $ time git rev-list --use-bitmap-index --objects --filter=blob:limit=200 --all |
>          git cat-file --buffer --batch-check='%(objecttype) %(objectname)' |
> 	 perl -lne 'print $1 if /^blob (.*)/' | wc -l
>   14966
>   
>   real	0m6.248s
>   user	0m7.810s
>   sys	0m0.440s
> 
> which is faster than what you showed above (this is on linux.git, but my
> result is different; maybe you have more refs than me?). But we should
> be able to do better purely internally, so I suspect my computer is just
> faster (or maybe your extra refs just aren't well-covered by bitmaps).
> Running with your patches I get:

I've got quite a beefy machine with a Ryzen 3 5800X, and I did do a `git
repack -Adfb` right before doig benchmarks. I do have the stable kernel
repository added though, which accounts for quite a lot of additional
references (3938) and objects (9.3M).

>   $ time git rev-list --objects --use-bitmap-index --all \
>              --filter-provided --filter=object:type=blob \
> 	     --filter=blob:limit=200 | wc -l
>   16339
> 
>   real	0m1.309s
>   user	0m1.234s
>   sys	0m0.079s
> 
> which is indeed faster. It's quite curious that the answer is not the
> same, though! I think yours has some bugs. If I sort and diff the
> results, I see some commits mentioned in the output. Perhaps this is
> --filter-provided not working, as they all seem to be ref tips.

I noticed it, too, and couldn't yet find an answer why that is.
Honestly, I found the NOT_USER_GIVEN flag quite confusing and I'm not at
all sure whether I've got all cases covered correctly. The previous was
how this was handled (`USER_GIVEN` instead of `NOT_USER_GIVEN`) would've
been easier to figure out for this specific usecase. But I guess it was
converted due to specific reasons.

I'll invest some more time to figure out what's happening here.

> > To be able to more efficiently answer this query, I've implemented
> > multiple things:
> > 
> > - A new object type filter `--filter=object:type=<type>` for
> >   git-rev-list(1), which is implemented both for normal graph walks and
> >   for the packfile bitmap index.
> > 
> > - Given that above usecase requires two filters (the object type
> >   and blob size filters), bitmap filters were extended to support
> >   combined filters.
> 
> That's probably reasonable, especially because it lets us use bitmaps. I
> do have a dream that we'll eventually be able to support more extensive
> formatting via log/rev-list, which would allow:
> 
>   git rev-list --use-bitmap-index --objects --all \
>                --format=%(objecttype) %(objectname) |
>   perl -ne 'print $1 if /^blob (.*)/'
> 
> That should be faster than the separate cat-file (which has to re-lookup
> each object, in addition to the extra pipe overhead), but I expect the
> --filter solution should always be faster still, as it can very quickly
> eliminate the majority of the objects at the bitmap level.

That'd be nice, even though it wouldn't help in my particular usecase: I
need to read each candidate blob to see whether it's an LFS pointer or
not anyway.

> > - git-rev-list(1) doesn't filter user-provided objects and always prints
> >   them. I don't want the listed commits though and only their referenced
> >   potential LFS blobs. So I've added a new flag `--filter-provided`
> >   which marks all provided objects as not-user-provided such that they
> >   get filtered the same as all the other objects.
> 
> Yeah, this "user-provided" behavior was quite a surprise to me when I
> started implementing the bitmap versions of the existing filters. It's
> nice to have the option to specify which you want.
> 
> -Peff

Patrick

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/7] rev-parse: implement object type filter
  2021-03-10 22:19   ` Jeff King
@ 2021-03-11 14:43     ` Patrick Steinhardt
  2021-03-11 17:56       ` Jeff King
  0 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-11 14:43 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, Christian Couder

On Wed, Mar 10, 2021 at 05:19:44PM -0500, Jeff King wrote:
> On Wed, Mar 10, 2021 at 04:58:16PM -0500, Taylor Blau wrote:
> 
> > On Mon, Mar 01, 2021 at 01:20:26PM +0100, Patrick Steinhardt wrote:
> > > - A new object type filter `--filter=object:type=<type>` for
> > >   git-rev-list(1), which is implemented both for normal graph walks and
> > >   for the packfile bitmap index.
> > 
> > I understand what you're looking for here, but I worry that '--filter'
> > might be too leaky of an abstraction.
> > 
> > I was a little surprised to learn that you can clone a repository with
> > --filter=object:type=tree (excluding commits), but it does work. I'm
> > fine reusing a lot of the object filtering code if it makes this an
> > easier task, but I think it may be worthwhile to hide this new kind of
> > filter from upload-pack.
> 
> I had a similar thought, but wouldn't the existing uploadpackfilter
> config take care of this?
> 
> I guess the catch-all "allow" option defaults to "true", so we'd support
> any new filters that are added. Which seems like a poor choice in
> general, but flipping it would mean that servers have to update their
> config.
> 
> I do wonder if it's that bad for clients to be able to specify something
> like this, though. Even though there's not that much use for it with a
> regular partial clone, it could conceivably used for some special cases.
> I do think it would be more useful if you could OR together multiple
> types. Asking for "commits|tags|trees" is really the same as the already
> useful "blob:none". And "commits|tags" is the same as tree:depth=0.

I did waste a few thoughts on how this should be handled. I see two ways
of doing it:

    - We could just implement the new `object:type` filter such that it
      directly supports OR'ing. That's the easy way to do it, but it's
      inflexible.

    - We could extend combined filters to support OR-semantics in
      addition to the current AND-semantics. In the end, that'd be a
      much more flexible approach and potentially allow additional
      usecases.

I lean more towards the latter as it feels like the better design. But
it's more involved, and I'm not sure I want to do it as part of this
patch series.

Patrick

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/7] rev-parse: implement object type filter
  2021-03-11 14:38   ` Patrick Steinhardt
@ 2021-03-11 17:54     ` Jeff King
  0 siblings, 0 replies; 67+ messages in thread
From: Jeff King @ 2021-03-11 17:54 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder

On Thu, Mar 11, 2021 at 03:38:11PM +0100, Patrick Steinhardt wrote:

> > Those produce very different answers. I guess because in the first one,
> > you still have a bunch of tree objects, too. You'd do much better to get
> > the actual types from cat-file, and filter on that. That also lets you
> > use bitmaps for the traversal portion. E.g.:
> 
> They do provide different answers, and you're right that `--batch-check`
> would have helped to filter by type. Your idea doesn't really work in my
> usecase though to identify LFS pointers, at least not without additional
> tooling on top of what you've provided. There'd at least need to be two
> git-cat-file(1) processes: one to do the `--batch-check` thing to
> actually filter by object type, and one to then read the actual LFS
> pointer candidates from disk in order to see whether they are LFS
> pointers or not.
> 
> Actually, we currently are doing something similar to that at GitLab: we
> list all potential candidates via git-rev-list(1), write the output into
> `git-cat-file --batch-check`, and anything that is a blob then gets
> forwarded into `git-cat-file --batch`.

You'd need that final cat-file with your patch, too, though. So I think
it makes sense to think about "generate the list of blobs" as the
primary action.

You can of course do the type and content dump as a single cat-file, but
in my experience that is much slower (because we waste time dumping
object content that the caller ultimately won't care about).

Thinking in the opposite direction, if we are filtering by type via
cat-file, we could do the size filter there, too. So:

  git rev-list --use-bitmap-index --objects --all |
  git cat-file --batch-check='%(objecttype) %(objectsize) %(objectname)' |
  perl -lne 'print $2 if /^blob (\d+) (.*)/ && $1 < 200'

which produces the same answer as my earlier:

> >   $ time git rev-list --use-bitmap-index --objects --filter=blob:limit=200 --all |
> >          git cat-file --buffer --batch-check='%(objecttype) %(objectname)' |
> > 	 perl -lne 'print $1 if /^blob (.*)/' | wc -l

but takes about twice as long. Which is really just a roundabout way of
saying that yes, shoving things into "rev-list" can provide substantial
speedups. :)

> > which is faster than what you showed above (this is on linux.git, but my
> > result is different; maybe you have more refs than me?). But we should
> > be able to do better purely internally, so I suspect my computer is just
> > faster (or maybe your extra refs just aren't well-covered by bitmaps).
> > Running with your patches I get:
> 
> I've got quite a beefy machine with a Ryzen 3 5800X, and I did do a `git
> repack -Adfb` right before doig benchmarks. I do have the stable kernel
> repository added though, which accounts for quite a lot of additional
> references (3938) and objects (9.3M).

Yeah, I wondered if it was something like that. Mine is just
torvalds/linux.git. Fetching stable/linux.git from kernel.org, running
"git repack -adb" on the result, and then repeating my timings gets me
numbers close to yours.

> > which is indeed faster. It's quite curious that the answer is not the
> > same, though! I think yours has some bugs. If I sort and diff the
> > results, I see some commits mentioned in the output. Perhaps this is
> > --filter-provided not working, as they all seem to be ref tips.
> 
> I noticed it, too, and couldn't yet find an answer why that is.
> Honestly, I found the NOT_USER_GIVEN flag quite confusing and I'm not at
> all sure whether I've got all cases covered correctly. The previous was
> how this was handled (`USER_GIVEN` instead of `NOT_USER_GIVEN`) would've
> been easier to figure out for this specific usecase. But I guess it was
> converted due to specific reasons.
> 
> I'll invest some more time to figure out what's happening here.

Thanks. I also scratched my head at NOT_USER_GIVEN. I haven't looked at
this part of the filter code very much, but it seems like that is a
recipe for accidentally marking a commit as NOT_USER_GIVEN if we
traverse to it (even if it was originally _also_ given by the user).

-Peff

> > That's probably reasonable, especially because it lets us use bitmaps. I
> > do have a dream that we'll eventually be able to support more extensive
> > formatting via log/rev-list, which would allow:
> > 
> >   git rev-list --use-bitmap-index --objects --all \
> >                --format=%(objecttype) %(objectname) |
> >   perl -ne 'print $1 if /^blob (.*)/'
> > 
> > That should be faster than the separate cat-file (which has to re-lookup
> > each object, in addition to the extra pipe overhead), but I expect the
> > --filter solution should always be faster still, as it can very quickly
> > eliminate the majority of the objects at the bitmap level.
> 
> That'd be nice, even though it wouldn't help in my particular usecase: I
> need to read each candidate blob to see whether it's an LFS pointer or
> not anyway.

I think it works out roughly the same as the --filter solution, in the
sense that both generate a list of candidate blobs that you'd read with
"cat-file --batch" (but of course it's still slower).

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/7] rev-parse: implement object type filter
  2021-03-11 14:43     ` Patrick Steinhardt
@ 2021-03-11 17:56       ` Jeff King
  0 siblings, 0 replies; 67+ messages in thread
From: Jeff King @ 2021-03-11 17:56 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Taylor Blau, git, Christian Couder

On Thu, Mar 11, 2021 at 03:43:39PM +0100, Patrick Steinhardt wrote:

> > I do wonder if it's that bad for clients to be able to specify something
> > like this, though. Even though there's not that much use for it with a
> > regular partial clone, it could conceivably used for some special cases.
> > I do think it would be more useful if you could OR together multiple
> > types. Asking for "commits|tags|trees" is really the same as the already
> > useful "blob:none". And "commits|tags" is the same as tree:depth=0.
> 
> I did waste a few thoughts on how this should be handled. I see two ways
> of doing it:
> 
>     - We could just implement the new `object:type` filter such that it
>       directly supports OR'ing. That's the easy way to do it, but it's
>       inflexible.
> 
>     - We could extend combined filters to support OR-semantics in
>       addition to the current AND-semantics. In the end, that'd be a
>       much more flexible approach and potentially allow additional
>       usecases.
> 
> I lean more towards the latter as it feels like the better design. But
> it's more involved, and I'm not sure I want to do it as part of this
> patch series.

Yeah, I don't think that needs to be part of this series. The only thing
to consider for this series is whether it's a problem for clients to be
able to ask for type=blob from a server which has blindly turned on
uploadpack.allowFilter without restricting the types.

My gut is to say yes. Even if we don't have a particular use, I don't
think it hurts (and in general, I think people running public servers
with bitmaps really ought to set uploadpackfilter.allow=false anyway,
because stuff like non-zero tree-depth filters are expensive).

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH 0/7] rev-parse: implement object type filter
  2021-03-10 21:39 ` [PATCH 0/7] rev-parse: implement object type filter Jeff King
  2021-03-11 14:38   ` Patrick Steinhardt
@ 2021-03-15 11:25   ` Patrick Steinhardt
  1 sibling, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-15 11:25 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Christian Couder

On Wed, Mar 10, 2021 at 04:39:22PM -0500, Jeff King wrote:
> On Mon, Mar 01, 2021 at 01:20:26PM +0100, Patrick Steinhardt wrote:
> 
> > Altogether, this ends up with the following queries, both of which have
> > been executed in a well-packed linux.git repository:
> > 
> >     # Previous query which uses object names as a heuristic to filter
> >     # non-blob objects, which bars us from using bitmap indices because
> >     # they cannot print paths.
> >     $ time git rev-list --objects --filter=blob:limit=200 \
> >         --object-names --all | sed -r '/^.{,41}$/d' | wc -l
> >     4502300
> > 
> >     real 1m23.872s
> >     user 1m30.076s
> >     sys  0m6.002s
> > 
> >     # New query.
> >     $ time git rev-list --objects --filter-provided \
> >         --filter=object:type=blob --filter=blob:limit=200 \
> >         --use-bitmap-index --all | wc -l
> >     22585
> > 
> >     real 0m19.216s
> >     user 0m16.768s
> >     sys  0m2.450s
> 
> Those produce very different answers. I guess because in the first one,
> you still have a bunch of tree objects, too. You'd do much better to get
> the actual types from cat-file, and filter on that. That also lets you
> use bitmaps for the traversal portion. E.g.:
> 
>   $ time git rev-list --use-bitmap-index --objects --filter=blob:limit=200 --all |
>          git cat-file --buffer --batch-check='%(objecttype) %(objectname)' |
> 	 perl -lne 'print $1 if /^blob (.*)/' | wc -l
>   14966
>   
>   real	0m6.248s
>   user	0m7.810s
>   sys	0m0.440s
> 
> which is faster than what you showed above (this is on linux.git, but my
> result is different; maybe you have more refs than me?). But we should
> be able to do better purely internally, so I suspect my computer is just
> faster (or maybe your extra refs just aren't well-covered by bitmaps).
> Running with your patches I get:
> 
>   $ time git rev-list --objects --use-bitmap-index --all \
>              --filter-provided --filter=object:type=blob \
> 	     --filter=blob:limit=200 | wc -l
>   16339
> 
>   real	0m1.309s
>   user	0m1.234s
>   sys	0m0.079s
> 
> which is indeed faster. It's quite curious that the answer is not the
> same, though! I think yours has some bugs. If I sort and diff the
> results, I see some commits mentioned in the output. Perhaps this is
> --filter-provided not working, as they all seem to be ref tips.
[snip]

I've found the issue: when converting filters to a combined filter via
`transform_to_combine_type()`, we reset the top-level filter via a call
to `memset()`. So for combined filters, the option wouldn't have taken
any effect because it got reset iff the `--filter-provided` option comes
before the second filter.

Patrick

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2 0/8] rev-parse: implement object type filter
  2021-03-01 12:20 [PATCH 0/7] rev-parse: implement object type filter Patrick Steinhardt
                   ` (8 preceding siblings ...)
  2021-03-10 21:58 ` Taylor Blau
@ 2021-03-15 13:14 ` Patrick Steinhardt
  2021-03-15 13:14   ` [PATCH v2 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
                     ` (9 more replies)
  9 siblings, 10 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-15 13:14 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 8412 bytes --]

Hi,

this is the second version of my patch series which implements a new
`object:type` filter for git-rev-parse(1) and git-upload-pack(1) and
extends support for bitmap indices to work with combined filters.

Changes compared to v1:

    - I've added a patch up front which changes the uploadpack
      documentation to explicitly document that setting
      `uploadpackfilter.allow=true` will enable all future filters. I'm
      not yet saying that this is the correct thing to do, but rather
      added this patch such that we have a proper place to discuss this
      topic. In the context of object-type filters, I do think though
      that it's not an issue to default-enable type filters: they're not
      expensive to compute anyway.

    - `uploadpackfilter.<filter>.allow` documentation was updated to
      mention the new filter.

    - A bug was fixed which caused us to reset `--filter-allowed` in
      case a normal filter was converted to a combined filter. I've
      added tests to more thoroughly verify that filters work as
      expected and also filter provided objects.

Please see the attached range-diff for more details.

Patrick

Patrick Steinhardt (8):
  uploadpack.txt: document implication of `uploadpackfilter.allow`
  revision: mark commit parents as NOT_USER_GIVEN
  list-objects: move tag processing into its own function
  list-objects: support filtering by tag and commit
  list-objects: implement object type filter
  pack-bitmap: implement object type filter
  pack-bitmap: implement combined filter
  rev-list: allow filtering of provided items

 Documentation/config/uploadpack.txt |   9 ++-
 Documentation/rev-list-options.txt  |   3 +
 builtin/rev-list.c                  |  14 ++++
 list-objects-filter-options.c       |  18 +++++
 list-objects-filter-options.h       |   8 ++
 list-objects-filter.c               | 116 ++++++++++++++++++++++++++++
 list-objects-filter.h               |   2 +
 list-objects.c                      |  32 +++++++-
 pack-bitmap.c                       |  71 +++++++++++++++--
 revision.c                          |   4 +-
 revision.h                          |   3 -
 t/t6112-rev-list-filters-objects.sh |  76 ++++++++++++++++++
 t/t6113-rev-list-bitmap-filters.sh  |  68 +++++++++++++++-
 13 files changed, 403 insertions(+), 21 deletions(-)

Range-diff against v1:
-:  ---------- > 1:  270ff80dac uploadpack.txt: document implication of `uploadpackfilter.allow`
1:  f2ce5dac89 = 2:  ddbec75986 revision: mark commit parents as NOT_USER_GIVEN
2:  9feadba124 = 3:  d8da0b24f4 list-objects: move tag processing into its own function
3:  4aa13ee83f = 4:  5545c189c5 list-objects: support filtering by tag and commit
4:  01b9fdbb9c ! 5:  acf01472af list-objects: implement object type filter
    @@ Commit message
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
    + ## Documentation/config/uploadpack.txt ##
    +@@ Documentation/config/uploadpack.txt: uploadpackfilter.allow::
    + uploadpackfilter.<filter>.allow::
    + 	Explicitly allow or ban the object filter corresponding to
    + 	`<filter>`, where `<filter>` may be one of: `blob:none`,
    +-	`blob:limit`, `tree`, `sparse:oid`, or `combine`. If using
    +-	combined filters, both `combine` and all of the nested filter
    +-	kinds must be allowed. Defaults to `uploadpackfilter.allow`.
    ++	`blob:limit`, `object:type`, `tree`, `sparse:oid`, or `combine`.
    ++	If using combined filters, both `combine` and all of the nested
    ++	filter kinds must be allowed. Defaults to `uploadpackfilter.allow`.
    + 
    + uploadpackfilter.tree.maxDepth::
    + 	Only allow `--filter=tree:<n>` when `<n>` is no more than the value of
    +
      ## Documentation/rev-list-options.txt ##
     @@ Documentation/rev-list-options.txt: or units.  n may be zero.  The suffixes k, m, and g can be used to name
      units in KiB, MiB, or GiB.  For example, 'blob:limit=1k' is the same
5:  c97fd28d8f = 6:  8073ab665b pack-bitmap: implement object type filter
6:  fe2b7a1e55 = 7:  fac3477d97 pack-bitmap: implement combined filter
7:  b43bf401df ! 8:  0e26fee8b3 rev-list: allow filtering of provided items
    @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
      		oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
      	if (arg_missing_action == MA_PRINT)
     
    + ## list-objects-filter-options.c ##
    +@@ list-objects-filter-options.c: static void transform_to_combine_type(
    + 		memset(filter_options, 0, sizeof(*filter_options));
    + 		filter_options->sub = sub_array;
    + 		filter_options->sub_alloc = initial_sub_alloc;
    ++		filter_options->filter_wants = sub_array[0].filter_wants;
    + 	}
    + 	filter_options->sub_nr = 1;
    + 	filter_options->choice = LOFC_COMBINE;
    +@@ list-objects-filter-options.c: void parse_list_objects_filter(
    + 		parse_error = gently_parse_list_objects_filter(
    + 			&filter_options->sub[filter_options->sub_nr - 1], arg,
    + 			&errbuf);
    ++		if (!parse_error)
    ++			filter_options->sub[filter_options->sub_nr - 1].filter_wants =
    ++				filter_options->filter_wants;
    + 	}
    + 	if (parse_error)
    + 		die("%s", errbuf.buf);
    +
      ## list-objects-filter-options.h ##
     @@ list-objects-filter-options.h: struct list_objects_filter_options {
      	 */
    @@ t/t6113-rev-list-bitmap-filters.sh: test_expect_success 'object:type filter' '
      '
      
     +test_expect_success 'object:type filter with --filter-provided' '
    -+	git rev-list --objects --filter=object:type=tag --filter-provided tag >expect &&
    ++	git rev-list --objects --filter-provided --filter=object:type=tag tag >expect &&
     +	git rev-list --use-bitmap-index \
    -+		     --objects --filter=object:type=tag --filter-provided tag >actual &&
    ++		     --objects --filter-provided --filter=object:type=tag tag >actual &&
     +	test_cmp expect actual &&
     +
    -+	git rev-list --objects --filter=object:type=commit --filter-provided tag >expect &&
    ++	git rev-list --objects --filter-provided --filter=object:type=commit tag >expect &&
     +	git rev-list --use-bitmap-index \
    -+		     --objects --filter=object:type=commit --filter-provided tag >actual &&
    ++		     --objects --filter-provided --filter=object:type=commit tag >actual &&
     +	test_bitmap_traversal expect actual &&
     +
    -+	git rev-list --objects --filter=object:type=tree --filter-provided tag >expect &&
    ++	git rev-list --objects --filter-provided --filter=object:type=tree tag >expect &&
     +	git rev-list --use-bitmap-index \
    -+		     --objects --filter=object:type=tree --filter-provided tag >actual &&
    ++		     --objects --filter-provided --filter=object:type=tree tag >actual &&
     +	test_bitmap_traversal expect actual &&
     +
    -+	git rev-list --objects --filter=object:type=blob --filter-provided tag >expect &&
    ++	git rev-list --objects --filter-provided --filter=object:type=blob tag >expect &&
     +	git rev-list --use-bitmap-index \
    -+		     --objects --filter=object:type=blob --filter-provided tag >actual &&
    ++		     --objects --filter-provided --filter=object:type=blob tag >actual &&
     +	test_bitmap_traversal expect actual
     +'
     +
      test_expect_success 'combine filter' '
      	git rev-list --objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
      	git rev-list --use-bitmap-index \
    +@@ t/t6113-rev-list-bitmap-filters.sh: test_expect_success 'combine filter' '
    + 	test_bitmap_traversal expect actual
    + '
    + 
    ++test_expect_success 'combine filter with --filter-provided' '
    ++	git rev-list --objects --filter-provided --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
    ++	git rev-list --use-bitmap-index \
    ++		     --objects --filter-provided --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
    ++	test_bitmap_traversal expect actual &&
    ++
    ++	git cat-file --batch-check="%(objecttype) %(objectsize)" <actual >objects &&
    ++	while read objecttype objectsize
    ++	do
    ++		test "$objecttype" = blob || return 1
    ++		test "$objectsize" -le 1000 || return 1
    ++	done <objects
    ++'
    ++
    + test_done
-- 
2.30.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow`
  2021-03-15 13:14 ` [PATCH v2 0/8] " Patrick Steinhardt
@ 2021-03-15 13:14   ` Patrick Steinhardt
  2021-04-06 17:17     ` Jeff King
  2021-03-15 13:14   ` [PATCH v2 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
                     ` (8 subsequent siblings)
  9 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-15 13:14 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 1317 bytes --]

When `uploadpackfilter.allow` is set to `true`, it means that filters
are enabled by default except in the case where a filter is explicitly
disabled via `uploadpackilter.<filter>.allow`. This option will not only
enable the currently supported set of filters, but also any filters
which get added in the future. As such, an admin which wants to have
tight control over which filters are allowed and which aren't probably
shouldn't ever set `uploadpackfilter.allow=true`.

Amend the documentation to make the ramifications more explicit so that
admins are aware of this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/uploadpack.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/uploadpack.txt b/Documentation/config/uploadpack.txt
index b0d761282c..6729a072ea 100644
--- a/Documentation/config/uploadpack.txt
+++ b/Documentation/config/uploadpack.txt
@@ -59,7 +59,8 @@ uploadpack.allowFilter::
 
 uploadpackfilter.allow::
 	Provides a default value for unspecified object filters (see: the
-	below configuration variable).
+	below configuration variable). If set to `true`, this will also
+	enable all filters which get added in the future.
 	Defaults to `true`.
 
 uploadpackfilter.<filter>.allow::
-- 
2.30.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2 2/8] revision: mark commit parents as NOT_USER_GIVEN
  2021-03-15 13:14 ` [PATCH v2 0/8] " Patrick Steinhardt
  2021-03-15 13:14   ` [PATCH v2 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
@ 2021-03-15 13:14   ` Patrick Steinhardt
  2021-04-06 17:30     ` Jeff King
  2021-03-15 13:14   ` [PATCH v2 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
                     ` (7 subsequent siblings)
  9 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-15 13:14 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 2338 bytes --]

The NOT_USER_GIVEN flag of an object marks whether a flag was explicitly
provided by the user or not. The most important use case for this is
when filtering objects: only objects that were not explicitly requested
will get filtered.

The flag is currently only set for blobs and trees, which has been fine
given that there are no filters for tags or commits currently. We're
about to extend filtering capabilities to add object type filter though,
which requires us to set up the NOT_USER_GIVEN flag correctly -- if it's
not set, the object wouldn't get filtered at all.

Mark unseen commit parents as NOT_USER_GIVEN when processing parents.
Like this, explicitly provided parents stay user-given and thus
unfiltered, while parents which get loaded as part of the graph walk
can be filtered.

This commit shouldn't have any user-visible impact yet as there is no
logic to filter commits yet.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 revision.c | 4 ++--
 revision.h | 3 ---
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/revision.c b/revision.c
index b78733f508..26f422f50d 100644
--- a/revision.c
+++ b/revision.c
@@ -1123,7 +1123,7 @@ static int process_parents(struct rev_info *revs, struct commit *commit,
 				mark_parents_uninteresting(p);
 			if (p->object.flags & SEEN)
 				continue;
-			p->object.flags |= SEEN;
+			p->object.flags |= (SEEN | NOT_USER_GIVEN);
 			if (list)
 				commit_list_insert_by_date(p, list);
 			if (queue)
@@ -1165,7 +1165,7 @@ static int process_parents(struct rev_info *revs, struct commit *commit,
 		}
 		p->object.flags |= left_flag;
 		if (!(p->object.flags & SEEN)) {
-			p->object.flags |= SEEN;
+			p->object.flags |= (SEEN | NOT_USER_GIVEN);
 			if (list)
 				commit_list_insert_by_date(p, list);
 			if (queue)
diff --git a/revision.h b/revision.h
index e6be3c845e..f1f324a19b 100644
--- a/revision.h
+++ b/revision.h
@@ -44,9 +44,6 @@
 /*
  * Indicates object was reached by traversal. i.e. not given by user on
  * command-line or stdin.
- * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
- * filtering trees and blobs, but it may be useful to support filtering commits
- * in the future.
  */
 #define NOT_USER_GIVEN	(1u<<25)
 #define TRACK_LINEAR	(1u<<26)
-- 
2.30.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2 3/8] list-objects: move tag processing into its own function
  2021-03-15 13:14 ` [PATCH v2 0/8] " Patrick Steinhardt
  2021-03-15 13:14   ` [PATCH v2 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
  2021-03-15 13:14   ` [PATCH v2 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
@ 2021-03-15 13:14   ` Patrick Steinhardt
  2021-04-06 17:39     ` Jeff King
  2021-03-15 13:14   ` [PATCH v2 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
                     ` (6 subsequent siblings)
  9 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-15 13:14 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 1293 bytes --]

Move processing of tags into its own function to make the logic easier
to extend when we're going to implement filtering for tags. No change in
behaviour is expected from this commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 list-objects.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index e19589baa0..093adf85b1 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -213,6 +213,15 @@ static void process_tree(struct traversal_context *ctx,
 	free_tree_buffer(tree);
 }
 
+static void process_tag(struct traversal_context *ctx,
+			struct tag *tag,
+			struct strbuf *base,
+			const char *name)
+{
+	tag->object.flags |= SEEN;
+	ctx->show_object(&tag->object, name, ctx->show_data);
+}
+
 static void mark_edge_parents_uninteresting(struct commit *commit,
 					    struct rev_info *revs,
 					    show_edge_fn show_edge)
@@ -334,8 +343,7 @@ static void traverse_trees_and_blobs(struct traversal_context *ctx,
 		if (obj->flags & (UNINTERESTING | SEEN))
 			continue;
 		if (obj->type == OBJ_TAG) {
-			obj->flags |= SEEN;
-			ctx->show_object(obj, name, ctx->show_data);
+			process_tag(ctx, (struct tag *)obj, base, name);
 			continue;
 		}
 		if (!path)
-- 
2.30.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2 4/8] list-objects: support filtering by tag and commit
  2021-03-15 13:14 ` [PATCH v2 0/8] " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2021-03-15 13:14   ` [PATCH v2 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
@ 2021-03-15 13:14   ` Patrick Steinhardt
  2021-03-15 13:14   ` [PATCH v2 5/8] list-objects: implement object type filter Patrick Steinhardt
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-15 13:14 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 4850 bytes --]

Object filters currently only support filtering blobs or trees based on
some criteria. This commit lays the foundation to also allow filtering
of tags and commits.

No change in behaviour is expected from this commit given that there are
no filters yet for those object types.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 list-objects-filter.c | 40 ++++++++++++++++++++++++++++++++++++++++
 list-objects-filter.h |  2 ++
 list-objects.c        | 24 +++++++++++++++++++++---
 3 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/list-objects-filter.c b/list-objects-filter.c
index 4ec0041cfb..7def039435 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -82,6 +82,16 @@ static enum list_objects_filter_result filter_blobs_none(
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		/* always include all tag objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		/* always include all commit objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		/* always include all tree objects */
@@ -173,6 +183,16 @@ static enum list_objects_filter_result filter_trees_depth(
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		/* always include all tag objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		/* always include all commit objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
 	case LOFS_END_TREE:
 		assert(obj->type == OBJ_TREE);
 		filter_data->current_depth--;
@@ -267,6 +287,16 @@ static enum list_objects_filter_result filter_blobs_limit(
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		/* always include all tag objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		/* always include all commit objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		/* always include all tree objects */
@@ -371,6 +401,16 @@ static enum list_objects_filter_result filter_sparse(
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		/* always include all tag objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		/* always include all commit objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		dtype = DT_DIR;
diff --git a/list-objects-filter.h b/list-objects-filter.h
index cfd784e203..9e98814111 100644
--- a/list-objects-filter.h
+++ b/list-objects-filter.h
@@ -55,6 +55,8 @@ enum list_objects_filter_result {
 };
 
 enum list_objects_filter_situation {
+	LOFS_COMMIT,
+	LOFS_TAG,
 	LOFS_BEGIN_TREE,
 	LOFS_END_TREE,
 	LOFS_BLOB
diff --git a/list-objects.c b/list-objects.c
index 093adf85b1..3b63dfd4f2 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -218,8 +218,16 @@ static void process_tag(struct traversal_context *ctx,
 			struct strbuf *base,
 			const char *name)
 {
-	tag->object.flags |= SEEN;
-	ctx->show_object(&tag->object, name, ctx->show_data);
+	enum list_objects_filter_result r;
+
+	r = list_objects_filter__filter_object(ctx->revs->repo, LOFS_TAG,
+					       &tag->object, base->buf,
+					       &base->buf[base->len],
+					       ctx->filter);
+	if (r & LOFR_MARK_SEEN)
+		tag->object.flags |= SEEN;
+	if (r & LOFR_DO_SHOW)
+		ctx->show_object(&tag->object, name, ctx->show_data);
 }
 
 static void mark_edge_parents_uninteresting(struct commit *commit,
@@ -369,6 +377,12 @@ static void do_traverse(struct traversal_context *ctx)
 	strbuf_init(&csp, PATH_MAX);
 
 	while ((commit = get_revision(ctx->revs)) != NULL) {
+		enum list_objects_filter_result r;
+
+		r = list_objects_filter__filter_object(ctx->revs->repo,
+				LOFS_COMMIT, &commit->object,
+				NULL, NULL, ctx->filter);
+
 		/*
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
@@ -383,7 +397,11 @@ static void do_traverse(struct traversal_context *ctx)
 			die(_("unable to load root tree for commit %s"),
 			      oid_to_hex(&commit->object.oid));
 		}
-		ctx->show_commit(commit, ctx->show_data);
+
+		if (r & LOFR_MARK_SEEN)
+			commit->object.flags |= SEEN;
+		if (r & LOFR_DO_SHOW)
+			ctx->show_commit(commit, ctx->show_data);
 
 		if (ctx->revs->tree_blobs_in_commit_order)
 			/*
-- 
2.30.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2 5/8] list-objects: implement object type filter
  2021-03-15 13:14 ` [PATCH v2 0/8] " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2021-03-15 13:14   ` [PATCH v2 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
@ 2021-03-15 13:14   ` Patrick Steinhardt
  2021-04-06 17:42     ` Jeff King
  2021-03-15 13:14   ` [PATCH v2 6/8] pack-bitmap: " Patrick Steinhardt
                     ` (4 subsequent siblings)
  9 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-15 13:14 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 9502 bytes --]

While it already is possible to filter objects by some criteria in
git-rev-list(1), it is not yet possible to filter out only a specific
type of objects. This makes some filters less useful. The `blob:limit`
filter for example filters blobs such that only those which are smaller
than the given limit are returned. But it is unfit to ask only for these
smallish blobs, given that git-rev-list(1) will continue to print tags,
commits and trees.

Now that we have the infrastructure in place to also filter tags and
commits, we can improve this situation by implementing a new filter
which selects objects based on their type. Above query can thus
trivially be implemented with the following command:

    $ git rev-list --objects --filter=object:type=blob \
        --filter=blob:limit=200

Furthermore, this filter allows to optimize for certain other cases: if
for example only tags or commits have been selected, there is no need to
walk down trees.

The new filter is not yet supported in bitmaps. This is going to be
implemented in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/uploadpack.txt |  6 +--
 Documentation/rev-list-options.txt  |  3 ++
 list-objects-filter-options.c       | 14 ++++++
 list-objects-filter-options.h       |  2 +
 list-objects-filter.c               | 76 +++++++++++++++++++++++++++++
 t/t6112-rev-list-filters-objects.sh | 48 ++++++++++++++++++
 6 files changed, 146 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/uploadpack.txt b/Documentation/config/uploadpack.txt
index 6729a072ea..32fad5bbe8 100644
--- a/Documentation/config/uploadpack.txt
+++ b/Documentation/config/uploadpack.txt
@@ -66,9 +66,9 @@ uploadpackfilter.allow::
 uploadpackfilter.<filter>.allow::
 	Explicitly allow or ban the object filter corresponding to
 	`<filter>`, where `<filter>` may be one of: `blob:none`,
-	`blob:limit`, `tree`, `sparse:oid`, or `combine`. If using
-	combined filters, both `combine` and all of the nested filter
-	kinds must be allowed. Defaults to `uploadpackfilter.allow`.
+	`blob:limit`, `object:type`, `tree`, `sparse:oid`, or `combine`.
+	If using combined filters, both `combine` and all of the nested
+	filter kinds must be allowed. Defaults to `uploadpackfilter.allow`.
 
 uploadpackfilter.tree.maxDepth::
 	Only allow `--filter=tree:<n>` when `<n>` is no more than the value of
diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index b1c8f86c6e..3afa8fffbd 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -892,6 +892,9 @@ or units.  n may be zero.  The suffixes k, m, and g can be used to name
 units in KiB, MiB, or GiB.  For example, 'blob:limit=1k' is the same
 as 'blob:limit=1024'.
 +
+The form '--filter=object:type=(tag|commit|tree|blob)' omits all objects
+which are not of the requested type.
++
 The form '--filter=sparse:oid=<blob-ish>' uses a sparse-checkout
 specification contained in the blob (or blob-expression) '<blob-ish>'
 to omit blobs that would not be not required for a sparse checkout on
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index d2d1c81caf..bb6f6577d5 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -29,6 +29,8 @@ const char *list_object_filter_config_name(enum list_objects_filter_choice c)
 		return "tree";
 	case LOFC_SPARSE_OID:
 		return "sparse:oid";
+	case LOFC_OBJECT_TYPE:
+		return "object:type";
 	case LOFC_COMBINE:
 		return "combine";
 	case LOFC__COUNT:
@@ -97,6 +99,18 @@ static int gently_parse_list_objects_filter(
 		}
 		return 1;
 
+	} else if (skip_prefix(arg, "object:type=", &v0)) {
+		int type = type_from_string_gently(v0, -1, 1);
+		if (type < 0) {
+			strbuf_addstr(errbuf, _("expected 'object:type=<type>'"));
+			return 1;
+		}
+
+		filter_options->object_type = type;
+		filter_options->choice = LOFC_OBJECT_TYPE;
+
+		return 0;
+
 	} else if (skip_prefix(arg, "combine:", &v0)) {
 		return parse_combine_filter(filter_options, v0, errbuf);
 
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 01767c3c96..4d0d0588cc 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -13,6 +13,7 @@ enum list_objects_filter_choice {
 	LOFC_BLOB_LIMIT,
 	LOFC_TREE_DEPTH,
 	LOFC_SPARSE_OID,
+	LOFC_OBJECT_TYPE,
 	LOFC_COMBINE,
 	LOFC__COUNT /* must be last */
 };
@@ -54,6 +55,7 @@ struct list_objects_filter_options {
 	char *sparse_oid_name;
 	unsigned long blob_limit_value;
 	unsigned long tree_exclude_depth;
+	enum object_type object_type;
 
 	/* LOFC_COMBINE values */
 
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 7def039435..650a7c2c80 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -545,6 +545,81 @@ static void filter_sparse_oid__init(
 	filter->free_fn = filter_sparse_free;
 }
 
+/*
+ * A filter for list-objects to omit large blobs.
+ * And to OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_object_type_data {
+	enum object_type object_type;
+};
+
+static enum list_objects_filter_result filter_object_type(
+	struct repository *r,
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	struct oidset *omits,
+	void *filter_data_)
+{
+	struct filter_object_type_data *filter_data = filter_data_;
+
+	switch (filter_situation) {
+	default:
+		BUG("unknown filter_situation: %d", filter_situation);
+
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		if (filter_data->object_type == OBJ_TAG)
+			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+		return LOFR_MARK_SEEN;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		if (filter_data->object_type == OBJ_COMMIT)
+			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+		return LOFR_MARK_SEEN;
+
+	case LOFS_BEGIN_TREE:
+		assert(obj->type == OBJ_TREE);
+
+		/*
+		 * If we only want to show commits or tags, then there is no
+		 * need to walk down trees.
+		 */
+		if (filter_data->object_type == OBJ_COMMIT ||
+		    filter_data->object_type == OBJ_TAG)
+			return LOFR_SKIP_TREE;
+
+		if (filter_data->object_type == OBJ_TREE)
+			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+		return LOFR_MARK_SEEN;
+
+	case LOFS_BLOB:
+		assert(obj->type == OBJ_BLOB);
+
+		if (filter_data->object_type == OBJ_BLOB)
+			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+		return LOFR_MARK_SEEN;
+
+	case LOFS_END_TREE:
+		return LOFR_ZERO;
+	}
+}
+
+static void filter_object_type__init(
+	struct list_objects_filter_options *filter_options,
+	struct filter *filter)
+{
+	struct filter_object_type_data *d = xcalloc(1, sizeof(*d));
+	d->object_type = filter_options->object_type;
+
+	filter->filter_data = d;
+	filter->filter_object_fn = filter_object_type;
+	filter->free_fn = free;
+}
+
 /* A filter which only shows objects shown by all sub-filters. */
 struct combine_filter_data {
 	struct subfilter *sub;
@@ -691,6 +766,7 @@ static filter_init_fn s_filters[] = {
 	filter_blobs_limit__init,
 	filter_trees_depth__init,
 	filter_sparse_oid__init,
+	filter_object_type__init,
 	filter_combine__init,
 };
 
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 31457d13b9..c79ec04060 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -159,6 +159,54 @@ test_expect_success 'verify blob:limit=1m' '
 	test_must_be_empty observed
 '
 
+# Test object:type=<type> filter.
+
+test_expect_success 'setup object-type' '
+	git init object-type &&
+	echo contents >object-type/blob &&
+	git -C object-type add blob &&
+	git -C object-type commit -m commit-message &&
+	git -C object-type tag tag -m tag-message
+'
+
+test_expect_success 'verify object:type= fails with invalid type' '
+	test_must_fail git -C object-type rev-list --objects --filter=object:type= HEAD &&
+	test_must_fail git -C object-type rev-list --objects --filter=object:type=invalid HEAD
+'
+
+test_expect_success 'verify object:type=blob prints blob and commit' '
+	(
+		git -C object-type rev-parse HEAD &&
+		printf "%s blob\n" $(git -C object-type rev-parse HEAD:blob)
+	) >expected &&
+	git -C object-type rev-list --objects --filter=object:type=blob HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tree prints tree and commit' '
+	(
+		git -C object-type rev-parse HEAD &&
+		printf "%s \n" $(git -C object-type rev-parse HEAD^{tree})
+	) >expected &&
+	git -C object-type rev-list --objects --filter=object:type=tree HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=commit prints commit' '
+	git -C object-type rev-parse HEAD >expected &&
+	git -C object-type rev-list --objects --filter=object:type=commit HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tag prints tag' '
+	(
+		git -C object-type rev-parse HEAD &&
+		printf "%s tag\n" $(git -C object-type rev-parse tag)
+	) >expected &&
+	git -C object-type rev-list --objects --filter=object:type=tag tag >actual &&
+	test_cmp expected actual
+'
+
 # Test sparse:path=<path> filter.
 # !!!!
 # NOTE: sparse:path filter support has been dropped for security reasons,
-- 
2.30.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2 6/8] pack-bitmap: implement object type filter
  2021-03-15 13:14 ` [PATCH v2 0/8] " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2021-03-15 13:14   ` [PATCH v2 5/8] list-objects: implement object type filter Patrick Steinhardt
@ 2021-03-15 13:14   ` Patrick Steinhardt
  2021-04-06 17:48     ` Jeff King
  2021-03-15 13:14   ` [PATCH v2 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
                     ` (3 subsequent siblings)
  9 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-15 13:14 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 3572 bytes --]

The preceding commit has added a new object filter for git-rev-list(1)
which allows to filter objects by type. Implement the equivalent filter
for packfile bitmaps so that we can answer these queries fast.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 pack-bitmap.c                      | 28 +++++++++++++++++++++++++---
 t/t6113-rev-list-bitmap-filters.sh | 25 ++++++++++++++++++++++++-
 2 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 1f69b5fa85..196d38c91d 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -779,9 +779,6 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
 	eword_t mask;
 	uint32_t i;
 
-	if (type != OBJ_BLOB && type != OBJ_TREE)
-		BUG("filter_bitmap_exclude_type: unsupported type '%d'", type);
-
 	/*
 	 * The non-bitmap version of this filter never removes
 	 * objects which the other side specifically asked for,
@@ -911,6 +908,23 @@ static void filter_bitmap_tree_depth(struct bitmap_index *bitmap_git,
 				   OBJ_BLOB);
 }
 
+static void filter_bitmap_object_type(struct bitmap_index *bitmap_git,
+				      struct object_list *tip_objects,
+				      struct bitmap *to_filter,
+				      enum object_type object_type)
+{
+	enum object_type t;
+
+	if (object_type < OBJ_COMMIT || object_type > OBJ_TAG)
+		BUG("filter_bitmap_object_type given invalid object");
+
+	for (t = OBJ_COMMIT; t <= OBJ_TAG; t++) {
+		if (t == object_type)
+			continue;
+		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, t);
+	}
+}
+
 static int filter_bitmap(struct bitmap_index *bitmap_git,
 			 struct object_list *tip_objects,
 			 struct bitmap *to_filter,
@@ -943,6 +957,14 @@ static int filter_bitmap(struct bitmap_index *bitmap_git,
 		return 0;
 	}
 
+	if (filter->choice == LOFC_OBJECT_TYPE) {
+		if (bitmap_git)
+			filter_bitmap_object_type(bitmap_git, tip_objects,
+						  to_filter,
+						  filter->object_type);
+		return 0;
+	}
+
 	/* filter choice not handled */
 	return -1;
 }
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
index 3f889949ca..fb66735ac8 100755
--- a/t/t6113-rev-list-bitmap-filters.sh
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -10,7 +10,8 @@ test_expect_success 'set up bitmapped repo' '
 	test_commit much-larger-blob-one &&
 	git repack -adb &&
 	test_commit two &&
-	test_commit much-larger-blob-two
+	test_commit much-larger-blob-two &&
+	git tag tag
 '
 
 test_expect_success 'filters fallback to non-bitmap traversal' '
@@ -75,4 +76,26 @@ test_expect_success 'tree:1 filter' '
 	test_cmp expect actual
 '
 
+test_expect_success 'object:type filter' '
+	git rev-list --objects --filter=object:type=tag tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=tag tag >actual &&
+	test_cmp expect actual &&
+
+	git rev-list --objects --filter=object:type=commit tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=commit tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter=object:type=tree tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=tree tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter=object:type=blob tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=blob tag >actual &&
+	test_bitmap_traversal expect actual
+'
+
 test_done
-- 
2.30.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2 7/8] pack-bitmap: implement combined filter
  2021-03-15 13:14 ` [PATCH v2 0/8] " Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2021-03-15 13:14   ` [PATCH v2 6/8] pack-bitmap: " Patrick Steinhardt
@ 2021-03-15 13:14   ` Patrick Steinhardt
  2021-04-06 17:54     ` Jeff King
  2021-03-15 13:15   ` [PATCH v2 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
                     ` (2 subsequent siblings)
  9 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-15 13:14 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 3204 bytes --]

When the user has multiple objects filters specified, then this is
internally represented by having a "combined" filter. These combined
filters aren't yet supported by bitmap indices and can thus not be
accelerated.

Fix this by implementing support for these combined filters. The
implementation is quite trivial: when there's a combined filter, we
simply recurse into `filter_bitmap()` for all of the sub-filters.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 pack-bitmap.c                      | 40 +++++++++++++++++++++++++++---
 t/t6113-rev-list-bitmap-filters.sh |  7 ++++++
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 196d38c91d..e33805e076 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -925,6 +925,29 @@ static void filter_bitmap_object_type(struct bitmap_index *bitmap_git,
 	}
 }
 
+static int filter_supported(struct list_objects_filter_options *filter)
+{
+	int i;
+
+	switch (filter->choice) {
+	case LOFC_BLOB_NONE:
+	case LOFC_BLOB_LIMIT:
+	case LOFC_OBJECT_TYPE:
+		return 1;
+	case LOFC_TREE_DEPTH:
+		if (filter->tree_exclude_depth == 0)
+			return 1;
+		return 0;
+	case LOFC_COMBINE:
+		for (i = 0; i < filter->sub_nr; i++)
+			if (!filter_supported(&filter->sub[i]))
+				return 0;
+		return 1;
+	default:
+		return 0;
+	}
+}
+
 static int filter_bitmap(struct bitmap_index *bitmap_git,
 			 struct object_list *tip_objects,
 			 struct bitmap *to_filter,
@@ -932,6 +955,8 @@ static int filter_bitmap(struct bitmap_index *bitmap_git,
 {
 	if (!filter || filter->choice == LOFC_DISABLED)
 		return 0;
+	if (!filter_supported(filter))
+		return -1;
 
 	if (filter->choice == LOFC_BLOB_NONE) {
 		if (bitmap_git)
@@ -948,8 +973,7 @@ static int filter_bitmap(struct bitmap_index *bitmap_git,
 		return 0;
 	}
 
-	if (filter->choice == LOFC_TREE_DEPTH &&
-	    filter->tree_exclude_depth == 0) {
+	if (filter->choice == LOFC_TREE_DEPTH) {
 		if (bitmap_git)
 			filter_bitmap_tree_depth(bitmap_git, tip_objects,
 						 to_filter,
@@ -965,8 +989,16 @@ static int filter_bitmap(struct bitmap_index *bitmap_git,
 		return 0;
 	}
 
-	/* filter choice not handled */
-	return -1;
+	if (filter->choice == LOFC_COMBINE) {
+		int i;
+		for (i = 0; i < filter->sub_nr; i++) {
+			filter_bitmap(bitmap_git, tip_objects, to_filter,
+				      &filter->sub[i]);
+		}
+		return 0;
+	}
+
+	BUG("unsupported filter choice");
 }
 
 static int can_filter_bitmap(struct list_objects_filter_options *filter)
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
index fb66735ac8..cb9db7df6f 100755
--- a/t/t6113-rev-list-bitmap-filters.sh
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -98,4 +98,11 @@ test_expect_success 'object:type filter' '
 	test_bitmap_traversal expect actual
 '
 
+test_expect_success 'combine filter' '
+	git rev-list --objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
+	test_bitmap_traversal expect actual
+'
+
 test_done
-- 
2.30.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2 8/8] rev-list: allow filtering of provided items
  2021-03-15 13:14 ` [PATCH v2 0/8] " Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2021-03-15 13:14   ` [PATCH v2 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
@ 2021-03-15 13:15   ` Patrick Steinhardt
  2021-04-06 18:04     ` Jeff King
  2021-03-20 21:10   ` [PATCH v2 0/8] rev-parse: implement object type filter Junio C Hamano
  2021-04-09 11:27   ` [PATCH v3 " Patrick Steinhardt
  9 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2021-03-15 13:15 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 8030 bytes --]

When providing an object filter, it is currently impossible to also
filter provided items. E.g. when executing `git rev-list HEAD` , the
commit this reference points to will be treated as user-provided and is
thus excluded from the filtering mechanism. This makes it harder than
necessary to properly use the new `--filter=object:type` filter given
that even if the user wants to only see blobs, he'll still see commits
of provided references.

Improve this by introducing a new `--filter-provided` option to the
git-rev-parse(1) command. If given, then all user-provided references
will be subject to filtering.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/rev-list.c                  | 14 +++++++++++
 list-objects-filter-options.c       |  4 ++++
 list-objects-filter-options.h       |  6 +++++
 pack-bitmap.c                       |  3 ++-
 t/t6112-rev-list-filters-objects.sh | 28 ++++++++++++++++++++++
 t/t6113-rev-list-bitmap-filters.sh  | 36 +++++++++++++++++++++++++++++
 6 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index b4d8ea0a35..0f959b266d 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -599,6 +599,10 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			list_objects_filter_set_no_filter(&filter_options);
 			continue;
 		}
+		if (!strcmp(arg, "--filter-provided")) {
+			filter_options.filter_wants = 1;
+			continue;
+		}
 		if (!strcmp(arg, "--filter-print-omitted")) {
 			arg_print_omitted = 1;
 			continue;
@@ -694,6 +698,16 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			return show_bisect_vars(&info, reaches, all);
 	}
 
+	if (filter_options.filter_wants) {
+		struct commit_list *c;
+		for (i = 0; i < revs.pending.nr; i++) {
+			struct object_array_entry *pending = revs.pending.objects + i;
+			pending->item->flags |= NOT_USER_GIVEN;
+		}
+		for (c = revs.commits; c; c = c->next)
+			c->item->object.flags |= NOT_USER_GIVEN;
+	}
+
 	if (arg_print_omitted)
 		oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
 	if (arg_missing_action == MA_PRINT)
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index bb6f6577d5..2877aa9e96 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -242,6 +242,7 @@ static void transform_to_combine_type(
 		memset(filter_options, 0, sizeof(*filter_options));
 		filter_options->sub = sub_array;
 		filter_options->sub_alloc = initial_sub_alloc;
+		filter_options->filter_wants = sub_array[0].filter_wants;
 	}
 	filter_options->sub_nr = 1;
 	filter_options->choice = LOFC_COMBINE;
@@ -290,6 +291,9 @@ void parse_list_objects_filter(
 		parse_error = gently_parse_list_objects_filter(
 			&filter_options->sub[filter_options->sub_nr - 1], arg,
 			&errbuf);
+		if (!parse_error)
+			filter_options->sub[filter_options->sub_nr - 1].filter_wants =
+				filter_options->filter_wants;
 	}
 	if (parse_error)
 		die("%s", errbuf.buf);
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 4d0d0588cc..5e609e307a 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -42,6 +42,12 @@ struct list_objects_filter_options {
 	 */
 	enum list_objects_filter_choice choice;
 
+	/*
+	 * "--filter-provided" was given by the user, instructing us to also
+	 * filter all explicitly provided objects.
+	 */
+	unsigned int filter_wants : 1;
+
 	/*
 	 * Choice is LOFC_DISABLED because "--no-filter" was requested.
 	 */
diff --git a/pack-bitmap.c b/pack-bitmap.c
index e33805e076..5ff800316b 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1101,7 +1101,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	if (haves_bitmap)
 		bitmap_and_not(wants_bitmap, haves_bitmap);
 
-	filter_bitmap(bitmap_git, wants, wants_bitmap, filter);
+	filter_bitmap(bitmap_git, (filter && filter->filter_wants) ? NULL : wants,
+		      wants_bitmap, filter);
 
 	bitmap_git->result = wants_bitmap;
 	bitmap_git->haves = haves_bitmap;
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index c79ec04060..47c558ab0e 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -207,6 +207,34 @@ test_expect_success 'verify object:type=tag prints tag' '
 	test_cmp expected actual
 '
 
+test_expect_success 'verify object:type=blob prints only blob with --filter-provided' '
+	printf "%s blob\n" $(git -C object-type rev-parse HEAD:blob) >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=blob --filter-provided HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tree prints only tree with --filter-provided' '
+	printf "%s \n" $(git -C object-type rev-parse HEAD^{tree}) >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=tree HEAD --filter-provided >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=commit prints only commit with --filter-provided' '
+	git -C object-type rev-parse HEAD >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=commit --filter-provided HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tag prints only tag with --filter-provided' '
+	printf "%s tag\n" $(git -C object-type rev-parse tag) >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=tag --filter-provided tag >actual &&
+	test_cmp expected actual
+'
+
 # Test sparse:path=<path> filter.
 # !!!!
 # NOTE: sparse:path filter support has been dropped for security reasons,
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
index cb9db7df6f..9053ac5059 100755
--- a/t/t6113-rev-list-bitmap-filters.sh
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -98,6 +98,28 @@ test_expect_success 'object:type filter' '
 	test_bitmap_traversal expect actual
 '
 
+test_expect_success 'object:type filter with --filter-provided' '
+	git rev-list --objects --filter-provided --filter=object:type=tag tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided --filter=object:type=tag tag >actual &&
+	test_cmp expect actual &&
+
+	git rev-list --objects --filter-provided --filter=object:type=commit tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided --filter=object:type=commit tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter-provided --filter=object:type=tree tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided --filter=object:type=tree tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter-provided --filter=object:type=blob tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided --filter=object:type=blob tag >actual &&
+	test_bitmap_traversal expect actual
+'
+
 test_expect_success 'combine filter' '
 	git rev-list --objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
 	git rev-list --use-bitmap-index \
@@ -105,4 +127,18 @@ test_expect_success 'combine filter' '
 	test_bitmap_traversal expect actual
 '
 
+test_expect_success 'combine filter with --filter-provided' '
+	git rev-list --objects --filter-provided --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git cat-file --batch-check="%(objecttype) %(objectsize)" <actual >objects &&
+	while read objecttype objectsize
+	do
+		test "$objecttype" = blob || return 1
+		test "$objectsize" -le 1000 || return 1
+	done <objects
+'
+
 test_done
-- 
2.30.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 0/8] rev-parse: implement object type filter
  2021-03-15 13:14 ` [PATCH v2 0/8] " Patrick Steinhardt
                     ` (7 preceding siblings ...)
  2021-03-15 13:15   ` [PATCH v2 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
@ 2021-03-20 21:10   ` Junio C Hamano
  2021-04-06 18:08     ` Jeff King
  2021-04-09 11:27   ` [PATCH v3 " Patrick Steinhardt
  9 siblings, 1 reply; 67+ messages in thread
From: Junio C Hamano @ 2021-03-20 21:10 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder, Taylor Blau, Jeff King

Patrick Steinhardt <ps@pks.im> writes:

> this is the second version of my patch series which implements a new
> `object:type` filter for git-rev-parse(1) and git-upload-pack(1) and
> extends support for bitmap indices to work with combined filters.
> ...
> Please see the attached range-diff for more details.

Any comment from stakeholders?

Thanks.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow`
  2021-03-15 13:14   ` [PATCH v2 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
@ 2021-04-06 17:17     ` Jeff King
  0 siblings, 0 replies; 67+ messages in thread
From: Jeff King @ 2021-04-06 17:17 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder, Taylor Blau

On Mon, Mar 15, 2021 at 02:14:31PM +0100, Patrick Steinhardt wrote:

> When `uploadpackfilter.allow` is set to `true`, it means that filters
> are enabled by default except in the case where a filter is explicitly
> disabled via `uploadpackilter.<filter>.allow`. This option will not only
> enable the currently supported set of filters, but also any filters
> which get added in the future. As such, an admin which wants to have
> tight control over which filters are allowed and which aren't probably
> shouldn't ever set `uploadpackfilter.allow=true`.
> 
> Amend the documentation to make the ramifications more explicit so that
> admins are aware of this.

It might help to guide the admin a bit more here. What are we really
worried about? Probably that an expensive filter would be added that
would make an admin with a public-facing server unhappy.

Maybe we should be more explicit about our recommendations, like:

  This defaults to `true` for historical reasons, but that includes
  expensive-to-compute filters (both existing ones like `sparse`, but
  also future ones). A safer value is to set this to `false` and
  mark individual filters as allowed.

But then of course somebody wonders which set are expensive and which
ones are not. And really, "expensive" here is not that expensive. It is
"do not support bitmaps".

So I wonder if this concern is overblown in the first place. People who
care about using only bitmap-supported filters probably already set this
to "false". And vaguely calling things "expensive" is probably being
overly scary. But in that case, I'm not sure we even need to add a
reminder that future ones will also be enabled (OTOH, I do not mind it
so much; it is encouraging people to set this to false and mark
individual ones as allowed).

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/8] revision: mark commit parents as NOT_USER_GIVEN
  2021-03-15 13:14   ` [PATCH v2 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
@ 2021-04-06 17:30     ` Jeff King
  2021-04-09 10:19       ` Patrick Steinhardt
  0 siblings, 1 reply; 67+ messages in thread
From: Jeff King @ 2021-04-06 17:30 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder, Taylor Blau

On Mon, Mar 15, 2021 at 02:14:36PM +0100, Patrick Steinhardt wrote:

> The NOT_USER_GIVEN flag of an object marks whether a flag was explicitly
> provided by the user or not. The most important use case for this is
> when filtering objects: only objects that were not explicitly requested
> will get filtered.
> 
> The flag is currently only set for blobs and trees, which has been fine
> given that there are no filters for tags or commits currently. We're
> about to extend filtering capabilities to add object type filter though,
> which requires us to set up the NOT_USER_GIVEN flag correctly -- if it's
> not set, the object wouldn't get filtered at all.
> 
> Mark unseen commit parents as NOT_USER_GIVEN when processing parents.
> Like this, explicitly provided parents stay user-given and thus
> unfiltered, while parents which get loaded as part of the graph walk
> can be filtered.
> 
> This commit shouldn't have any user-visible impact yet as there is no
> logic to filter commits yet.

I'm still scratching my head a bit to understand how NOT_USER_GIVEN can
possibly be correct (as opposed to USER_GIVEN). If we visit the commit
in a not-user-given context and add the flag, how do we know it wasn't
_also_ visited in a user-given context?

Just guessing, but perhaps the SEEN flag is saving us here? If we visit
the user-given commit itself first, then we give it the SEEN flag. Then
if we try to visit it again via parent traversal, we've already
processed it and don't add the NOT_USER_GIVEN flag here.

That seems the opposite of the order we'd usually traverse, but I think
we set SEEN on each commit in prepare_revision_walk(), before we do any
traversing.

So I _think_ it all works even with your changes here, but I have to say
this NOT_USER_GIVEN thing seems really fragile to me. Not new in your
series, of course, but something we may want to look at.

Just grepping around, "rev-list -g" will happily remove SEEN flags, so I
suspect it interacts badly with --filter. Just trying "rev-list -g
--objects --filter=object:type=blob HEAD" shows that it produces quite a
lot of commits (which I think is a more fundamental problem: it is not
walking the parent chain at all to assign these NOT_USER_GIVEN flags).

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 3/8] list-objects: move tag processing into its own function
  2021-03-15 13:14   ` [PATCH v2 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
@ 2021-04-06 17:39     ` Jeff King
  0 siblings, 0 replies; 67+ messages in thread
From: Jeff King @ 2021-04-06 17:39 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder, Taylor Blau

On Mon, Mar 15, 2021 at 02:14:40PM +0100, Patrick Steinhardt wrote:

> Move processing of tags into its own function to make the logic easier
> to extend when we're going to implement filtering for tags. No change in
> behaviour is expected from this commit.

Makes sense. Even without extending the logic, it is nice to see the
symmetric with the tree/blob paths.

Although I think it's not quite symmetric in practice...

> +static void process_tag(struct traversal_context *ctx,
> +			struct tag *tag,
> +			struct strbuf *base,
> +			const char *name)
> +{
> +	tag->object.flags |= SEEN;
> +	ctx->show_object(&tag->object, name, ctx->show_data);
> +}

I'm skeptical that "base" will ever be meaningful here (as it would be
for trees and blobs), because we are never recursing a tree to hit a
tag. We do later pass it to filter_object(), but I think it will always
be the empty string (we even assert(base->len == 0) in the caller).

So I am tempted to say it should not take a base parameter at all, and
later the call to filter_object() added to process_tag() should just
pass an empty string as the base. That would make it clear we do not
expect any kind of "base". That's mostly academic, but I think it also
makes clear that the "name" field is not something that should be
appended to the base (unlike trees and blobs, it is the name we got from
the top-level parsing, not a pathname).

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 5/8] list-objects: implement object type filter
  2021-03-15 13:14   ` [PATCH v2 5/8] list-objects: implement object type filter Patrick Steinhardt
@ 2021-04-06 17:42     ` Jeff King
  0 siblings, 0 replies; 67+ messages in thread
From: Jeff King @ 2021-04-06 17:42 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder, Taylor Blau

On Mon, Mar 15, 2021 at 02:14:50PM +0100, Patrick Steinhardt wrote:

> While it already is possible to filter objects by some criteria in
> git-rev-list(1), it is not yet possible to filter out only a specific
> type of objects. This makes some filters less useful. The `blob:limit`
> filter for example filters blobs such that only those which are smaller
> than the given limit are returned. But it is unfit to ask only for these
> smallish blobs, given that git-rev-list(1) will continue to print tags,
> commits and trees.
> 
> Now that we have the infrastructure in place to also filter tags and
> commits, we can improve this situation by implementing a new filter
> which selects objects based on their type. Above query can thus
> trivially be implemented with the following command:
> 
>     $ git rev-list --objects --filter=object:type=blob \
>         --filter=blob:limit=200
> 
> Furthermore, this filter allows to optimize for certain other cases: if
> for example only tags or commits have been selected, there is no need to
> walk down trees.

Makes sense, and the implementation looks reasonable to me.

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 6/8] pack-bitmap: implement object type filter
  2021-03-15 13:14   ` [PATCH v2 6/8] pack-bitmap: " Patrick Steinhardt
@ 2021-04-06 17:48     ` Jeff King
  0 siblings, 0 replies; 67+ messages in thread
From: Jeff King @ 2021-04-06 17:48 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder, Taylor Blau

On Mon, Mar 15, 2021 at 02:14:55PM +0100, Patrick Steinhardt wrote:

> The preceding commit has added a new object filter for git-rev-list(1)
> which allows to filter objects by type. Implement the equivalent filter
> for packfile bitmaps so that we can answer these queries fast.

Makes sense. The implementation looks pretty sensible. One observation:

> +static void filter_bitmap_object_type(struct bitmap_index *bitmap_git,
> +				      struct object_list *tip_objects,
> +				      struct bitmap *to_filter,
> +				      enum object_type object_type)
> +{
> +	enum object_type t;
> +
> +	if (object_type < OBJ_COMMIT || object_type > OBJ_TAG)
> +		BUG("filter_bitmap_object_type given invalid object");
> +
> +	for (t = OBJ_COMMIT; t <= OBJ_TAG; t++) {
> +		if (t == object_type)
> +			continue;
> +		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, t);
> +	}
> +}

The OBJ_* constants are a contiguous set between COMMIT and TAG, and it
has to remain this way (because we use them to decipher the type fields
in pack files). But I don't think we've generally baked that assumption
into the code in this way.

Writing it out long-hand would be something like:

  if (t != OBJ_COMMIT)
	filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, OBJ_COMMIT);
  if (t != OBJ_TREE)
	filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, OBJ_TREE);

and so on, which isn't too bad. I dunno. That may be overly picky.

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 7/8] pack-bitmap: implement combined filter
  2021-03-15 13:14   ` [PATCH v2 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
@ 2021-04-06 17:54     ` Jeff King
  2021-04-09 10:31       ` Patrick Steinhardt
  2021-04-09 11:17       ` Patrick Steinhardt
  0 siblings, 2 replies; 67+ messages in thread
From: Jeff King @ 2021-04-06 17:54 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder, Taylor Blau

On Mon, Mar 15, 2021 at 02:14:59PM +0100, Patrick Steinhardt wrote:

> When the user has multiple objects filters specified, then this is
> internally represented by having a "combined" filter. These combined
> filters aren't yet supported by bitmap indices and can thus not be
> accelerated.
> 
> Fix this by implementing support for these combined filters. The
> implementation is quite trivial: when there's a combined filter, we
> simply recurse into `filter_bitmap()` for all of the sub-filters.

The goal makes sense.

Before this patch, I think your test:

> +test_expect_success 'combine filter' '
> +	git rev-list --objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
> +	git rev-list --use-bitmap-index \
> +		     --objects --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
> +	test_bitmap_traversal expect actual
> +'

would pass anyway, because we'd just skip using bitmaps. Is there a way
we can tell that the bitmap code actually kicked in? Maybe a perf test
would make it clear (those aren't always run, but hopefully we'd
eventually notice a regression there).

> +static int filter_supported(struct list_objects_filter_options *filter)
> +{
> +	int i;
> +
> +	switch (filter->choice) {
> +	case LOFC_BLOB_NONE:
> +	case LOFC_BLOB_LIMIT:
> +	case LOFC_OBJECT_TYPE:
> +		return 1;
> +	case LOFC_TREE_DEPTH:
> +		if (filter->tree_exclude_depth == 0)
> +			return 1;
> +		return 0;
> +	case LOFC_COMBINE:
> +		for (i = 0; i < filter->sub_nr; i++)
> +			if (!filter_supported(&filter->sub[i]))
> +				return 0;
> +		return 1;
> +	default:
> +		return 0;
> +	}
> +}

Hmm. This is essentially reproducing the list in filter_bitmap() of
what's OK for bitmaps. So when adding a new filter, it would have to be
added in both places.

Can we preserve that property of the original code? I'd think that just
adding LOFC_COMBINE to filter_bitmap() would be sufficient. I.e., this
hunk:

> +	if (filter->choice == LOFC_COMBINE) {
> +		int i;
> +		for (i = 0; i < filter->sub_nr; i++) {
> +			filter_bitmap(bitmap_git, tip_objects, to_filter,
> +				      &filter->sub[i]);
> +		}
> +		return 0;
> +	}

...except that we need to see if filter_bitmap() returns "-1" for any of
the recursive calls. Which we probably should be doing anyway to
propagate any errors (though I think the only "errors" we'd return are
"not supported", at least for now).

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 8/8] rev-list: allow filtering of provided items
  2021-03-15 13:15   ` [PATCH v2 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
@ 2021-04-06 18:04     ` Jeff King
  2021-04-09 10:59       ` Patrick Steinhardt
  0 siblings, 1 reply; 67+ messages in thread
From: Jeff King @ 2021-04-06 18:04 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder, Taylor Blau

On Mon, Mar 15, 2021 at 02:15:05PM +0100, Patrick Steinhardt wrote:

> When providing an object filter, it is currently impossible to also
> filter provided items. E.g. when executing `git rev-list HEAD` , the
> commit this reference points to will be treated as user-provided and is
> thus excluded from the filtering mechanism. This makes it harder than
> necessary to properly use the new `--filter=object:type` filter given
> that even if the user wants to only see blobs, he'll still see commits
> of provided references.
> 
> Improve this by introducing a new `--filter-provided` option to the
> git-rev-parse(1) command. If given, then all user-provided references
> will be subject to filtering.

I think this option is a good thing to have.

The name seems a little confusing to me, as I can read is as both
"please filter the provided objects" and "a filter has been provided".
I guess "--filter-print-provided" would be more clear. And also the
default, so you'd want "--no-filter-print-provided". That's kind of
clunky, though. Maybe "--filter-omit-provided"?

> @@ -694,6 +698,16 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  			return show_bisect_vars(&info, reaches, all);
>  	}
>  
> +	if (filter_options.filter_wants) {
> +		struct commit_list *c;
> +		for (i = 0; i < revs.pending.nr; i++) {
> +			struct object_array_entry *pending = revs.pending.objects + i;
> +			pending->item->flags |= NOT_USER_GIVEN;
> +		}
> +		for (c = revs.commits; c; c = c->next)
> +			c->item->object.flags |= NOT_USER_GIVEN;
> +	}

You store the flag inside the filter_options struct, which implies to me
that it's something that could be applied per-filter (at least in
theory; the command line option doesn't allow us to distinguish).

But here you treat it as a global flag that munges the NOT_USER_GIVEN
flags. Given that it's inside the filter_options struct, and that you
propagate it via transform_to_combine_type(), I'd have expected the LOFC
code to look at the flag and decide to ignore the whole user-given
concept completely.

To be clear, I don't mind at all having it as a global that applies to
all filters. I don't think the flexibility buys us anything. But since
it only applies to rev-list, why not just make it a global option within
rev-list?

And then these hunks:

> diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> index bb6f6577d5..2877aa9e96 100644
> --- a/list-objects-filter-options.c
> +++ b/list-objects-filter-options.c
> @@ -242,6 +242,7 @@ static void transform_to_combine_type(
>  		memset(filter_options, 0, sizeof(*filter_options));
>  		filter_options->sub = sub_array;
>  		filter_options->sub_alloc = initial_sub_alloc;
> +		filter_options->filter_wants = sub_array[0].filter_wants;
>  	}
>  	filter_options->sub_nr = 1;
>  	filter_options->choice = LOFC_COMBINE;
> @@ -290,6 +291,9 @@ void parse_list_objects_filter(
>  		parse_error = gently_parse_list_objects_filter(
>  			&filter_options->sub[filter_options->sub_nr - 1], arg,
>  			&errbuf);
> +		if (!parse_error)
> +			filter_options->sub[filter_options->sub_nr - 1].filter_wants =
> +				filter_options->filter_wants;
>  	}
>  	if (parse_error)
>  		die("%s", errbuf.buf);
> diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
> index 4d0d0588cc..5e609e307a 100644
> --- a/list-objects-filter-options.h
> +++ b/list-objects-filter-options.h
> @@ -42,6 +42,12 @@ struct list_objects_filter_options {
>  	 */
>  	enum list_objects_filter_choice choice;
>  
> +	/*
> +	 * "--filter-provided" was given by the user, instructing us to also
> +	 * filter all explicitly provided objects.
> +	 */
> +	unsigned int filter_wants : 1;
> +
>  	/*
>  	 * Choice is LOFC_DISABLED because "--no-filter" was requested.
>  	 */

would not be needed at all.

> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index e33805e076..5ff800316b 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -1101,7 +1101,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
>  	if (haves_bitmap)
>  		bitmap_and_not(wants_bitmap, haves_bitmap);
>  
> -	filter_bitmap(bitmap_git, wants, wants_bitmap, filter);
> +	filter_bitmap(bitmap_git, (filter && filter->filter_wants) ? NULL : wants,
> +		      wants_bitmap, filter);
>  
>  	bitmap_git->result = wants_bitmap;
>  	bitmap_git->haves = haves_bitmap;

I guess we'd need to pass that flag into prepare_bitmap_walk() here so
it knows not to bother with the wants-filtering. But that seems less bad
that stuffing it into the filter struct.

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 0/8] rev-parse: implement object type filter
  2021-03-20 21:10   ` [PATCH v2 0/8] rev-parse: implement object type filter Junio C Hamano
@ 2021-04-06 18:08     ` Jeff King
  2021-04-09 11:14       ` Patrick Steinhardt
  0 siblings, 1 reply; 67+ messages in thread
From: Jeff King @ 2021-04-06 18:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git, Christian Couder, Taylor Blau

On Sat, Mar 20, 2021 at 02:10:41PM -0700, Junio C Hamano wrote:

> Patrick Steinhardt <ps@pks.im> writes:
> 
> > this is the second version of my patch series which implements a new
> > `object:type` filter for git-rev-parse(1) and git-upload-pack(1) and
> > extends support for bitmap indices to work with combined filters.
> > ...
> > Please see the attached range-diff for more details.
> 
> Any comment from stakeholders?

Sorry, this languished on my to-review list for a while.

I took a careful look. I found a few small nits, but the code overall
looks pretty good.

I do still find the use of the filter code here a _little_ bit
off-putting. It makes perfect sense in some ways: we are asking rev-list
to filter the output, and it keeps our implementation nice and simple.
It took me a while to figure out what I think makes it weird, but I
think it's:

  - the partial-clone feature exposes the filter mechanism in a very
    transparent way. So while it's not _wrong_ to be able to ask for a
    partial clone of only trees, it's an odd thing that nobody would
    really use in practice. And so it's a bit funny that it gets
    documented alongside blob:limit, etc.

  - for the same reason, it's very rigid. We have no way to say "this
    filter OR that filter", and are unlikely to grow them (because this
    is all part of the network protocol). Whereas it's perfectly
    reasonable for somebody to ask for "trees and blobs" via rev-list.

I dunno. Those aren't objections exactly. Just trying to put my finger
on why my initial reaction was "huh, why --filter?".

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/8] revision: mark commit parents as NOT_USER_GIVEN
  2021-04-06 17:30     ` Jeff King
@ 2021-04-09 10:19       ` Patrick Steinhardt
  0 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-09 10:19 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Christian Couder, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 2985 bytes --]

On Tue, Apr 06, 2021 at 01:30:57PM -0400, Jeff King wrote:
> On Mon, Mar 15, 2021 at 02:14:36PM +0100, Patrick Steinhardt wrote:
> 
> > The NOT_USER_GIVEN flag of an object marks whether a flag was explicitly
> > provided by the user or not. The most important use case for this is
> > when filtering objects: only objects that were not explicitly requested
> > will get filtered.
> > 
> > The flag is currently only set for blobs and trees, which has been fine
> > given that there are no filters for tags or commits currently. We're
> > about to extend filtering capabilities to add object type filter though,
> > which requires us to set up the NOT_USER_GIVEN flag correctly -- if it's
> > not set, the object wouldn't get filtered at all.
> > 
> > Mark unseen commit parents as NOT_USER_GIVEN when processing parents.
> > Like this, explicitly provided parents stay user-given and thus
> > unfiltered, while parents which get loaded as part of the graph walk
> > can be filtered.
> > 
> > This commit shouldn't have any user-visible impact yet as there is no
> > logic to filter commits yet.
> 
> I'm still scratching my head a bit to understand how NOT_USER_GIVEN can
> possibly be correct (as opposed to USER_GIVEN). If we visit the commit
> in a not-user-given context and add the flag, how do we know it wasn't
> _also_ visited in a user-given context?
> 
> Just guessing, but perhaps the SEEN flag is saving us here? If we visit
> the user-given commit itself first, then we give it the SEEN flag. Then
> if we try to visit it again via parent traversal, we've already
> processed it and don't add the NOT_USER_GIVEN flag here.

Yes, I think that's mostly it.

> That seems the opposite of the order we'd usually traverse, but I think
> we set SEEN on each commit in prepare_revision_walk(), before we do any
> traversing.
> 
> So I _think_ it all works even with your changes here, but I have to say
> this NOT_USER_GIVEN thing seems really fragile to me. Not new in your
> series, of course, but something we may want to look at.
> 
> Just grepping around, "rev-list -g" will happily remove SEEN flags, so I
> suspect it interacts badly with --filter. Just trying "rev-list -g
> --objects --filter=object:type=blob HEAD" shows that it produces quite a
> lot of commits (which I think is a more fundamental problem: it is not
> walking the parent chain at all to assign these NOT_USER_GIVEN flags).

I totally agree that this feels fragile, and developing this series with
NOT_USER_GIVEN wasn't the most enjoyable experience either. I wouldn't
love to be doing the conversion back to USER_GIVEN as part of this
series, but I wouldn't oppose doing that job either. Right now I don't
feel like I'm sufficiently sure that it's working for all cases, and
indeed your example with "rev-list -g" already shows one case where it's
breaking.

So let me know whether I should add the conversion as preparatory step.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 7/8] pack-bitmap: implement combined filter
  2021-04-06 17:54     ` Jeff King
@ 2021-04-09 10:31       ` Patrick Steinhardt
  2021-04-09 15:53         ` Jeff King
  2021-04-09 11:17       ` Patrick Steinhardt
  1 sibling, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-09 10:31 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Christian Couder, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 3063 bytes --]

On Tue, Apr 06, 2021 at 01:54:31PM -0400, Jeff King wrote:
> On Mon, Mar 15, 2021 at 02:14:59PM +0100, Patrick Steinhardt wrote:
> 
> > When the user has multiple objects filters specified, then this is
> > internally represented by having a "combined" filter. These combined
> > filters aren't yet supported by bitmap indices and can thus not be
> > accelerated.
> > 
> > Fix this by implementing support for these combined filters. The
> > implementation is quite trivial: when there's a combined filter, we
> > simply recurse into `filter_bitmap()` for all of the sub-filters.
> 
> The goal makes sense.
> 
> Before this patch, I think your test:
> 
> > +test_expect_success 'combine filter' '
> > +	git rev-list --objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
> > +	git rev-list --use-bitmap-index \
> > +		     --objects --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
> > +	test_bitmap_traversal expect actual
> > +'
> 
> would pass anyway, because we'd just skip using bitmaps. Is there a way
> we can tell that the bitmap code actually kicked in? Maybe a perf test
> would make it clear (those aren't always run, but hopefully we'd
> eventually notice a regression there).
> 
> > +static int filter_supported(struct list_objects_filter_options *filter)
> > +{
> > +	int i;
> > +
> > +	switch (filter->choice) {
> > +	case LOFC_BLOB_NONE:
> > +	case LOFC_BLOB_LIMIT:
> > +	case LOFC_OBJECT_TYPE:
> > +		return 1;
> > +	case LOFC_TREE_DEPTH:
> > +		if (filter->tree_exclude_depth == 0)
> > +			return 1;
> > +		return 0;
> > +	case LOFC_COMBINE:
> > +		for (i = 0; i < filter->sub_nr; i++)
> > +			if (!filter_supported(&filter->sub[i]))
> > +				return 0;
> > +		return 1;
> > +	default:
> > +		return 0;
> > +	}
> > +}
> 
> Hmm. This is essentially reproducing the list in filter_bitmap() of
> what's OK for bitmaps. So when adding a new filter, it would have to be
> added in both places.
> 
> Can we preserve that property of the original code? I'd think that just
> adding LOFC_COMBINE to filter_bitmap() would be sufficient. I.e., this
> hunk:
> 
> > +	if (filter->choice == LOFC_COMBINE) {
> > +		int i;
> > +		for (i = 0; i < filter->sub_nr; i++) {
> > +			filter_bitmap(bitmap_git, tip_objects, to_filter,
> > +				      &filter->sub[i]);
> > +		}
> > +		return 0;
> > +	}
> 
> ...except that we need to see if filter_bitmap() returns "-1" for any of
> the recursive calls. Which we probably should be doing anyway to
> propagate any errors (though I think the only "errors" we'd return are
> "not supported", at least for now).
> 
> -Peff

But wouldn't that mean that we're now needlessly filtering via bitmaps
all the way down the combined filters only to realize at the end that it
cannot work because we've got a tree filter with non-zero tree depth?
Granted, this will not be the common case. But it still feels like we're
doing needless work for cases where we know that bitmaps cannot answer
the query.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 8/8] rev-list: allow filtering of provided items
  2021-04-06 18:04     ` Jeff King
@ 2021-04-09 10:59       ` Patrick Steinhardt
  2021-04-09 15:58         ` Jeff King
  0 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-09 10:59 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Christian Couder, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 3247 bytes --]

On Tue, Apr 06, 2021 at 02:04:15PM -0400, Jeff King wrote:
> On Mon, Mar 15, 2021 at 02:15:05PM +0100, Patrick Steinhardt wrote:
> 
> > When providing an object filter, it is currently impossible to also
> > filter provided items. E.g. when executing `git rev-list HEAD` , the
> > commit this reference points to will be treated as user-provided and is
> > thus excluded from the filtering mechanism. This makes it harder than
> > necessary to properly use the new `--filter=object:type` filter given
> > that even if the user wants to only see blobs, he'll still see commits
> > of provided references.
> > 
> > Improve this by introducing a new `--filter-provided` option to the
> > git-rev-parse(1) command. If given, then all user-provided references
> > will be subject to filtering.
> 
> I think this option is a good thing to have.
> 
> The name seems a little confusing to me, as I can read is as both
> "please filter the provided objects" and "a filter has been provided".
> I guess "--filter-print-provided" would be more clear. And also the
> default, so you'd want "--no-filter-print-provided". That's kind of
> clunky, though. Maybe "--filter-omit-provided"?

Hum, "--filter-omit-provided" doesn't sound good to me, either. Omit to
me sounds like it'd omit filtering provided items, but we're doing
the reverse thing.

How about "--filter-provided-revisions"? Verbose, but at least it cannot
be confused with a filter being provided.

> > @@ -694,6 +698,16 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
> >  			return show_bisect_vars(&info, reaches, all);
> >  	}
> >  
> > +	if (filter_options.filter_wants) {
> > +		struct commit_list *c;
> > +		for (i = 0; i < revs.pending.nr; i++) {
> > +			struct object_array_entry *pending = revs.pending.objects + i;
> > +			pending->item->flags |= NOT_USER_GIVEN;
> > +		}
> > +		for (c = revs.commits; c; c = c->next)
> > +			c->item->object.flags |= NOT_USER_GIVEN;
> > +	}
> 
> You store the flag inside the filter_options struct, which implies to me
> that it's something that could be applied per-filter (at least in
> theory; the command line option doesn't allow us to distinguish).
> 
> But here you treat it as a global flag that munges the NOT_USER_GIVEN
> flags. Given that it's inside the filter_options struct, and that you
> propagate it via transform_to_combine_type(), I'd have expected the LOFC
> code to look at the flag and decide to ignore the whole user-given
> concept completely.
> 
> To be clear, I don't mind at all having it as a global that applies to
> all filters. I don't think the flexibility buys us anything. But since
> it only applies to rev-list, why not just make it a global option within
> rev-list?
[snip]

Fair point. This probably stems from the confusion where I initially
didn't realize that the filter_options is not a "global" options
structure, but in fact the filter itself already. That's also why there
had been the initial bug where converting filter options into a combined
filter led to `filter_wants` being dropped.

In any case, the resulting code with it being global to rev-list.c
instead of part of the options is a lot cleaner.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 0/8] rev-parse: implement object type filter
  2021-04-06 18:08     ` Jeff King
@ 2021-04-09 11:14       ` Patrick Steinhardt
  2021-04-09 16:05         ` Jeff King
  0 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-09 11:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Christian Couder, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 2835 bytes --]

On Tue, Apr 06, 2021 at 02:08:52PM -0400, Jeff King wrote:
> On Sat, Mar 20, 2021 at 02:10:41PM -0700, Junio C Hamano wrote:
> 
> > Patrick Steinhardt <ps@pks.im> writes:
> > 
> > > this is the second version of my patch series which implements a new
> > > `object:type` filter for git-rev-parse(1) and git-upload-pack(1) and
> > > extends support for bitmap indices to work with combined filters.
> > > ...
> > > Please see the attached range-diff for more details.
> > 
> > Any comment from stakeholders?
> 
> Sorry, this languished on my to-review list for a while.
> 
> I took a careful look. I found a few small nits, but the code overall
> looks pretty good.
> 
> I do still find the use of the filter code here a _little_ bit
> off-putting. It makes perfect sense in some ways: we are asking rev-list
> to filter the output, and it keeps our implementation nice and simple.
> It took me a while to figure out what I think makes it weird, but I
> think it's:
> 
>   - the partial-clone feature exposes the filter mechanism in a very
>     transparent way. So while it's not _wrong_ to be able to ask for a
>     partial clone of only trees, it's an odd thing that nobody would
>     really use in practice. And so it's a bit funny that it gets
>     documented alongside blob:limit, etc.
> 
>   - for the same reason, it's very rigid. We have no way to say "this
>     filter OR that filter", and are unlikely to grow them (because this
>     is all part of the network protocol). Whereas it's perfectly
>     reasonable for somebody to ask for "trees and blobs" via rev-list.
> 
> I dunno. Those aren't objections exactly. Just trying to put my finger
> on why my initial reaction was "huh, why --filter?".

Yeah, I do kind of share these concerns. Ideally, we'd provide a nicer
only-user-facing interface to query the repository for various objects.
git-cat-file(1) would be the obvious thing that first gets into my mind,
where it would be nice to have it filter stuff. But then on the other
hand, it's really rather a simple "Give me what I tell you to" binary,
which is probably a good thing. Other than that I don't think there's
any executable that'd be a good fit -- we could do this via a new
git-list-objects(1), but then again git-rev-list(1) already does most of
what git-list-objects(1) would do, so why bother.

It kind of feels like git-checkout(1) to me: it does many things, and if
you know how to wield it it works perfectly fine. But the user interface
is lacking, which is why it was split up into git-switch(1) and
git-restore(1). It's telling already that the summary of git-rev-list(1)
is "Lists commit objects in reverse chronological order". I mean yes,
that's what it does in many cases. But there's just as many cases where
it doesn't.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 7/8] pack-bitmap: implement combined filter
  2021-04-06 17:54     ` Jeff King
  2021-04-09 10:31       ` Patrick Steinhardt
@ 2021-04-09 11:17       ` Patrick Steinhardt
  2021-04-09 15:55         ` Jeff King
  1 sibling, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-09 11:17 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Christian Couder, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 2059 bytes --]

On Tue, Apr 06, 2021 at 01:54:31PM -0400, Jeff King wrote:
> On Mon, Mar 15, 2021 at 02:14:59PM +0100, Patrick Steinhardt wrote:
> 
> > When the user has multiple objects filters specified, then this is
> > internally represented by having a "combined" filter. These combined
> > filters aren't yet supported by bitmap indices and can thus not be
> > accelerated.
> > 
> > Fix this by implementing support for these combined filters. The
> > implementation is quite trivial: when there's a combined filter, we
> > simply recurse into `filter_bitmap()` for all of the sub-filters.
> 
> The goal makes sense.
> 
> Before this patch, I think your test:
> 
> > +test_expect_success 'combine filter' '
> > +	git rev-list --objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
> > +	git rev-list --use-bitmap-index \
> > +		     --objects --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
> > +	test_bitmap_traversal expect actual
> > +'
> 
> would pass anyway, because we'd just skip using bitmaps. Is there a way
> we can tell that the bitmap code actually kicked in? Maybe a perf test
> would make it clear (those aren't always run, but hopefully we'd
> eventually notice a regression there).

I think that's not actually true. Note that we're using
`test_bitmap_traversal`:

    test_bitmap_traversal () {
        if test "$1" = "--no-confirm-bitmaps"
        then
            shift
        elif cmp "$1" "$2"
        then
            echo >&2 "identical raw outputs; are you sure bitmaps were used?"
            return 1
        fi &&
        cut -d' ' -f1 "$1" | sort >"$1.normalized" &&
        sort "$2" >"$2.normalized" &&
        test_cmp "$1.normalized" "$2.normalized" &&
        rm -f "$1.normalized" "$2.normalized"
    }

The output is different when using bitmap indices, which is why the
function knows to fail in case output is the same in both cases. So we
know that it cannot be the same here and thus we also know that the
bitmap case kicked in.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v3 0/8] rev-parse: implement object type filter
  2021-03-15 13:14 ` [PATCH v2 0/8] " Patrick Steinhardt
                     ` (8 preceding siblings ...)
  2021-03-20 21:10   ` [PATCH v2 0/8] rev-parse: implement object type filter Junio C Hamano
@ 2021-04-09 11:27   ` Patrick Steinhardt
  2021-04-09 11:27     ` [PATCH v3 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
                       ` (9 more replies)
  9 siblings, 10 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-09 11:27 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Christian Couder, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 15084 bytes --]

Hi,

this is the third version of my patch series which implements a new
`object:type` filter for git-rev-parse(1) and git-upload-pack(1) and
extends support for bitmap indices to work with combined filters.

This mostly addresses Peff's comments. Thanks for your feedback!

    - Removed the `base` parameter from `process_tag()`.

    - The object type filter doesn't assume ordering for the object type
      enum anymore.

    - Combined filters in the bitmap path now verify that
      `filter_bitmap` does not return any errors.

    - Renamed "--filter-provided" to "--filter-provided-revisions" and
      added documentation for it.

    - Refactored the code to not munge the `filter_provided` field in
      the filter options struct, but instead carry it in rev-list.c.

Please see the attached range-diff for more details.

Patrick

Patrick Steinhardt (8):
  uploadpack.txt: document implication of `uploadpackfilter.allow`
  revision: mark commit parents as NOT_USER_GIVEN
  list-objects: move tag processing into its own function
  list-objects: support filtering by tag and commit
  list-objects: implement object type filter
  pack-bitmap: implement object type filter
  pack-bitmap: implement combined filter
  rev-list: allow filtering of provided items

 Documentation/config/uploadpack.txt |   9 ++-
 Documentation/rev-list-options.txt  |   8 ++
 builtin/pack-objects.c              |   2 +-
 builtin/rev-list.c                  |  36 ++++++---
 list-objects-filter-options.c       |  14 ++++
 list-objects-filter-options.h       |   2 +
 list-objects-filter.c               | 116 ++++++++++++++++++++++++++++
 list-objects-filter.h               |   2 +
 list-objects.c                      |  29 ++++++-
 pack-bitmap.c                       |  76 +++++++++++++++---
 pack-bitmap.h                       |   3 +-
 reachable.c                         |   2 +-
 revision.c                          |   4 +-
 revision.h                          |   3 -
 t/t6112-rev-list-filters-objects.sh |  76 ++++++++++++++++++
 t/t6113-rev-list-bitmap-filters.sh  |  68 +++++++++++++++-
 16 files changed, 416 insertions(+), 34 deletions(-)

Range-diff against v2:
1:  270ff80dac = 1:  f80b9570d4 uploadpack.txt: document implication of `uploadpackfilter.allow`
2:  ddbec75986 = 2:  46c1952405 revision: mark commit parents as NOT_USER_GIVEN
3:  d8da0b24f4 ! 3:  3d792f6339 list-objects: move tag processing into its own function
    @@ list-objects.c: static void process_tree(struct traversal_context *ctx,
      
     +static void process_tag(struct traversal_context *ctx,
     +			struct tag *tag,
    -+			struct strbuf *base,
     +			const char *name)
     +{
     +	tag->object.flags |= SEEN;
    @@ list-objects.c: static void traverse_trees_and_blobs(struct traversal_context *c
      		if (obj->type == OBJ_TAG) {
     -			obj->flags |= SEEN;
     -			ctx->show_object(obj, name, ctx->show_data);
    -+			process_tag(ctx, (struct tag *)obj, base, name);
    ++			process_tag(ctx, (struct tag *)obj, name);
      			continue;
      		}
      		if (!path)
4:  5545c189c5 ! 4:  80193d6ba3 list-objects: support filtering by tag and commit
    @@ list-objects-filter.h: enum list_objects_filter_result {
     
      ## list-objects.c ##
     @@ list-objects.c: static void process_tag(struct traversal_context *ctx,
    - 			struct strbuf *base,
    + 			struct tag *tag,
      			const char *name)
      {
     -	tag->object.flags |= SEEN;
    @@ list-objects.c: static void process_tag(struct traversal_context *ctx,
     +	enum list_objects_filter_result r;
     +
     +	r = list_objects_filter__filter_object(ctx->revs->repo, LOFS_TAG,
    -+					       &tag->object, base->buf,
    -+					       &base->buf[base->len],
    -+					       ctx->filter);
    ++					       &tag->object, "", 0, ctx->filter);
     +	if (r & LOFR_MARK_SEEN)
     +		tag->object.flags |= SEEN;
     +	if (r & LOFR_DO_SHOW)
5:  acf01472af = 5:  e2a14abf92 list-objects: implement object type filter
6:  8073ab665b ! 6:  46d4450d38 pack-bitmap: implement object type filter
    @@ pack-bitmap.c: static void filter_bitmap_tree_depth(struct bitmap_index *bitmap_
     +				      struct bitmap *to_filter,
     +				      enum object_type object_type)
     +{
    -+	enum object_type t;
    -+
     +	if (object_type < OBJ_COMMIT || object_type > OBJ_TAG)
     +		BUG("filter_bitmap_object_type given invalid object");
     +
    -+	for (t = OBJ_COMMIT; t <= OBJ_TAG; t++) {
    -+		if (t == object_type)
    -+			continue;
    -+		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, t);
    -+	}
    ++	if (object_type != OBJ_TAG)
    ++		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, OBJ_TAG);
    ++	if (object_type != OBJ_COMMIT)
    ++		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, OBJ_COMMIT);
    ++	if (object_type != OBJ_TREE)
    ++		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, OBJ_TREE);
    ++	if (object_type != OBJ_BLOB)
    ++		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, OBJ_BLOB);
     +}
     +
      static int filter_bitmap(struct bitmap_index *bitmap_git,
7:  fac3477d97 ! 7:  06a376399b pack-bitmap: implement combined filter
    @@ Commit message
     
      ## pack-bitmap.c ##
     @@ pack-bitmap.c: static void filter_bitmap_object_type(struct bitmap_index *bitmap_git,
    - 	}
    + 		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, OBJ_BLOB);
      }
      
     +static int filter_supported(struct list_objects_filter_options *filter)
    @@ pack-bitmap.c: static int filter_bitmap(struct bitmap_index *bitmap_git,
     +	if (filter->choice == LOFC_COMBINE) {
     +		int i;
     +		for (i = 0; i < filter->sub_nr; i++) {
    -+			filter_bitmap(bitmap_git, tip_objects, to_filter,
    -+				      &filter->sub[i]);
    ++			if (filter_bitmap(bitmap_git, tip_objects, to_filter,
    ++					  &filter->sub[i]) < 0)
    ++				return -1;
     +		}
     +		return 0;
     +	}
8:  0e26fee8b3 ! 8:  796606f32b rev-list: allow filtering of provided items
    @@ Commit message
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
    + ## Documentation/rev-list-options.txt ##
    +@@ Documentation/rev-list-options.txt: equivalent.
    + --no-filter::
    + 	Turn off any previous `--filter=` argument.
    + 
    ++--filter-provided-revisions::
    ++	Filter the list of explicitly provided revisions, which would otherwise
    ++	always be printed even if they did not match any of the filters. Only
    ++	useful with `--filter=`.
    ++
    + --filter-print-omitted::
    + 	Only useful with `--filter=`; prints a list of the objects omitted
    + 	by the filter.  Object IDs are prefixed with a ``~'' character.
    +
    + ## builtin/pack-objects.c ##
    +@@ builtin/pack-objects.c: static int pack_options_allow_reuse(void)
    + 
    + static int get_object_list_from_bitmap(struct rev_info *revs)
    + {
    +-	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options)))
    ++	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options, 0)))
    + 		return -1;
    + 
    + 	if (pack_options_allow_reuse() &&
    +
      ## builtin/rev-list.c ##
    +@@ builtin/rev-list.c: static inline int parse_missing_action_value(const char *value)
    + }
    + 
    + static int try_bitmap_count(struct rev_info *revs,
    +-			    struct list_objects_filter_options *filter)
    ++			    struct list_objects_filter_options *filter,
    ++			    int filter_provided_revs)
    + {
    + 	uint32_t commit_count = 0,
    + 		 tag_count = 0,
    +@@ builtin/rev-list.c: static int try_bitmap_count(struct rev_info *revs,
    + 	 */
    + 	max_count = revs->max_count;
    + 
    +-	bitmap_git = prepare_bitmap_walk(revs, filter);
    ++	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_revs);
    + 	if (!bitmap_git)
    + 		return -1;
    + 
    +@@ builtin/rev-list.c: static int try_bitmap_count(struct rev_info *revs,
    + }
    + 
    + static int try_bitmap_traversal(struct rev_info *revs,
    +-				struct list_objects_filter_options *filter)
    ++				struct list_objects_filter_options *filter,
    ++				int filter_provided_revs)
    + {
    + 	struct bitmap_index *bitmap_git;
    + 
    +@@ builtin/rev-list.c: static int try_bitmap_traversal(struct rev_info *revs,
    + 	if (revs->max_count >= 0)
    + 		return -1;
    + 
    +-	bitmap_git = prepare_bitmap_walk(revs, filter);
    ++	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_revs);
    + 	if (!bitmap_git)
    + 		return -1;
    + 
    +@@ builtin/rev-list.c: static int try_bitmap_traversal(struct rev_info *revs,
    + }
    + 
    + static int try_bitmap_disk_usage(struct rev_info *revs,
    +-				 struct list_objects_filter_options *filter)
    ++				 struct list_objects_filter_options *filter,
    ++				 int filter_provided_revs)
    + {
    + 	struct bitmap_index *bitmap_git;
    + 
    + 	if (!show_disk_usage)
    + 		return -1;
    + 
    +-	bitmap_git = prepare_bitmap_walk(revs, filter);
    ++	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_revs);
    + 	if (!bitmap_git)
    + 		return -1;
    + 
    +@@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *prefix)
    + 	int bisect_show_vars = 0;
    + 	int bisect_find_all = 0;
    + 	int use_bitmap_index = 0;
    ++	int filter_provided_revs = 0;
    + 	const char *show_progress = NULL;
    + 
    + 	if (argc == 2 && !strcmp(argv[1], "-h"))
     @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *prefix)
      			list_objects_filter_set_no_filter(&filter_options);
      			continue;
      		}
    -+		if (!strcmp(arg, "--filter-provided")) {
    -+			filter_options.filter_wants = 1;
    ++		if (!strcmp(arg, "--filter-provided-revisions")) {
    ++			filter_provided_revs = 1;
     +			continue;
     +		}
      		if (!strcmp(arg, "--filter-print-omitted")) {
      			arg_print_omitted = 1;
      			continue;
    +@@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *prefix)
    + 		progress = start_delayed_progress(show_progress, 0);
    + 
    + 	if (use_bitmap_index) {
    +-		if (!try_bitmap_count(&revs, &filter_options))
    ++		if (!try_bitmap_count(&revs, &filter_options, filter_provided_revs))
    + 			return 0;
    +-		if (!try_bitmap_disk_usage(&revs, &filter_options))
    ++		if (!try_bitmap_disk_usage(&revs, &filter_options, filter_provided_revs))
    + 			return 0;
    +-		if (!try_bitmap_traversal(&revs, &filter_options))
    ++		if (!try_bitmap_traversal(&revs, &filter_options, filter_provided_revs))
    + 			return 0;
    + 	}
    + 
     @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *prefix)
      			return show_bisect_vars(&info, reaches, all);
      	}
      
    -+	if (filter_options.filter_wants) {
    ++	if (filter_provided_revs) {
     +		struct commit_list *c;
     +		for (i = 0; i < revs.pending.nr; i++) {
     +			struct object_array_entry *pending = revs.pending.objects + i;
    @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
      		oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
      	if (arg_missing_action == MA_PRINT)
     
    - ## list-objects-filter-options.c ##
    -@@ list-objects-filter-options.c: static void transform_to_combine_type(
    - 		memset(filter_options, 0, sizeof(*filter_options));
    - 		filter_options->sub = sub_array;
    - 		filter_options->sub_alloc = initial_sub_alloc;
    -+		filter_options->filter_wants = sub_array[0].filter_wants;
    - 	}
    - 	filter_options->sub_nr = 1;
    - 	filter_options->choice = LOFC_COMBINE;
    -@@ list-objects-filter-options.c: void parse_list_objects_filter(
    - 		parse_error = gently_parse_list_objects_filter(
    - 			&filter_options->sub[filter_options->sub_nr - 1], arg,
    - 			&errbuf);
    -+		if (!parse_error)
    -+			filter_options->sub[filter_options->sub_nr - 1].filter_wants =
    -+				filter_options->filter_wants;
    - 	}
    - 	if (parse_error)
    - 		die("%s", errbuf.buf);
    -
    - ## list-objects-filter-options.h ##
    -@@ list-objects-filter-options.h: struct list_objects_filter_options {
    - 	 */
    - 	enum list_objects_filter_choice choice;
    - 
    -+	/*
    -+	 * "--filter-provided" was given by the user, instructing us to also
    -+	 * filter all explicitly provided objects.
    -+	 */
    -+	unsigned int filter_wants : 1;
    -+
    - 	/*
    - 	 * Choice is LOFC_DISABLED because "--no-filter" was requested.
    - 	 */
    -
      ## pack-bitmap.c ##
    +@@ pack-bitmap.c: static int can_filter_bitmap(struct list_objects_filter_options *filter)
    + }
    + 
    + struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
    +-					 struct list_objects_filter_options *filter)
    ++					 struct list_objects_filter_options *filter,
    ++					 int filter_provided_revs)
    + {
    + 	unsigned int i;
    + 
     @@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
      	if (haves_bitmap)
      		bitmap_and_not(wants_bitmap, haves_bitmap);
      
     -	filter_bitmap(bitmap_git, wants, wants_bitmap, filter);
    -+	filter_bitmap(bitmap_git, (filter && filter->filter_wants) ? NULL : wants,
    ++	filter_bitmap(bitmap_git, (filter && filter_provided_revs) ? NULL : wants,
     +		      wants_bitmap, filter);
      
      	bitmap_git->result = wants_bitmap;
      	bitmap_git->haves = haves_bitmap;
     
    + ## pack-bitmap.h ##
    +@@ pack-bitmap.h: void traverse_bitmap_commit_list(struct bitmap_index *,
    + 				 show_reachable_fn show_reachable);
    + void test_bitmap_walk(struct rev_info *revs);
    + struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
    +-					 struct list_objects_filter_options *filter);
    ++					 struct list_objects_filter_options *filter,
    ++					 int filter_provided_revs);
    + int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
    + 				       struct packed_git **packfile,
    + 				       uint32_t *entries,
    +
    + ## reachable.c ##
    +@@ reachable.c: void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
    + 	cp.progress = progress;
    + 	cp.count = 0;
    + 
    +-	bitmap_git = prepare_bitmap_walk(revs, NULL);
    ++	bitmap_git = prepare_bitmap_walk(revs, NULL, 0);
    + 	if (bitmap_git) {
    + 		traverse_bitmap_commit_list(bitmap_git, revs, mark_object_seen);
    + 		free_bitmap_index(bitmap_git);
    +
      ## t/t6112-rev-list-filters-objects.sh ##
     @@ t/t6112-rev-list-filters-objects.sh: test_expect_success 'verify object:type=tag prints tag' '
      	test_cmp expected actual
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v3 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow`
  2021-04-09 11:27   ` [PATCH v3 " Patrick Steinhardt
@ 2021-04-09 11:27     ` Patrick Steinhardt
  2021-04-09 11:27     ` [PATCH v3 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
                       ` (8 subsequent siblings)
  9 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-09 11:27 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Christian Couder, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 1317 bytes --]

When `uploadpackfilter.allow` is set to `true`, it means that filters
are enabled by default except in the case where a filter is explicitly
disabled via `uploadpackilter.<filter>.allow`. This option will not only
enable the currently supported set of filters, but also any filters
which get added in the future. As such, an admin which wants to have
tight control over which filters are allowed and which aren't probably
shouldn't ever set `uploadpackfilter.allow=true`.

Amend the documentation to make the ramifications more explicit so that
admins are aware of this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/uploadpack.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/uploadpack.txt b/Documentation/config/uploadpack.txt
index b0d761282c..6729a072ea 100644
--- a/Documentation/config/uploadpack.txt
+++ b/Documentation/config/uploadpack.txt
@@ -59,7 +59,8 @@ uploadpack.allowFilter::
 
 uploadpackfilter.allow::
 	Provides a default value for unspecified object filters (see: the
-	below configuration variable).
+	below configuration variable). If set to `true`, this will also
+	enable all filters which get added in the future.
 	Defaults to `true`.
 
 uploadpackfilter.<filter>.allow::
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v3 2/8] revision: mark commit parents as NOT_USER_GIVEN
  2021-04-09 11:27   ` [PATCH v3 " Patrick Steinhardt
  2021-04-09 11:27     ` [PATCH v3 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
@ 2021-04-09 11:27     ` Patrick Steinhardt
  2021-04-09 11:28     ` [PATCH v3 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
                       ` (7 subsequent siblings)
  9 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-09 11:27 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Christian Couder, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 2338 bytes --]

The NOT_USER_GIVEN flag of an object marks whether a flag was explicitly
provided by the user or not. The most important use case for this is
when filtering objects: only objects that were not explicitly requested
will get filtered.

The flag is currently only set for blobs and trees, which has been fine
given that there are no filters for tags or commits currently. We're
about to extend filtering capabilities to add object type filter though,
which requires us to set up the NOT_USER_GIVEN flag correctly -- if it's
not set, the object wouldn't get filtered at all.

Mark unseen commit parents as NOT_USER_GIVEN when processing parents.
Like this, explicitly provided parents stay user-given and thus
unfiltered, while parents which get loaded as part of the graph walk
can be filtered.

This commit shouldn't have any user-visible impact yet as there is no
logic to filter commits yet.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 revision.c | 4 ++--
 revision.h | 3 ---
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/revision.c b/revision.c
index 553c0faa9b..fd34c75e23 100644
--- a/revision.c
+++ b/revision.c
@@ -1123,7 +1123,7 @@ static int process_parents(struct rev_info *revs, struct commit *commit,
 				mark_parents_uninteresting(p);
 			if (p->object.flags & SEEN)
 				continue;
-			p->object.flags |= SEEN;
+			p->object.flags |= (SEEN | NOT_USER_GIVEN);
 			if (list)
 				commit_list_insert_by_date(p, list);
 			if (queue)
@@ -1165,7 +1165,7 @@ static int process_parents(struct rev_info *revs, struct commit *commit,
 		}
 		p->object.flags |= left_flag;
 		if (!(p->object.flags & SEEN)) {
-			p->object.flags |= SEEN;
+			p->object.flags |= (SEEN | NOT_USER_GIVEN);
 			if (list)
 				commit_list_insert_by_date(p, list);
 			if (queue)
diff --git a/revision.h b/revision.h
index a24f72dcd1..93aa012f51 100644
--- a/revision.h
+++ b/revision.h
@@ -44,9 +44,6 @@
 /*
  * Indicates object was reached by traversal. i.e. not given by user on
  * command-line or stdin.
- * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
- * filtering trees and blobs, but it may be useful to support filtering commits
- * in the future.
  */
 #define NOT_USER_GIVEN	(1u<<25)
 #define TRACK_LINEAR	(1u<<26)
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v3 3/8] list-objects: move tag processing into its own function
  2021-04-09 11:27   ` [PATCH v3 " Patrick Steinhardt
  2021-04-09 11:27     ` [PATCH v3 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
  2021-04-09 11:27     ` [PATCH v3 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
@ 2021-04-09 11:28     ` Patrick Steinhardt
  2021-04-09 11:28     ` [PATCH v3 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
                       ` (6 subsequent siblings)
  9 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-09 11:28 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Christian Couder, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 1259 bytes --]

Move processing of tags into its own function to make the logic easier
to extend when we're going to implement filtering for tags. No change in
behaviour is expected from this commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 list-objects.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index e19589baa0..a5a60301cb 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -213,6 +213,14 @@ static void process_tree(struct traversal_context *ctx,
 	free_tree_buffer(tree);
 }
 
+static void process_tag(struct traversal_context *ctx,
+			struct tag *tag,
+			const char *name)
+{
+	tag->object.flags |= SEEN;
+	ctx->show_object(&tag->object, name, ctx->show_data);
+}
+
 static void mark_edge_parents_uninteresting(struct commit *commit,
 					    struct rev_info *revs,
 					    show_edge_fn show_edge)
@@ -334,8 +342,7 @@ static void traverse_trees_and_blobs(struct traversal_context *ctx,
 		if (obj->flags & (UNINTERESTING | SEEN))
 			continue;
 		if (obj->type == OBJ_TAG) {
-			obj->flags |= SEEN;
-			ctx->show_object(obj, name, ctx->show_data);
+			process_tag(ctx, (struct tag *)obj, name);
 			continue;
 		}
 		if (!path)
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v3 4/8] list-objects: support filtering by tag and commit
  2021-04-09 11:27   ` [PATCH v3 " Patrick Steinhardt
                       ` (2 preceding siblings ...)
  2021-04-09 11:28     ` [PATCH v3 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
@ 2021-04-09 11:28     ` Patrick Steinhardt
  2021-04-11  6:49       ` Junio C Hamano
  2021-04-09 11:28     ` [PATCH v3 5/8] list-objects: implement object type filter Patrick Steinhardt
                       ` (5 subsequent siblings)
  9 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-09 11:28 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Christian Couder, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 4789 bytes --]

Object filters currently only support filtering blobs or trees based on
some criteria. This commit lays the foundation to also allow filtering
of tags and commits.

No change in behaviour is expected from this commit given that there are
no filters yet for those object types.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 list-objects-filter.c | 40 ++++++++++++++++++++++++++++++++++++++++
 list-objects-filter.h |  2 ++
 list-objects.c        | 22 +++++++++++++++++++---
 3 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/list-objects-filter.c b/list-objects-filter.c
index 39e2f15333..0ebfa52966 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -82,6 +82,16 @@ static enum list_objects_filter_result filter_blobs_none(
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		/* always include all tag objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		/* always include all commit objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		/* always include all tree objects */
@@ -173,6 +183,16 @@ static enum list_objects_filter_result filter_trees_depth(
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		/* always include all tag objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		/* always include all commit objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
 	case LOFS_END_TREE:
 		assert(obj->type == OBJ_TREE);
 		filter_data->current_depth--;
@@ -267,6 +287,16 @@ static enum list_objects_filter_result filter_blobs_limit(
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		/* always include all tag objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		/* always include all commit objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		/* always include all tree objects */
@@ -371,6 +401,16 @@ static enum list_objects_filter_result filter_sparse(
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		/* always include all tag objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		/* always include all commit objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		dtype = DT_DIR;
diff --git a/list-objects-filter.h b/list-objects-filter.h
index cfd784e203..9e98814111 100644
--- a/list-objects-filter.h
+++ b/list-objects-filter.h
@@ -55,6 +55,8 @@ enum list_objects_filter_result {
 };
 
 enum list_objects_filter_situation {
+	LOFS_COMMIT,
+	LOFS_TAG,
 	LOFS_BEGIN_TREE,
 	LOFS_END_TREE,
 	LOFS_BLOB
diff --git a/list-objects.c b/list-objects.c
index a5a60301cb..0c524a81ac 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -217,8 +217,14 @@ static void process_tag(struct traversal_context *ctx,
 			struct tag *tag,
 			const char *name)
 {
-	tag->object.flags |= SEEN;
-	ctx->show_object(&tag->object, name, ctx->show_data);
+	enum list_objects_filter_result r;
+
+	r = list_objects_filter__filter_object(ctx->revs->repo, LOFS_TAG,
+					       &tag->object, "", 0, ctx->filter);
+	if (r & LOFR_MARK_SEEN)
+		tag->object.flags |= SEEN;
+	if (r & LOFR_DO_SHOW)
+		ctx->show_object(&tag->object, name, ctx->show_data);
 }
 
 static void mark_edge_parents_uninteresting(struct commit *commit,
@@ -368,6 +374,12 @@ static void do_traverse(struct traversal_context *ctx)
 	strbuf_init(&csp, PATH_MAX);
 
 	while ((commit = get_revision(ctx->revs)) != NULL) {
+		enum list_objects_filter_result r;
+
+		r = list_objects_filter__filter_object(ctx->revs->repo,
+				LOFS_COMMIT, &commit->object,
+				NULL, NULL, ctx->filter);
+
 		/*
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
@@ -382,7 +394,11 @@ static void do_traverse(struct traversal_context *ctx)
 			die(_("unable to load root tree for commit %s"),
 			      oid_to_hex(&commit->object.oid));
 		}
-		ctx->show_commit(commit, ctx->show_data);
+
+		if (r & LOFR_MARK_SEEN)
+			commit->object.flags |= SEEN;
+		if (r & LOFR_DO_SHOW)
+			ctx->show_commit(commit, ctx->show_data);
 
 		if (ctx->revs->tree_blobs_in_commit_order)
 			/*
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v3 5/8] list-objects: implement object type filter
  2021-04-09 11:27   ` [PATCH v3 " Patrick Steinhardt
                       ` (3 preceding siblings ...)
  2021-04-09 11:28     ` [PATCH v3 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
@ 2021-04-09 11:28     ` Patrick Steinhardt
  2021-04-09 11:28     ` [PATCH v3 6/8] pack-bitmap: " Patrick Steinhardt
                       ` (4 subsequent siblings)
  9 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-09 11:28 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Christian Couder, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 9502 bytes --]

While it already is possible to filter objects by some criteria in
git-rev-list(1), it is not yet possible to filter out only a specific
type of objects. This makes some filters less useful. The `blob:limit`
filter for example filters blobs such that only those which are smaller
than the given limit are returned. But it is unfit to ask only for these
smallish blobs, given that git-rev-list(1) will continue to print tags,
commits and trees.

Now that we have the infrastructure in place to also filter tags and
commits, we can improve this situation by implementing a new filter
which selects objects based on their type. Above query can thus
trivially be implemented with the following command:

    $ git rev-list --objects --filter=object:type=blob \
        --filter=blob:limit=200

Furthermore, this filter allows to optimize for certain other cases: if
for example only tags or commits have been selected, there is no need to
walk down trees.

The new filter is not yet supported in bitmaps. This is going to be
implemented in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/uploadpack.txt |  6 +--
 Documentation/rev-list-options.txt  |  3 ++
 list-objects-filter-options.c       | 14 ++++++
 list-objects-filter-options.h       |  2 +
 list-objects-filter.c               | 76 +++++++++++++++++++++++++++++
 t/t6112-rev-list-filters-objects.sh | 48 ++++++++++++++++++
 6 files changed, 146 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/uploadpack.txt b/Documentation/config/uploadpack.txt
index 6729a072ea..32fad5bbe8 100644
--- a/Documentation/config/uploadpack.txt
+++ b/Documentation/config/uploadpack.txt
@@ -66,9 +66,9 @@ uploadpackfilter.allow::
 uploadpackfilter.<filter>.allow::
 	Explicitly allow or ban the object filter corresponding to
 	`<filter>`, where `<filter>` may be one of: `blob:none`,
-	`blob:limit`, `tree`, `sparse:oid`, or `combine`. If using
-	combined filters, both `combine` and all of the nested filter
-	kinds must be allowed. Defaults to `uploadpackfilter.allow`.
+	`blob:limit`, `object:type`, `tree`, `sparse:oid`, or `combine`.
+	If using combined filters, both `combine` and all of the nested
+	filter kinds must be allowed. Defaults to `uploadpackfilter.allow`.
 
 uploadpackfilter.tree.maxDepth::
 	Only allow `--filter=tree:<n>` when `<n>` is no more than the value of
diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index b1c8f86c6e..3afa8fffbd 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -892,6 +892,9 @@ or units.  n may be zero.  The suffixes k, m, and g can be used to name
 units in KiB, MiB, or GiB.  For example, 'blob:limit=1k' is the same
 as 'blob:limit=1024'.
 +
+The form '--filter=object:type=(tag|commit|tree|blob)' omits all objects
+which are not of the requested type.
++
 The form '--filter=sparse:oid=<blob-ish>' uses a sparse-checkout
 specification contained in the blob (or blob-expression) '<blob-ish>'
 to omit blobs that would not be not required for a sparse checkout on
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index d2d1c81caf..bb6f6577d5 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -29,6 +29,8 @@ const char *list_object_filter_config_name(enum list_objects_filter_choice c)
 		return "tree";
 	case LOFC_SPARSE_OID:
 		return "sparse:oid";
+	case LOFC_OBJECT_TYPE:
+		return "object:type";
 	case LOFC_COMBINE:
 		return "combine";
 	case LOFC__COUNT:
@@ -97,6 +99,18 @@ static int gently_parse_list_objects_filter(
 		}
 		return 1;
 
+	} else if (skip_prefix(arg, "object:type=", &v0)) {
+		int type = type_from_string_gently(v0, -1, 1);
+		if (type < 0) {
+			strbuf_addstr(errbuf, _("expected 'object:type=<type>'"));
+			return 1;
+		}
+
+		filter_options->object_type = type;
+		filter_options->choice = LOFC_OBJECT_TYPE;
+
+		return 0;
+
 	} else if (skip_prefix(arg, "combine:", &v0)) {
 		return parse_combine_filter(filter_options, v0, errbuf);
 
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 01767c3c96..4d0d0588cc 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -13,6 +13,7 @@ enum list_objects_filter_choice {
 	LOFC_BLOB_LIMIT,
 	LOFC_TREE_DEPTH,
 	LOFC_SPARSE_OID,
+	LOFC_OBJECT_TYPE,
 	LOFC_COMBINE,
 	LOFC__COUNT /* must be last */
 };
@@ -54,6 +55,7 @@ struct list_objects_filter_options {
 	char *sparse_oid_name;
 	unsigned long blob_limit_value;
 	unsigned long tree_exclude_depth;
+	enum object_type object_type;
 
 	/* LOFC_COMBINE values */
 
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 0ebfa52966..1c1ee3d1bb 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -545,6 +545,81 @@ static void filter_sparse_oid__init(
 	filter->free_fn = filter_sparse_free;
 }
 
+/*
+ * A filter for list-objects to omit large blobs.
+ * And to OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_object_type_data {
+	enum object_type object_type;
+};
+
+static enum list_objects_filter_result filter_object_type(
+	struct repository *r,
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	struct oidset *omits,
+	void *filter_data_)
+{
+	struct filter_object_type_data *filter_data = filter_data_;
+
+	switch (filter_situation) {
+	default:
+		BUG("unknown filter_situation: %d", filter_situation);
+
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		if (filter_data->object_type == OBJ_TAG)
+			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+		return LOFR_MARK_SEEN;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		if (filter_data->object_type == OBJ_COMMIT)
+			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+		return LOFR_MARK_SEEN;
+
+	case LOFS_BEGIN_TREE:
+		assert(obj->type == OBJ_TREE);
+
+		/*
+		 * If we only want to show commits or tags, then there is no
+		 * need to walk down trees.
+		 */
+		if (filter_data->object_type == OBJ_COMMIT ||
+		    filter_data->object_type == OBJ_TAG)
+			return LOFR_SKIP_TREE;
+
+		if (filter_data->object_type == OBJ_TREE)
+			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+		return LOFR_MARK_SEEN;
+
+	case LOFS_BLOB:
+		assert(obj->type == OBJ_BLOB);
+
+		if (filter_data->object_type == OBJ_BLOB)
+			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+		return LOFR_MARK_SEEN;
+
+	case LOFS_END_TREE:
+		return LOFR_ZERO;
+	}
+}
+
+static void filter_object_type__init(
+	struct list_objects_filter_options *filter_options,
+	struct filter *filter)
+{
+	struct filter_object_type_data *d = xcalloc(1, sizeof(*d));
+	d->object_type = filter_options->object_type;
+
+	filter->filter_data = d;
+	filter->filter_object_fn = filter_object_type;
+	filter->free_fn = free;
+}
+
 /* A filter which only shows objects shown by all sub-filters. */
 struct combine_filter_data {
 	struct subfilter *sub;
@@ -691,6 +766,7 @@ static filter_init_fn s_filters[] = {
 	filter_blobs_limit__init,
 	filter_trees_depth__init,
 	filter_sparse_oid__init,
+	filter_object_type__init,
 	filter_combine__init,
 };
 
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 31457d13b9..c79ec04060 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -159,6 +159,54 @@ test_expect_success 'verify blob:limit=1m' '
 	test_must_be_empty observed
 '
 
+# Test object:type=<type> filter.
+
+test_expect_success 'setup object-type' '
+	git init object-type &&
+	echo contents >object-type/blob &&
+	git -C object-type add blob &&
+	git -C object-type commit -m commit-message &&
+	git -C object-type tag tag -m tag-message
+'
+
+test_expect_success 'verify object:type= fails with invalid type' '
+	test_must_fail git -C object-type rev-list --objects --filter=object:type= HEAD &&
+	test_must_fail git -C object-type rev-list --objects --filter=object:type=invalid HEAD
+'
+
+test_expect_success 'verify object:type=blob prints blob and commit' '
+	(
+		git -C object-type rev-parse HEAD &&
+		printf "%s blob\n" $(git -C object-type rev-parse HEAD:blob)
+	) >expected &&
+	git -C object-type rev-list --objects --filter=object:type=blob HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tree prints tree and commit' '
+	(
+		git -C object-type rev-parse HEAD &&
+		printf "%s \n" $(git -C object-type rev-parse HEAD^{tree})
+	) >expected &&
+	git -C object-type rev-list --objects --filter=object:type=tree HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=commit prints commit' '
+	git -C object-type rev-parse HEAD >expected &&
+	git -C object-type rev-list --objects --filter=object:type=commit HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tag prints tag' '
+	(
+		git -C object-type rev-parse HEAD &&
+		printf "%s tag\n" $(git -C object-type rev-parse tag)
+	) >expected &&
+	git -C object-type rev-list --objects --filter=object:type=tag tag >actual &&
+	test_cmp expected actual
+'
+
 # Test sparse:path=<path> filter.
 # !!!!
 # NOTE: sparse:path filter support has been dropped for security reasons,
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v3 6/8] pack-bitmap: implement object type filter
  2021-04-09 11:27   ` [PATCH v3 " Patrick Steinhardt
                       ` (4 preceding siblings ...)
  2021-04-09 11:28     ` [PATCH v3 5/8] list-objects: implement object type filter Patrick Steinhardt
@ 2021-04-09 11:28     ` Patrick Steinhardt
  2021-04-09 11:28     ` [PATCH v3 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
                       ` (3 subsequent siblings)
  9 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-09 11:28 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Christian Couder, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 3827 bytes --]

The preceding commit has added a new object filter for git-rev-list(1)
which allows to filter objects by type. Implement the equivalent filter
for packfile bitmaps so that we can answer these queries fast.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 pack-bitmap.c                      | 29 ++++++++++++++++++++++++++---
 t/t6113-rev-list-bitmap-filters.sh | 25 ++++++++++++++++++++++++-
 2 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index b4513f8672..cd3f5c433e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -779,9 +779,6 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
 	eword_t mask;
 	uint32_t i;
 
-	if (type != OBJ_BLOB && type != OBJ_TREE)
-		BUG("filter_bitmap_exclude_type: unsupported type '%d'", type);
-
 	/*
 	 * The non-bitmap version of this filter never removes
 	 * objects which the other side specifically asked for,
@@ -911,6 +908,24 @@ static void filter_bitmap_tree_depth(struct bitmap_index *bitmap_git,
 				   OBJ_BLOB);
 }
 
+static void filter_bitmap_object_type(struct bitmap_index *bitmap_git,
+				      struct object_list *tip_objects,
+				      struct bitmap *to_filter,
+				      enum object_type object_type)
+{
+	if (object_type < OBJ_COMMIT || object_type > OBJ_TAG)
+		BUG("filter_bitmap_object_type given invalid object");
+
+	if (object_type != OBJ_TAG)
+		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, OBJ_TAG);
+	if (object_type != OBJ_COMMIT)
+		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, OBJ_COMMIT);
+	if (object_type != OBJ_TREE)
+		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, OBJ_TREE);
+	if (object_type != OBJ_BLOB)
+		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, OBJ_BLOB);
+}
+
 static int filter_bitmap(struct bitmap_index *bitmap_git,
 			 struct object_list *tip_objects,
 			 struct bitmap *to_filter,
@@ -943,6 +958,14 @@ static int filter_bitmap(struct bitmap_index *bitmap_git,
 		return 0;
 	}
 
+	if (filter->choice == LOFC_OBJECT_TYPE) {
+		if (bitmap_git)
+			filter_bitmap_object_type(bitmap_git, tip_objects,
+						  to_filter,
+						  filter->object_type);
+		return 0;
+	}
+
 	/* filter choice not handled */
 	return -1;
 }
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
index 3f889949ca..fb66735ac8 100755
--- a/t/t6113-rev-list-bitmap-filters.sh
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -10,7 +10,8 @@ test_expect_success 'set up bitmapped repo' '
 	test_commit much-larger-blob-one &&
 	git repack -adb &&
 	test_commit two &&
-	test_commit much-larger-blob-two
+	test_commit much-larger-blob-two &&
+	git tag tag
 '
 
 test_expect_success 'filters fallback to non-bitmap traversal' '
@@ -75,4 +76,26 @@ test_expect_success 'tree:1 filter' '
 	test_cmp expect actual
 '
 
+test_expect_success 'object:type filter' '
+	git rev-list --objects --filter=object:type=tag tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=tag tag >actual &&
+	test_cmp expect actual &&
+
+	git rev-list --objects --filter=object:type=commit tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=commit tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter=object:type=tree tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=tree tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter=object:type=blob tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=blob tag >actual &&
+	test_bitmap_traversal expect actual
+'
+
 test_done
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v3 7/8] pack-bitmap: implement combined filter
  2021-04-09 11:27   ` [PATCH v3 " Patrick Steinhardt
                       ` (5 preceding siblings ...)
  2021-04-09 11:28     ` [PATCH v3 6/8] pack-bitmap: " Patrick Steinhardt
@ 2021-04-09 11:28     ` Patrick Steinhardt
  2021-04-09 11:28     ` [PATCH v3 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
                       ` (2 subsequent siblings)
  9 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-09 11:28 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Christian Couder, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 3298 bytes --]

When the user has multiple objects filters specified, then this is
internally represented by having a "combined" filter. These combined
filters aren't yet supported by bitmap indices and can thus not be
accelerated.

Fix this by implementing support for these combined filters. The
implementation is quite trivial: when there's a combined filter, we
simply recurse into `filter_bitmap()` for all of the sub-filters.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 pack-bitmap.c                      | 41 +++++++++++++++++++++++++++---
 t/t6113-rev-list-bitmap-filters.sh |  7 +++++
 2 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index cd3f5c433e..4385f15828 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -926,6 +926,29 @@ static void filter_bitmap_object_type(struct bitmap_index *bitmap_git,
 		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, OBJ_BLOB);
 }
 
+static int filter_supported(struct list_objects_filter_options *filter)
+{
+	int i;
+
+	switch (filter->choice) {
+	case LOFC_BLOB_NONE:
+	case LOFC_BLOB_LIMIT:
+	case LOFC_OBJECT_TYPE:
+		return 1;
+	case LOFC_TREE_DEPTH:
+		if (filter->tree_exclude_depth == 0)
+			return 1;
+		return 0;
+	case LOFC_COMBINE:
+		for (i = 0; i < filter->sub_nr; i++)
+			if (!filter_supported(&filter->sub[i]))
+				return 0;
+		return 1;
+	default:
+		return 0;
+	}
+}
+
 static int filter_bitmap(struct bitmap_index *bitmap_git,
 			 struct object_list *tip_objects,
 			 struct bitmap *to_filter,
@@ -933,6 +956,8 @@ static int filter_bitmap(struct bitmap_index *bitmap_git,
 {
 	if (!filter || filter->choice == LOFC_DISABLED)
 		return 0;
+	if (!filter_supported(filter))
+		return -1;
 
 	if (filter->choice == LOFC_BLOB_NONE) {
 		if (bitmap_git)
@@ -949,8 +974,7 @@ static int filter_bitmap(struct bitmap_index *bitmap_git,
 		return 0;
 	}
 
-	if (filter->choice == LOFC_TREE_DEPTH &&
-	    filter->tree_exclude_depth == 0) {
+	if (filter->choice == LOFC_TREE_DEPTH) {
 		if (bitmap_git)
 			filter_bitmap_tree_depth(bitmap_git, tip_objects,
 						 to_filter,
@@ -966,8 +990,17 @@ static int filter_bitmap(struct bitmap_index *bitmap_git,
 		return 0;
 	}
 
-	/* filter choice not handled */
-	return -1;
+	if (filter->choice == LOFC_COMBINE) {
+		int i;
+		for (i = 0; i < filter->sub_nr; i++) {
+			if (filter_bitmap(bitmap_git, tip_objects, to_filter,
+					  &filter->sub[i]) < 0)
+				return -1;
+		}
+		return 0;
+	}
+
+	BUG("unsupported filter choice");
 }
 
 static int can_filter_bitmap(struct list_objects_filter_options *filter)
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
index fb66735ac8..cb9db7df6f 100755
--- a/t/t6113-rev-list-bitmap-filters.sh
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -98,4 +98,11 @@ test_expect_success 'object:type filter' '
 	test_bitmap_traversal expect actual
 '
 
+test_expect_success 'combine filter' '
+	git rev-list --objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
+	test_bitmap_traversal expect actual
+'
+
 test_done
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v3 8/8] rev-list: allow filtering of provided items
  2021-04-09 11:27   ` [PATCH v3 " Patrick Steinhardt
                       ` (6 preceding siblings ...)
  2021-04-09 11:28     ` [PATCH v3 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
@ 2021-04-09 11:28     ` Patrick Steinhardt
  2021-04-09 11:32       ` [RESEND PATCH " Patrick Steinhardt
  2021-04-09 15:00       ` [PATCH " Philip Oakley
  2021-04-11  6:02     ` [PATCH v3 0/8] rev-parse: implement object type filter Junio C Hamano
  2021-04-12 13:37     ` [PATCH v4 0/8] rev-list: " Patrick Steinhardt
  9 siblings, 2 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-09 11:28 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Christian Couder, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 11964 bytes --]

When providing an object filter, it is currently impossible to also
filter provided items. E.g. when executing `git rev-list HEAD` , the
commit this reference points to will be treated as user-provided and is
thus excluded from the filtering mechanism. This makes it harder than
necessary to properly use the new `--filter=object:type` filter given
that even if the user wants to only see blobs, he'll still see commits
of provided references.

Improve this by introducing a new `--filter-provided` option to the
git-rev-parse(1) command. If given, then all user-provided references
will be subject to filtering.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/rev-list-options.txt  |  5 ++++
 builtin/pack-objects.c              |  2 +-
 builtin/rev-list.c                  | 36 +++++++++++++++++++++--------
 pack-bitmap.c                       |  6 +++--
 pack-bitmap.h                       |  3 ++-
 reachable.c                         |  2 +-
 t/t6112-rev-list-filters-objects.sh | 28 ++++++++++++++++++++++
 t/t6113-rev-list-bitmap-filters.sh  | 36 +++++++++++++++++++++++++++++
 8 files changed, 104 insertions(+), 14 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 3afa8fffbd..7fa18fc6e6 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -933,6 +933,11 @@ equivalent.
 --no-filter::
 	Turn off any previous `--filter=` argument.
 
+--filter-provided-revisions::
+	Filter the list of explicitly provided revisions, which would otherwise
+	always be printed even if they did not match any of the filters. Only
+	useful with `--filter=`.
+
 --filter-print-omitted::
 	Only useful with `--filter=`; prints a list of the objects omitted
 	by the filter.  Object IDs are prefixed with a ``~'' character.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 525c2d8552..2f2026dc87 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3516,7 +3516,7 @@ static int pack_options_allow_reuse(void)
 
 static int get_object_list_from_bitmap(struct rev_info *revs)
 {
-	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options)))
+	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options, 0)))
 		return -1;
 
 	if (pack_options_allow_reuse() &&
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index b4d8ea0a35..13f0ff3f8d 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -398,7 +398,8 @@ static inline int parse_missing_action_value(const char *value)
 }
 
 static int try_bitmap_count(struct rev_info *revs,
-			    struct list_objects_filter_options *filter)
+			    struct list_objects_filter_options *filter,
+			    int filter_provided_revs)
 {
 	uint32_t commit_count = 0,
 		 tag_count = 0,
@@ -433,7 +434,7 @@ static int try_bitmap_count(struct rev_info *revs,
 	 */
 	max_count = revs->max_count;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter);
+	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_revs);
 	if (!bitmap_git)
 		return -1;
 
@@ -450,7 +451,8 @@ static int try_bitmap_count(struct rev_info *revs,
 }
 
 static int try_bitmap_traversal(struct rev_info *revs,
-				struct list_objects_filter_options *filter)
+				struct list_objects_filter_options *filter,
+				int filter_provided_revs)
 {
 	struct bitmap_index *bitmap_git;
 
@@ -461,7 +463,7 @@ static int try_bitmap_traversal(struct rev_info *revs,
 	if (revs->max_count >= 0)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter);
+	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_revs);
 	if (!bitmap_git)
 		return -1;
 
@@ -471,14 +473,15 @@ static int try_bitmap_traversal(struct rev_info *revs,
 }
 
 static int try_bitmap_disk_usage(struct rev_info *revs,
-				 struct list_objects_filter_options *filter)
+				 struct list_objects_filter_options *filter,
+				 int filter_provided_revs)
 {
 	struct bitmap_index *bitmap_git;
 
 	if (!show_disk_usage)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter);
+	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_revs);
 	if (!bitmap_git)
 		return -1;
 
@@ -499,6 +502,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	int bisect_show_vars = 0;
 	int bisect_find_all = 0;
 	int use_bitmap_index = 0;
+	int filter_provided_revs = 0;
 	const char *show_progress = NULL;
 
 	if (argc == 2 && !strcmp(argv[1], "-h"))
@@ -599,6 +603,10 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			list_objects_filter_set_no_filter(&filter_options);
 			continue;
 		}
+		if (!strcmp(arg, "--filter-provided-revisions")) {
+			filter_provided_revs = 1;
+			continue;
+		}
 		if (!strcmp(arg, "--filter-print-omitted")) {
 			arg_print_omitted = 1;
 			continue;
@@ -665,11 +673,11 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		progress = start_delayed_progress(show_progress, 0);
 
 	if (use_bitmap_index) {
-		if (!try_bitmap_count(&revs, &filter_options))
+		if (!try_bitmap_count(&revs, &filter_options, filter_provided_revs))
 			return 0;
-		if (!try_bitmap_disk_usage(&revs, &filter_options))
+		if (!try_bitmap_disk_usage(&revs, &filter_options, filter_provided_revs))
 			return 0;
-		if (!try_bitmap_traversal(&revs, &filter_options))
+		if (!try_bitmap_traversal(&revs, &filter_options, filter_provided_revs))
 			return 0;
 	}
 
@@ -694,6 +702,16 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			return show_bisect_vars(&info, reaches, all);
 	}
 
+	if (filter_provided_revs) {
+		struct commit_list *c;
+		for (i = 0; i < revs.pending.nr; i++) {
+			struct object_array_entry *pending = revs.pending.objects + i;
+			pending->item->flags |= NOT_USER_GIVEN;
+		}
+		for (c = revs.commits; c; c = c->next)
+			c->item->object.flags |= NOT_USER_GIVEN;
+	}
+
 	if (arg_print_omitted)
 		oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
 	if (arg_missing_action == MA_PRINT)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 4385f15828..0576a19a28 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1009,7 +1009,8 @@ static int can_filter_bitmap(struct list_objects_filter_options *filter)
 }
 
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
-					 struct list_objects_filter_options *filter)
+					 struct list_objects_filter_options *filter,
+					 int filter_provided_revs)
 {
 	unsigned int i;
 
@@ -1104,7 +1105,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	if (haves_bitmap)
 		bitmap_and_not(wants_bitmap, haves_bitmap);
 
-	filter_bitmap(bitmap_git, wants, wants_bitmap, filter);
+	filter_bitmap(bitmap_git, (filter && filter_provided_revs) ? NULL : wants,
+		      wants_bitmap, filter);
 
 	bitmap_git->result = wants_bitmap;
 	bitmap_git->haves = haves_bitmap;
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 36d99930d8..5d8ae3b590 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -50,7 +50,8 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
 				 show_reachable_fn show_reachable);
 void test_bitmap_walk(struct rev_info *revs);
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
-					 struct list_objects_filter_options *filter);
+					 struct list_objects_filter_options *filter,
+					 int filter_provided_revs);
 int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
 				       struct packed_git **packfile,
 				       uint32_t *entries,
diff --git a/reachable.c b/reachable.c
index 77a60c70a5..fc833cae43 100644
--- a/reachable.c
+++ b/reachable.c
@@ -223,7 +223,7 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 	cp.progress = progress;
 	cp.count = 0;
 
-	bitmap_git = prepare_bitmap_walk(revs, NULL);
+	bitmap_git = prepare_bitmap_walk(revs, NULL, 0);
 	if (bitmap_git) {
 		traverse_bitmap_commit_list(bitmap_git, revs, mark_object_seen);
 		free_bitmap_index(bitmap_git);
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index c79ec04060..47c558ab0e 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -207,6 +207,34 @@ test_expect_success 'verify object:type=tag prints tag' '
 	test_cmp expected actual
 '
 
+test_expect_success 'verify object:type=blob prints only blob with --filter-provided' '
+	printf "%s blob\n" $(git -C object-type rev-parse HEAD:blob) >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=blob --filter-provided HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tree prints only tree with --filter-provided' '
+	printf "%s \n" $(git -C object-type rev-parse HEAD^{tree}) >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=tree HEAD --filter-provided >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=commit prints only commit with --filter-provided' '
+	git -C object-type rev-parse HEAD >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=commit --filter-provided HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tag prints only tag with --filter-provided' '
+	printf "%s tag\n" $(git -C object-type rev-parse tag) >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=tag --filter-provided tag >actual &&
+	test_cmp expected actual
+'
+
 # Test sparse:path=<path> filter.
 # !!!!
 # NOTE: sparse:path filter support has been dropped for security reasons,
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
index cb9db7df6f..9053ac5059 100755
--- a/t/t6113-rev-list-bitmap-filters.sh
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -98,6 +98,28 @@ test_expect_success 'object:type filter' '
 	test_bitmap_traversal expect actual
 '
 
+test_expect_success 'object:type filter with --filter-provided' '
+	git rev-list --objects --filter-provided --filter=object:type=tag tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided --filter=object:type=tag tag >actual &&
+	test_cmp expect actual &&
+
+	git rev-list --objects --filter-provided --filter=object:type=commit tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided --filter=object:type=commit tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter-provided --filter=object:type=tree tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided --filter=object:type=tree tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter-provided --filter=object:type=blob tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided --filter=object:type=blob tag >actual &&
+	test_bitmap_traversal expect actual
+'
+
 test_expect_success 'combine filter' '
 	git rev-list --objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
 	git rev-list --use-bitmap-index \
@@ -105,4 +127,18 @@ test_expect_success 'combine filter' '
 	test_bitmap_traversal expect actual
 '
 
+test_expect_success 'combine filter with --filter-provided' '
+	git rev-list --objects --filter-provided --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git cat-file --batch-check="%(objecttype) %(objectsize)" <actual >objects &&
+	while read objecttype objectsize
+	do
+		test "$objecttype" = blob || return 1
+		test "$objectsize" -le 1000 || return 1
+	done <objects
+'
+
 test_done
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [RESEND PATCH v3 8/8] rev-list: allow filtering of provided items
  2021-04-09 11:28     ` [PATCH v3 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
@ 2021-04-09 11:32       ` Patrick Steinhardt
  2021-04-09 15:00       ` [PATCH " Philip Oakley
  1 sibling, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-09 11:32 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Christian Couder, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 12244 bytes --]

When providing an object filter, it is currently impossible to also
filter provided items. E.g. when executing `git rev-list HEAD` , the
commit this reference points to will be treated as user-provided and is
thus excluded from the filtering mechanism. This makes it harder than
necessary to properly use the new `--filter=object:type` filter given
that even if the user wants to only see blobs, he'll still see commits
of provided references.

Improve this by introducing a new `--filter-provided` option to the
git-rev-parse(1) command. If given, then all user-provided references
will be subject to filtering.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---

Forgot to stage and add test changes to adjust for the changed flag
name.

 Documentation/rev-list-options.txt  |  5 ++++
 builtin/pack-objects.c              |  2 +-
 builtin/rev-list.c                  | 36 +++++++++++++++++++++--------
 pack-bitmap.c                       |  6 +++--
 pack-bitmap.h                       |  3 ++-
 reachable.c                         |  2 +-
 t/t6112-rev-list-filters-objects.sh | 28 ++++++++++++++++++++++
 t/t6113-rev-list-bitmap-filters.sh  | 36 +++++++++++++++++++++++++++++
 8 files changed, 104 insertions(+), 14 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 3afa8fffbd..7fa18fc6e6 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -933,6 +933,11 @@ equivalent.
 --no-filter::
 	Turn off any previous `--filter=` argument.
 
+--filter-provided-revisions::
+	Filter the list of explicitly provided revisions, which would otherwise
+	always be printed even if they did not match any of the filters. Only
+	useful with `--filter=`.
+
 --filter-print-omitted::
 	Only useful with `--filter=`; prints a list of the objects omitted
 	by the filter.  Object IDs are prefixed with a ``~'' character.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 525c2d8552..2f2026dc87 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3516,7 +3516,7 @@ static int pack_options_allow_reuse(void)
 
 static int get_object_list_from_bitmap(struct rev_info *revs)
 {
-	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options)))
+	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options, 0)))
 		return -1;
 
 	if (pack_options_allow_reuse() &&
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index b4d8ea0a35..13f0ff3f8d 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -398,7 +398,8 @@ static inline int parse_missing_action_value(const char *value)
 }
 
 static int try_bitmap_count(struct rev_info *revs,
-			    struct list_objects_filter_options *filter)
+			    struct list_objects_filter_options *filter,
+			    int filter_provided_revs)
 {
 	uint32_t commit_count = 0,
 		 tag_count = 0,
@@ -433,7 +434,7 @@ static int try_bitmap_count(struct rev_info *revs,
 	 */
 	max_count = revs->max_count;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter);
+	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_revs);
 	if (!bitmap_git)
 		return -1;
 
@@ -450,7 +451,8 @@ static int try_bitmap_count(struct rev_info *revs,
 }
 
 static int try_bitmap_traversal(struct rev_info *revs,
-				struct list_objects_filter_options *filter)
+				struct list_objects_filter_options *filter,
+				int filter_provided_revs)
 {
 	struct bitmap_index *bitmap_git;
 
@@ -461,7 +463,7 @@ static int try_bitmap_traversal(struct rev_info *revs,
 	if (revs->max_count >= 0)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter);
+	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_revs);
 	if (!bitmap_git)
 		return -1;
 
@@ -471,14 +473,15 @@ static int try_bitmap_traversal(struct rev_info *revs,
 }
 
 static int try_bitmap_disk_usage(struct rev_info *revs,
-				 struct list_objects_filter_options *filter)
+				 struct list_objects_filter_options *filter,
+				 int filter_provided_revs)
 {
 	struct bitmap_index *bitmap_git;
 
 	if (!show_disk_usage)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter);
+	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_revs);
 	if (!bitmap_git)
 		return -1;
 
@@ -499,6 +502,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	int bisect_show_vars = 0;
 	int bisect_find_all = 0;
 	int use_bitmap_index = 0;
+	int filter_provided_revs = 0;
 	const char *show_progress = NULL;
 
 	if (argc == 2 && !strcmp(argv[1], "-h"))
@@ -599,6 +603,10 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			list_objects_filter_set_no_filter(&filter_options);
 			continue;
 		}
+		if (!strcmp(arg, "--filter-provided-revisions")) {
+			filter_provided_revs = 1;
+			continue;
+		}
 		if (!strcmp(arg, "--filter-print-omitted")) {
 			arg_print_omitted = 1;
 			continue;
@@ -665,11 +673,11 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		progress = start_delayed_progress(show_progress, 0);
 
 	if (use_bitmap_index) {
-		if (!try_bitmap_count(&revs, &filter_options))
+		if (!try_bitmap_count(&revs, &filter_options, filter_provided_revs))
 			return 0;
-		if (!try_bitmap_disk_usage(&revs, &filter_options))
+		if (!try_bitmap_disk_usage(&revs, &filter_options, filter_provided_revs))
 			return 0;
-		if (!try_bitmap_traversal(&revs, &filter_options))
+		if (!try_bitmap_traversal(&revs, &filter_options, filter_provided_revs))
 			return 0;
 	}
 
@@ -694,6 +702,16 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			return show_bisect_vars(&info, reaches, all);
 	}
 
+	if (filter_provided_revs) {
+		struct commit_list *c;
+		for (i = 0; i < revs.pending.nr; i++) {
+			struct object_array_entry *pending = revs.pending.objects + i;
+			pending->item->flags |= NOT_USER_GIVEN;
+		}
+		for (c = revs.commits; c; c = c->next)
+			c->item->object.flags |= NOT_USER_GIVEN;
+	}
+
 	if (arg_print_omitted)
 		oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
 	if (arg_missing_action == MA_PRINT)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 4385f15828..0576a19a28 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1009,7 +1009,8 @@ static int can_filter_bitmap(struct list_objects_filter_options *filter)
 }
 
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
-					 struct list_objects_filter_options *filter)
+					 struct list_objects_filter_options *filter,
+					 int filter_provided_revs)
 {
 	unsigned int i;
 
@@ -1104,7 +1105,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	if (haves_bitmap)
 		bitmap_and_not(wants_bitmap, haves_bitmap);
 
-	filter_bitmap(bitmap_git, wants, wants_bitmap, filter);
+	filter_bitmap(bitmap_git, (filter && filter_provided_revs) ? NULL : wants,
+		      wants_bitmap, filter);
 
 	bitmap_git->result = wants_bitmap;
 	bitmap_git->haves = haves_bitmap;
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 36d99930d8..5d8ae3b590 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -50,7 +50,8 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
 				 show_reachable_fn show_reachable);
 void test_bitmap_walk(struct rev_info *revs);
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
-					 struct list_objects_filter_options *filter);
+					 struct list_objects_filter_options *filter,
+					 int filter_provided_revs);
 int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
 				       struct packed_git **packfile,
 				       uint32_t *entries,
diff --git a/reachable.c b/reachable.c
index 77a60c70a5..fc833cae43 100644
--- a/reachable.c
+++ b/reachable.c
@@ -223,7 +223,7 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 	cp.progress = progress;
 	cp.count = 0;
 
-	bitmap_git = prepare_bitmap_walk(revs, NULL);
+	bitmap_git = prepare_bitmap_walk(revs, NULL, 0);
 	if (bitmap_git) {
 		traverse_bitmap_commit_list(bitmap_git, revs, mark_object_seen);
 		free_bitmap_index(bitmap_git);
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index c79ec04060..0a305c9c49 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -207,6 +207,34 @@ test_expect_success 'verify object:type=tag prints tag' '
 	test_cmp expected actual
 '
 
+test_expect_success 'verify object:type=blob prints only blob with --filter-provided-revisions' '
+	printf "%s blob\n" $(git -C object-type rev-parse HEAD:blob) >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=blob --filter-provided-revisions HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tree prints only tree with --filter-provided-revisions' '
+	printf "%s \n" $(git -C object-type rev-parse HEAD^{tree}) >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=tree HEAD --filter-provided-revisions >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=commit prints only commit with --filter-provided-revisions' '
+	git -C object-type rev-parse HEAD >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=commit --filter-provided-revisions HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tag prints only tag with --filter-provided-revisions' '
+	printf "%s tag\n" $(git -C object-type rev-parse tag) >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=tag --filter-provided-revisions tag >actual &&
+	test_cmp expected actual
+'
+
 # Test sparse:path=<path> filter.
 # !!!!
 # NOTE: sparse:path filter support has been dropped for security reasons,
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
index cb9db7df6f..bfc9bdafa0 100755
--- a/t/t6113-rev-list-bitmap-filters.sh
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -98,6 +98,28 @@ test_expect_success 'object:type filter' '
 	test_bitmap_traversal expect actual
 '
 
+test_expect_success 'object:type filter with --filter-provided-revisions' '
+	git rev-list --objects --filter-provided-revisions --filter=object:type=tag tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided-revisions --filter=object:type=tag tag >actual &&
+	test_cmp expect actual &&
+
+	git rev-list --objects --filter-provided-revisions --filter=object:type=commit tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided-revisions --filter=object:type=commit tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter-provided-revisions --filter=object:type=tree tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided-revisions --filter=object:type=tree tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter-provided-revisions --filter=object:type=blob tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided-revisions --filter=object:type=blob tag >actual &&
+	test_bitmap_traversal expect actual
+'
+
 test_expect_success 'combine filter' '
 	git rev-list --objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
 	git rev-list --use-bitmap-index \
@@ -105,4 +127,18 @@ test_expect_success 'combine filter' '
 	test_bitmap_traversal expect actual
 '
 
+test_expect_success 'combine filter with --filter-provided-revisions' '
+	git rev-list --objects --filter-provided-revisions --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided-revisions --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git cat-file --batch-check="%(objecttype) %(objectsize)" <actual >objects &&
+	while read objecttype objectsize
+	do
+		test "$objecttype" = blob || return 1
+		test "$objectsize" -le 1000 || return 1
+	done <objects
+'
+
 test_done
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 8/8] rev-list: allow filtering of provided items
  2021-04-09 11:28     ` [PATCH v3 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
  2021-04-09 11:32       ` [RESEND PATCH " Patrick Steinhardt
@ 2021-04-09 15:00       ` Philip Oakley
  2021-04-12 13:15         ` Patrick Steinhardt
  1 sibling, 1 reply; 67+ messages in thread
From: Philip Oakley @ 2021-04-09 15:00 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Jeff King, Christian Couder, Taylor Blau

typo nit.
On 09/04/2021 12:28, Patrick Steinhardt wrote:
> When providing an object filter, it is currently impossible to also
> filter provided items. E.g. when executing `git rev-list HEAD` , the
> commit this reference points to will be treated as user-provided and is
> thus excluded from the filtering mechanism. This makes it harder than
> necessary to properly use the new `--filter=object:type` filter given
> that even if the user wants to only see blobs, he'll still see commits
> of provided references.
>
> Improve this by introducing a new `--filter-provided` option to the
s/--filter-provided/--filter-provided-revisions/

Also in some tests - I presume the option should be spelled out in full.

> git-rev-parse(1) command. If given, then all user-provided references
> will be subject to filtering.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  Documentation/rev-list-options.txt  |  5 ++++
>  builtin/pack-objects.c              |  2 +-
>  builtin/rev-list.c                  | 36 +++++++++++++++++++++--------
>  pack-bitmap.c                       |  6 +++--
>  pack-bitmap.h                       |  3 ++-
>  reachable.c                         |  2 +-
>  t/t6112-rev-list-filters-objects.sh | 28 ++++++++++++++++++++++
>  t/t6113-rev-list-bitmap-filters.sh  | 36 +++++++++++++++++++++++++++++
>  8 files changed, 104 insertions(+), 14 deletions(-)
>
> diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
> index 3afa8fffbd..7fa18fc6e6 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -933,6 +933,11 @@ equivalent.
>  --no-filter::
>  	Turn off any previous `--filter=` argument.
>  
> +--filter-provided-revisions::
> +	Filter the list of explicitly provided revisions, which would otherwise
> +	always be printed even if they did not match any of the filters. Only
> +	useful with `--filter=`.
> +
>  --filter-print-omitted::
>  	Only useful with `--filter=`; prints a list of the objects omitted
>  	by the filter.  Object IDs are prefixed with a ``~'' character.
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 525c2d8552..2f2026dc87 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3516,7 +3516,7 @@ static int pack_options_allow_reuse(void)
>  
>  static int get_object_list_from_bitmap(struct rev_info *revs)
>  {
> -	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options)))
> +	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options, 0)))
>  		return -1;
>  
>  	if (pack_options_allow_reuse() &&
> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index b4d8ea0a35..13f0ff3f8d 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -398,7 +398,8 @@ static inline int parse_missing_action_value(const char *value)
>  }
>  
>  static int try_bitmap_count(struct rev_info *revs,
> -			    struct list_objects_filter_options *filter)
> +			    struct list_objects_filter_options *filter,
> +			    int filter_provided_revs)
>  {
>  	uint32_t commit_count = 0,
>  		 tag_count = 0,
> @@ -433,7 +434,7 @@ static int try_bitmap_count(struct rev_info *revs,
>  	 */
>  	max_count = revs->max_count;
>  
> -	bitmap_git = prepare_bitmap_walk(revs, filter);
> +	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_revs);
>  	if (!bitmap_git)
>  		return -1;
>  
> @@ -450,7 +451,8 @@ static int try_bitmap_count(struct rev_info *revs,
>  }
>  
>  static int try_bitmap_traversal(struct rev_info *revs,
> -				struct list_objects_filter_options *filter)
> +				struct list_objects_filter_options *filter,
> +				int filter_provided_revs)
>  {
>  	struct bitmap_index *bitmap_git;
>  
> @@ -461,7 +463,7 @@ static int try_bitmap_traversal(struct rev_info *revs,
>  	if (revs->max_count >= 0)
>  		return -1;
>  
> -	bitmap_git = prepare_bitmap_walk(revs, filter);
> +	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_revs);
>  	if (!bitmap_git)
>  		return -1;
>  
> @@ -471,14 +473,15 @@ static int try_bitmap_traversal(struct rev_info *revs,
>  }
>  
>  static int try_bitmap_disk_usage(struct rev_info *revs,
> -				 struct list_objects_filter_options *filter)
> +				 struct list_objects_filter_options *filter,
> +				 int filter_provided_revs)
>  {
>  	struct bitmap_index *bitmap_git;
>  
>  	if (!show_disk_usage)
>  		return -1;
>  
> -	bitmap_git = prepare_bitmap_walk(revs, filter);
> +	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_revs);
>  	if (!bitmap_git)
>  		return -1;
>  
> @@ -499,6 +502,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  	int bisect_show_vars = 0;
>  	int bisect_find_all = 0;
>  	int use_bitmap_index = 0;
> +	int filter_provided_revs = 0;
>  	const char *show_progress = NULL;
>  
>  	if (argc == 2 && !strcmp(argv[1], "-h"))
> @@ -599,6 +603,10 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  			list_objects_filter_set_no_filter(&filter_options);
>  			continue;
>  		}
> +		if (!strcmp(arg, "--filter-provided-revisions")) {
> +			filter_provided_revs = 1;
> +			continue;
> +		}
>  		if (!strcmp(arg, "--filter-print-omitted")) {
>  			arg_print_omitted = 1;
>  			continue;
> @@ -665,11 +673,11 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  		progress = start_delayed_progress(show_progress, 0);
>  
>  	if (use_bitmap_index) {
> -		if (!try_bitmap_count(&revs, &filter_options))
> +		if (!try_bitmap_count(&revs, &filter_options, filter_provided_revs))
>  			return 0;
> -		if (!try_bitmap_disk_usage(&revs, &filter_options))
> +		if (!try_bitmap_disk_usage(&revs, &filter_options, filter_provided_revs))
>  			return 0;
> -		if (!try_bitmap_traversal(&revs, &filter_options))
> +		if (!try_bitmap_traversal(&revs, &filter_options, filter_provided_revs))
>  			return 0;
>  	}
>  
> @@ -694,6 +702,16 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  			return show_bisect_vars(&info, reaches, all);
>  	}
>  
> +	if (filter_provided_revs) {
> +		struct commit_list *c;
> +		for (i = 0; i < revs.pending.nr; i++) {
> +			struct object_array_entry *pending = revs.pending.objects + i;
> +			pending->item->flags |= NOT_USER_GIVEN;
> +		}
> +		for (c = revs.commits; c; c = c->next)
> +			c->item->object.flags |= NOT_USER_GIVEN;
> +	}
> +
>  	if (arg_print_omitted)
>  		oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
>  	if (arg_missing_action == MA_PRINT)
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 4385f15828..0576a19a28 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -1009,7 +1009,8 @@ static int can_filter_bitmap(struct list_objects_filter_options *filter)
>  }
>  
>  struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
> -					 struct list_objects_filter_options *filter)
> +					 struct list_objects_filter_options *filter,
> +					 int filter_provided_revs)
>  {
>  	unsigned int i;
>  
> @@ -1104,7 +1105,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
>  	if (haves_bitmap)
>  		bitmap_and_not(wants_bitmap, haves_bitmap);
>  
> -	filter_bitmap(bitmap_git, wants, wants_bitmap, filter);
> +	filter_bitmap(bitmap_git, (filter && filter_provided_revs) ? NULL : wants,
> +		      wants_bitmap, filter);
>  
>  	bitmap_git->result = wants_bitmap;
>  	bitmap_git->haves = haves_bitmap;
> diff --git a/pack-bitmap.h b/pack-bitmap.h
> index 36d99930d8..5d8ae3b590 100644
> --- a/pack-bitmap.h
> +++ b/pack-bitmap.h
> @@ -50,7 +50,8 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
>  				 show_reachable_fn show_reachable);
>  void test_bitmap_walk(struct rev_info *revs);
>  struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
> -					 struct list_objects_filter_options *filter);
> +					 struct list_objects_filter_options *filter,
> +					 int filter_provided_revs);
>  int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
>  				       struct packed_git **packfile,
>  				       uint32_t *entries,
> diff --git a/reachable.c b/reachable.c
> index 77a60c70a5..fc833cae43 100644
> --- a/reachable.c
> +++ b/reachable.c
> @@ -223,7 +223,7 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
>  	cp.progress = progress;
>  	cp.count = 0;
>  
> -	bitmap_git = prepare_bitmap_walk(revs, NULL);
> +	bitmap_git = prepare_bitmap_walk(revs, NULL, 0);
>  	if (bitmap_git) {
>  		traverse_bitmap_commit_list(bitmap_git, revs, mark_object_seen);
>  		free_bitmap_index(bitmap_git);
> diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
> index c79ec04060..47c558ab0e 100755
> --- a/t/t6112-rev-list-filters-objects.sh
> +++ b/t/t6112-rev-list-filters-objects.sh
> @@ -207,6 +207,34 @@ test_expect_success 'verify object:type=tag prints tag' '
>  	test_cmp expected actual
>  '
>  
> +test_expect_success 'verify object:type=blob prints only blob with --filter-provided' '
> +	printf "%s blob\n" $(git -C object-type rev-parse HEAD:blob) >expected &&
> +	git -C object-type rev-list --objects \
> +		--filter=object:type=blob --filter-provided HEAD >actual &&
> +	test_cmp expected actual
> +'
> +
> +test_expect_success 'verify object:type=tree prints only tree with --filter-provided' '
> +	printf "%s \n" $(git -C object-type rev-parse HEAD^{tree}) >expected &&
> +	git -C object-type rev-list --objects \
> +		--filter=object:type=tree HEAD --filter-provided >actual &&
> +	test_cmp expected actual
> +'
> +
> +test_expect_success 'verify object:type=commit prints only commit with --filter-provided' '
> +	git -C object-type rev-parse HEAD >expected &&
> +	git -C object-type rev-list --objects \
> +		--filter=object:type=commit --filter-provided HEAD >actual &&
> +	test_cmp expected actual
> +'
> +
> +test_expect_success 'verify object:type=tag prints only tag with --filter-provided' '
> +	printf "%s tag\n" $(git -C object-type rev-parse tag) >expected &&
> +	git -C object-type rev-list --objects \
> +		--filter=object:type=tag --filter-provided tag >actual &&
> +	test_cmp expected actual
> +'
> +
>  # Test sparse:path=<path> filter.
>  # !!!!
>  # NOTE: sparse:path filter support has been dropped for security reasons,
> diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
> index cb9db7df6f..9053ac5059 100755
> --- a/t/t6113-rev-list-bitmap-filters.sh
> +++ b/t/t6113-rev-list-bitmap-filters.sh
> @@ -98,6 +98,28 @@ test_expect_success 'object:type filter' '
>  	test_bitmap_traversal expect actual
>  '
>  
> +test_expect_success 'object:type filter with --filter-provided' '
> +	git rev-list --objects --filter-provided --filter=object:type=tag tag >expect &&
> +	git rev-list --use-bitmap-index \
> +		     --objects --filter-provided --filter=object:type=tag tag >actual &&
> +	test_cmp expect actual &&
> +
> +	git rev-list --objects --filter-provided --filter=object:type=commit tag >expect &&
> +	git rev-list --use-bitmap-index \
> +		     --objects --filter-provided --filter=object:type=commit tag >actual &&
> +	test_bitmap_traversal expect actual &&
> +
> +	git rev-list --objects --filter-provided --filter=object:type=tree tag >expect &&
> +	git rev-list --use-bitmap-index \
> +		     --objects --filter-provided --filter=object:type=tree tag >actual &&
> +	test_bitmap_traversal expect actual &&
> +
> +	git rev-list --objects --filter-provided --filter=object:type=blob tag >expect &&
> +	git rev-list --use-bitmap-index \
> +		     --objects --filter-provided --filter=object:type=blob tag >actual &&
> +	test_bitmap_traversal expect actual
> +'
> +
>  test_expect_success 'combine filter' '
>  	git rev-list --objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
>  	git rev-list --use-bitmap-index \
> @@ -105,4 +127,18 @@ test_expect_success 'combine filter' '
>  	test_bitmap_traversal expect actual
>  '
>  
> +test_expect_success 'combine filter with --filter-provided' '
> +	git rev-list --objects --filter-provided --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
> +	git rev-list --use-bitmap-index \
> +		     --objects --filter-provided --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
> +	test_bitmap_traversal expect actual &&
> +
> +	git cat-file --batch-check="%(objecttype) %(objectsize)" <actual >objects &&
> +	while read objecttype objectsize
> +	do
> +		test "$objecttype" = blob || return 1
> +		test "$objectsize" -le 1000 || return 1
> +	done <objects
> +'
> +
>  test_done


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 7/8] pack-bitmap: implement combined filter
  2021-04-09 10:31       ` Patrick Steinhardt
@ 2021-04-09 15:53         ` Jeff King
  0 siblings, 0 replies; 67+ messages in thread
From: Jeff King @ 2021-04-09 15:53 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder, Taylor Blau

On Fri, Apr 09, 2021 at 12:31:42PM +0200, Patrick Steinhardt wrote:

> > Hmm. This is essentially reproducing the list in filter_bitmap() of
> > what's OK for bitmaps. So when adding a new filter, it would have to be
> > added in both places.
> > 
> > Can we preserve that property of the original code? I'd think that just
> > adding LOFC_COMBINE to filter_bitmap() would be sufficient. I.e., this
> > hunk:
> > 
> > > +	if (filter->choice == LOFC_COMBINE) {
> > > +		int i;
> > > +		for (i = 0; i < filter->sub_nr; i++) {
> > > +			filter_bitmap(bitmap_git, tip_objects, to_filter,
> > > +				      &filter->sub[i]);
> > > +		}
> > > +		return 0;
> > > +	}
> > 
> > ...except that we need to see if filter_bitmap() returns "-1" for any of
> > the recursive calls. Which we probably should be doing anyway to
> > propagate any errors (though I think the only "errors" we'd return are
> > "not supported", at least for now).
> > 
> > -Peff
> 
> But wouldn't that mean that we're now needlessly filtering via bitmaps
> all the way down the combined filters only to realize at the end that it
> cannot work because we've got a tree filter with non-zero tree depth?
> Granted, this will not be the common case. But it still feels like we're
> doing needless work for cases where we know that bitmaps cannot answer
> the query.

I don't think so. We first call can_filter_bitmap(filter), which passes
NULL for bitmap_git. And then in filter_bitmap(), we only do actual work
if bitmap_git is non-NULL.

This is the same thing that saves us from even loading the bitmaps
(which is itself a non-trivial amount of work) if the filter cannot be
satisfied by bitmaps.

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 7/8] pack-bitmap: implement combined filter
  2021-04-09 11:17       ` Patrick Steinhardt
@ 2021-04-09 15:55         ` Jeff King
  0 siblings, 0 replies; 67+ messages in thread
From: Jeff King @ 2021-04-09 15:55 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder, Taylor Blau

On Fri, Apr 09, 2021 at 01:17:59PM +0200, Patrick Steinhardt wrote:

> > Before this patch, I think your test:
> > 
> > > +test_expect_success 'combine filter' '
> > > +	git rev-list --objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
> > > +	git rev-list --use-bitmap-index \
> > > +		     --objects --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
> > > +	test_bitmap_traversal expect actual
> > > +'
> > 
> > would pass anyway, because we'd just skip using bitmaps. Is there a way
> > we can tell that the bitmap code actually kicked in? Maybe a perf test
> > would make it clear (those aren't always run, but hopefully we'd
> > eventually notice a regression there).
> 
> I think that's not actually true. Note that we're using
> `test_bitmap_traversal`:

Ah, right. I forgot about the hackery in test_bitmap_traversal() to let
us tell the difference (even though I was the one who wrote it, I still
consider it hackery ;) ).

So yes, this is a good test that we are allowing the combine filter.

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 8/8] rev-list: allow filtering of provided items
  2021-04-09 10:59       ` Patrick Steinhardt
@ 2021-04-09 15:58         ` Jeff King
  0 siblings, 0 replies; 67+ messages in thread
From: Jeff King @ 2021-04-09 15:58 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Christian Couder, Taylor Blau

On Fri, Apr 09, 2021 at 12:59:41PM +0200, Patrick Steinhardt wrote:

> > The name seems a little confusing to me, as I can read is as both
> > "please filter the provided objects" and "a filter has been provided".
> > I guess "--filter-print-provided" would be more clear. And also the
> > default, so you'd want "--no-filter-print-provided". That's kind of
> > clunky, though. Maybe "--filter-omit-provided"?
> 
> Hum, "--filter-omit-provided" doesn't sound good to me, either. Omit to
> me sounds like it'd omit filtering provided items, but we're doing
> the reverse thing.

Yeah, I can see that.

> How about "--filter-provided-revisions"? Verbose, but at least it cannot
> be confused with a filter being provided.

Yes, that works for me. Maybe "--filter-provided-objects", since you
could also provide a non-revision on the command line (though I think
other parts of the docs are happy to refer to "revisions" or "commits"
on the command line, even though you can clearly provide non-commits
when used with --objects).

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 0/8] rev-parse: implement object type filter
  2021-04-09 11:14       ` Patrick Steinhardt
@ 2021-04-09 16:05         ` Jeff King
  0 siblings, 0 replies; 67+ messages in thread
From: Jeff King @ 2021-04-09 16:05 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Junio C Hamano, git, Christian Couder, Taylor Blau

On Fri, Apr 09, 2021 at 01:14:26PM +0200, Patrick Steinhardt wrote:

> > I dunno. Those aren't objections exactly. Just trying to put my finger
> > on why my initial reaction was "huh, why --filter?".
> 
> Yeah, I do kind of share these concerns. Ideally, we'd provide a nicer
> only-user-facing interface to query the repository for various objects.
> git-cat-file(1) would be the obvious thing that first gets into my mind,
> where it would be nice to have it filter stuff. But then on the other
> hand, it's really rather a simple "Give me what I tell you to" binary,
> which is probably a good thing. Other than that I don't think there's
> any executable that'd be a good fit -- we could do this via a new
> git-list-objects(1), but then again git-rev-list(1) already does most of
> what git-list-objects(1) would do, so why bother.

I don't think cat-file does quite the same thing. An important part of
rev-list is that it is traversing. So it is determining both
reachability, but also eliminating excluded objects. For example, there
is no cat-file equivalent (and can never be) of:

  git rev-list --objects --filter=object:type=blob $old..$new

Likewise for list-objects (which cat-file really _does_ cover, with
--batch-all-objects). Obviously you can pair rev-list with cat-file to
traverse and then filter, but the whole point of this series is to do so
more efficiently.

So I think putting this into rev-list is the only sensible option. The
question is just whether to use --filter, or if it should be:

  git rev-list --show-blobs --show-trees $old..$new

with rules like:

  - if no --show-X is given, show only commits

  - if one or more --show-X is given, show all of them (but nothing else)

  - --objects is equivalent to providing each of --show-commits
    --show-blobs --show-trees --show-tags

-Peff

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 0/8] rev-parse: implement object type filter
  2021-04-09 11:27   ` [PATCH v3 " Patrick Steinhardt
                       ` (7 preceding siblings ...)
  2021-04-09 11:28     ` [PATCH v3 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
@ 2021-04-11  6:02     ` Junio C Hamano
  2021-04-12 13:12       ` Patrick Steinhardt
  2021-04-12 13:37     ` [PATCH v4 0/8] rev-list: " Patrick Steinhardt
  9 siblings, 1 reply; 67+ messages in thread
From: Junio C Hamano @ 2021-04-11  6:02 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Christian Couder, Taylor Blau

Patrick Steinhardt <ps@pks.im> writes:

> Subject: Re: [PATCH v3 0/8] rev-parse: implement object type filter
>
> this is the third version of my patch series which implements a new
> `object:type` filter for git-rev-parse(1) and git-upload-pack(1) and
> extends support for bitmap indices to work with combined filters.

Do you truly mean rev-parse, or is it just a typo for rev-list?

> This mostly addresses Peff's comments. Thanks for your feedback!
>
>     - Removed the `base` parameter from `process_tag()`.
>
>     - The object type filter doesn't assume ordering for the object type
>       enum anymore.
>
>     - Combined filters in the bitmap path now verify that
>       `filter_bitmap` does not return any errors.
>
>     - Renamed "--filter-provided" to "--filter-provided-revisions" and
>       added documentation for it.
>
>     - Refactored the code to not munge the `filter_provided` field in
>       the filter options struct, but instead carry it in rev-list.c.
>
> Please see the attached range-diff for more details.
>
> Patrick

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 4/8] list-objects: support filtering by tag and commit
  2021-04-09 11:28     ` [PATCH v3 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
@ 2021-04-11  6:49       ` Junio C Hamano
  0 siblings, 0 replies; 67+ messages in thread
From: Junio C Hamano @ 2021-04-11  6:49 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Christian Couder, Taylor Blau

Patrick Steinhardt <ps@pks.im> writes:

> Object filters currently only support filtering blobs or trees based on
> some criteria. This commit lays the foundation to also allow filtering
> of tags and commits.
>
> No change in behaviour is expected from this commit given that there are
> no filters yet for those object types.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  list-objects-filter.c | 40 ++++++++++++++++++++++++++++++++++++++++
>  list-objects-filter.h |  2 ++
>  list-objects.c        | 22 +++++++++++++++++++---
>  3 files changed, 61 insertions(+), 3 deletions(-)
>
> diff --git a/list-objects-filter.c b/list-objects-filter.c
> index 39e2f15333..0ebfa52966 100644
> --- a/list-objects-filter.c
> +++ b/list-objects-filter.c
> @@ -82,6 +82,16 @@ static enum list_objects_filter_result filter_blobs_none(
>  	default:
>  		BUG("unknown filter_situation: %d", filter_situation);
>  
> +	case LOFS_TAG:
> +		assert(obj->type == OBJ_TAG);
> +		/* always include all tag objects */
> +		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
> +
> +	case LOFS_COMMIT:
> +		assert(obj->type == OBJ_COMMIT);
> +		/* always include all commit objects */
> +		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
> +
>  	case LOFS_BEGIN_TREE:
>  		assert(obj->type == OBJ_TREE);
>  		/* always include all tree objects */
> @@ -173,6 +183,16 @@ static enum list_objects_filter_result filter_trees_depth(
>  	default:
>  		BUG("unknown filter_situation: %d", filter_situation);
>  
> +	case LOFS_TAG:
> +		assert(obj->type == OBJ_TAG);
> +		/* always include all tag objects */
> +		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
> +
> +	case LOFS_COMMIT:
> +		assert(obj->type == OBJ_COMMIT);
> +		/* always include all commit objects */
> +		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
> +
>  	case LOFS_END_TREE:
>  		assert(obj->type == OBJ_TREE);
>  		filter_data->current_depth--;
> @@ -267,6 +287,16 @@ static enum list_objects_filter_result filter_blobs_limit(
>  	default:
>  		BUG("unknown filter_situation: %d", filter_situation);
>  
> +	case LOFS_TAG:
> +		assert(obj->type == OBJ_TAG);
> +		/* always include all tag objects */
> +		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
> +
> +	case LOFS_COMMIT:
> +		assert(obj->type == OBJ_COMMIT);
> +		/* always include all commit objects */
> +		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
> +
>  	case LOFS_BEGIN_TREE:
>  		assert(obj->type == OBJ_TREE);
>  		/* always include all tree objects */
> @@ -371,6 +401,16 @@ static enum list_objects_filter_result filter_sparse(
>  	default:
>  		BUG("unknown filter_situation: %d", filter_situation);
>  
> +	case LOFS_TAG:
> +		assert(obj->type == OBJ_TAG);
> +		/* always include all tag objects */
> +		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
> +
> +	case LOFS_COMMIT:
> +		assert(obj->type == OBJ_COMMIT);
> +		/* always include all commit objects */
> +		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
> +
>  	case LOFS_BEGIN_TREE:
>  		assert(obj->type == OBJ_TREE);
>  		dtype = DT_DIR;
> diff --git a/list-objects-filter.h b/list-objects-filter.h
> index cfd784e203..9e98814111 100644
> --- a/list-objects-filter.h
> +++ b/list-objects-filter.h
> @@ -55,6 +55,8 @@ enum list_objects_filter_result {
>  };
>  
>  enum list_objects_filter_situation {
> +	LOFS_COMMIT,
> +	LOFS_TAG,
>  	LOFS_BEGIN_TREE,
>  	LOFS_END_TREE,
>  	LOFS_BLOB
> diff --git a/list-objects.c b/list-objects.c
> index a5a60301cb..0c524a81ac 100644
> --- a/list-objects.c
> +++ b/list-objects.c
> @@ -217,8 +217,14 @@ static void process_tag(struct traversal_context *ctx,
>  			struct tag *tag,
>  			const char *name)
>  {
> -	tag->object.flags |= SEEN;
> -	ctx->show_object(&tag->object, name, ctx->show_data);
> +	enum list_objects_filter_result r;
> +
> +	r = list_objects_filter__filter_object(ctx->revs->repo, LOFS_TAG,
> +					       &tag->object, "", 0, ctx->filter);

s/0/NULL/

> +	if (r & LOFR_MARK_SEEN)
> +		tag->object.flags |= SEEN;
> +	if (r & LOFR_DO_SHOW)
> +		ctx->show_object(&tag->object, name, ctx->show_data);
>  }
>  
>  static void mark_edge_parents_uninteresting(struct commit *commit,
> @@ -368,6 +374,12 @@ static void do_traverse(struct traversal_context *ctx)
>  	strbuf_init(&csp, PATH_MAX);
>  
>  	while ((commit = get_revision(ctx->revs)) != NULL) {
> +		enum list_objects_filter_result r;
> +
> +		r = list_objects_filter__filter_object(ctx->revs->repo,
> +				LOFS_COMMIT, &commit->object,
> +				NULL, NULL, ctx->filter);
> +
>  		/*
>  		 * an uninteresting boundary commit may not have its tree
>  		 * parsed yet, but we are not going to show them anyway
> @@ -382,7 +394,11 @@ static void do_traverse(struct traversal_context *ctx)
>  			die(_("unable to load root tree for commit %s"),
>  			      oid_to_hex(&commit->object.oid));
>  		}
> -		ctx->show_commit(commit, ctx->show_data);
> +
> +		if (r & LOFR_MARK_SEEN)
> +			commit->object.flags |= SEEN;
> +		if (r & LOFR_DO_SHOW)
> +			ctx->show_commit(commit, ctx->show_data);
>  
>  		if (ctx->revs->tree_blobs_in_commit_order)
>  			/*

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 0/8] rev-parse: implement object type filter
  2021-04-11  6:02     ` [PATCH v3 0/8] rev-parse: implement object type filter Junio C Hamano
@ 2021-04-12 13:12       ` Patrick Steinhardt
  0 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-12 13:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Christian Couder, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 552 bytes --]

On Sat, Apr 10, 2021 at 11:02:55PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Subject: Re: [PATCH v3 0/8] rev-parse: implement object type filter
> >
> > this is the third version of my patch series which implements a new
> > `object:type` filter for git-rev-parse(1) and git-upload-pack(1) and
> > extends support for bitmap indices to work with combined filters.
> 
> Do you truly mean rev-parse, or is it just a typo for rev-list?

It's a typo both in the series' title and here in the text.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 8/8] rev-list: allow filtering of provided items
  2021-04-09 15:00       ` [PATCH " Philip Oakley
@ 2021-04-12 13:15         ` Patrick Steinhardt
  0 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-12 13:15 UTC (permalink / raw)
  To: Philip Oakley; +Cc: git, Jeff King, Christian Couder, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 965 bytes --]

On Fri, Apr 09, 2021 at 04:00:26PM +0100, Philip Oakley wrote:
> typo nit.
> On 09/04/2021 12:28, Patrick Steinhardt wrote:
> > When providing an object filter, it is currently impossible to also
> > filter provided items. E.g. when executing `git rev-list HEAD` , the
> > commit this reference points to will be treated as user-provided and is
> > thus excluded from the filtering mechanism. This makes it harder than
> > necessary to properly use the new `--filter=object:type` filter given
> > that even if the user wants to only see blobs, he'll still see commits
> > of provided references.
> >
> > Improve this by introducing a new `--filter-provided` option to the
> s/--filter-provided/--filter-provided-revisions/
> 
> Also in some tests - I presume the option should be spelled out in full.

Right. I did fix these in the resend because I forgot to stage changes,
but still had it in the commit message.

Fixed now, thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v4 0/8] rev-list: implement object type filter
  2021-04-09 11:27   ` [PATCH v3 " Patrick Steinhardt
                       ` (8 preceding siblings ...)
  2021-04-11  6:02     ` [PATCH v3 0/8] rev-parse: implement object type filter Junio C Hamano
@ 2021-04-12 13:37     ` Patrick Steinhardt
  2021-04-12 13:37       ` [PATCH v4 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
                         ` (7 more replies)
  9 siblings, 8 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-12 13:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Christian Couder, Taylor Blau, Philip Oakley

[-- Attachment #1: Type: text/plain, Size: 18246 bytes --]

Hi,

this is the fourth version of my patch series which implements a new
`object:type` filter for git-rev-list(1) and git-upload-pack(1) and
extends support for bitmap indices to work with combined filters.

Changes compared to v3:

    - Small style fix to not pass an empty string and `0`, but instead
      simply pass two `NULL` pointers to
      `list_objects_filter__filter_object, pointed out by Junio.

    - I've changed patch 7/8 as proposed by Peff: support of combined
      filters is now determined directly in `filter_bitmap()`, without
      having to mirror all filter types in the new `filter_supported()`
      function.

    - Renamed `--filter-provided-revisions` to
      `--filter-provided-objects` as proposed by Peff and addressed both
      commit message and tests as pointed out by Philip.

Thanks for all your feedback! As alawys, the range-diff is attached
below.

Patrick

Patrick Steinhardt (8):
  uploadpack.txt: document implication of `uploadpackfilter.allow`
  revision: mark commit parents as NOT_USER_GIVEN
  list-objects: move tag processing into its own function
  list-objects: support filtering by tag and commit
  list-objects: implement object type filter
  pack-bitmap: implement object type filter
  pack-bitmap: implement combined filter
  rev-list: allow filtering of provided items

 Documentation/config/uploadpack.txt |   9 ++-
 Documentation/rev-list-options.txt  |   8 ++
 builtin/pack-objects.c              |   2 +-
 builtin/rev-list.c                  |  36 ++++++---
 list-objects-filter-options.c       |  14 ++++
 list-objects-filter-options.h       |   2 +
 list-objects-filter.c               | 116 ++++++++++++++++++++++++++++
 list-objects-filter.h               |   2 +
 list-objects.c                      |  30 ++++++-
 pack-bitmap.c                       |  45 +++++++++--
 pack-bitmap.h                       |   3 +-
 reachable.c                         |   2 +-
 revision.c                          |   4 +-
 revision.h                          |   3 -
 t/t6112-rev-list-filters-objects.sh |  76 ++++++++++++++++++
 t/t6113-rev-list-bitmap-filters.sh  |  68 +++++++++++++++-
 16 files changed, 390 insertions(+), 30 deletions(-)

Range-diff against v3:
1:  f80b9570d4 = 1:  f80b9570d4 uploadpack.txt: document implication of `uploadpackfilter.allow`
2:  46c1952405 = 2:  46c1952405 revision: mark commit parents as NOT_USER_GIVEN
3:  3d792f6339 = 3:  3d792f6339 list-objects: move tag processing into its own function
4:  80193d6ba3 ! 4:  674da0f9ac list-objects: support filtering by tag and commit
    @@ list-objects.c: static void process_tag(struct traversal_context *ctx,
     +	enum list_objects_filter_result r;
     +
     +	r = list_objects_filter__filter_object(ctx->revs->repo, LOFS_TAG,
    -+					       &tag->object, "", 0, ctx->filter);
    ++					       &tag->object, NULL, NULL,
    ++					       ctx->filter);
     +	if (r & LOFR_MARK_SEEN)
     +		tag->object.flags |= SEEN;
     +	if (r & LOFR_DO_SHOW)
5:  e2a14abf92 = 5:  d22a5fd37d list-objects: implement object type filter
6:  46d4450d38 = 6:  17c9f66bbc pack-bitmap: implement object type filter
7:  06a376399b ! 7:  759ac54bb2 pack-bitmap: implement combined filter
    @@ Commit message
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## pack-bitmap.c ##
    -@@ pack-bitmap.c: static void filter_bitmap_object_type(struct bitmap_index *bitmap_git,
    - 		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, OBJ_BLOB);
    - }
    - 
    -+static int filter_supported(struct list_objects_filter_options *filter)
    -+{
    -+	int i;
    -+
    -+	switch (filter->choice) {
    -+	case LOFC_BLOB_NONE:
    -+	case LOFC_BLOB_LIMIT:
    -+	case LOFC_OBJECT_TYPE:
    -+		return 1;
    -+	case LOFC_TREE_DEPTH:
    -+		if (filter->tree_exclude_depth == 0)
    -+			return 1;
    -+		return 0;
    -+	case LOFC_COMBINE:
    -+		for (i = 0; i < filter->sub_nr; i++)
    -+			if (!filter_supported(&filter->sub[i]))
    -+				return 0;
    -+		return 1;
    -+	default:
    -+		return 0;
    -+	}
    -+}
    -+
    - static int filter_bitmap(struct bitmap_index *bitmap_git,
    - 			 struct object_list *tip_objects,
    - 			 struct bitmap *to_filter,
    -@@ pack-bitmap.c: static int filter_bitmap(struct bitmap_index *bitmap_git,
    - {
    - 	if (!filter || filter->choice == LOFC_DISABLED)
    - 		return 0;
    -+	if (!filter_supported(filter))
    -+		return -1;
    - 
    - 	if (filter->choice == LOFC_BLOB_NONE) {
    - 		if (bitmap_git)
     @@ pack-bitmap.c: static int filter_bitmap(struct bitmap_index *bitmap_git,
      		return 0;
      	}
      
    --	if (filter->choice == LOFC_TREE_DEPTH &&
    --	    filter->tree_exclude_depth == 0) {
    -+	if (filter->choice == LOFC_TREE_DEPTH) {
    - 		if (bitmap_git)
    - 			filter_bitmap_tree_depth(bitmap_git, tip_objects,
    - 						 to_filter,
    -@@ pack-bitmap.c: static int filter_bitmap(struct bitmap_index *bitmap_git,
    - 		return 0;
    - 	}
    - 
    --	/* filter choice not handled */
    --	return -1;
     +	if (filter->choice == LOFC_COMBINE) {
     +		int i;
     +		for (i = 0; i < filter->sub_nr; i++) {
    @@ pack-bitmap.c: static int filter_bitmap(struct bitmap_index *bitmap_git,
     +		return 0;
     +	}
     +
    -+	BUG("unsupported filter choice");
    + 	/* filter choice not handled */
    + 	return -1;
      }
    - 
    - static int can_filter_bitmap(struct list_objects_filter_options *filter)
     
      ## t/t6113-rev-list-bitmap-filters.sh ##
     @@ t/t6113-rev-list-bitmap-filters.sh: test_expect_success 'object:type filter' '
8:  cf2297b413 ! 8:  c779d222cf rev-list: allow filtering of provided items
    @@ Commit message
         that even if the user wants to only see blobs, he'll still see commits
         of provided references.
     
    -    Improve this by introducing a new `--filter-provided` option to the
    -    git-rev-parse(1) command. If given, then all user-provided references
    -    will be subject to filtering.
    +    Improve this by introducing a new `--filter-provided-objects` option
    +    to the git-rev-parse(1) command. If given, then all user-provided
    +    references will be subject to filtering.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
    @@ Documentation/rev-list-options.txt: equivalent.
      --no-filter::
      	Turn off any previous `--filter=` argument.
      
    -+--filter-provided-revisions::
    -+	Filter the list of explicitly provided revisions, which would otherwise
    ++--filter-provided-objects::
    ++	Filter the list of explicitly provided objects, which would otherwise
     +	always be printed even if they did not match any of the filters. Only
     +	useful with `--filter=`.
     +
    @@ builtin/rev-list.c: static inline int parse_missing_action_value(const char *val
      static int try_bitmap_count(struct rev_info *revs,
     -			    struct list_objects_filter_options *filter)
     +			    struct list_objects_filter_options *filter,
    -+			    int filter_provided_revs)
    ++			    int filter_provided_objects)
      {
      	uint32_t commit_count = 0,
      		 tag_count = 0,
    @@ builtin/rev-list.c: static int try_bitmap_count(struct rev_info *revs,
      	max_count = revs->max_count;
      
     -	bitmap_git = prepare_bitmap_walk(revs, filter);
    -+	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_revs);
    ++	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
      	if (!bitmap_git)
      		return -1;
      
    @@ builtin/rev-list.c: static int try_bitmap_count(struct rev_info *revs,
      static int try_bitmap_traversal(struct rev_info *revs,
     -				struct list_objects_filter_options *filter)
     +				struct list_objects_filter_options *filter,
    -+				int filter_provided_revs)
    ++				int filter_provided_objects)
      {
      	struct bitmap_index *bitmap_git;
      
    @@ builtin/rev-list.c: static int try_bitmap_traversal(struct rev_info *revs,
      		return -1;
      
     -	bitmap_git = prepare_bitmap_walk(revs, filter);
    -+	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_revs);
    ++	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
      	if (!bitmap_git)
      		return -1;
      
    @@ builtin/rev-list.c: static int try_bitmap_traversal(struct rev_info *revs,
      static int try_bitmap_disk_usage(struct rev_info *revs,
     -				 struct list_objects_filter_options *filter)
     +				 struct list_objects_filter_options *filter,
    -+				 int filter_provided_revs)
    ++				 int filter_provided_objects)
      {
      	struct bitmap_index *bitmap_git;
      
    @@ builtin/rev-list.c: static int try_bitmap_traversal(struct rev_info *revs,
      		return -1;
      
     -	bitmap_git = prepare_bitmap_walk(revs, filter);
    -+	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_revs);
    ++	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
      	if (!bitmap_git)
      		return -1;
      
    @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
      	int bisect_show_vars = 0;
      	int bisect_find_all = 0;
      	int use_bitmap_index = 0;
    -+	int filter_provided_revs = 0;
    ++	int filter_provided_objects = 0;
      	const char *show_progress = NULL;
      
      	if (argc == 2 && !strcmp(argv[1], "-h"))
    @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
      			list_objects_filter_set_no_filter(&filter_options);
      			continue;
      		}
    -+		if (!strcmp(arg, "--filter-provided-revisions")) {
    -+			filter_provided_revs = 1;
    ++		if (!strcmp(arg, "--filter-provided-objects")) {
    ++			filter_provided_objects = 1;
     +			continue;
     +		}
      		if (!strcmp(arg, "--filter-print-omitted")) {
    @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
      
      	if (use_bitmap_index) {
     -		if (!try_bitmap_count(&revs, &filter_options))
    -+		if (!try_bitmap_count(&revs, &filter_options, filter_provided_revs))
    ++		if (!try_bitmap_count(&revs, &filter_options, filter_provided_objects))
      			return 0;
     -		if (!try_bitmap_disk_usage(&revs, &filter_options))
    -+		if (!try_bitmap_disk_usage(&revs, &filter_options, filter_provided_revs))
    ++		if (!try_bitmap_disk_usage(&revs, &filter_options, filter_provided_objects))
      			return 0;
     -		if (!try_bitmap_traversal(&revs, &filter_options))
    -+		if (!try_bitmap_traversal(&revs, &filter_options, filter_provided_revs))
    ++		if (!try_bitmap_traversal(&revs, &filter_options, filter_provided_objects))
      			return 0;
      	}
      
    @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
      			return show_bisect_vars(&info, reaches, all);
      	}
      
    -+	if (filter_provided_revs) {
    ++	if (filter_provided_objects) {
     +		struct commit_list *c;
     +		for (i = 0; i < revs.pending.nr; i++) {
     +			struct object_array_entry *pending = revs.pending.objects + i;
    @@ pack-bitmap.c: static int can_filter_bitmap(struct list_objects_filter_options *
      struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
     -					 struct list_objects_filter_options *filter)
     +					 struct list_objects_filter_options *filter,
    -+					 int filter_provided_revs)
    ++					 int filter_provided_objects)
      {
      	unsigned int i;
      
    @@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
      		bitmap_and_not(wants_bitmap, haves_bitmap);
      
     -	filter_bitmap(bitmap_git, wants, wants_bitmap, filter);
    -+	filter_bitmap(bitmap_git, (filter && filter_provided_revs) ? NULL : wants,
    ++	filter_bitmap(bitmap_git, (filter && filter_provided_objects) ? NULL : wants,
     +		      wants_bitmap, filter);
      
      	bitmap_git->result = wants_bitmap;
    @@ pack-bitmap.h: void traverse_bitmap_commit_list(struct bitmap_index *,
      struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
     -					 struct list_objects_filter_options *filter);
     +					 struct list_objects_filter_options *filter,
    -+					 int filter_provided_revs);
    ++					 int filter_provided_objects);
      int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
      				       struct packed_git **packfile,
      				       uint32_t *entries,
    @@ t/t6112-rev-list-filters-objects.sh: test_expect_success 'verify object:type=tag
      	test_cmp expected actual
      '
      
    -+test_expect_success 'verify object:type=blob prints only blob with --filter-provided-revisions' '
    ++test_expect_success 'verify object:type=blob prints only blob with --filter-provided-objects' '
     +	printf "%s blob\n" $(git -C object-type rev-parse HEAD:blob) >expected &&
     +	git -C object-type rev-list --objects \
    -+		--filter=object:type=blob --filter-provided-revisions HEAD >actual &&
    ++		--filter=object:type=blob --filter-provided-objects HEAD >actual &&
     +	test_cmp expected actual
     +'
     +
    -+test_expect_success 'verify object:type=tree prints only tree with --filter-provided-revisions' '
    ++test_expect_success 'verify object:type=tree prints only tree with --filter-provided-objects' '
     +	printf "%s \n" $(git -C object-type rev-parse HEAD^{tree}) >expected &&
     +	git -C object-type rev-list --objects \
    -+		--filter=object:type=tree HEAD --filter-provided-revisions >actual &&
    ++		--filter=object:type=tree HEAD --filter-provided-objects >actual &&
     +	test_cmp expected actual
     +'
     +
    -+test_expect_success 'verify object:type=commit prints only commit with --filter-provided-revisions' '
    ++test_expect_success 'verify object:type=commit prints only commit with --filter-provided-objects' '
     +	git -C object-type rev-parse HEAD >expected &&
     +	git -C object-type rev-list --objects \
    -+		--filter=object:type=commit --filter-provided-revisions HEAD >actual &&
    ++		--filter=object:type=commit --filter-provided-objects HEAD >actual &&
     +	test_cmp expected actual
     +'
     +
    -+test_expect_success 'verify object:type=tag prints only tag with --filter-provided-revisions' '
    ++test_expect_success 'verify object:type=tag prints only tag with --filter-provided-objects' '
     +	printf "%s tag\n" $(git -C object-type rev-parse tag) >expected &&
     +	git -C object-type rev-list --objects \
    -+		--filter=object:type=tag --filter-provided-revisions tag >actual &&
    ++		--filter=object:type=tag --filter-provided-objects tag >actual &&
     +	test_cmp expected actual
     +'
     +
    @@ t/t6113-rev-list-bitmap-filters.sh: test_expect_success 'object:type filter' '
      	test_bitmap_traversal expect actual
      '
      
    -+test_expect_success 'object:type filter with --filter-provided-revisions' '
    -+	git rev-list --objects --filter-provided-revisions --filter=object:type=tag tag >expect &&
    ++test_expect_success 'object:type filter with --filter-provided-objects' '
    ++	git rev-list --objects --filter-provided-objects --filter=object:type=tag tag >expect &&
     +	git rev-list --use-bitmap-index \
    -+		     --objects --filter-provided-revisions --filter=object:type=tag tag >actual &&
    ++		     --objects --filter-provided-objects --filter=object:type=tag tag >actual &&
     +	test_cmp expect actual &&
     +
    -+	git rev-list --objects --filter-provided-revisions --filter=object:type=commit tag >expect &&
    ++	git rev-list --objects --filter-provided-objects --filter=object:type=commit tag >expect &&
     +	git rev-list --use-bitmap-index \
    -+		     --objects --filter-provided-revisions --filter=object:type=commit tag >actual &&
    ++		     --objects --filter-provided-objects --filter=object:type=commit tag >actual &&
     +	test_bitmap_traversal expect actual &&
     +
    -+	git rev-list --objects --filter-provided-revisions --filter=object:type=tree tag >expect &&
    ++	git rev-list --objects --filter-provided-objects --filter=object:type=tree tag >expect &&
     +	git rev-list --use-bitmap-index \
    -+		     --objects --filter-provided-revisions --filter=object:type=tree tag >actual &&
    ++		     --objects --filter-provided-objects --filter=object:type=tree tag >actual &&
     +	test_bitmap_traversal expect actual &&
     +
    -+	git rev-list --objects --filter-provided-revisions --filter=object:type=blob tag >expect &&
    ++	git rev-list --objects --filter-provided-objects --filter=object:type=blob tag >expect &&
     +	git rev-list --use-bitmap-index \
    -+		     --objects --filter-provided-revisions --filter=object:type=blob tag >actual &&
    ++		     --objects --filter-provided-objects --filter=object:type=blob tag >actual &&
     +	test_bitmap_traversal expect actual
     +'
     +
    @@ t/t6113-rev-list-bitmap-filters.sh: test_expect_success 'combine filter' '
      	test_bitmap_traversal expect actual
      '
      
    -+test_expect_success 'combine filter with --filter-provided-revisions' '
    -+	git rev-list --objects --filter-provided-revisions --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
    ++test_expect_success 'combine filter with --filter-provided-objects' '
    ++	git rev-list --objects --filter-provided-objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
     +	git rev-list --use-bitmap-index \
    -+		     --objects --filter-provided-revisions --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
    ++		     --objects --filter-provided-objects --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
     +	test_bitmap_traversal expect actual &&
     +
     +	git cat-file --batch-check="%(objecttype) %(objectsize)" <actual >objects &&
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v4 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow`
  2021-04-12 13:37     ` [PATCH v4 0/8] rev-list: " Patrick Steinhardt
@ 2021-04-12 13:37       ` Patrick Steinhardt
  2021-04-12 13:37       ` [PATCH v4 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
                         ` (6 subsequent siblings)
  7 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-12 13:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Christian Couder, Taylor Blau, Philip Oakley

[-- Attachment #1: Type: text/plain, Size: 1317 bytes --]

When `uploadpackfilter.allow` is set to `true`, it means that filters
are enabled by default except in the case where a filter is explicitly
disabled via `uploadpackilter.<filter>.allow`. This option will not only
enable the currently supported set of filters, but also any filters
which get added in the future. As such, an admin which wants to have
tight control over which filters are allowed and which aren't probably
shouldn't ever set `uploadpackfilter.allow=true`.

Amend the documentation to make the ramifications more explicit so that
admins are aware of this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/uploadpack.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/uploadpack.txt b/Documentation/config/uploadpack.txt
index b0d761282c..6729a072ea 100644
--- a/Documentation/config/uploadpack.txt
+++ b/Documentation/config/uploadpack.txt
@@ -59,7 +59,8 @@ uploadpack.allowFilter::
 
 uploadpackfilter.allow::
 	Provides a default value for unspecified object filters (see: the
-	below configuration variable).
+	below configuration variable). If set to `true`, this will also
+	enable all filters which get added in the future.
 	Defaults to `true`.
 
 uploadpackfilter.<filter>.allow::
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v4 2/8] revision: mark commit parents as NOT_USER_GIVEN
  2021-04-12 13:37     ` [PATCH v4 0/8] rev-list: " Patrick Steinhardt
  2021-04-12 13:37       ` [PATCH v4 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
@ 2021-04-12 13:37       ` Patrick Steinhardt
  2021-04-12 13:37       ` [PATCH v4 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
                         ` (5 subsequent siblings)
  7 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-12 13:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Christian Couder, Taylor Blau, Philip Oakley

[-- Attachment #1: Type: text/plain, Size: 2338 bytes --]

The NOT_USER_GIVEN flag of an object marks whether a flag was explicitly
provided by the user or not. The most important use case for this is
when filtering objects: only objects that were not explicitly requested
will get filtered.

The flag is currently only set for blobs and trees, which has been fine
given that there are no filters for tags or commits currently. We're
about to extend filtering capabilities to add object type filter though,
which requires us to set up the NOT_USER_GIVEN flag correctly -- if it's
not set, the object wouldn't get filtered at all.

Mark unseen commit parents as NOT_USER_GIVEN when processing parents.
Like this, explicitly provided parents stay user-given and thus
unfiltered, while parents which get loaded as part of the graph walk
can be filtered.

This commit shouldn't have any user-visible impact yet as there is no
logic to filter commits yet.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 revision.c | 4 ++--
 revision.h | 3 ---
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/revision.c b/revision.c
index 553c0faa9b..fd34c75e23 100644
--- a/revision.c
+++ b/revision.c
@@ -1123,7 +1123,7 @@ static int process_parents(struct rev_info *revs, struct commit *commit,
 				mark_parents_uninteresting(p);
 			if (p->object.flags & SEEN)
 				continue;
-			p->object.flags |= SEEN;
+			p->object.flags |= (SEEN | NOT_USER_GIVEN);
 			if (list)
 				commit_list_insert_by_date(p, list);
 			if (queue)
@@ -1165,7 +1165,7 @@ static int process_parents(struct rev_info *revs, struct commit *commit,
 		}
 		p->object.flags |= left_flag;
 		if (!(p->object.flags & SEEN)) {
-			p->object.flags |= SEEN;
+			p->object.flags |= (SEEN | NOT_USER_GIVEN);
 			if (list)
 				commit_list_insert_by_date(p, list);
 			if (queue)
diff --git a/revision.h b/revision.h
index a24f72dcd1..93aa012f51 100644
--- a/revision.h
+++ b/revision.h
@@ -44,9 +44,6 @@
 /*
  * Indicates object was reached by traversal. i.e. not given by user on
  * command-line or stdin.
- * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
- * filtering trees and blobs, but it may be useful to support filtering commits
- * in the future.
  */
 #define NOT_USER_GIVEN	(1u<<25)
 #define TRACK_LINEAR	(1u<<26)
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v4 3/8] list-objects: move tag processing into its own function
  2021-04-12 13:37     ` [PATCH v4 0/8] rev-list: " Patrick Steinhardt
  2021-04-12 13:37       ` [PATCH v4 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
  2021-04-12 13:37       ` [PATCH v4 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
@ 2021-04-12 13:37       ` Patrick Steinhardt
  2021-04-12 13:37       ` [PATCH v4 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
                         ` (4 subsequent siblings)
  7 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-12 13:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Christian Couder, Taylor Blau, Philip Oakley

[-- Attachment #1: Type: text/plain, Size: 1259 bytes --]

Move processing of tags into its own function to make the logic easier
to extend when we're going to implement filtering for tags. No change in
behaviour is expected from this commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 list-objects.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index e19589baa0..a5a60301cb 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -213,6 +213,14 @@ static void process_tree(struct traversal_context *ctx,
 	free_tree_buffer(tree);
 }
 
+static void process_tag(struct traversal_context *ctx,
+			struct tag *tag,
+			const char *name)
+{
+	tag->object.flags |= SEEN;
+	ctx->show_object(&tag->object, name, ctx->show_data);
+}
+
 static void mark_edge_parents_uninteresting(struct commit *commit,
 					    struct rev_info *revs,
 					    show_edge_fn show_edge)
@@ -334,8 +342,7 @@ static void traverse_trees_and_blobs(struct traversal_context *ctx,
 		if (obj->flags & (UNINTERESTING | SEEN))
 			continue;
 		if (obj->type == OBJ_TAG) {
-			obj->flags |= SEEN;
-			ctx->show_object(obj, name, ctx->show_data);
+			process_tag(ctx, (struct tag *)obj, name);
 			continue;
 		}
 		if (!path)
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v4 4/8] list-objects: support filtering by tag and commit
  2021-04-12 13:37     ` [PATCH v4 0/8] rev-list: " Patrick Steinhardt
                         ` (2 preceding siblings ...)
  2021-04-12 13:37       ` [PATCH v4 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
@ 2021-04-12 13:37       ` Patrick Steinhardt
  2021-04-12 13:37       ` [PATCH v4 5/8] list-objects: implement object type filter Patrick Steinhardt
                         ` (3 subsequent siblings)
  7 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-12 13:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Christian Couder, Taylor Blau, Philip Oakley

[-- Attachment #1: Type: text/plain, Size: 4809 bytes --]

Object filters currently only support filtering blobs or trees based on
some criteria. This commit lays the foundation to also allow filtering
of tags and commits.

No change in behaviour is expected from this commit given that there are
no filters yet for those object types.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 list-objects-filter.c | 40 ++++++++++++++++++++++++++++++++++++++++
 list-objects-filter.h |  2 ++
 list-objects.c        | 23 ++++++++++++++++++++---
 3 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/list-objects-filter.c b/list-objects-filter.c
index 39e2f15333..0ebfa52966 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -82,6 +82,16 @@ static enum list_objects_filter_result filter_blobs_none(
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		/* always include all tag objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		/* always include all commit objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		/* always include all tree objects */
@@ -173,6 +183,16 @@ static enum list_objects_filter_result filter_trees_depth(
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		/* always include all tag objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		/* always include all commit objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
 	case LOFS_END_TREE:
 		assert(obj->type == OBJ_TREE);
 		filter_data->current_depth--;
@@ -267,6 +287,16 @@ static enum list_objects_filter_result filter_blobs_limit(
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		/* always include all tag objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		/* always include all commit objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		/* always include all tree objects */
@@ -371,6 +401,16 @@ static enum list_objects_filter_result filter_sparse(
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		/* always include all tag objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		/* always include all commit objects */
+		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		dtype = DT_DIR;
diff --git a/list-objects-filter.h b/list-objects-filter.h
index cfd784e203..9e98814111 100644
--- a/list-objects-filter.h
+++ b/list-objects-filter.h
@@ -55,6 +55,8 @@ enum list_objects_filter_result {
 };
 
 enum list_objects_filter_situation {
+	LOFS_COMMIT,
+	LOFS_TAG,
 	LOFS_BEGIN_TREE,
 	LOFS_END_TREE,
 	LOFS_BLOB
diff --git a/list-objects.c b/list-objects.c
index a5a60301cb..7f404677d5 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -217,8 +217,15 @@ static void process_tag(struct traversal_context *ctx,
 			struct tag *tag,
 			const char *name)
 {
-	tag->object.flags |= SEEN;
-	ctx->show_object(&tag->object, name, ctx->show_data);
+	enum list_objects_filter_result r;
+
+	r = list_objects_filter__filter_object(ctx->revs->repo, LOFS_TAG,
+					       &tag->object, NULL, NULL,
+					       ctx->filter);
+	if (r & LOFR_MARK_SEEN)
+		tag->object.flags |= SEEN;
+	if (r & LOFR_DO_SHOW)
+		ctx->show_object(&tag->object, name, ctx->show_data);
 }
 
 static void mark_edge_parents_uninteresting(struct commit *commit,
@@ -368,6 +375,12 @@ static void do_traverse(struct traversal_context *ctx)
 	strbuf_init(&csp, PATH_MAX);
 
 	while ((commit = get_revision(ctx->revs)) != NULL) {
+		enum list_objects_filter_result r;
+
+		r = list_objects_filter__filter_object(ctx->revs->repo,
+				LOFS_COMMIT, &commit->object,
+				NULL, NULL, ctx->filter);
+
 		/*
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
@@ -382,7 +395,11 @@ static void do_traverse(struct traversal_context *ctx)
 			die(_("unable to load root tree for commit %s"),
 			      oid_to_hex(&commit->object.oid));
 		}
-		ctx->show_commit(commit, ctx->show_data);
+
+		if (r & LOFR_MARK_SEEN)
+			commit->object.flags |= SEEN;
+		if (r & LOFR_DO_SHOW)
+			ctx->show_commit(commit, ctx->show_data);
 
 		if (ctx->revs->tree_blobs_in_commit_order)
 			/*
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v4 5/8] list-objects: implement object type filter
  2021-04-12 13:37     ` [PATCH v4 0/8] rev-list: " Patrick Steinhardt
                         ` (3 preceding siblings ...)
  2021-04-12 13:37       ` [PATCH v4 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
@ 2021-04-12 13:37       ` Patrick Steinhardt
  2021-04-12 13:37       ` [PATCH v4 6/8] pack-bitmap: " Patrick Steinhardt
                         ` (2 subsequent siblings)
  7 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-12 13:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Christian Couder, Taylor Blau, Philip Oakley

[-- Attachment #1: Type: text/plain, Size: 9502 bytes --]

While it already is possible to filter objects by some criteria in
git-rev-list(1), it is not yet possible to filter out only a specific
type of objects. This makes some filters less useful. The `blob:limit`
filter for example filters blobs such that only those which are smaller
than the given limit are returned. But it is unfit to ask only for these
smallish blobs, given that git-rev-list(1) will continue to print tags,
commits and trees.

Now that we have the infrastructure in place to also filter tags and
commits, we can improve this situation by implementing a new filter
which selects objects based on their type. Above query can thus
trivially be implemented with the following command:

    $ git rev-list --objects --filter=object:type=blob \
        --filter=blob:limit=200

Furthermore, this filter allows to optimize for certain other cases: if
for example only tags or commits have been selected, there is no need to
walk down trees.

The new filter is not yet supported in bitmaps. This is going to be
implemented in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/uploadpack.txt |  6 +--
 Documentation/rev-list-options.txt  |  3 ++
 list-objects-filter-options.c       | 14 ++++++
 list-objects-filter-options.h       |  2 +
 list-objects-filter.c               | 76 +++++++++++++++++++++++++++++
 t/t6112-rev-list-filters-objects.sh | 48 ++++++++++++++++++
 6 files changed, 146 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/uploadpack.txt b/Documentation/config/uploadpack.txt
index 6729a072ea..32fad5bbe8 100644
--- a/Documentation/config/uploadpack.txt
+++ b/Documentation/config/uploadpack.txt
@@ -66,9 +66,9 @@ uploadpackfilter.allow::
 uploadpackfilter.<filter>.allow::
 	Explicitly allow or ban the object filter corresponding to
 	`<filter>`, where `<filter>` may be one of: `blob:none`,
-	`blob:limit`, `tree`, `sparse:oid`, or `combine`. If using
-	combined filters, both `combine` and all of the nested filter
-	kinds must be allowed. Defaults to `uploadpackfilter.allow`.
+	`blob:limit`, `object:type`, `tree`, `sparse:oid`, or `combine`.
+	If using combined filters, both `combine` and all of the nested
+	filter kinds must be allowed. Defaults to `uploadpackfilter.allow`.
 
 uploadpackfilter.tree.maxDepth::
 	Only allow `--filter=tree:<n>` when `<n>` is no more than the value of
diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index b1c8f86c6e..3afa8fffbd 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -892,6 +892,9 @@ or units.  n may be zero.  The suffixes k, m, and g can be used to name
 units in KiB, MiB, or GiB.  For example, 'blob:limit=1k' is the same
 as 'blob:limit=1024'.
 +
+The form '--filter=object:type=(tag|commit|tree|blob)' omits all objects
+which are not of the requested type.
++
 The form '--filter=sparse:oid=<blob-ish>' uses a sparse-checkout
 specification contained in the blob (or blob-expression) '<blob-ish>'
 to omit blobs that would not be not required for a sparse checkout on
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index d2d1c81caf..bb6f6577d5 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -29,6 +29,8 @@ const char *list_object_filter_config_name(enum list_objects_filter_choice c)
 		return "tree";
 	case LOFC_SPARSE_OID:
 		return "sparse:oid";
+	case LOFC_OBJECT_TYPE:
+		return "object:type";
 	case LOFC_COMBINE:
 		return "combine";
 	case LOFC__COUNT:
@@ -97,6 +99,18 @@ static int gently_parse_list_objects_filter(
 		}
 		return 1;
 
+	} else if (skip_prefix(arg, "object:type=", &v0)) {
+		int type = type_from_string_gently(v0, -1, 1);
+		if (type < 0) {
+			strbuf_addstr(errbuf, _("expected 'object:type=<type>'"));
+			return 1;
+		}
+
+		filter_options->object_type = type;
+		filter_options->choice = LOFC_OBJECT_TYPE;
+
+		return 0;
+
 	} else if (skip_prefix(arg, "combine:", &v0)) {
 		return parse_combine_filter(filter_options, v0, errbuf);
 
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 01767c3c96..4d0d0588cc 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -13,6 +13,7 @@ enum list_objects_filter_choice {
 	LOFC_BLOB_LIMIT,
 	LOFC_TREE_DEPTH,
 	LOFC_SPARSE_OID,
+	LOFC_OBJECT_TYPE,
 	LOFC_COMBINE,
 	LOFC__COUNT /* must be last */
 };
@@ -54,6 +55,7 @@ struct list_objects_filter_options {
 	char *sparse_oid_name;
 	unsigned long blob_limit_value;
 	unsigned long tree_exclude_depth;
+	enum object_type object_type;
 
 	/* LOFC_COMBINE values */
 
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 0ebfa52966..1c1ee3d1bb 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -545,6 +545,81 @@ static void filter_sparse_oid__init(
 	filter->free_fn = filter_sparse_free;
 }
 
+/*
+ * A filter for list-objects to omit large blobs.
+ * And to OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_object_type_data {
+	enum object_type object_type;
+};
+
+static enum list_objects_filter_result filter_object_type(
+	struct repository *r,
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	struct oidset *omits,
+	void *filter_data_)
+{
+	struct filter_object_type_data *filter_data = filter_data_;
+
+	switch (filter_situation) {
+	default:
+		BUG("unknown filter_situation: %d", filter_situation);
+
+	case LOFS_TAG:
+		assert(obj->type == OBJ_TAG);
+		if (filter_data->object_type == OBJ_TAG)
+			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+		return LOFR_MARK_SEEN;
+
+	case LOFS_COMMIT:
+		assert(obj->type == OBJ_COMMIT);
+		if (filter_data->object_type == OBJ_COMMIT)
+			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+		return LOFR_MARK_SEEN;
+
+	case LOFS_BEGIN_TREE:
+		assert(obj->type == OBJ_TREE);
+
+		/*
+		 * If we only want to show commits or tags, then there is no
+		 * need to walk down trees.
+		 */
+		if (filter_data->object_type == OBJ_COMMIT ||
+		    filter_data->object_type == OBJ_TAG)
+			return LOFR_SKIP_TREE;
+
+		if (filter_data->object_type == OBJ_TREE)
+			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+		return LOFR_MARK_SEEN;
+
+	case LOFS_BLOB:
+		assert(obj->type == OBJ_BLOB);
+
+		if (filter_data->object_type == OBJ_BLOB)
+			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+		return LOFR_MARK_SEEN;
+
+	case LOFS_END_TREE:
+		return LOFR_ZERO;
+	}
+}
+
+static void filter_object_type__init(
+	struct list_objects_filter_options *filter_options,
+	struct filter *filter)
+{
+	struct filter_object_type_data *d = xcalloc(1, sizeof(*d));
+	d->object_type = filter_options->object_type;
+
+	filter->filter_data = d;
+	filter->filter_object_fn = filter_object_type;
+	filter->free_fn = free;
+}
+
 /* A filter which only shows objects shown by all sub-filters. */
 struct combine_filter_data {
 	struct subfilter *sub;
@@ -691,6 +766,7 @@ static filter_init_fn s_filters[] = {
 	filter_blobs_limit__init,
 	filter_trees_depth__init,
 	filter_sparse_oid__init,
+	filter_object_type__init,
 	filter_combine__init,
 };
 
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 31457d13b9..c79ec04060 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -159,6 +159,54 @@ test_expect_success 'verify blob:limit=1m' '
 	test_must_be_empty observed
 '
 
+# Test object:type=<type> filter.
+
+test_expect_success 'setup object-type' '
+	git init object-type &&
+	echo contents >object-type/blob &&
+	git -C object-type add blob &&
+	git -C object-type commit -m commit-message &&
+	git -C object-type tag tag -m tag-message
+'
+
+test_expect_success 'verify object:type= fails with invalid type' '
+	test_must_fail git -C object-type rev-list --objects --filter=object:type= HEAD &&
+	test_must_fail git -C object-type rev-list --objects --filter=object:type=invalid HEAD
+'
+
+test_expect_success 'verify object:type=blob prints blob and commit' '
+	(
+		git -C object-type rev-parse HEAD &&
+		printf "%s blob\n" $(git -C object-type rev-parse HEAD:blob)
+	) >expected &&
+	git -C object-type rev-list --objects --filter=object:type=blob HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tree prints tree and commit' '
+	(
+		git -C object-type rev-parse HEAD &&
+		printf "%s \n" $(git -C object-type rev-parse HEAD^{tree})
+	) >expected &&
+	git -C object-type rev-list --objects --filter=object:type=tree HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=commit prints commit' '
+	git -C object-type rev-parse HEAD >expected &&
+	git -C object-type rev-list --objects --filter=object:type=commit HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tag prints tag' '
+	(
+		git -C object-type rev-parse HEAD &&
+		printf "%s tag\n" $(git -C object-type rev-parse tag)
+	) >expected &&
+	git -C object-type rev-list --objects --filter=object:type=tag tag >actual &&
+	test_cmp expected actual
+'
+
 # Test sparse:path=<path> filter.
 # !!!!
 # NOTE: sparse:path filter support has been dropped for security reasons,
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v4 6/8] pack-bitmap: implement object type filter
  2021-04-12 13:37     ` [PATCH v4 0/8] rev-list: " Patrick Steinhardt
                         ` (4 preceding siblings ...)
  2021-04-12 13:37       ` [PATCH v4 5/8] list-objects: implement object type filter Patrick Steinhardt
@ 2021-04-12 13:37       ` Patrick Steinhardt
  2021-04-12 13:37       ` [PATCH v4 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
  2021-04-12 13:37       ` [PATCH v4 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
  7 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-12 13:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Christian Couder, Taylor Blau, Philip Oakley

[-- Attachment #1: Type: text/plain, Size: 3827 bytes --]

The preceding commit has added a new object filter for git-rev-list(1)
which allows to filter objects by type. Implement the equivalent filter
for packfile bitmaps so that we can answer these queries fast.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 pack-bitmap.c                      | 29 ++++++++++++++++++++++++++---
 t/t6113-rev-list-bitmap-filters.sh | 25 ++++++++++++++++++++++++-
 2 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index b4513f8672..cd3f5c433e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -779,9 +779,6 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
 	eword_t mask;
 	uint32_t i;
 
-	if (type != OBJ_BLOB && type != OBJ_TREE)
-		BUG("filter_bitmap_exclude_type: unsupported type '%d'", type);
-
 	/*
 	 * The non-bitmap version of this filter never removes
 	 * objects which the other side specifically asked for,
@@ -911,6 +908,24 @@ static void filter_bitmap_tree_depth(struct bitmap_index *bitmap_git,
 				   OBJ_BLOB);
 }
 
+static void filter_bitmap_object_type(struct bitmap_index *bitmap_git,
+				      struct object_list *tip_objects,
+				      struct bitmap *to_filter,
+				      enum object_type object_type)
+{
+	if (object_type < OBJ_COMMIT || object_type > OBJ_TAG)
+		BUG("filter_bitmap_object_type given invalid object");
+
+	if (object_type != OBJ_TAG)
+		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, OBJ_TAG);
+	if (object_type != OBJ_COMMIT)
+		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, OBJ_COMMIT);
+	if (object_type != OBJ_TREE)
+		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, OBJ_TREE);
+	if (object_type != OBJ_BLOB)
+		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, OBJ_BLOB);
+}
+
 static int filter_bitmap(struct bitmap_index *bitmap_git,
 			 struct object_list *tip_objects,
 			 struct bitmap *to_filter,
@@ -943,6 +958,14 @@ static int filter_bitmap(struct bitmap_index *bitmap_git,
 		return 0;
 	}
 
+	if (filter->choice == LOFC_OBJECT_TYPE) {
+		if (bitmap_git)
+			filter_bitmap_object_type(bitmap_git, tip_objects,
+						  to_filter,
+						  filter->object_type);
+		return 0;
+	}
+
 	/* filter choice not handled */
 	return -1;
 }
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
index 3f889949ca..fb66735ac8 100755
--- a/t/t6113-rev-list-bitmap-filters.sh
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -10,7 +10,8 @@ test_expect_success 'set up bitmapped repo' '
 	test_commit much-larger-blob-one &&
 	git repack -adb &&
 	test_commit two &&
-	test_commit much-larger-blob-two
+	test_commit much-larger-blob-two &&
+	git tag tag
 '
 
 test_expect_success 'filters fallback to non-bitmap traversal' '
@@ -75,4 +76,26 @@ test_expect_success 'tree:1 filter' '
 	test_cmp expect actual
 '
 
+test_expect_success 'object:type filter' '
+	git rev-list --objects --filter=object:type=tag tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=tag tag >actual &&
+	test_cmp expect actual &&
+
+	git rev-list --objects --filter=object:type=commit tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=commit tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter=object:type=tree tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=tree tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter=object:type=blob tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=object:type=blob tag >actual &&
+	test_bitmap_traversal expect actual
+'
+
 test_done
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v4 7/8] pack-bitmap: implement combined filter
  2021-04-12 13:37     ` [PATCH v4 0/8] rev-list: " Patrick Steinhardt
                         ` (5 preceding siblings ...)
  2021-04-12 13:37       ` [PATCH v4 6/8] pack-bitmap: " Patrick Steinhardt
@ 2021-04-12 13:37       ` Patrick Steinhardt
  2021-04-12 13:37       ` [PATCH v4 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
  7 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-12 13:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Christian Couder, Taylor Blau, Philip Oakley

[-- Attachment #1: Type: text/plain, Size: 1771 bytes --]

When the user has multiple objects filters specified, then this is
internally represented by having a "combined" filter. These combined
filters aren't yet supported by bitmap indices and can thus not be
accelerated.

Fix this by implementing support for these combined filters. The
implementation is quite trivial: when there's a combined filter, we
simply recurse into `filter_bitmap()` for all of the sub-filters.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 pack-bitmap.c                      | 10 ++++++++++
 t/t6113-rev-list-bitmap-filters.sh |  7 +++++++
 2 files changed, 17 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index cd3f5c433e..7ce3ede7e4 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -966,6 +966,16 @@ static int filter_bitmap(struct bitmap_index *bitmap_git,
 		return 0;
 	}
 
+	if (filter->choice == LOFC_COMBINE) {
+		int i;
+		for (i = 0; i < filter->sub_nr; i++) {
+			if (filter_bitmap(bitmap_git, tip_objects, to_filter,
+					  &filter->sub[i]) < 0)
+				return -1;
+		}
+		return 0;
+	}
+
 	/* filter choice not handled */
 	return -1;
 }
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
index fb66735ac8..cb9db7df6f 100755
--- a/t/t6113-rev-list-bitmap-filters.sh
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -98,4 +98,11 @@ test_expect_success 'object:type filter' '
 	test_bitmap_traversal expect actual
 '
 
+test_expect_success 'combine filter' '
+	git rev-list --objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
+	test_bitmap_traversal expect actual
+'
+
 test_done
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v4 8/8] rev-list: allow filtering of provided items
  2021-04-12 13:37     ` [PATCH v4 0/8] rev-list: " Patrick Steinhardt
                         ` (6 preceding siblings ...)
  2021-04-12 13:37       ` [PATCH v4 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
@ 2021-04-12 13:37       ` Patrick Steinhardt
  7 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2021-04-12 13:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Christian Couder, Taylor Blau, Philip Oakley

[-- Attachment #1: Type: text/plain, Size: 12169 bytes --]

When providing an object filter, it is currently impossible to also
filter provided items. E.g. when executing `git rev-list HEAD` , the
commit this reference points to will be treated as user-provided and is
thus excluded from the filtering mechanism. This makes it harder than
necessary to properly use the new `--filter=object:type` filter given
that even if the user wants to only see blobs, he'll still see commits
of provided references.

Improve this by introducing a new `--filter-provided-objects` option
to the git-rev-parse(1) command. If given, then all user-provided
references will be subject to filtering.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/rev-list-options.txt  |  5 ++++
 builtin/pack-objects.c              |  2 +-
 builtin/rev-list.c                  | 36 +++++++++++++++++++++--------
 pack-bitmap.c                       |  6 +++--
 pack-bitmap.h                       |  3 ++-
 reachable.c                         |  2 +-
 t/t6112-rev-list-filters-objects.sh | 28 ++++++++++++++++++++++
 t/t6113-rev-list-bitmap-filters.sh  | 36 +++++++++++++++++++++++++++++
 8 files changed, 104 insertions(+), 14 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 3afa8fffbd..5bf2a85f69 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -933,6 +933,11 @@ equivalent.
 --no-filter::
 	Turn off any previous `--filter=` argument.
 
+--filter-provided-objects::
+	Filter the list of explicitly provided objects, which would otherwise
+	always be printed even if they did not match any of the filters. Only
+	useful with `--filter=`.
+
 --filter-print-omitted::
 	Only useful with `--filter=`; prints a list of the objects omitted
 	by the filter.  Object IDs are prefixed with a ``~'' character.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 525c2d8552..2f2026dc87 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3516,7 +3516,7 @@ static int pack_options_allow_reuse(void)
 
 static int get_object_list_from_bitmap(struct rev_info *revs)
 {
-	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options)))
+	if (!(bitmap_git = prepare_bitmap_walk(revs, &filter_options, 0)))
 		return -1;
 
 	if (pack_options_allow_reuse() &&
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index b4d8ea0a35..7677b1af5a 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -398,7 +398,8 @@ static inline int parse_missing_action_value(const char *value)
 }
 
 static int try_bitmap_count(struct rev_info *revs,
-			    struct list_objects_filter_options *filter)
+			    struct list_objects_filter_options *filter,
+			    int filter_provided_objects)
 {
 	uint32_t commit_count = 0,
 		 tag_count = 0,
@@ -433,7 +434,7 @@ static int try_bitmap_count(struct rev_info *revs,
 	 */
 	max_count = revs->max_count;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter);
+	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -450,7 +451,8 @@ static int try_bitmap_count(struct rev_info *revs,
 }
 
 static int try_bitmap_traversal(struct rev_info *revs,
-				struct list_objects_filter_options *filter)
+				struct list_objects_filter_options *filter,
+				int filter_provided_objects)
 {
 	struct bitmap_index *bitmap_git;
 
@@ -461,7 +463,7 @@ static int try_bitmap_traversal(struct rev_info *revs,
 	if (revs->max_count >= 0)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter);
+	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -471,14 +473,15 @@ static int try_bitmap_traversal(struct rev_info *revs,
 }
 
 static int try_bitmap_disk_usage(struct rev_info *revs,
-				 struct list_objects_filter_options *filter)
+				 struct list_objects_filter_options *filter,
+				 int filter_provided_objects)
 {
 	struct bitmap_index *bitmap_git;
 
 	if (!show_disk_usage)
 		return -1;
 
-	bitmap_git = prepare_bitmap_walk(revs, filter);
+	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
 	if (!bitmap_git)
 		return -1;
 
@@ -499,6 +502,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	int bisect_show_vars = 0;
 	int bisect_find_all = 0;
 	int use_bitmap_index = 0;
+	int filter_provided_objects = 0;
 	const char *show_progress = NULL;
 
 	if (argc == 2 && !strcmp(argv[1], "-h"))
@@ -599,6 +603,10 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			list_objects_filter_set_no_filter(&filter_options);
 			continue;
 		}
+		if (!strcmp(arg, "--filter-provided-objects")) {
+			filter_provided_objects = 1;
+			continue;
+		}
 		if (!strcmp(arg, "--filter-print-omitted")) {
 			arg_print_omitted = 1;
 			continue;
@@ -665,11 +673,11 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		progress = start_delayed_progress(show_progress, 0);
 
 	if (use_bitmap_index) {
-		if (!try_bitmap_count(&revs, &filter_options))
+		if (!try_bitmap_count(&revs, &filter_options, filter_provided_objects))
 			return 0;
-		if (!try_bitmap_disk_usage(&revs, &filter_options))
+		if (!try_bitmap_disk_usage(&revs, &filter_options, filter_provided_objects))
 			return 0;
-		if (!try_bitmap_traversal(&revs, &filter_options))
+		if (!try_bitmap_traversal(&revs, &filter_options, filter_provided_objects))
 			return 0;
 	}
 
@@ -694,6 +702,16 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			return show_bisect_vars(&info, reaches, all);
 	}
 
+	if (filter_provided_objects) {
+		struct commit_list *c;
+		for (i = 0; i < revs.pending.nr; i++) {
+			struct object_array_entry *pending = revs.pending.objects + i;
+			pending->item->flags |= NOT_USER_GIVEN;
+		}
+		for (c = revs.commits; c; c = c->next)
+			c->item->object.flags |= NOT_USER_GIVEN;
+	}
+
 	if (arg_print_omitted)
 		oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
 	if (arg_missing_action == MA_PRINT)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 7ce3ede7e4..6b790a834b 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -986,7 +986,8 @@ static int can_filter_bitmap(struct list_objects_filter_options *filter)
 }
 
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
-					 struct list_objects_filter_options *filter)
+					 struct list_objects_filter_options *filter,
+					 int filter_provided_objects)
 {
 	unsigned int i;
 
@@ -1081,7 +1082,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	if (haves_bitmap)
 		bitmap_and_not(wants_bitmap, haves_bitmap);
 
-	filter_bitmap(bitmap_git, wants, wants_bitmap, filter);
+	filter_bitmap(bitmap_git, (filter && filter_provided_objects) ? NULL : wants,
+		      wants_bitmap, filter);
 
 	bitmap_git->result = wants_bitmap;
 	bitmap_git->haves = haves_bitmap;
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 36d99930d8..bb45217d3b 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -50,7 +50,8 @@ void traverse_bitmap_commit_list(struct bitmap_index *,
 				 show_reachable_fn show_reachable);
 void test_bitmap_walk(struct rev_info *revs);
 struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
-					 struct list_objects_filter_options *filter);
+					 struct list_objects_filter_options *filter,
+					 int filter_provided_objects);
 int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
 				       struct packed_git **packfile,
 				       uint32_t *entries,
diff --git a/reachable.c b/reachable.c
index 77a60c70a5..fc833cae43 100644
--- a/reachable.c
+++ b/reachable.c
@@ -223,7 +223,7 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
 	cp.progress = progress;
 	cp.count = 0;
 
-	bitmap_git = prepare_bitmap_walk(revs, NULL);
+	bitmap_git = prepare_bitmap_walk(revs, NULL, 0);
 	if (bitmap_git) {
 		traverse_bitmap_commit_list(bitmap_git, revs, mark_object_seen);
 		free_bitmap_index(bitmap_git);
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index c79ec04060..de751b65b4 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -207,6 +207,34 @@ test_expect_success 'verify object:type=tag prints tag' '
 	test_cmp expected actual
 '
 
+test_expect_success 'verify object:type=blob prints only blob with --filter-provided-objects' '
+	printf "%s blob\n" $(git -C object-type rev-parse HEAD:blob) >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=blob --filter-provided-objects HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tree prints only tree with --filter-provided-objects' '
+	printf "%s \n" $(git -C object-type rev-parse HEAD^{tree}) >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=tree HEAD --filter-provided-objects >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=commit prints only commit with --filter-provided-objects' '
+	git -C object-type rev-parse HEAD >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=commit --filter-provided-objects HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'verify object:type=tag prints only tag with --filter-provided-objects' '
+	printf "%s tag\n" $(git -C object-type rev-parse tag) >expected &&
+	git -C object-type rev-list --objects \
+		--filter=object:type=tag --filter-provided-objects tag >actual &&
+	test_cmp expected actual
+'
+
 # Test sparse:path=<path> filter.
 # !!!!
 # NOTE: sparse:path filter support has been dropped for security reasons,
diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh
index cb9db7df6f..4d8e09167e 100755
--- a/t/t6113-rev-list-bitmap-filters.sh
+++ b/t/t6113-rev-list-bitmap-filters.sh
@@ -98,6 +98,28 @@ test_expect_success 'object:type filter' '
 	test_bitmap_traversal expect actual
 '
 
+test_expect_success 'object:type filter with --filter-provided-objects' '
+	git rev-list --objects --filter-provided-objects --filter=object:type=tag tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided-objects --filter=object:type=tag tag >actual &&
+	test_cmp expect actual &&
+
+	git rev-list --objects --filter-provided-objects --filter=object:type=commit tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided-objects --filter=object:type=commit tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter-provided-objects --filter=object:type=tree tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided-objects --filter=object:type=tree tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git rev-list --objects --filter-provided-objects --filter=object:type=blob tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided-objects --filter=object:type=blob tag >actual &&
+	test_bitmap_traversal expect actual
+'
+
 test_expect_success 'combine filter' '
 	git rev-list --objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
 	git rev-list --use-bitmap-index \
@@ -105,4 +127,18 @@ test_expect_success 'combine filter' '
 	test_bitmap_traversal expect actual
 '
 
+test_expect_success 'combine filter with --filter-provided-objects' '
+	git rev-list --objects --filter-provided-objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
+	git rev-list --use-bitmap-index \
+		     --objects --filter-provided-objects --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
+	test_bitmap_traversal expect actual &&
+
+	git cat-file --batch-check="%(objecttype) %(objectsize)" <actual >objects &&
+	while read objecttype objectsize
+	do
+		test "$objecttype" = blob || return 1
+		test "$objectsize" -le 1000 || return 1
+	done <objects
+'
+
 test_done
-- 
2.31.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2021-04-12 13:37 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-01 12:20 [PATCH 0/7] rev-parse: implement object type filter Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 1/7] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 2/7] list-objects: move tag processing into its own function Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 3/7] list-objects: support filtering by tag and commit Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 4/7] list-objects: implement object type filter Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 5/7] pack-bitmap: " Patrick Steinhardt
2021-03-01 12:20 ` [PATCH 6/7] pack-bitmap: implement combined filter Patrick Steinhardt
2021-03-01 12:21 ` [PATCH 7/7] rev-list: allow filtering of provided items Patrick Steinhardt
2021-03-10 21:39 ` [PATCH 0/7] rev-parse: implement object type filter Jeff King
2021-03-11 14:38   ` Patrick Steinhardt
2021-03-11 17:54     ` Jeff King
2021-03-15 11:25   ` Patrick Steinhardt
2021-03-10 21:58 ` Taylor Blau
2021-03-10 22:19   ` Jeff King
2021-03-11 14:43     ` Patrick Steinhardt
2021-03-11 17:56       ` Jeff King
2021-03-15 13:14 ` [PATCH v2 0/8] " Patrick Steinhardt
2021-03-15 13:14   ` [PATCH v2 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
2021-04-06 17:17     ` Jeff King
2021-03-15 13:14   ` [PATCH v2 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
2021-04-06 17:30     ` Jeff King
2021-04-09 10:19       ` Patrick Steinhardt
2021-03-15 13:14   ` [PATCH v2 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
2021-04-06 17:39     ` Jeff King
2021-03-15 13:14   ` [PATCH v2 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
2021-03-15 13:14   ` [PATCH v2 5/8] list-objects: implement object type filter Patrick Steinhardt
2021-04-06 17:42     ` Jeff King
2021-03-15 13:14   ` [PATCH v2 6/8] pack-bitmap: " Patrick Steinhardt
2021-04-06 17:48     ` Jeff King
2021-03-15 13:14   ` [PATCH v2 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
2021-04-06 17:54     ` Jeff King
2021-04-09 10:31       ` Patrick Steinhardt
2021-04-09 15:53         ` Jeff King
2021-04-09 11:17       ` Patrick Steinhardt
2021-04-09 15:55         ` Jeff King
2021-03-15 13:15   ` [PATCH v2 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
2021-04-06 18:04     ` Jeff King
2021-04-09 10:59       ` Patrick Steinhardt
2021-04-09 15:58         ` Jeff King
2021-03-20 21:10   ` [PATCH v2 0/8] rev-parse: implement object type filter Junio C Hamano
2021-04-06 18:08     ` Jeff King
2021-04-09 11:14       ` Patrick Steinhardt
2021-04-09 16:05         ` Jeff King
2021-04-09 11:27   ` [PATCH v3 " Patrick Steinhardt
2021-04-09 11:27     ` [PATCH v3 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
2021-04-09 11:27     ` [PATCH v3 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
2021-04-09 11:28     ` [PATCH v3 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
2021-04-09 11:28     ` [PATCH v3 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
2021-04-11  6:49       ` Junio C Hamano
2021-04-09 11:28     ` [PATCH v3 5/8] list-objects: implement object type filter Patrick Steinhardt
2021-04-09 11:28     ` [PATCH v3 6/8] pack-bitmap: " Patrick Steinhardt
2021-04-09 11:28     ` [PATCH v3 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
2021-04-09 11:28     ` [PATCH v3 8/8] rev-list: allow filtering of provided items Patrick Steinhardt
2021-04-09 11:32       ` [RESEND PATCH " Patrick Steinhardt
2021-04-09 15:00       ` [PATCH " Philip Oakley
2021-04-12 13:15         ` Patrick Steinhardt
2021-04-11  6:02     ` [PATCH v3 0/8] rev-parse: implement object type filter Junio C Hamano
2021-04-12 13:12       ` Patrick Steinhardt
2021-04-12 13:37     ` [PATCH v4 0/8] rev-list: " Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 1/8] uploadpack.txt: document implication of `uploadpackfilter.allow` Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 2/8] revision: mark commit parents as NOT_USER_GIVEN Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 3/8] list-objects: move tag processing into its own function Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 4/8] list-objects: support filtering by tag and commit Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 5/8] list-objects: implement object type filter Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 6/8] pack-bitmap: " Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 7/8] pack-bitmap: implement combined filter Patrick Steinhardt
2021-04-12 13:37       ` [PATCH v4 8/8] rev-list: allow filtering of provided items Patrick Steinhardt

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git