git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* [RFC PATCH 0/5] filter: support for excluding all trees and blobs
@ 2018-08-09 22:44 Matthew DeVore
  2018-08-09 22:45 ` [PATCH 1/5] revision: invert meaning of the USER_GIVEN flag Matthew DeVore
                   ` (16 more replies)
  0 siblings, 17 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-09 22:44 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, jeffhost, peff, stefanbeller, jonathantanmy

This patch series does two things:

  a. (patches 1-2) introduces "--filter=only:commits" which filters trees and blobs
  b. (patches 3-5) better support for promisor trees in the rev-list command

The intention is to enable initial partial clones to be very tiny by only
including commits. Patches 3-5 are necessary because, even though it has already
been possible to have partial clones with trees missing (there are tests for it),
there have not been any filters which support this yet, so it seemed necessary
to make rev-list handle this case better.

Thank you,

Matthew DeVore (5):
  revision: invert meaning of the USER_GIVEN flag
  list-objects-filter: implement filter only:commits
  list-objects: store common func args in struct
  list-objects: refactor to process_tree_contents
  rev-list: handle missing tree objects properly

 Documentation/rev-list-options.txt     |   2 +
 builtin/rev-list.c                     |  12 +-
 list-objects-filter-options.c          |   4 +
 list-objects-filter-options.h          |   1 +
 list-objects-filter.c                  |  43 +++--
 list-objects.c                         | 226 +++++++++++++------------
 revision.c                             |   1 -
 revision.h                             |  11 +-
 t/t5317-pack-objects-filter-objects.sh |  30 ++++
 t/t5616-partial-clone.sh               |  27 +++
 t/t6112-rev-list-filters-objects.sh    |  13 ++
 11 files changed, 242 insertions(+), 128 deletions(-)

-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH 1/5] revision: invert meaning of the USER_GIVEN flag
  2018-08-09 22:44 [RFC PATCH 0/5] filter: support for excluding all trees and blobs Matthew DeVore
@ 2018-08-09 22:45 ` Matthew DeVore
  2018-08-10 18:43   ` Jonathan Tan
  2018-08-09 22:45 ` [PATCH 2/5] list-objects-filter: implement filter only:commits Matthew DeVore
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-08-09 22:45 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, jeffhost, peff, stefanbeller, jonathantanmy

Abandon the previous approach of mutating all new objects implicitly in
add_pending_object by inverting the meaning of the bit (it is now
NOT_USER_GIVEN) and only setting the flag when we need to.

This more accurately tracks if a tree was provided directly by the user.
Without this patch, the root tree of all commits were erroneously
considered to be USER_GIVEN, which meant they cannot be filtered. This
distinction is important in the next patch.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 31 ++++++++++++++++++-------------
 revision.c     |  1 -
 revision.h     | 10 +++++++---
 3 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index c99c47ac1..482044bda 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -48,7 +48,7 @@ static void process_blob(struct rev_info *revs,
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && filter_fn)
 		r = filter_fn(LOFS_BLOB, obj,
 			      path->buf, &path->buf[pathlen],
 			      filter_data);
@@ -133,7 +133,7 @@ static void process_tree(struct rev_info *revs,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && filter_fn)
 		r = filter_fn(LOFS_BEGIN_TREE, obj,
 			      base->buf, &base->buf[baselen],
 			      filter_data);
@@ -156,23 +156,25 @@ static void process_tree(struct rev_info *revs,
 				continue;
 		}
 
-		if (S_ISDIR(entry.mode))
-			process_tree(revs,
-				     lookup_tree(the_repository, entry.oid),
-				     show, base, entry.path,
+		if (S_ISDIR(entry.mode)) {
+			struct tree *t = lookup_tree(the_repository, entry.oid);
+			t->object.flags |= NOT_USER_GIVEN;
+			process_tree(revs, t, show, base, entry.path,
 				     cb_data, filter_fn, filter_data);
+		}
 		else if (S_ISGITLINK(entry.mode))
 			process_gitlink(revs, entry.oid->hash,
 					show, base, entry.path,
 					cb_data);
-		else
-			process_blob(revs,
-				     lookup_blob(the_repository, entry.oid),
-				     show, base, entry.path,
+		else {
+			struct blob *b = lookup_blob(the_repository, entry.oid);
+			b->object.flags |= NOT_USER_GIVEN;
+			process_blob(revs, b, show, base, entry.path,
 				     cb_data, filter_fn, filter_data);
+		}
 	}
 
-	if (!(obj->flags & USER_GIVEN) && filter_fn) {
+	if ((obj->flags & NOT_USER_GIVEN) && filter_fn) {
 		r = filter_fn(LOFS_END_TREE, obj,
 			      base->buf, &base->buf[baselen],
 			      filter_data);
@@ -301,8 +303,11 @@ static void do_traverse(struct rev_info *revs,
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
-		if (get_commit_tree(commit))
-			add_pending_tree(revs, get_commit_tree(commit));
+		if (get_commit_tree(commit)) {
+			struct tree *tree = get_commit_tree(commit);
+			tree->object.flags |= NOT_USER_GIVEN;
+			add_pending_tree(revs, tree);
+		}
 		show_commit(commit, show_data);
 
 		if (revs->tree_blobs_in_commit_order)
diff --git a/revision.c b/revision.c
index 062749437..6d355b43c 100644
--- a/revision.c
+++ b/revision.c
@@ -175,7 +175,6 @@ static void add_pending_object_with_path(struct rev_info *revs,
 		strbuf_release(&buf);
 		return; /* do not add the commit itself */
 	}
-	obj->flags |= USER_GIVEN;
 	add_object_array_with_path(obj, name, &revs->pending, mode, path);
 }
 
diff --git a/revision.h b/revision.h
index c599c34da..cd6b62313 100644
--- a/revision.h
+++ b/revision.h
@@ -8,7 +8,11 @@
 #include "diff.h"
 #include "commit-slab-decl.h"
 
-/* Remember to update object flag allocation in object.h */
+/* Remember to update object flag allocation in object.h
+ * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
+ * filtering trees and blobs, but it may be useful to support filtering commits
+ * in the future.
+ */
 #define SEEN		(1u<<0)
 #define UNINTERESTING   (1u<<1)
 #define TREESAME	(1u<<2)
@@ -20,9 +24,9 @@
 #define SYMMETRIC_LEFT	(1u<<8)
 #define PATCHSAME	(1u<<9)
 #define BOTTOM		(1u<<10)
-#define USER_GIVEN	(1u<<25) /* given directly by the user */
+#define NOT_USER_GIVEN	(1u<<25) /* tree or blob not given directly by user */
 #define TRACK_LINEAR	(1u<<26)
-#define ALL_REV_FLAGS	(((1u<<11)-1) | USER_GIVEN | TRACK_LINEAR)
+#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)
 
 #define DECORATE_SHORT_REFS	1
 #define DECORATE_FULL_REFS	2
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH 2/5] list-objects-filter: implement filter only:commits
  2018-08-09 22:44 [RFC PATCH 0/5] filter: support for excluding all trees and blobs Matthew DeVore
  2018-08-09 22:45 ` [PATCH 1/5] revision: invert meaning of the USER_GIVEN flag Matthew DeVore
@ 2018-08-09 22:45 ` Matthew DeVore
  2018-08-10  0:14   ` Jonathan Tan
  2018-08-09 22:45 ` [PATCH 3/5] list-objects: store common func args in struct Matthew DeVore
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-08-09 22:45 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, jeffhost, peff, stefanbeller, jonathantanmy

Teach list-objects the "only:commits" filter which allows for filtering
out all non-commit and non-annotated tag objects (unless other objects
are explicitly specified by the user). The purpose of this patch is to
allow smaller partial clones.

The name of this filter - only:commits - is a bit inaccurate because it
still allows annotated tags to pass through. I chose it because it was
the only concise name I could think of that was pretty descriptive. I
considered and decided against "tree:none" because the code and
documentation for filters seems to lack the concept of "you're filtering
this, so we'll implicitly filter all referents of this." So "tree:none"
is vague, since some may think it filters blobs too, while some may not.
"only:commits" is specific and makes it easier to match it to a
potential use case.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 Documentation/rev-list-options.txt     |  2 ++
 list-objects-filter-options.c          |  4 +++
 list-objects-filter-options.h          |  1 +
 list-objects-filter.c                  | 43 ++++++++++++++++++--------
 t/t5317-pack-objects-filter-objects.sh | 30 ++++++++++++++++++
 t/t6112-rev-list-filters-objects.sh    | 13 ++++++++
 6 files changed, 80 insertions(+), 13 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 7b273635d..3a60a490a 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -743,6 +743,8 @@ specification contained in <path>.
 	A debug option to help with future "partial clone" development.
 	This option specifies how missing objects are handled.
 +
+The form '--filter=only:commits' omits all blobs and trees.
++
 The form '--missing=error' requests that rev-list stop with an error if
 a missing object is encountered.  This is the default action.
 +
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..aaaaae508 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -69,6 +69,10 @@ static int gently_parse_list_objects_filter(
 		filter_options->choice = LOFC_SPARSE_PATH;
 		filter_options->sparse_path_value = strdup(v0);
 		return 0;
+
+	} else if (!strcmp(arg, "only:commits")) {
+		filter_options->choice = LOFC_ONLY_COMMITS;
+		return 0;
 	}
 
 	if (errbuf) {
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 0000a61f8..a68df42c8 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -12,6 +12,7 @@ enum list_objects_filter_choice {
 	LOFC_BLOB_LIMIT,
 	LOFC_SPARSE_OID,
 	LOFC_SPARSE_PATH,
+	LOFC_ONLY_COMMITS,
 	LOFC__COUNT /* must be last */
 };
 
diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..f0a064b4b 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -26,38 +26,39 @@
 #define FILTER_SHOWN_BUT_REVISIT (1<<21)
 
 /*
- * A filter for list-objects to omit ALL blobs from the traversal.
- * And to OPTIONALLY collect a list of the omitted OIDs.
+ * A filter for list-objects to omit ALL blobs from the traversal, and possibly
+ * trees as well.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
  */
-struct filter_blobs_none_data {
+struct filter_none_of_type_data {
+	unsigned omit_trees : 1;
 	struct oidset *omits;
 };
 
-static enum list_objects_filter_result filter_blobs_none(
+static enum list_objects_filter_result filter_none_of_type(
 	enum list_objects_filter_situation filter_situation,
 	struct object *obj,
 	const char *pathname,
 	const char *filename,
 	void *filter_data_)
 {
-	struct filter_blobs_none_data *filter_data = filter_data_;
+	struct filter_none_of_type_data *filter_data = filter_data_;
 
 	switch (filter_situation) {
 	default:
 		die("unknown filter_situation");
 		return LOFR_ZERO;
 
-	case LOFS_BEGIN_TREE:
-		assert(obj->type == OBJ_TREE);
-		/* always include all tree objects */
-		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
-
 	case LOFS_END_TREE:
 		assert(obj->type == OBJ_TREE);
 		return LOFR_ZERO;
 
+	case LOFS_BEGIN_TREE:
+		assert(obj->type == OBJ_TREE);
+		if (!filter_data->omit_trees)
+			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
 	case LOFS_BLOB:
-		assert(obj->type == OBJ_BLOB);
 		assert((obj->flags & SEEN) == 0);
 
 		if (filter_data->omits)
@@ -72,10 +73,25 @@ static void *filter_blobs_none__init(
 	filter_object_fn *filter_fn,
 	filter_free_fn *filter_free_fn)
 {
-	struct filter_blobs_none_data *d = xcalloc(1, sizeof(*d));
+	struct filter_none_of_type_data *d = xcalloc(1, sizeof(*d));
+	d->omits = omitted;
+
+	*filter_fn = filter_none_of_type;
+	*filter_free_fn = free;
+	return d;
+}
+
+static void* filter_only_commits__init(
+	struct oidset *omitted,
+	struct list_objects_filter_options *filter_options,
+	filter_object_fn *filter_fn,
+	filter_free_fn *filter_free_fn)
+{
+	struct filter_none_of_type_data *d = xcalloc(1, sizeof(*d));
+	d->omit_trees = 1;
 	d->omits = omitted;
 
-	*filter_fn = filter_blobs_none;
+	*filter_fn = filter_none_of_type;
 	*filter_free_fn = free;
 	return d;
 }
@@ -376,6 +392,7 @@ static filter_init_fn s_filters[] = {
 	filter_blobs_limit__init,
 	filter_sparse_oid__init,
 	filter_sparse_path__init,
+	filter_only_commits__init,
 };
 
 void *list_objects_filter__init(
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 6710c8bc8..600d153f9 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -59,6 +59,36 @@ test_expect_success 'verify normal and blob:none packfiles have same commits/tre
 	test_cmp observed expected
 '
 
+test_expect_success 'setup for tests of only:commits' '
+	mkdir r1/subtree &&
+	echo "This is a file in a subtree" > r1/subtree/file &&
+	git -C r1 add subtree/file &&
+	git -C r1 commit -m subtree
+'
+
+test_expect_success 'verify only:commits packfile has no blobs or trees' '
+	git -C r1 pack-objects --rev --stdout --filter=only:commits >commitsonly.pack <<-EOF &&
+	HEAD
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack \
+		| grep -E "tree|blob" \
+		| sort >observed &&
+	test_line_count = 0 observed
+'
+
+test_expect_success 'grab tree directly when using only:commits' '
+	# We should get the tree specified directly but not its blobs or subtrees.
+	git -C r1 pack-objects --rev --stdout --filter=only:commits >commitsonly.pack <<-EOF &&
+	HEAD:
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack \
+		| grep -E "tree|blob" \
+		| sort >observed &&
+	test_line_count = 1 observed
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 0a37dd5f9..6dbd9477c 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -196,6 +196,19 @@ test_expect_success 'verify sparse:oid=oid-ish omits top-level files' '
 	test_cmp observed expected
 '
 
+# Test only:commits filter.
+
+test_expect_success 'verify only:commits includes trees in "filtered" output' '
+	git -C r3 rev-list HEAD --quiet --objects --filter-print-omitted --filter=only:commits \
+		| awk -f print_1.awk \
+		| sed s/~// \
+		| xargs -n1 git -C r3 cat-file -t \
+		| sort -u >filtered_types &&
+	printf "blob\ntree\n" > expected &&
+	test_cmp filtered_types expected
+'
+
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH 3/5] list-objects: store common func args in struct
  2018-08-09 22:44 [RFC PATCH 0/5] filter: support for excluding all trees and blobs Matthew DeVore
  2018-08-09 22:45 ` [PATCH 1/5] revision: invert meaning of the USER_GIVEN flag Matthew DeVore
  2018-08-09 22:45 ` [PATCH 2/5] list-objects-filter: implement filter only:commits Matthew DeVore
@ 2018-08-09 22:45 ` Matthew DeVore
  2018-08-09 22:45 ` [PATCH 4/5] list-objects: refactor to process_tree_contents Matthew DeVore
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-09 22:45 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, jeffhost, peff, stefanbeller, jonathantanmy

This will make utility functions easier to create, as done by the next
patch.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 152 +++++++++++++++++++++++--------------------------
 1 file changed, 71 insertions(+), 81 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 482044bda..fa34fbf58 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -12,20 +12,25 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static void process_blob(struct rev_info *revs,
+struct traversal_context {
+	struct rev_info *revs;
+	show_object_fn show_object;
+	show_commit_fn show_commit;
+	void *show_data;
+	filter_object_fn filter_fn;
+	void *filter_data;
+};
+
+static void process_blob(struct traversal_context *ctx,
 			 struct blob *blob,
-			 show_object_fn show,
 			 struct strbuf *path,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &blob->object;
 	size_t pathlen;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
-	if (!revs->blob_objects)
+	if (!ctx->revs->blob_objects)
 		return;
 	if (!obj)
 		die("bad blob object");
@@ -41,21 +46,21 @@ static void process_blob(struct rev_info *revs,
 	 * may cause the actual filter to report an incomplete list
 	 * of missing objects.
 	 */
-	if (revs->exclude_promisor_objects &&
+	if (ctx->revs->exclude_promisor_objects &&
 	    !has_object_file(&obj->oid) &&
 	    is_promisor_object(&obj->oid))
 		return;
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if ((obj->flags & NOT_USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BLOB, obj,
-			      path->buf, &path->buf[pathlen],
-			      filter_data);
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BLOB, obj,
+				   path->buf, &path->buf[pathlen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, path->buf, cb_data);
+		ctx->show_object(obj, path->buf, ctx->show_data);
 	strbuf_setlen(path, pathlen);
 }
 
@@ -81,26 +86,21 @@ static void process_blob(struct rev_info *revs,
  * the link, and how to do it. Whether it necessarily makes
  * any sense what-so-ever to ever do that is another issue.
  */
-static void process_gitlink(struct rev_info *revs,
+static void process_gitlink(struct traversal_context *ctx,
 			    const unsigned char *sha1,
-			    show_object_fn show,
 			    struct strbuf *path,
-			    const char *name,
-			    void *cb_data)
+			    const char *name)
 {
 	/* Nothing to do */
 }
 
-static void process_tree(struct rev_info *revs,
+static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
-			 show_object_fn show,
 			 struct strbuf *base,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &tree->object;
+	struct rev_info *revs = ctx->revs;
 	struct tree_desc desc;
 	struct name_entry entry;
 	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
@@ -133,14 +133,14 @@ static void process_tree(struct rev_info *revs,
 	}
 
 	strbuf_addstr(base, name);
-	if ((obj->flags & NOT_USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BEGIN_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, base->buf, cb_data);
+		ctx->show_object(obj, base->buf, ctx->show_data);
 	if (base->len)
 		strbuf_addch(base, '/');
 
@@ -159,29 +159,26 @@ static void process_tree(struct rev_info *revs,
 		if (S_ISDIR(entry.mode)) {
 			struct tree *t = lookup_tree(the_repository, entry.oid);
 			t->object.flags |= NOT_USER_GIVEN;
-			process_tree(revs, t, show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+			process_tree(ctx, t, base, entry.path);
 		}
 		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(revs, entry.oid->hash,
-					show, base, entry.path,
-					cb_data);
+			process_gitlink(ctx, entry.oid->hash,
+					base, entry.path);
 		else {
 			struct blob *b = lookup_blob(the_repository, entry.oid);
 			b->object.flags |= NOT_USER_GIVEN;
-			process_blob(revs, b, show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+			process_blob(ctx, b, base, entry.path);
 		}
 	}
 
-	if ((obj->flags & NOT_USER_GIVEN) && filter_fn) {
-		r = filter_fn(LOFS_END_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
+		r = ctx->filter_fn(LOFS_END_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 		if (r & LOFR_MARK_SEEN)
 			obj->flags |= SEEN;
 		if (r & LOFR_DO_SHOW)
-			show(obj, base->buf, cb_data);
+			ctx->show_object(obj, base->buf, ctx->show_data);
 	}
 
 	strbuf_setlen(base, baselen);
@@ -244,19 +241,15 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 	add_pending_object(revs, &tree->object, "");
 }
 
-static void traverse_trees_and_blobs(struct rev_info *revs,
-				     struct strbuf *base,
-				     show_object_fn show_object,
-				     void *show_data,
-				     filter_object_fn filter_fn,
-				     void *filter_data)
+static void traverse_trees_and_blobs(struct traversal_context *ctx,
+				     struct strbuf *base)
 {
 	int i;
 
 	assert(base->len == 0);
 
-	for (i = 0; i < revs->pending.nr; i++) {
-		struct object_array_entry *pending = revs->pending.objects + i;
+	for (i = 0; i < ctx->revs->pending.nr; i++) {
+		struct object_array_entry *pending = ctx->revs->pending.objects + i;
 		struct object *obj = pending->item;
 		const char *name = pending->name;
 		const char *path = pending->path;
@@ -264,41 +257,32 @@ static void traverse_trees_and_blobs(struct rev_info *revs,
 			continue;
 		if (obj->type == OBJ_TAG) {
 			obj->flags |= SEEN;
-			show_object(obj, name, show_data);
+			ctx->show_object(obj, name, ctx->show_data);
 			continue;
 		}
 		if (!path)
 			path = "";
 		if (obj->type == OBJ_TREE) {
-			process_tree(revs, (struct tree *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_tree(ctx, (struct tree *)obj, base, path);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
-			process_blob(revs, (struct blob *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_blob(ctx, (struct blob *)obj, base, path);
 			continue;
 		}
 		die("unknown pending object %s (%s)",
 		    oid_to_hex(&obj->oid), name);
 	}
-	object_array_clear(&revs->pending);
+	object_array_clear(&ctx->revs->pending);
 }
 
-static void do_traverse(struct rev_info *revs,
-			show_commit_fn show_commit,
-			show_object_fn show_object,
-			void *show_data,
-			filter_object_fn filter_fn,
-			void *filter_data)
+static void do_traverse(struct traversal_context *ctx)
 {
 	struct commit *commit;
 	struct strbuf csp; /* callee's scratch pad */
 	strbuf_init(&csp, PATH_MAX);
 
-	while ((commit = get_revision(revs)) != NULL) {
+	while ((commit = get_revision(ctx->revs)) != NULL) {
 		/*
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
@@ -306,23 +290,19 @@ static void do_traverse(struct rev_info *revs,
 		if (get_commit_tree(commit)) {
 			struct tree *tree = get_commit_tree(commit);
 			tree->object.flags |= NOT_USER_GIVEN;
-			add_pending_tree(revs, tree);
+			add_pending_tree(ctx->revs, tree);
 		}
-		show_commit(commit, show_data);
+		ctx->show_commit(commit, ctx->show_data);
 
-		if (revs->tree_blobs_in_commit_order)
+		if (ctx->revs->tree_blobs_in_commit_order)
 			/*
 			 * NEEDSWORK: Adding the tree and then flushing it here
 			 * needs a reallocation for each commit. Can we pass the
 			 * tree directory without allocation churn?
 			 */
-			traverse_trees_and_blobs(revs, &csp,
-						 show_object, show_data,
-						 filter_fn, filter_data);
+			traverse_trees_and_blobs(ctx, &csp);
 	}
-	traverse_trees_and_blobs(revs, &csp,
-				 show_object, show_data,
-				 filter_fn, filter_data);
+	traverse_trees_and_blobs(ctx, &csp);
 	strbuf_release(&csp);
 }
 
@@ -331,7 +311,14 @@ void traverse_commit_list(struct rev_info *revs,
 			  show_object_fn show_object,
 			  void *show_data)
 {
-	do_traverse(revs, show_commit, show_object, show_data, NULL, NULL);
+	struct traversal_context ctx;
+	ctx.revs = revs;
+	ctx.show_commit = show_commit;
+	ctx.show_object = show_object;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+	ctx.filter_data = NULL;
+	do_traverse(&ctx);
 }
 
 void traverse_commit_list_filtered(
@@ -342,14 +329,17 @@ void traverse_commit_list_filtered(
 	void *show_data,
 	struct oidset *omitted)
 {
-	filter_object_fn filter_fn = NULL;
+	struct traversal_context ctx;
+	ctx.revs = revs;
+	ctx.show_object = show_object;
+	ctx.show_commit = show_commit;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
 	filter_free_fn filter_free_fn = NULL;
-	void *filter_data = NULL;
 
-	filter_data = list_objects_filter__init(omitted, filter_options,
-						&filter_fn, &filter_free_fn);
-	do_traverse(revs, show_commit, show_object, show_data,
-		    filter_fn, filter_data);
-	if (filter_data && filter_free_fn)
-		filter_free_fn(filter_data);
+	ctx.filter_data = list_objects_filter__init(omitted, filter_options,
+						    &ctx.filter_fn, &filter_free_fn);
+	do_traverse(&ctx);
+	if (ctx.filter_data && filter_free_fn)
+		filter_free_fn(ctx.filter_data);
 }
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH 4/5] list-objects: refactor to process_tree_contents
  2018-08-09 22:44 [RFC PATCH 0/5] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (2 preceding siblings ...)
  2018-08-09 22:45 ` [PATCH 3/5] list-objects: store common func args in struct Matthew DeVore
@ 2018-08-09 22:45 ` Matthew DeVore
  2018-08-09 22:45 ` [PATCH 5/5] rev-list: handle missing tree objects properly Matthew DeVore
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-09 22:45 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, jeffhost, peff, stefanbeller, jonathantanmy

This will be used in a follow-up patch to reduce indentation needed when
invoking the logic conditionally. i.e. rather than:

if (foo) {
	while (...) {
		/* this is very indented */
	}
}

we will have:

if (foo)
	process_tree_contents(...);

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 73 +++++++++++++++++++++++++++++---------------------
 1 file changed, 43 insertions(+), 30 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index fa34fbf58..7ecdb95ce 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -94,6 +94,48 @@ static void process_gitlink(struct traversal_context *ctx,
 	/* Nothing to do */
 }
 
+static void process_tree(struct traversal_context *ctx,
+			 struct tree *tree,
+			 struct strbuf *base,
+			 const char *name);
+
+static void process_tree_contents(struct traversal_context *ctx,
+				  struct tree *tree,
+				  struct strbuf *base)
+{
+	struct tree_desc desc;
+	struct name_entry entry;
+	enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
+		all_entries_interesting : entry_not_interesting;
+
+	init_tree_desc(&desc, tree->buffer, tree->size);
+
+	while (tree_entry(&desc, &entry)) {
+		if (match != all_entries_interesting) {
+			match = tree_entry_interesting(&entry, base, 0,
+						       &ctx->revs->diffopt.pathspec);
+			if (match == all_entries_not_interesting)
+				break;
+			if (match == entry_not_interesting)
+				continue;
+		}
+
+		if (S_ISDIR(entry.mode)) {
+			struct tree *t = lookup_tree(the_repository, entry.oid);
+			t->object.flags |= NOT_USER_GIVEN;
+			process_tree(ctx, t, base, entry.path);
+		}
+		else if (S_ISGITLINK(entry.mode))
+			process_gitlink(ctx, entry.oid->hash,
+					base, entry.path);
+		else {
+			struct blob *b = lookup_blob(the_repository, entry.oid);
+			b->object.flags |= NOT_USER_GIVEN;
+			process_blob(ctx, b, base, entry.path);
+		}
+	}
+}
+
 static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
 			 struct strbuf *base,
@@ -101,10 +143,6 @@ static void process_tree(struct traversal_context *ctx,
 {
 	struct object *obj = &tree->object;
 	struct rev_info *revs = ctx->revs;
-	struct tree_desc desc;
-	struct name_entry entry;
-	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
-		all_entries_interesting: entry_not_interesting;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 	int gently = revs->ignore_missing_links ||
@@ -144,32 +182,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
-
-	while (tree_entry(&desc, &entry)) {
-		if (match != all_entries_interesting) {
-			match = tree_entry_interesting(&entry, base, 0,
-						       &revs->diffopt.pathspec);
-			if (match == all_entries_not_interesting)
-				break;
-			if (match == entry_not_interesting)
-				continue;
-		}
-
-		if (S_ISDIR(entry.mode)) {
-			struct tree *t = lookup_tree(the_repository, entry.oid);
-			t->object.flags |= NOT_USER_GIVEN;
-			process_tree(ctx, t, base, entry.path);
-		}
-		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(ctx, entry.oid->hash,
-					base, entry.path);
-		else {
-			struct blob *b = lookup_blob(the_repository, entry.oid);
-			b->object.flags |= NOT_USER_GIVEN;
-			process_blob(ctx, b, base, entry.path);
-		}
-	}
+	process_tree_contents(ctx, tree, base);
 
 	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH 5/5] rev-list: handle missing tree objects properly
  2018-08-09 22:44 [RFC PATCH 0/5] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (3 preceding siblings ...)
  2018-08-09 22:45 ` [PATCH 4/5] list-objects: refactor to process_tree_contents Matthew DeVore
@ 2018-08-09 22:45 ` Matthew DeVore
  2018-08-10  0:24   ` Jonathan Tan
  2018-08-10 19:03 ` [RFC PATCH 0/5] filter: support for excluding all trees and blobs Jonathan Tan
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-08-09 22:45 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, jeffhost, peff, stefanbeller, jonathantanmy

Previously, we assumed only blob objects could be missing. This patch
makes rev-list handle missing trees like missing blobs. A missing tree
will cause an error if --missing indicates an error should be caused,
and the hash is printed even if the tree is missing.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 builtin/rev-list.c       | 12 ++++++++----
 list-objects.c           |  8 ++++++--
 revision.h               |  1 +
 t/t5616-partial-clone.sh | 27 +++++++++++++++++++++++++++
 4 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5b07f3f4a..c870d4fe6 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -6,6 +6,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "object.h"
 #include "object-store.h"
 #include "pack.h"
 #include "pack-bitmap.h"
@@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj)
 	 */
 	switch (arg_missing_action) {
 	case MA_ERROR:
-		die("missing blob object '%s'", oid_to_hex(&obj->oid));
+		die("missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	case MA_ALLOW_ANY:
@@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj)
 	case MA_ALLOW_PROMISOR:
 		if (is_promisor_object(&obj->oid))
 			return;
-		die("unexpected missing blob object '%s'",
-		    oid_to_hex(&obj->oid));
+		die("unexpected missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	default:
@@ -235,7 +237,7 @@ static inline void finish_object__ma(struct object *obj)
 static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
+	if (!has_object_file(&obj->oid)) {
 		finish_object__ma(obj);
 		return 1;
 	}
@@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	init_revisions(&revs, prefix);
 	revs.abbrev = DEFAULT_ABBREV;
 	revs.commit_format = CMIT_FMT_UNSPECIFIED;
+	revs.show_missing_trees = 1;
 
 	/*
 	 * Scan the argument list before invoking setup_revisions(), so that we
@@ -389,6 +392,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		if (!strcmp(arg, "--exclude-promisor-objects")) {
 			fetch_if_missing = 0;
 			revs.exclude_promisor_objects = 1;
+			revs.show_missing_trees = 0;
 			break;
 		}
 	}
diff --git a/list-objects.c b/list-objects.c
index 7ecdb95ce..b0291c45a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -146,7 +146,9 @@ static void process_tree(struct traversal_context *ctx,
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 	int gently = revs->ignore_missing_links ||
+		     revs->show_missing_trees ||
 		     revs->exclude_promisor_objects;
+	int parse_result;
 
 	if (!revs->tree_objects)
 		return;
@@ -154,7 +156,8 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, gently) < 0) {
+	parse_result = parse_tree_gently(tree, gently);
+	if (parse_result < 0 && !revs->show_missing_trees) {
 		if (revs->ignore_missing_links)
 			return;
 
@@ -182,7 +185,8 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	process_tree_contents(ctx, tree, base);
+	if (parse_result >= 0)
+		process_tree_contents(ctx, tree, base);
 
 	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
diff --git a/revision.h b/revision.h
index cd6b62313..34ff99f05 100644
--- a/revision.h
+++ b/revision.h
@@ -128,6 +128,7 @@ struct rev_info {
 			first_parent_only:1,
 			line_level_traverse:1,
 			tree_blobs_in_commit_order:1,
+			show_missing_trees:1,
 
 			/* for internal use only */
 			exclude_promisor_objects:1;
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index bbbe7537d..8a0ca0a74 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -170,6 +170,33 @@ test_expect_success 'partial clone fetches blobs pointed to by refs even if norm
 	git -C dst fsck
 '
 
+test_expect_success 'can use only:commits to filter partial clone' '
+	rm -rf dst &&
+	git clone --no-checkout --filter=only:commits "file://$(pwd)/srv.bare" dst &&
+	git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
+	cat fetched_objects \
+		| awk -f print_1.awk \
+		| xargs -n1 git -C dst cat-file -t >fetched_types &&
+	sort fetched_types -u >unique_types.observed &&
+	echo commit > unique_types.expected &&
+	test_cmp unique_types.observed unique_types.expected
+'
+
+test_expect_success 'show missing tree objects with --missing=print' '
+	git -C dst rev-list master --missing=print --quiet --objects >missing_objs &&
+	sed "s/?//" missing_objs \
+		| xargs -n1 git -C srv.bare cat-file -t \
+		>missing_types &&
+	sort -u missing_types >missing_types.uniq &&
+	echo tree >expected &&
+	test_cmp missing_types.uniq expected
+'
+
+test_expect_success 'do not complain when a missing tree cannot be parsed' '
+	git -C dst rev-list master --missing=print --quiet --objects 2>rev_list_err >&2 &&
+	! grep -q "Could not read " rev_list_err
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 2/5] list-objects-filter: implement filter only:commits
  2018-08-09 22:45 ` [PATCH 2/5] list-objects-filter: implement filter only:commits Matthew DeVore
@ 2018-08-10  0:14   ` Jonathan Tan
  0 siblings, 0 replies; 151+ messages in thread
From: Jonathan Tan @ 2018-08-10  0:14 UTC (permalink / raw)
  To: matvore; +Cc: git, jeffhost, peff, stefanbeller, jonathantanmy

> Teach list-objects the "only:commits" filter which allows for filtering
> out all non-commit and non-annotated tag objects (unless other objects
> are explicitly specified by the user). The purpose of this patch is to
> allow smaller partial clones.
> 
> The name of this filter - only:commits - is a bit inaccurate because it
> still allows annotated tags to pass through. I chose it because it was
> the only concise name I could think of that was pretty descriptive. I
> considered and decided against "tree:none" because the code and
> documentation for filters seems to lack the concept of "you're filtering
> this, so we'll implicitly filter all referents of this." So "tree:none"
> is vague, since some may think it filters blobs too, while some may not.
> "only:commits" is specific and makes it easier to match it to a
> potential use case.

I'll do a fuller review tomorrow, but here are my initial thoughts.

I'm undecided about whether "only:commits" or "tree:none" is better -
one argument in favor of the latter is that blobs are not of much use
without any trees referring to them, so it makes sense that omitting
trees means omitting blobs. But that requires some thought and is not
immediately obvious.

>  /*
> - * A filter for list-objects to omit ALL blobs from the traversal.
> - * And to OPTIONALLY collect a list of the omitted OIDs.
> + * A filter for list-objects to omit ALL blobs from the traversal, and possibly
> + * trees as well.
> + * Can OPTIONALLY collect a list of the omitted OIDs.
>   */
> -struct filter_blobs_none_data {
> +struct filter_none_of_type_data {
> +	unsigned omit_trees : 1;
>  	struct oidset *omits;
>  };

I know that it's documented above that blobs are always omitted, but
maybe it's worth it to add a comment /* blobs are always omitted */.

> -	case LOFS_BEGIN_TREE:
> -		assert(obj->type == OBJ_TREE);
> -		/* always include all tree objects */
> -		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
> -
>  	case LOFS_END_TREE:
>  		assert(obj->type == OBJ_TREE);
>  		return LOFR_ZERO;
>  
> +	case LOFS_BEGIN_TREE:
> +		assert(obj->type == OBJ_TREE);
> +		if (!filter_data->omit_trees)
> +			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
> +
>  	case LOFS_BLOB:
> -		assert(obj->type == OBJ_BLOB);
>  		assert((obj->flags & SEEN) == 0);

Moving the case LOFS_BEGIN_TREE and removing the assert is unnecessary,
I think.

Also, there's fallthrough. If that's on purpose, add /* fallthrough */,
although I think that it complicates the code unnecessarily here.

> +test_expect_success 'verify only:commits packfile has no blobs or trees' '
> +	git -C r1 pack-objects --rev --stdout --filter=only:commits >commitsonly.pack <<-EOF &&
> +	HEAD
> +	EOF
> +	git -C r1 index-pack ../commitsonly.pack &&
> +	git -C r1 verify-pack -v ../commitsonly.pack \
> +		| grep -E "tree|blob" \
> +		| sort >observed &&
> +	test_line_count = 0 observed
> +'

Bash pipes conceal return codes. Here it's OK, but it might be better to
write the verify-pack on its own line and then '! grep -E "tree|blob"' -
you don't need to sort or test_line_count.

> +test_expect_success 'grab tree directly when using only:commits' '
> +	# We should get the tree specified directly but not its blobs or subtrees.
> +	git -C r1 pack-objects --rev --stdout --filter=only:commits >commitsonly.pack <<-EOF &&
> +	HEAD:
> +	EOF
> +	git -C r1 index-pack ../commitsonly.pack &&
> +	git -C r1 verify-pack -v ../commitsonly.pack \
> +		| grep -E "tree|blob" \
> +		| sort >observed &&
> +	test_line_count = 1 observed
> +'

Similar comment as above, except you can redirect the output of grep to
a file, then test_line_count on that file. No need for sort.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 5/5] rev-list: handle missing tree objects properly
  2018-08-09 22:45 ` [PATCH 5/5] rev-list: handle missing tree objects properly Matthew DeVore
@ 2018-08-10  0:24   ` Jonathan Tan
  0 siblings, 0 replies; 151+ messages in thread
From: Jonathan Tan @ 2018-08-10  0:24 UTC (permalink / raw)
  To: matvore; +Cc: git, jeffhost, peff, stefanbeller, jonathantanmy

> @@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj)
>  	 */
>  	switch (arg_missing_action) {
>  	case MA_ERROR:
> -		die("missing blob object '%s'", oid_to_hex(&obj->oid));
> +		die("missing %s object '%s'",
> +		    type_name(obj->type), oid_to_hex(&obj->oid));
>  		return;
>  
>  	case MA_ALLOW_ANY:
> @@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj)
>  	case MA_ALLOW_PROMISOR:
>  		if (is_promisor_object(&obj->oid))
>  			return;
> -		die("unexpected missing blob object '%s'",
> -		    oid_to_hex(&obj->oid));
> +		die("unexpected missing %s object '%s'",
> +		    type_name(obj->type), oid_to_hex(&obj->oid));
>  		return;

Once again, I'll do a fuller review tomorrow.

These are fine (obj->type is populated), because the types of objects
are known during traversal.

> -	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
> +	if (!has_object_file(&obj->oid)) {
>  		finish_object__ma(obj);
>  		return 1;

And this is also fine, because finish_object__ma can now handle any
object type.

> +	revs.show_missing_trees = 1;

(and elsewhere)

Could we just show missing trees all the time? We do that for blobs and
already rely on the caller (eventually, show_object() in
builtin/rev-list.c) to determine whether the object actually exists or
not; we could do the same for trees. This allows us to not include this
extra knob.

> -	if (parse_tree_gently(tree, gently) < 0) {
> +	parse_result = parse_tree_gently(tree, gently);
> +	if (parse_result < 0 && !revs->show_missing_trees) {
>  		if (revs->ignore_missing_links)
>  			return;
>  
> @@ -182,7 +185,8 @@ static void process_tree(struct traversal_context *ctx,
>  	if (base->len)
>  		strbuf_addch(base, '/');
>  
> -	process_tree_contents(ctx, tree, base);
> +	if (parse_result >= 0)
> +		process_tree_contents(ctx, tree, base);
>  
>  	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
>  		r = ctx->filter_fn(LOFS_END_TREE, obj,

Is it possible to call the appropriate callbacks and then return
immediately, instead of going through the whole function checking
parse_result when necessary? When doing the latter, the reader needs to
keep on checking if each function still works if the tree is
unparseable.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 1/5] revision: invert meaning of the USER_GIVEN flag
  2018-08-09 22:45 ` [PATCH 1/5] revision: invert meaning of the USER_GIVEN flag Matthew DeVore
@ 2018-08-10 18:43   ` Jonathan Tan
  0 siblings, 0 replies; 151+ messages in thread
From: Jonathan Tan @ 2018-08-10 18:43 UTC (permalink / raw)
  To: matvore; +Cc: git, jeffhost, peff, stefanbeller, jonathantanmy

> Abandon the previous approach of mutating all new objects implicitly in
> add_pending_object by inverting the meaning of the bit (it is now
> NOT_USER_GIVEN) and only setting the flag when we need to.
> 
> This more accurately tracks if a tree was provided directly by the user.
> Without this patch, the root tree of all commits were erroneously
> considered to be USER_GIVEN, which meant they cannot be filtered. This
> distinction is important in the next patch.

After rereading this patch, I think the thought process is:

 - the existing code inaccurately makes root trees of commits USER_GIVEN
 - instead of trying to fix that, it is easier to invert the meaning of this
   flag, and since we only need to track trees and blobs, let's do so in this
   patch

So a better commit message might be:

  revision: mark non-user-given objects instead

  Currently, list-objects.c incorrectly treats all root trees of commits
  as USER_GIVEN. Also, it would be easier to mark objects that are
  non-user-given instead of user-given, since the places in the code
  where we access an object through a reference are more obvious than
  the places where we access an object that was given by the user.

  Resolve these two problems by introducing a flag NOT_USER_GIVEN that
  marks blobs and trees that are non-user-given, replacing USER_GIVEN.
  (Only blobs and trees are marked because this mark is only used when
  filtering objects, and filtering of other types of objects are not
  supported yet.)

The patch itself looks good to me.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [RFC PATCH 0/5] filter: support for excluding all trees and blobs
  2018-08-09 22:44 [RFC PATCH 0/5] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (4 preceding siblings ...)
  2018-08-09 22:45 ` [PATCH 5/5] rev-list: handle missing tree objects properly Matthew DeVore
@ 2018-08-10 19:03 ` Jonathan Tan
  2018-08-10 23:06 ` [PATCH v2 " Matthew DeVore
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 151+ messages in thread
From: Jonathan Tan @ 2018-08-10 19:03 UTC (permalink / raw)
  To: matvore; +Cc: git, jeffhost, peff, stefanbeller, jonathantanmy

> Matthew DeVore (5):
>   revision: invert meaning of the USER_GIVEN flag
>   list-objects-filter: implement filter only:commits
>   list-objects: store common func args in struct
>   list-objects: refactor to process_tree_contents
>   rev-list: handle missing tree objects properly

Firstly, run every patch with "make DEVELOPER=1" - there is at least one
"mixed declarations and code", which the Git coding style does not
allow.

I've already replied to patches 1, 2, and 5. Patches 3 and 4 look OK to
me and seem like good changes (patch 4, in addition to reducing
indentation, also reduces the scope of the local variables - so it is a
good change).

One last thing is that I'm not sure that this order of patches is the
best order - in particular, if I run the tests at the 5th patch using a
binary compiled at the 4th patch, I notice that cloning with
"--filter=only:commits" fails with a cryptic error "fatal: bad tree
object e891efadd67ca0c01b1c518a2fd91130d40f5904". This makes bisecting
for errors difficult, but perhaps with this problem manifesting in only
a few commits, it is not so bad.

The ideal order is to put patches 3-5 before 1-2. I've tried the
rearrangement myself and found many instances where I had to rewrite
code because one patch introduces "ctx" and the other, NOT_USER_GIVEN.
So as a reviewer, I'm on the fence about suggesting that the patches be
reordered.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v2 0/5] filter: support for excluding all trees and blobs
  2018-08-09 22:44 [RFC PATCH 0/5] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (5 preceding siblings ...)
  2018-08-10 19:03 ` [RFC PATCH 0/5] filter: support for excluding all trees and blobs Jonathan Tan
@ 2018-08-10 23:06 ` " Matthew DeVore
  2018-08-10 23:06   ` [PATCH v2 1/5] list-objects: store common func args in struct Matthew DeVore
                     ` (4 more replies)
  2018-08-13 18:14 ` [PATCH v3 0/5] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (9 subsequent siblings)
  16 siblings, 5 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-10 23:06 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, jeffhost, peff, stefanbeller, jonathantanmy

Changes applied, as suggested by jonathantanmy@google.com:
- Re-ordered patches so 3-5 actually come first
- Sadly, as a result of the above, many of the tests in the "treat
  missing trees like missing blobs" patch had to be moved to the filter
  implementation patch, since it doesn't seem possible to create
  promisor objects that are really recognized as promisor objects in
  tests (unless you actually do a partial clone). Overall, I thought
  this ordering was more elegant, so I kept it.
- Reworded NOT_USER_GIVEN commit message as suggested
- Fixed style error list-objects.c (var dec and code mixed)
- Added missing /* fallthrough */ and explanation why
- Removed the show_missing_trees flag - now we won't show any error if
  the only problem is the object is missing and it's a promisor object.
- Renamed only:commits to tree:none and updated commit message
  accordingly
- Added /* blobs are always omitted */ comment in list-objects-filter.c
- Fixed up tests in t5317-pack-objects-filter-objects.sh to not use
  unnecessary sorts, and to do more commands on lines of their own
  rather than in pipes


Matthew DeVore (5):
  list-objects: store common func args in struct
  list-objects: refactor to process_tree_contents
  rev-list: handle missing tree objects properly
  revision: mark non-user-given objects instead
  list-objects-filter: implement filter tree:none

 Documentation/rev-list-options.txt     |   2 +
 builtin/rev-list.c                     |  10 +-
 list-objects-filter-options.c          |   4 +
 list-objects-filter-options.h          |   1 +
 list-objects-filter.c                  |  49 +++--
 list-objects.c                         | 236 +++++++++++++------------
 revision.c                             |   1 -
 revision.h                             |  10 +-
 t/t5317-pack-objects-filter-objects.sh |  40 +++++
 t/t5616-partial-clone.sh               |  27 +++
 t/t6112-rev-list-filters-objects.sh    |  13 ++
 11 files changed, 259 insertions(+), 134 deletions(-)

-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v2 1/5] list-objects: store common func args in struct
  2018-08-10 23:06 ` [PATCH v2 " Matthew DeVore
@ 2018-08-10 23:06   ` Matthew DeVore
  2018-08-10 23:06   ` [PATCH v2 2/5] list-objects: refactor to process_tree_contents Matthew DeVore
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-10 23:06 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, jeffhost, peff, stefanbeller, jonathantanmy

This will make utility functions easier to create, as done by the next
patch.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 158 +++++++++++++++++++++++--------------------------
 1 file changed, 74 insertions(+), 84 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index c99c47ac1..584518a3f 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -12,20 +12,25 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static void process_blob(struct rev_info *revs,
+struct traversal_context {
+	struct rev_info *revs;
+	show_object_fn show_object;
+	show_commit_fn show_commit;
+	void *show_data;
+	filter_object_fn filter_fn;
+	void *filter_data;
+};
+
+static void process_blob(struct traversal_context *ctx,
 			 struct blob *blob,
-			 show_object_fn show,
 			 struct strbuf *path,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &blob->object;
 	size_t pathlen;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
-	if (!revs->blob_objects)
+	if (!ctx->revs->blob_objects)
 		return;
 	if (!obj)
 		die("bad blob object");
@@ -41,21 +46,21 @@ static void process_blob(struct rev_info *revs,
 	 * may cause the actual filter to report an incomplete list
 	 * of missing objects.
 	 */
-	if (revs->exclude_promisor_objects &&
+	if (ctx->revs->exclude_promisor_objects &&
 	    !has_object_file(&obj->oid) &&
 	    is_promisor_object(&obj->oid))
 		return;
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BLOB, obj,
-			      path->buf, &path->buf[pathlen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BLOB, obj,
+				   path->buf, &path->buf[pathlen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, path->buf, cb_data);
+		ctx->show_object(obj, path->buf, ctx->show_data);
 	strbuf_setlen(path, pathlen);
 }
 
@@ -81,26 +86,21 @@ static void process_blob(struct rev_info *revs,
  * the link, and how to do it. Whether it necessarily makes
  * any sense what-so-ever to ever do that is another issue.
  */
-static void process_gitlink(struct rev_info *revs,
+static void process_gitlink(struct traversal_context *ctx,
 			    const unsigned char *sha1,
-			    show_object_fn show,
 			    struct strbuf *path,
-			    const char *name,
-			    void *cb_data)
+			    const char *name)
 {
 	/* Nothing to do */
 }
 
-static void process_tree(struct rev_info *revs,
+static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
-			 show_object_fn show,
 			 struct strbuf *base,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &tree->object;
+	struct rev_info *revs = ctx->revs;
 	struct tree_desc desc;
 	struct name_entry entry;
 	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
@@ -133,14 +133,14 @@ static void process_tree(struct rev_info *revs,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BEGIN_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, base->buf, cb_data);
+		ctx->show_object(obj, base->buf, ctx->show_data);
 	if (base->len)
 		strbuf_addch(base, '/');
 
@@ -157,29 +157,25 @@ static void process_tree(struct rev_info *revs,
 		}
 
 		if (S_ISDIR(entry.mode))
-			process_tree(revs,
+			process_tree(ctx,
 				     lookup_tree(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(revs, entry.oid->hash,
-					show, base, entry.path,
-					cb_data);
+			process_gitlink(ctx, entry.oid->hash, base, entry.path);
 		else
-			process_blob(revs,
+			process_blob(ctx,
 				     lookup_blob(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 	}
 
-	if (!(obj->flags & USER_GIVEN) && filter_fn) {
-		r = filter_fn(LOFS_END_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+		r = ctx->filter_fn(LOFS_END_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 		if (r & LOFR_MARK_SEEN)
 			obj->flags |= SEEN;
 		if (r & LOFR_DO_SHOW)
-			show(obj, base->buf, cb_data);
+			ctx->show_object(obj, base->buf, ctx->show_data);
 	}
 
 	strbuf_setlen(base, baselen);
@@ -242,19 +238,15 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 	add_pending_object(revs, &tree->object, "");
 }
 
-static void traverse_trees_and_blobs(struct rev_info *revs,
-				     struct strbuf *base,
-				     show_object_fn show_object,
-				     void *show_data,
-				     filter_object_fn filter_fn,
-				     void *filter_data)
+static void traverse_trees_and_blobs(struct traversal_context *ctx,
+				     struct strbuf *base)
 {
 	int i;
 
 	assert(base->len == 0);
 
-	for (i = 0; i < revs->pending.nr; i++) {
-		struct object_array_entry *pending = revs->pending.objects + i;
+	for (i = 0; i < ctx->revs->pending.nr; i++) {
+		struct object_array_entry *pending = ctx->revs->pending.objects + i;
 		struct object *obj = pending->item;
 		const char *name = pending->name;
 		const char *path = pending->path;
@@ -262,62 +254,49 @@ static void traverse_trees_and_blobs(struct rev_info *revs,
 			continue;
 		if (obj->type == OBJ_TAG) {
 			obj->flags |= SEEN;
-			show_object(obj, name, show_data);
+			ctx->show_object(obj, name, ctx->show_data);
 			continue;
 		}
 		if (!path)
 			path = "";
 		if (obj->type == OBJ_TREE) {
-			process_tree(revs, (struct tree *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_tree(ctx, (struct tree *)obj, base, path);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
-			process_blob(revs, (struct blob *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_blob(ctx, (struct blob *)obj, base, path);
 			continue;
 		}
 		die("unknown pending object %s (%s)",
 		    oid_to_hex(&obj->oid), name);
 	}
-	object_array_clear(&revs->pending);
+	object_array_clear(&ctx->revs->pending);
 }
 
-static void do_traverse(struct rev_info *revs,
-			show_commit_fn show_commit,
-			show_object_fn show_object,
-			void *show_data,
-			filter_object_fn filter_fn,
-			void *filter_data)
+static void do_traverse(struct traversal_context *ctx)
 {
 	struct commit *commit;
 	struct strbuf csp; /* callee's scratch pad */
 	strbuf_init(&csp, PATH_MAX);
 
-	while ((commit = get_revision(revs)) != NULL) {
+	while ((commit = get_revision(ctx->revs)) != NULL) {
 		/*
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
 		if (get_commit_tree(commit))
-			add_pending_tree(revs, get_commit_tree(commit));
-		show_commit(commit, show_data);
+			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		ctx->show_commit(commit, ctx->show_data);
 
-		if (revs->tree_blobs_in_commit_order)
+		if (ctx->revs->tree_blobs_in_commit_order)
 			/*
 			 * NEEDSWORK: Adding the tree and then flushing it here
 			 * needs a reallocation for each commit. Can we pass the
 			 * tree directory without allocation churn?
 			 */
-			traverse_trees_and_blobs(revs, &csp,
-						 show_object, show_data,
-						 filter_fn, filter_data);
+			traverse_trees_and_blobs(ctx, &csp);
 	}
-	traverse_trees_and_blobs(revs, &csp,
-				 show_object, show_data,
-				 filter_fn, filter_data);
+	traverse_trees_and_blobs(ctx, &csp);
 	strbuf_release(&csp);
 }
 
@@ -326,7 +305,14 @@ void traverse_commit_list(struct rev_info *revs,
 			  show_object_fn show_object,
 			  void *show_data)
 {
-	do_traverse(revs, show_commit, show_object, show_data, NULL, NULL);
+	struct traversal_context ctx;
+	ctx.revs = revs;
+	ctx.show_commit = show_commit;
+	ctx.show_object = show_object;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+	ctx.filter_data = NULL;
+	do_traverse(&ctx);
 }
 
 void traverse_commit_list_filtered(
@@ -337,14 +323,18 @@ void traverse_commit_list_filtered(
 	void *show_data,
 	struct oidset *omitted)
 {
-	filter_object_fn filter_fn = NULL;
+	struct traversal_context ctx;
 	filter_free_fn filter_free_fn = NULL;
-	void *filter_data = NULL;
-
-	filter_data = list_objects_filter__init(omitted, filter_options,
-						&filter_fn, &filter_free_fn);
-	do_traverse(revs, show_commit, show_object, show_data,
-		    filter_fn, filter_data);
-	if (filter_data && filter_free_fn)
-		filter_free_fn(filter_data);
+
+	ctx.revs = revs;
+	ctx.show_object = show_object;
+	ctx.show_commit = show_commit;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+
+	ctx.filter_data = list_objects_filter__init(omitted, filter_options,
+						    &ctx.filter_fn, &filter_free_fn);
+	do_traverse(&ctx);
+	if (ctx.filter_data && filter_free_fn)
+		filter_free_fn(ctx.filter_data);
 }
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v2 2/5] list-objects: refactor to process_tree_contents
  2018-08-10 23:06 ` [PATCH v2 " Matthew DeVore
  2018-08-10 23:06   ` [PATCH v2 1/5] list-objects: store common func args in struct Matthew DeVore
@ 2018-08-10 23:06   ` Matthew DeVore
  2018-08-10 23:06   ` [PATCH v2 3/5] rev-list: handle missing tree objects properly Matthew DeVore
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-10 23:06 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, jeffhost, peff, stefanbeller, jonathantanmy

This will be used in a follow-up patch to reduce indentation needed when
invoking the logic conditionally. i.e. rather than:

if (foo) {
	while (...) {
		/* this is very indented */
	}
}

we will have:

if (foo)
	process_tree_contents(...);

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 68 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 584518a3f..ccc529e5e 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -94,6 +94,46 @@ static void process_gitlink(struct traversal_context *ctx,
 	/* Nothing to do */
 }
 
+static void process_tree(struct traversal_context *ctx,
+			 struct tree *tree,
+			 struct strbuf *base,
+			 const char *name);
+
+static void process_tree_contents(struct traversal_context *ctx,
+				  struct tree *tree,
+				  struct strbuf *base)
+{
+	struct tree_desc desc;
+	struct name_entry entry;
+	enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
+		all_entries_interesting : entry_not_interesting;
+
+	init_tree_desc(&desc, tree->buffer, tree->size);
+
+	while (tree_entry(&desc, &entry)) {
+		if (match != all_entries_interesting) {
+			match = tree_entry_interesting(&entry, base, 0,
+						       &ctx->revs->diffopt.pathspec);
+			if (match == all_entries_not_interesting)
+				break;
+			if (match == entry_not_interesting)
+				continue;
+		}
+
+		if (S_ISDIR(entry.mode))
+			process_tree(ctx,
+				     lookup_tree(the_repository, entry.oid),
+				     base, entry.path);
+		else if (S_ISGITLINK(entry.mode))
+			process_gitlink(ctx, entry.oid->hash,
+					base, entry.path);
+		else
+			process_blob(ctx,
+				     lookup_blob(the_repository, entry.oid),
+				     base, entry.path);
+	}
+}
+
 static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
 			 struct strbuf *base,
@@ -101,10 +141,6 @@ static void process_tree(struct traversal_context *ctx,
 {
 	struct object *obj = &tree->object;
 	struct rev_info *revs = ctx->revs;
-	struct tree_desc desc;
-	struct name_entry entry;
-	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
-		all_entries_interesting: entry_not_interesting;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 	int gently = revs->ignore_missing_links ||
@@ -144,29 +180,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
-
-	while (tree_entry(&desc, &entry)) {
-		if (match != all_entries_interesting) {
-			match = tree_entry_interesting(&entry, base, 0,
-						       &revs->diffopt.pathspec);
-			if (match == all_entries_not_interesting)
-				break;
-			if (match == entry_not_interesting)
-				continue;
-		}
-
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
-		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(ctx, entry.oid->hash, base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
-	}
+	process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v2 3/5] rev-list: handle missing tree objects properly
  2018-08-10 23:06 ` [PATCH v2 " Matthew DeVore
  2018-08-10 23:06   ` [PATCH v2 1/5] list-objects: store common func args in struct Matthew DeVore
  2018-08-10 23:06   ` [PATCH v2 2/5] list-objects: refactor to process_tree_contents Matthew DeVore
@ 2018-08-10 23:06   ` Matthew DeVore
  2018-08-13 18:20     ` Jonathan Tan
  2018-08-10 23:06   ` [PATCH v2 4/5] revision: mark non-user-given objects instead Matthew DeVore
  2018-08-10 23:06   ` [PATCH v2 5/5] list-objects-filter: implement filter tree:none Matthew DeVore
  4 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-08-10 23:06 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, jeffhost, peff, stefanbeller, jonathantanmy

Previously, we assumed only blob objects could be missing. This patch
makes rev-list handle missing trees like missing blobs. A missing tree
will cause an error if --missing indicates an error should be caused,
and the hash is printed even if the tree is missing.

In list-objects.c we no longer print a message to stderr if a tree
object is missing (quiet_on_missing is always true). I couldn't find
any place where this would matter, or where the caller of
traverse_commit_list would need to be fixed to show the error. However,
in the future it would be trivial to make the caller show the message if
we needed to.

This is not tested very thoroughly, since we cannot create promisor
objects in tests without using an actual partial clone. t0410 has a
promise_and_delete utility function, but the is_promisor_object function
does not return 1 for objects deleted in this way. More tests will will
come in a patch that implements a filter that can be used with git
clone.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 builtin/rev-list.c                     | 10 ++++++----
 list-objects.c                         | 17 +++++++++--------
 t/t5317-pack-objects-filter-objects.sh | 13 +++++++++++++
 3 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5b07f3f4a..ea0daf0c4 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -6,6 +6,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "object.h"
 #include "object-store.h"
 #include "pack.h"
 #include "pack-bitmap.h"
@@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj)
 	 */
 	switch (arg_missing_action) {
 	case MA_ERROR:
-		die("missing blob object '%s'", oid_to_hex(&obj->oid));
+		die("missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	case MA_ALLOW_ANY:
@@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj)
 	case MA_ALLOW_PROMISOR:
 		if (is_promisor_object(&obj->oid))
 			return;
-		die("unexpected missing blob object '%s'",
-		    oid_to_hex(&obj->oid));
+		die("unexpected missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	default:
@@ -235,7 +237,7 @@ static inline void finish_object__ma(struct object *obj)
 static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
+	if (!has_object_file(&obj->oid)) {
 		finish_object__ma(obj);
 		return 1;
 	}
diff --git a/list-objects.c b/list-objects.c
index ccc529e5e..aedcd0228 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,8 +143,7 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
-	int gently = revs->ignore_missing_links ||
-		     revs->exclude_promisor_objects;
+	int parsed;
 
 	if (!revs->tree_objects)
 		return;
@@ -152,20 +151,21 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, gently) < 0) {
+	parsed = parse_tree_gently(tree, /*quiet_on_missing=*/1) >= 0;
+	if (!parsed) {
 		if (revs->ignore_missing_links)
 			return;
 
+		if (!is_promisor_object(&obj->oid))
+			die("bad tree object %s", oid_to_hex(&obj->oid));
+
 		/*
 		 * Pre-filter known-missing tree objects when explicitly
 		 * requested.  This may cause the actual filter to report
 		 * an incomplete list of missing objects.
 		 */
-		if (revs->exclude_promisor_objects &&
-		    is_promisor_object(&obj->oid))
+		if (revs->exclude_promisor_objects)
 			return;
-
-		die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
 	strbuf_addstr(base, name);
@@ -180,7 +180,8 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	process_tree_contents(ctx, tree, base);
+	if (parsed)
+		process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 6710c8bc8..5e35f33bf 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -59,6 +59,19 @@ test_expect_success 'verify normal and blob:none packfiles have same commits/tre
 	test_cmp observed expected
 '
 
+test_expect_success 'get an error for missing tree object' '
+	git init r5 &&
+	echo foo > r5/foo &&
+	git -C r5 add foo &&
+	git -C r5 commit -m "foo" &&
+	del=$(git -C r5 rev-parse HEAD^{tree} | sed "s|..|&/|") &&
+	rm r5/.git/objects/$del &&
+	test_must_fail git -C r5 pack-objects --rev --stdout 2>bad_tree <<-EOF &&
+	HEAD
+	EOF
+	grep -q "bad tree object" bad_tree
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v2 4/5] revision: mark non-user-given objects instead
  2018-08-10 23:06 ` [PATCH v2 " Matthew DeVore
                     ` (2 preceding siblings ...)
  2018-08-10 23:06   ` [PATCH v2 3/5] rev-list: handle missing tree objects properly Matthew DeVore
@ 2018-08-10 23:06   ` Matthew DeVore
  2018-08-10 23:06   ` [PATCH v2 5/5] list-objects-filter: implement filter tree:none Matthew DeVore
  4 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-10 23:06 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, jeffhost, peff, stefanbeller, jonathantanmy

Currently, list-objects.c incorrectly treats all root trees of commits
as USER_GIVEN. Also, it would be easier to mark objects that are
non-user-given instead of user-given, since the places in the code
where we access an object through a reference are more obvious than
the places where we access an object that was given by the user.

Resolve these two problems by introducing a flag NOT_USER_GIVEN that
marks blobs and trees that are non-user-given, replacing USER_GIVEN.
(Only blobs and trees are marked because this mark is only used when
filtering objects, and filtering of other types of objects is not
supported yet.)

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 31 ++++++++++++++++++-------------
 revision.c     |  1 -
 revision.h     | 10 +++++++---
 3 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index aedcd0228..fd522a59a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -53,7 +53,7 @@ static void process_blob(struct traversal_context *ctx,
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BLOB, obj,
 				   path->buf, &path->buf[pathlen],
 				   ctx->filter_data);
@@ -120,17 +120,19 @@ static void process_tree_contents(struct traversal_context *ctx,
 				continue;
 		}
 
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
+		if (S_ISDIR(entry.mode)) {
+			struct tree *t = lookup_tree(the_repository, entry.oid);
+			t->object.flags |= NOT_USER_GIVEN;
+			process_tree(ctx, t, base, entry.path);
+		}
 		else if (S_ISGITLINK(entry.mode))
 			process_gitlink(ctx, entry.oid->hash,
 					base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
+		else {
+			struct blob *b = lookup_blob(the_repository, entry.oid);
+			b->object.flags |= NOT_USER_GIVEN;
+			process_blob(ctx, b, base, entry.path);
+		}
 	}
 }
 
@@ -169,7 +171,7 @@ static void process_tree(struct traversal_context *ctx,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -183,7 +185,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (parsed)
 		process_tree_contents(ctx, tree, base);
 
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -299,8 +301,11 @@ static void do_traverse(struct traversal_context *ctx)
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
-		if (get_commit_tree(commit))
-			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		if (get_commit_tree(commit)) {
+			struct tree *tree = get_commit_tree(commit);
+			tree->object.flags |= NOT_USER_GIVEN;
+			add_pending_tree(ctx->revs, tree);
+		}
 		ctx->show_commit(commit, ctx->show_data);
 
 		if (ctx->revs->tree_blobs_in_commit_order)
diff --git a/revision.c b/revision.c
index 062749437..6d355b43c 100644
--- a/revision.c
+++ b/revision.c
@@ -175,7 +175,6 @@ static void add_pending_object_with_path(struct rev_info *revs,
 		strbuf_release(&buf);
 		return; /* do not add the commit itself */
 	}
-	obj->flags |= USER_GIVEN;
 	add_object_array_with_path(obj, name, &revs->pending, mode, path);
 }
 
diff --git a/revision.h b/revision.h
index c599c34da..cd6b62313 100644
--- a/revision.h
+++ b/revision.h
@@ -8,7 +8,11 @@
 #include "diff.h"
 #include "commit-slab-decl.h"
 
-/* Remember to update object flag allocation in object.h */
+/* Remember to update object flag allocation in object.h
+ * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
+ * filtering trees and blobs, but it may be useful to support filtering commits
+ * in the future.
+ */
 #define SEEN		(1u<<0)
 #define UNINTERESTING   (1u<<1)
 #define TREESAME	(1u<<2)
@@ -20,9 +24,9 @@
 #define SYMMETRIC_LEFT	(1u<<8)
 #define PATCHSAME	(1u<<9)
 #define BOTTOM		(1u<<10)
-#define USER_GIVEN	(1u<<25) /* given directly by the user */
+#define NOT_USER_GIVEN	(1u<<25) /* tree or blob not given directly by user */
 #define TRACK_LINEAR	(1u<<26)
-#define ALL_REV_FLAGS	(((1u<<11)-1) | USER_GIVEN | TRACK_LINEAR)
+#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)
 
 #define DECORATE_SHORT_REFS	1
 #define DECORATE_FULL_REFS	2
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v2 5/5] list-objects-filter: implement filter tree:none
  2018-08-10 23:06 ` [PATCH v2 " Matthew DeVore
                     ` (3 preceding siblings ...)
  2018-08-10 23:06   ` [PATCH v2 4/5] revision: mark non-user-given objects instead Matthew DeVore
@ 2018-08-10 23:06   ` Matthew DeVore
  2018-08-13 16:38     ` Jeff Hostetler
  2018-08-13 18:29     ` Jonathan Tan
  4 siblings, 2 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-10 23:06 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, jeffhost, peff, stefanbeller, jonathantanmy

Teach list-objects the "tree:none" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:none - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also consider only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 Documentation/rev-list-options.txt     |  2 ++
 list-objects-filter-options.c          |  4 +++
 list-objects-filter-options.h          |  1 +
 list-objects-filter.c                  | 49 +++++++++++++++++++-------
 t/t5317-pack-objects-filter-objects.sh | 27 ++++++++++++++
 t/t5616-partial-clone.sh               | 27 ++++++++++++++
 t/t6112-rev-list-filters-objects.sh    | 13 +++++++
 7 files changed, 110 insertions(+), 13 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 7b273635d..68b4b9552 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -743,6 +743,8 @@ specification contained in <path>.
 	A debug option to help with future "partial clone" development.
 	This option specifies how missing objects are handled.
 +
+The form '--filter=tree:none' omits all blobs and trees.
++
 The form '--missing=error' requests that rev-list stop with an error if
 a missing object is encountered.  This is the default action.
 +
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..523cb00a0 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
 			return 0;
 		}
 
+	} else if (!strcmp(arg, "tree:none")) {
+		filter_options->choice = LOFC_TREE_NONE;
+		return 0;
+
 	} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
 		struct object_context oc;
 		struct object_id sparse_oid;
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 0000a61f8..af64e5c66 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -10,6 +10,7 @@ enum list_objects_filter_choice {
 	LOFC_DISABLED = 0,
 	LOFC_BLOB_NONE,
 	LOFC_BLOB_LIMIT,
+	LOFC_TREE_NONE,
 	LOFC_SPARSE_OID,
 	LOFC_SPARSE_PATH,
 	LOFC__COUNT /* must be last */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..22c894093 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -26,38 +26,45 @@
 #define FILTER_SHOWN_BUT_REVISIT (1<<21)
 
 /*
- * A filter for list-objects to omit ALL blobs from the traversal.
- * And to OPTIONALLY collect a list of the omitted OIDs.
+ * A filter for list-objects to omit ALL blobs from the traversal, and possibly
+ * trees as well.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
  */
-struct filter_blobs_none_data {
+struct filter_none_of_type_data {
+	/* blobs are always omitted */
+	unsigned omit_trees : 1;
 	struct oidset *omits;
 };
 
-static enum list_objects_filter_result filter_blobs_none(
+static enum list_objects_filter_result filter_none_of_type(
 	enum list_objects_filter_situation filter_situation,
 	struct object *obj,
 	const char *pathname,
 	const char *filename,
 	void *filter_data_)
 {
-	struct filter_blobs_none_data *filter_data = filter_data_;
+	struct filter_none_of_type_data *filter_data = filter_data_;
 
 	switch (filter_situation) {
 	default:
 		die("unknown filter_situation");
 		return LOFR_ZERO;
 
-	case LOFS_BEGIN_TREE:
-		assert(obj->type == OBJ_TREE);
-		/* always include all tree objects */
-		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
-
 	case LOFS_END_TREE:
 		assert(obj->type == OBJ_TREE);
 		return LOFR_ZERO;
 
+	case LOFS_BEGIN_TREE:
+		assert(obj->type == OBJ_TREE);
+		if (!filter_data->omit_trees)
+			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
+
+		/*
+		 * Fallthrough to insert into omitted list for trees as well as
+		 * blobs.
+		 */
+		/* fallthrough */
 	case LOFS_BLOB:
-		assert(obj->type == OBJ_BLOB);
 		assert((obj->flags & SEEN) == 0);
 
 		if (filter_data->omits)
@@ -72,10 +79,25 @@ static void *filter_blobs_none__init(
 	filter_object_fn *filter_fn,
 	filter_free_fn *filter_free_fn)
 {
-	struct filter_blobs_none_data *d = xcalloc(1, sizeof(*d));
+	struct filter_none_of_type_data *d = xcalloc(1, sizeof(*d));
+	d->omits = omitted;
+
+	*filter_fn = filter_none_of_type;
+	*filter_free_fn = free;
+	return d;
+}
+
+static void* filter_tree_none__init(
+	struct oidset *omitted,
+	struct list_objects_filter_options *filter_options,
+	filter_object_fn *filter_fn,
+	filter_free_fn *filter_free_fn)
+{
+	struct filter_none_of_type_data *d = xcalloc(1, sizeof(*d));
+	d->omit_trees = 1;
 	d->omits = omitted;
 
-	*filter_fn = filter_blobs_none;
+	*filter_fn = filter_none_of_type;
 	*filter_free_fn = free;
 	return d;
 }
@@ -374,6 +396,7 @@ static filter_init_fn s_filters[] = {
 	NULL,
 	filter_blobs_none__init,
 	filter_blobs_limit__init,
+	filter_tree_none__init,
 	filter_sparse_oid__init,
 	filter_sparse_path__init,
 };
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 5e35f33bf..28a8c916a 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -72,6 +72,33 @@ test_expect_success 'get an error for missing tree object' '
 	grep -q "bad tree object" bad_tree
 '
 
+test_expect_success 'setup for tests of tree:none' '
+	mkdir r1/subtree &&
+	echo "This is a file in a subtree" > r1/subtree/file &&
+	git -C r1 add subtree/file &&
+	git -C r1 commit -m subtree
+'
+
+test_expect_success 'verify tree:none packfile has no blobs or trees' '
+	git -C r1 pack-objects --rev --stdout --filter=tree:none >commitsonly.pack <<-EOF &&
+	HEAD
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	! grep -E "tree|blob" objs
+'
+
+test_expect_success 'grab tree directly when using tree:none' '
+	# We should get the tree specified directly but not its blobs or subtrees.
+	git -C r1 pack-objects --rev --stdout --filter=tree:none >commitsonly.pack <<-EOF &&
+	HEAD:
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	grep -E "tree|blob" objs >trees_and_blobs &&
+	test_line_count = 1 trees_and_blobs
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index bbbe7537d..4fc068716 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -170,6 +170,33 @@ test_expect_success 'partial clone fetches blobs pointed to by refs even if norm
 	git -C dst fsck
 '
 
+test_expect_success 'can use tree:none to filter partial clone' '
+	rm -rf dst &&
+	git clone --no-checkout --filter=tree:none "file://$(pwd)/srv.bare" dst &&
+	git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
+	cat fetched_objects \
+		| awk -f print_1.awk \
+		| xargs -n1 git -C dst cat-file -t >fetched_types &&
+	sort fetched_types -u >unique_types.observed &&
+	echo commit > unique_types.expected &&
+	test_cmp unique_types.observed unique_types.expected
+'
+
+test_expect_success 'show missing tree objects with --missing=print' '
+	git -C dst rev-list master --missing=print --quiet --objects >missing_objs &&
+	sed "s/?//" missing_objs \
+		| xargs -n1 git -C srv.bare cat-file -t \
+		>missing_types &&
+	sort -u missing_types >missing_types.uniq &&
+	echo tree >expected &&
+	test_cmp missing_types.uniq expected
+'
+
+test_expect_success 'do not complain when a missing tree cannot be parsed' '
+	git -C dst rev-list master --missing=print --quiet --objects 2>rev_list_err >&2 &&
+	! grep -q "Could not read " rev_list_err
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 0a37dd5f9..ecdf6b4c3 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -196,6 +196,19 @@ test_expect_success 'verify sparse:oid=oid-ish omits top-level files' '
 	test_cmp observed expected
 '
 
+# Test tree:none filter.
+
+test_expect_success 'verify tree:none includes trees in "filtered" output' '
+	git -C r3 rev-list HEAD --quiet --objects --filter-print-omitted --filter=tree:none \
+		| awk -f print_1.awk \
+		| sed s/~// \
+		| xargs -n1 git -C r3 cat-file -t \
+		| sort -u >filtered_types &&
+	printf "blob\ntree\n" > expected &&
+	test_cmp filtered_types expected
+'
+
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v2 5/5] list-objects-filter: implement filter tree:none
  2018-08-10 23:06   ` [PATCH v2 5/5] list-objects-filter: implement filter tree:none Matthew DeVore
@ 2018-08-13 16:38     ` Jeff Hostetler
  2018-08-14  0:57       ` Matthew DeVore
  2018-08-13 18:29     ` Jonathan Tan
  1 sibling, 1 reply; 151+ messages in thread
From: Jeff Hostetler @ 2018-08-13 16:38 UTC (permalink / raw)
  To: Matthew DeVore, git; +Cc: jeffhost, peff, stefanbeller, jonathantanmy



On 8/10/2018 7:06 PM, Matthew DeVore wrote:
> Teach list-objects the "tree:none" filter which allows for filtering
> out all tree and blob objects (unless other objects are explicitly
> specified by the user). The purpose of this patch is to allow smaller
> partial clones.
> 
> The name of this filter - tree:none - does not explicitly specify that
> it also filters out all blobs, but this should not cause much confusion
> because blobs are not at all useful without the trees that refer to
> them.
> 
> I also consider only:commits as a name, but this is inaccurate because
> it suggests that annotated tags are omitted, but actually they are
> included.
> 
> Signed-off-by: Matthew DeVore <matvore@google.com>
> ---
>   Documentation/rev-list-options.txt     |  2 ++
>   list-objects-filter-options.c          |  4 +++
>   list-objects-filter-options.h          |  1 +
>   list-objects-filter.c                  | 49 +++++++++++++++++++-------
>   t/t5317-pack-objects-filter-objects.sh | 27 ++++++++++++++
>   t/t5616-partial-clone.sh               | 27 ++++++++++++++
>   t/t6112-rev-list-filters-objects.sh    | 13 +++++++
>   7 files changed, 110 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
> index 7b273635d..68b4b9552 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -743,6 +743,8 @@ specification contained in <path>.
>   	A debug option to help with future "partial clone" development.
>   	This option specifies how missing objects are handled.
>   +
> +The form '--filter=tree:none' omits all blobs and trees.
> ++
>   The form '--missing=error' requests that rev-list stop with an error if
>   a missing object is encountered.  This is the default action.
>   +
> diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> index c0e2bd6a0..523cb00a0 100644
> --- a/list-objects-filter-options.c
> +++ b/list-objects-filter-options.c
> @@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
>   			return 0;
>   		}
>   
> +	} else if (!strcmp(arg, "tree:none")) {
> +		filter_options->choice = LOFC_TREE_NONE;
> +		return 0;
> +
>   	} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
>   		struct object_context oc;
>   		struct object_id sparse_oid;
> diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
> index 0000a61f8..af64e5c66 100644
> --- a/list-objects-filter-options.h
> +++ b/list-objects-filter-options.h
> @@ -10,6 +10,7 @@ enum list_objects_filter_choice {
>   	LOFC_DISABLED = 0,
>   	LOFC_BLOB_NONE,
>   	LOFC_BLOB_LIMIT,
> +	LOFC_TREE_NONE,
>   	LOFC_SPARSE_OID,
>   	LOFC_SPARSE_PATH,
>   	LOFC__COUNT /* must be last */
> diff --git a/list-objects-filter.c b/list-objects-filter.c
> index a0ba78b20..22c894093 100644
> --- a/list-objects-filter.c
> +++ b/list-objects-filter.c
> @@ -26,38 +26,45 @@
>   #define FILTER_SHOWN_BUT_REVISIT (1<<21)
>   
>   /*
> - * A filter for list-objects to omit ALL blobs from the traversal.
> - * And to OPTIONALLY collect a list of the omitted OIDs.
> + * A filter for list-objects to omit ALL blobs from the traversal, and possibly
> + * trees as well.
> + * Can OPTIONALLY collect a list of the omitted OIDs.
>    */
> -struct filter_blobs_none_data {
> +struct filter_none_of_type_data {
> +	/* blobs are always omitted */
> +	unsigned omit_trees : 1;
>   	struct oidset *omits;
>   };
>   

I'm not sure I'd convert the existing filter types.
When I created this file, I created a set of function pairs
for each filter type:
     filter_<name>() and filter_<name>__init()

with the latter being added to the s_filters[] array and created
a choice enum having corresponding values
     LOFC_<name>

Here you're adding a new _init() and LOFC_ key, but mapping both
the original "blob:none" and the new "tree:none" to a combined
filter function and blends these 2 modes.

Style-wise, I'd keep the original filters as they were and add a
new function pair for the new tree:none filter.  Then you can
simplify the logic inside your new filter.  For example, in your
filter "filter_data->omit_trees" will always be true, so you can
just do the "if (filter_data->omits) oidset_insert(...); return _SEEN"
and not have the fallthru stuff -- or get rid of the asserts() and put
the case labels together.

One of the things I wanted to do (when I found some free time) was to
add a "tree:none" and maybe a "tree:root" filter.  (The latter only
including the root trees associated with the fetched commits, since
there are/were some places where we implicitly also load the root tree
when loading the commit object.)  So in that vein, it might be that we
would want a "tree:<depth>" filter instead with 0 = none and 1 = root.
I wasn't ready to propose that when I did the filtering, but I had that
in mind.  (And is partially why I suggest keeping your new filter
independent of the existing ones.)

Jeff


> -static enum list_objects_filter_result filter_blobs_none(
> +static enum list_objects_filter_result filter_none_of_type(
>   	enum list_objects_filter_situation filter_situation,
>   	struct object *obj,
>   	const char *pathname,
>   	const char *filename,
>   	void *filter_data_)
>   {
> -	struct filter_blobs_none_data *filter_data = filter_data_;
> +	struct filter_none_of_type_data *filter_data = filter_data_;
>   
>   	switch (filter_situation) {
>   	default:
>   		die("unknown filter_situation");
>   		return LOFR_ZERO;
>   
> -	case LOFS_BEGIN_TREE:
> -		assert(obj->type == OBJ_TREE);
> -		/* always include all tree objects */
> -		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
> -
>   	case LOFS_END_TREE:
>   		assert(obj->type == OBJ_TREE);
>   		return LOFR_ZERO;
>   
> +	case LOFS_BEGIN_TREE:
> +		assert(obj->type == OBJ_TREE);
> +		if (!filter_data->omit_trees)
> +			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
> +
> +		/*
> +		 * Fallthrough to insert into omitted list for trees as well as
> +		 * blobs.
> +		 */
> +		/* fallthrough */
>   	case LOFS_BLOB:
> -		assert(obj->type == OBJ_BLOB);
>   		assert((obj->flags & SEEN) == 0);
>   
>   		if (filter_data->omits)
> @@ -72,10 +79,25 @@ static void *filter_blobs_none__init(
>   	filter_object_fn *filter_fn,
>   	filter_free_fn *filter_free_fn)
>   {
> -	struct filter_blobs_none_data *d = xcalloc(1, sizeof(*d));
> +	struct filter_none_of_type_data *d = xcalloc(1, sizeof(*d));
> +	d->omits = omitted;
> +
> +	*filter_fn = filter_none_of_type;
> +	*filter_free_fn = free;
> +	return d;
> +}
> +
> +static void* filter_tree_none__init(
> +	struct oidset *omitted,
> +	struct list_objects_filter_options *filter_options,
> +	filter_object_fn *filter_fn,
> +	filter_free_fn *filter_free_fn)
> +{
> +	struct filter_none_of_type_data *d = xcalloc(1, sizeof(*d));
> +	d->omit_trees = 1;
>   	d->omits = omitted;
>   
> -	*filter_fn = filter_blobs_none;
> +	*filter_fn = filter_none_of_type;
>   	*filter_free_fn = free;
>   	return d;
>   }
> @@ -374,6 +396,7 @@ static filter_init_fn s_filters[] = {
>   	NULL,
>   	filter_blobs_none__init,
>   	filter_blobs_limit__init,
> +	filter_tree_none__init,
>   	filter_sparse_oid__init,
>   	filter_sparse_path__init,
>   };
> diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
> index 5e35f33bf..28a8c916a 100755
> --- a/t/t5317-pack-objects-filter-objects.sh
> +++ b/t/t5317-pack-objects-filter-objects.sh
> @@ -72,6 +72,33 @@ test_expect_success 'get an error for missing tree object' '
>   	grep -q "bad tree object" bad_tree
>   '
>   
> +test_expect_success 'setup for tests of tree:none' '
> +	mkdir r1/subtree &&
> +	echo "This is a file in a subtree" > r1/subtree/file &&
> +	git -C r1 add subtree/file &&
> +	git -C r1 commit -m subtree
> +'
> +
> +test_expect_success 'verify tree:none packfile has no blobs or trees' '
> +	git -C r1 pack-objects --rev --stdout --filter=tree:none >commitsonly.pack <<-EOF &&
> +	HEAD
> +	EOF
> +	git -C r1 index-pack ../commitsonly.pack &&
> +	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
> +	! grep -E "tree|blob" objs
> +'
> +
> +test_expect_success 'grab tree directly when using tree:none' '
> +	# We should get the tree specified directly but not its blobs or subtrees.
> +	git -C r1 pack-objects --rev --stdout --filter=tree:none >commitsonly.pack <<-EOF &&
> +	HEAD:
> +	EOF
> +	git -C r1 index-pack ../commitsonly.pack &&
> +	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
> +	grep -E "tree|blob" objs >trees_and_blobs &&
> +	test_line_count = 1 trees_and_blobs
> +'
> +
>   # Test blob:limit=<n>[kmg] filter.
>   # We boundary test around the size parameter.  The filter is strictly less than
>   # the value, so size 500 and 1000 should have the same results, but 1001 should
> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> index bbbe7537d..4fc068716 100755
> --- a/t/t5616-partial-clone.sh
> +++ b/t/t5616-partial-clone.sh
> @@ -170,6 +170,33 @@ test_expect_success 'partial clone fetches blobs pointed to by refs even if norm
>   	git -C dst fsck
>   '
>   
> +test_expect_success 'can use tree:none to filter partial clone' '
> +	rm -rf dst &&
> +	git clone --no-checkout --filter=tree:none "file://$(pwd)/srv.bare" dst &&
> +	git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
> +	cat fetched_objects \
> +		| awk -f print_1.awk \
> +		| xargs -n1 git -C dst cat-file -t >fetched_types &&
> +	sort fetched_types -u >unique_types.observed &&
> +	echo commit > unique_types.expected &&
> +	test_cmp unique_types.observed unique_types.expected
> +'
> +
> +test_expect_success 'show missing tree objects with --missing=print' '
> +	git -C dst rev-list master --missing=print --quiet --objects >missing_objs &&
> +	sed "s/?//" missing_objs \
> +		| xargs -n1 git -C srv.bare cat-file -t \
> +		>missing_types &&
> +	sort -u missing_types >missing_types.uniq &&
> +	echo tree >expected &&
> +	test_cmp missing_types.uniq expected
> +'
> +
> +test_expect_success 'do not complain when a missing tree cannot be parsed' '
> +	git -C dst rev-list master --missing=print --quiet --objects 2>rev_list_err >&2 &&
> +	! grep -q "Could not read " rev_list_err
> +'
> +
>   . "$TEST_DIRECTORY"/lib-httpd.sh
>   start_httpd
>   
> diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
> index 0a37dd5f9..ecdf6b4c3 100755
> --- a/t/t6112-rev-list-filters-objects.sh
> +++ b/t/t6112-rev-list-filters-objects.sh
> @@ -196,6 +196,19 @@ test_expect_success 'verify sparse:oid=oid-ish omits top-level files' '
>   	test_cmp observed expected
>   '
>   
> +# Test tree:none filter.
> +
> +test_expect_success 'verify tree:none includes trees in "filtered" output' '
> +	git -C r3 rev-list HEAD --quiet --objects --filter-print-omitted --filter=tree:none \
> +		| awk -f print_1.awk \
> +		| sed s/~// \
> +		| xargs -n1 git -C r3 cat-file -t \
> +		| sort -u >filtered_types &&
> +	printf "blob\ntree\n" > expected &&
> +	test_cmp filtered_types expected
> +'
> +
> +
>   # Delete some loose objects and use rev-list, but WITHOUT any filtering.
>   # This models previously omitted objects that we did not receive.
>   
> 

^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v3 0/5] filter: support for excluding all trees and blobs
  2018-08-09 22:44 [RFC PATCH 0/5] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (6 preceding siblings ...)
  2018-08-10 23:06 ` [PATCH v2 " Matthew DeVore
@ 2018-08-13 18:14 ` Matthew DeVore
  2018-08-13 18:14   ` [PATCH v3 1/5] list-objects: store common func args in struct Matthew DeVore
                     ` (4 more replies)
  2018-08-14 17:28 ` [PATCH v4 0/6] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (8 subsequent siblings)
  16 siblings, 5 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-13 18:14 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy

Applied the following changes suggested by git@jeffhostetler.com:
 - Change the filter name from tree:none to tree:0 to make room for
   future improvements which allow filtering based on depth.
 - Made a separate filter logic function and filter data struct for
   tree:0 rather than share it with blob:none. I would usually prefer
   making the code less redundant whenever possible, but since there
   are plans to extend this in the future, it makes more sense to have
   this separate.

Matthew DeVore (5):
  list-objects: store common func args in struct
  list-objects: refactor to process_tree_contents
  rev-list: handle missing tree objects properly
  revision: mark non-user-given objects instead
  list-objects-filter: implement filter tree:0

 Documentation/rev-list-options.txt     |   3 +
 builtin/rev-list.c                     |  10 +-
 list-objects-filter-options.c          |   4 +
 list-objects-filter-options.h          |   1 +
 list-objects-filter.c                  |  50 ++++++
 list-objects.c                         | 236 +++++++++++++------------
 revision.c                             |   1 -
 revision.h                             |  10 +-
 t/t5317-pack-objects-filter-objects.sh |  40 +++++
 t/t5616-partial-clone.sh               |  27 +++
 t/t6112-rev-list-filters-objects.sh    |  13 ++
 11 files changed, 274 insertions(+), 121 deletions(-)

-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v3 1/5] list-objects: store common func args in struct
  2018-08-13 18:14 ` [PATCH v3 0/5] filter: support for excluding all trees and blobs Matthew DeVore
@ 2018-08-13 18:14   ` Matthew DeVore
  2018-08-13 18:14   ` [PATCH v3 2/5] list-objects: refactor to process_tree_contents Matthew DeVore
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-13 18:14 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy

This will make utility functions easier to create, as done by the next
patch.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 158 +++++++++++++++++++++++--------------------------
 1 file changed, 74 insertions(+), 84 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index c99c47ac1..584518a3f 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -12,20 +12,25 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static void process_blob(struct rev_info *revs,
+struct traversal_context {
+	struct rev_info *revs;
+	show_object_fn show_object;
+	show_commit_fn show_commit;
+	void *show_data;
+	filter_object_fn filter_fn;
+	void *filter_data;
+};
+
+static void process_blob(struct traversal_context *ctx,
 			 struct blob *blob,
-			 show_object_fn show,
 			 struct strbuf *path,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &blob->object;
 	size_t pathlen;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
-	if (!revs->blob_objects)
+	if (!ctx->revs->blob_objects)
 		return;
 	if (!obj)
 		die("bad blob object");
@@ -41,21 +46,21 @@ static void process_blob(struct rev_info *revs,
 	 * may cause the actual filter to report an incomplete list
 	 * of missing objects.
 	 */
-	if (revs->exclude_promisor_objects &&
+	if (ctx->revs->exclude_promisor_objects &&
 	    !has_object_file(&obj->oid) &&
 	    is_promisor_object(&obj->oid))
 		return;
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BLOB, obj,
-			      path->buf, &path->buf[pathlen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BLOB, obj,
+				   path->buf, &path->buf[pathlen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, path->buf, cb_data);
+		ctx->show_object(obj, path->buf, ctx->show_data);
 	strbuf_setlen(path, pathlen);
 }
 
@@ -81,26 +86,21 @@ static void process_blob(struct rev_info *revs,
  * the link, and how to do it. Whether it necessarily makes
  * any sense what-so-ever to ever do that is another issue.
  */
-static void process_gitlink(struct rev_info *revs,
+static void process_gitlink(struct traversal_context *ctx,
 			    const unsigned char *sha1,
-			    show_object_fn show,
 			    struct strbuf *path,
-			    const char *name,
-			    void *cb_data)
+			    const char *name)
 {
 	/* Nothing to do */
 }
 
-static void process_tree(struct rev_info *revs,
+static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
-			 show_object_fn show,
 			 struct strbuf *base,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &tree->object;
+	struct rev_info *revs = ctx->revs;
 	struct tree_desc desc;
 	struct name_entry entry;
 	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
@@ -133,14 +133,14 @@ static void process_tree(struct rev_info *revs,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BEGIN_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, base->buf, cb_data);
+		ctx->show_object(obj, base->buf, ctx->show_data);
 	if (base->len)
 		strbuf_addch(base, '/');
 
@@ -157,29 +157,25 @@ static void process_tree(struct rev_info *revs,
 		}
 
 		if (S_ISDIR(entry.mode))
-			process_tree(revs,
+			process_tree(ctx,
 				     lookup_tree(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(revs, entry.oid->hash,
-					show, base, entry.path,
-					cb_data);
+			process_gitlink(ctx, entry.oid->hash, base, entry.path);
 		else
-			process_blob(revs,
+			process_blob(ctx,
 				     lookup_blob(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 	}
 
-	if (!(obj->flags & USER_GIVEN) && filter_fn) {
-		r = filter_fn(LOFS_END_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+		r = ctx->filter_fn(LOFS_END_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 		if (r & LOFR_MARK_SEEN)
 			obj->flags |= SEEN;
 		if (r & LOFR_DO_SHOW)
-			show(obj, base->buf, cb_data);
+			ctx->show_object(obj, base->buf, ctx->show_data);
 	}
 
 	strbuf_setlen(base, baselen);
@@ -242,19 +238,15 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 	add_pending_object(revs, &tree->object, "");
 }
 
-static void traverse_trees_and_blobs(struct rev_info *revs,
-				     struct strbuf *base,
-				     show_object_fn show_object,
-				     void *show_data,
-				     filter_object_fn filter_fn,
-				     void *filter_data)
+static void traverse_trees_and_blobs(struct traversal_context *ctx,
+				     struct strbuf *base)
 {
 	int i;
 
 	assert(base->len == 0);
 
-	for (i = 0; i < revs->pending.nr; i++) {
-		struct object_array_entry *pending = revs->pending.objects + i;
+	for (i = 0; i < ctx->revs->pending.nr; i++) {
+		struct object_array_entry *pending = ctx->revs->pending.objects + i;
 		struct object *obj = pending->item;
 		const char *name = pending->name;
 		const char *path = pending->path;
@@ -262,62 +254,49 @@ static void traverse_trees_and_blobs(struct rev_info *revs,
 			continue;
 		if (obj->type == OBJ_TAG) {
 			obj->flags |= SEEN;
-			show_object(obj, name, show_data);
+			ctx->show_object(obj, name, ctx->show_data);
 			continue;
 		}
 		if (!path)
 			path = "";
 		if (obj->type == OBJ_TREE) {
-			process_tree(revs, (struct tree *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_tree(ctx, (struct tree *)obj, base, path);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
-			process_blob(revs, (struct blob *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_blob(ctx, (struct blob *)obj, base, path);
 			continue;
 		}
 		die("unknown pending object %s (%s)",
 		    oid_to_hex(&obj->oid), name);
 	}
-	object_array_clear(&revs->pending);
+	object_array_clear(&ctx->revs->pending);
 }
 
-static void do_traverse(struct rev_info *revs,
-			show_commit_fn show_commit,
-			show_object_fn show_object,
-			void *show_data,
-			filter_object_fn filter_fn,
-			void *filter_data)
+static void do_traverse(struct traversal_context *ctx)
 {
 	struct commit *commit;
 	struct strbuf csp; /* callee's scratch pad */
 	strbuf_init(&csp, PATH_MAX);
 
-	while ((commit = get_revision(revs)) != NULL) {
+	while ((commit = get_revision(ctx->revs)) != NULL) {
 		/*
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
 		if (get_commit_tree(commit))
-			add_pending_tree(revs, get_commit_tree(commit));
-		show_commit(commit, show_data);
+			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		ctx->show_commit(commit, ctx->show_data);
 
-		if (revs->tree_blobs_in_commit_order)
+		if (ctx->revs->tree_blobs_in_commit_order)
 			/*
 			 * NEEDSWORK: Adding the tree and then flushing it here
 			 * needs a reallocation for each commit. Can we pass the
 			 * tree directory without allocation churn?
 			 */
-			traverse_trees_and_blobs(revs, &csp,
-						 show_object, show_data,
-						 filter_fn, filter_data);
+			traverse_trees_and_blobs(ctx, &csp);
 	}
-	traverse_trees_and_blobs(revs, &csp,
-				 show_object, show_data,
-				 filter_fn, filter_data);
+	traverse_trees_and_blobs(ctx, &csp);
 	strbuf_release(&csp);
 }
 
@@ -326,7 +305,14 @@ void traverse_commit_list(struct rev_info *revs,
 			  show_object_fn show_object,
 			  void *show_data)
 {
-	do_traverse(revs, show_commit, show_object, show_data, NULL, NULL);
+	struct traversal_context ctx;
+	ctx.revs = revs;
+	ctx.show_commit = show_commit;
+	ctx.show_object = show_object;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+	ctx.filter_data = NULL;
+	do_traverse(&ctx);
 }
 
 void traverse_commit_list_filtered(
@@ -337,14 +323,18 @@ void traverse_commit_list_filtered(
 	void *show_data,
 	struct oidset *omitted)
 {
-	filter_object_fn filter_fn = NULL;
+	struct traversal_context ctx;
 	filter_free_fn filter_free_fn = NULL;
-	void *filter_data = NULL;
-
-	filter_data = list_objects_filter__init(omitted, filter_options,
-						&filter_fn, &filter_free_fn);
-	do_traverse(revs, show_commit, show_object, show_data,
-		    filter_fn, filter_data);
-	if (filter_data && filter_free_fn)
-		filter_free_fn(filter_data);
+
+	ctx.revs = revs;
+	ctx.show_object = show_object;
+	ctx.show_commit = show_commit;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+
+	ctx.filter_data = list_objects_filter__init(omitted, filter_options,
+						    &ctx.filter_fn, &filter_free_fn);
+	do_traverse(&ctx);
+	if (ctx.filter_data && filter_free_fn)
+		filter_free_fn(ctx.filter_data);
 }
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v3 2/5] list-objects: refactor to process_tree_contents
  2018-08-13 18:14 ` [PATCH v3 0/5] filter: support for excluding all trees and blobs Matthew DeVore
  2018-08-13 18:14   ` [PATCH v3 1/5] list-objects: store common func args in struct Matthew DeVore
@ 2018-08-13 18:14   ` Matthew DeVore
  2018-08-13 18:14   ` [PATCH v3 3/5] rev-list: handle missing tree objects properly Matthew DeVore
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-13 18:14 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy

This will be used in a follow-up patch to reduce indentation needed when
invoking the logic conditionally. i.e. rather than:

if (foo) {
	while (...) {
		/* this is very indented */
	}
}

we will have:

if (foo)
	process_tree_contents(...);

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 68 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 584518a3f..ccc529e5e 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -94,6 +94,46 @@ static void process_gitlink(struct traversal_context *ctx,
 	/* Nothing to do */
 }
 
+static void process_tree(struct traversal_context *ctx,
+			 struct tree *tree,
+			 struct strbuf *base,
+			 const char *name);
+
+static void process_tree_contents(struct traversal_context *ctx,
+				  struct tree *tree,
+				  struct strbuf *base)
+{
+	struct tree_desc desc;
+	struct name_entry entry;
+	enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
+		all_entries_interesting : entry_not_interesting;
+
+	init_tree_desc(&desc, tree->buffer, tree->size);
+
+	while (tree_entry(&desc, &entry)) {
+		if (match != all_entries_interesting) {
+			match = tree_entry_interesting(&entry, base, 0,
+						       &ctx->revs->diffopt.pathspec);
+			if (match == all_entries_not_interesting)
+				break;
+			if (match == entry_not_interesting)
+				continue;
+		}
+
+		if (S_ISDIR(entry.mode))
+			process_tree(ctx,
+				     lookup_tree(the_repository, entry.oid),
+				     base, entry.path);
+		else if (S_ISGITLINK(entry.mode))
+			process_gitlink(ctx, entry.oid->hash,
+					base, entry.path);
+		else
+			process_blob(ctx,
+				     lookup_blob(the_repository, entry.oid),
+				     base, entry.path);
+	}
+}
+
 static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
 			 struct strbuf *base,
@@ -101,10 +141,6 @@ static void process_tree(struct traversal_context *ctx,
 {
 	struct object *obj = &tree->object;
 	struct rev_info *revs = ctx->revs;
-	struct tree_desc desc;
-	struct name_entry entry;
-	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
-		all_entries_interesting: entry_not_interesting;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 	int gently = revs->ignore_missing_links ||
@@ -144,29 +180,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
-
-	while (tree_entry(&desc, &entry)) {
-		if (match != all_entries_interesting) {
-			match = tree_entry_interesting(&entry, base, 0,
-						       &revs->diffopt.pathspec);
-			if (match == all_entries_not_interesting)
-				break;
-			if (match == entry_not_interesting)
-				continue;
-		}
-
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
-		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(ctx, entry.oid->hash, base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
-	}
+	process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v3 3/5] rev-list: handle missing tree objects properly
  2018-08-13 18:14 ` [PATCH v3 0/5] filter: support for excluding all trees and blobs Matthew DeVore
  2018-08-13 18:14   ` [PATCH v3 1/5] list-objects: store common func args in struct Matthew DeVore
  2018-08-13 18:14   ` [PATCH v3 2/5] list-objects: refactor to process_tree_contents Matthew DeVore
@ 2018-08-13 18:14   ` Matthew DeVore
  2018-08-13 18:14   ` [PATCH v3 4/5] revision: mark non-user-given objects instead Matthew DeVore
  2018-08-13 18:14   ` [PATCH v3 5/5] list-objects-filter: implement filter tree:0 Matthew DeVore
  4 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-13 18:14 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy

Previously, we assumed only blob objects could be missing. This patch
makes rev-list handle missing trees like missing blobs. A missing tree
will cause an error if --missing indicates an error should be caused,
and the hash is printed even if the tree is missing.

In list-objects.c we no longer print a message to stderr if a tree
object is missing (quiet_on_missing is always true). I couldn't find
any place where this would matter, or where the caller of
traverse_commit_list would need to be fixed to show the error. However,
in the future it would be trivial to make the caller show the message if
we needed to.

This is not tested very thoroughly, since we cannot create promisor
objects in tests without using an actual partial clone. t0410 has a
promise_and_delete utility function, but the is_promisor_object function
does not return 1 for objects deleted in this way. More tests will will
come in a patch that implements a filter that can be used with git
clone.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 builtin/rev-list.c                     | 10 ++++++----
 list-objects.c                         | 17 +++++++++--------
 t/t5317-pack-objects-filter-objects.sh | 13 +++++++++++++
 3 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5b07f3f4a..ea0daf0c4 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -6,6 +6,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "object.h"
 #include "object-store.h"
 #include "pack.h"
 #include "pack-bitmap.h"
@@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj)
 	 */
 	switch (arg_missing_action) {
 	case MA_ERROR:
-		die("missing blob object '%s'", oid_to_hex(&obj->oid));
+		die("missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	case MA_ALLOW_ANY:
@@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj)
 	case MA_ALLOW_PROMISOR:
 		if (is_promisor_object(&obj->oid))
 			return;
-		die("unexpected missing blob object '%s'",
-		    oid_to_hex(&obj->oid));
+		die("unexpected missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	default:
@@ -235,7 +237,7 @@ static inline void finish_object__ma(struct object *obj)
 static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
+	if (!has_object_file(&obj->oid)) {
 		finish_object__ma(obj);
 		return 1;
 	}
diff --git a/list-objects.c b/list-objects.c
index ccc529e5e..aedcd0228 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,8 +143,7 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
-	int gently = revs->ignore_missing_links ||
-		     revs->exclude_promisor_objects;
+	int parsed;
 
 	if (!revs->tree_objects)
 		return;
@@ -152,20 +151,21 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, gently) < 0) {
+	parsed = parse_tree_gently(tree, /*quiet_on_missing=*/1) >= 0;
+	if (!parsed) {
 		if (revs->ignore_missing_links)
 			return;
 
+		if (!is_promisor_object(&obj->oid))
+			die("bad tree object %s", oid_to_hex(&obj->oid));
+
 		/*
 		 * Pre-filter known-missing tree objects when explicitly
 		 * requested.  This may cause the actual filter to report
 		 * an incomplete list of missing objects.
 		 */
-		if (revs->exclude_promisor_objects &&
-		    is_promisor_object(&obj->oid))
+		if (revs->exclude_promisor_objects)
 			return;
-
-		die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
 	strbuf_addstr(base, name);
@@ -180,7 +180,8 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	process_tree_contents(ctx, tree, base);
+	if (parsed)
+		process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 6710c8bc8..5e35f33bf 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -59,6 +59,19 @@ test_expect_success 'verify normal and blob:none packfiles have same commits/tre
 	test_cmp observed expected
 '
 
+test_expect_success 'get an error for missing tree object' '
+	git init r5 &&
+	echo foo > r5/foo &&
+	git -C r5 add foo &&
+	git -C r5 commit -m "foo" &&
+	del=$(git -C r5 rev-parse HEAD^{tree} | sed "s|..|&/|") &&
+	rm r5/.git/objects/$del &&
+	test_must_fail git -C r5 pack-objects --rev --stdout 2>bad_tree <<-EOF &&
+	HEAD
+	EOF
+	grep -q "bad tree object" bad_tree
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v3 4/5] revision: mark non-user-given objects instead
  2018-08-13 18:14 ` [PATCH v3 0/5] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (2 preceding siblings ...)
  2018-08-13 18:14   ` [PATCH v3 3/5] rev-list: handle missing tree objects properly Matthew DeVore
@ 2018-08-13 18:14   ` Matthew DeVore
  2018-08-13 18:14   ` [PATCH v3 5/5] list-objects-filter: implement filter tree:0 Matthew DeVore
  4 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-13 18:14 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy

Currently, list-objects.c incorrectly treats all root trees of commits
as USER_GIVEN. Also, it would be easier to mark objects that are
non-user-given instead of user-given, since the places in the code
where we access an object through a reference are more obvious than
the places where we access an object that was given by the user.

Resolve these two problems by introducing a flag NOT_USER_GIVEN that
marks blobs and trees that are non-user-given, replacing USER_GIVEN.
(Only blobs and trees are marked because this mark is only used when
filtering objects, and filtering of other types of objects is not
supported yet.)

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 31 ++++++++++++++++++-------------
 revision.c     |  1 -
 revision.h     | 10 +++++++---
 3 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index aedcd0228..fd522a59a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -53,7 +53,7 @@ static void process_blob(struct traversal_context *ctx,
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BLOB, obj,
 				   path->buf, &path->buf[pathlen],
 				   ctx->filter_data);
@@ -120,17 +120,19 @@ static void process_tree_contents(struct traversal_context *ctx,
 				continue;
 		}
 
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
+		if (S_ISDIR(entry.mode)) {
+			struct tree *t = lookup_tree(the_repository, entry.oid);
+			t->object.flags |= NOT_USER_GIVEN;
+			process_tree(ctx, t, base, entry.path);
+		}
 		else if (S_ISGITLINK(entry.mode))
 			process_gitlink(ctx, entry.oid->hash,
 					base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
+		else {
+			struct blob *b = lookup_blob(the_repository, entry.oid);
+			b->object.flags |= NOT_USER_GIVEN;
+			process_blob(ctx, b, base, entry.path);
+		}
 	}
 }
 
@@ -169,7 +171,7 @@ static void process_tree(struct traversal_context *ctx,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -183,7 +185,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (parsed)
 		process_tree_contents(ctx, tree, base);
 
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -299,8 +301,11 @@ static void do_traverse(struct traversal_context *ctx)
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
-		if (get_commit_tree(commit))
-			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		if (get_commit_tree(commit)) {
+			struct tree *tree = get_commit_tree(commit);
+			tree->object.flags |= NOT_USER_GIVEN;
+			add_pending_tree(ctx->revs, tree);
+		}
 		ctx->show_commit(commit, ctx->show_data);
 
 		if (ctx->revs->tree_blobs_in_commit_order)
diff --git a/revision.c b/revision.c
index 062749437..6d355b43c 100644
--- a/revision.c
+++ b/revision.c
@@ -175,7 +175,6 @@ static void add_pending_object_with_path(struct rev_info *revs,
 		strbuf_release(&buf);
 		return; /* do not add the commit itself */
 	}
-	obj->flags |= USER_GIVEN;
 	add_object_array_with_path(obj, name, &revs->pending, mode, path);
 }
 
diff --git a/revision.h b/revision.h
index c599c34da..cd6b62313 100644
--- a/revision.h
+++ b/revision.h
@@ -8,7 +8,11 @@
 #include "diff.h"
 #include "commit-slab-decl.h"
 
-/* Remember to update object flag allocation in object.h */
+/* Remember to update object flag allocation in object.h
+ * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
+ * filtering trees and blobs, but it may be useful to support filtering commits
+ * in the future.
+ */
 #define SEEN		(1u<<0)
 #define UNINTERESTING   (1u<<1)
 #define TREESAME	(1u<<2)
@@ -20,9 +24,9 @@
 #define SYMMETRIC_LEFT	(1u<<8)
 #define PATCHSAME	(1u<<9)
 #define BOTTOM		(1u<<10)
-#define USER_GIVEN	(1u<<25) /* given directly by the user */
+#define NOT_USER_GIVEN	(1u<<25) /* tree or blob not given directly by user */
 #define TRACK_LINEAR	(1u<<26)
-#define ALL_REV_FLAGS	(((1u<<11)-1) | USER_GIVEN | TRACK_LINEAR)
+#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)
 
 #define DECORATE_SHORT_REFS	1
 #define DECORATE_FULL_REFS	2
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v3 5/5] list-objects-filter: implement filter tree:0
  2018-08-13 18:14 ` [PATCH v3 0/5] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (3 preceding siblings ...)
  2018-08-13 18:14   ` [PATCH v3 4/5] revision: mark non-user-given objects instead Matthew DeVore
@ 2018-08-13 18:14   ` Matthew DeVore
  2018-08-14 15:13     ` Jeff Hostetler
  4 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-08-13 18:14 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy

Teach list-objects the "tree:0" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:0 - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also consider only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
would filter out all but the root tree and blobs. In order to avoid
confusion between 0 and capital O, the documentation was worded in a
somewhat round-about way that also hints at this future improvement to
the feature.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 Documentation/rev-list-options.txt     |  3 ++
 list-objects-filter-options.c          |  4 +++
 list-objects-filter-options.h          |  1 +
 list-objects-filter.c                  | 50 ++++++++++++++++++++++++++
 t/t5317-pack-objects-filter-objects.sh | 27 ++++++++++++++
 t/t5616-partial-clone.sh               | 27 ++++++++++++++
 t/t6112-rev-list-filters-objects.sh    | 13 +++++++
 7 files changed, 125 insertions(+)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 7b273635d..9e351ec2a 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -743,6 +743,9 @@ specification contained in <path>.
 	A debug option to help with future "partial clone" development.
 	This option specifies how missing objects are handled.
 +
+The form '--filter=tree:<depth>' omits all blobs and trees deeper than
+<depth> from the root tree. Currently, only <depth>=0 is supported.
++
 The form '--missing=error' requests that rev-list stop with an error if
 a missing object is encountered.  This is the default action.
 +
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..a28382940 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
 			return 0;
 		}
 
+	} else if (!strcmp(arg, "tree:0")) {
+		filter_options->choice = LOFC_TREE_NONE;
+		return 0;
+
 	} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
 		struct object_context oc;
 		struct object_id sparse_oid;
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 0000a61f8..af64e5c66 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -10,6 +10,7 @@ enum list_objects_filter_choice {
 	LOFC_DISABLED = 0,
 	LOFC_BLOB_NONE,
 	LOFC_BLOB_LIMIT,
+	LOFC_TREE_NONE,
 	LOFC_SPARSE_OID,
 	LOFC_SPARSE_PATH,
 	LOFC__COUNT /* must be last */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..8e3caf5bf 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -80,6 +80,55 @@ static void *filter_blobs_none__init(
 	return d;
 }
 
+/*
+ * A filter for list-objects to omit ALL trees and blobs from the traversal.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_trees_none_data {
+	struct oidset *omits;
+};
+
+static enum list_objects_filter_result filter_trees_none(
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data_)
+{
+	struct filter_trees_none_data *filter_data = filter_data_;
+
+	switch (filter_situation) {
+	default:
+		die("unknown filter_situation");
+		return LOFR_ZERO;
+
+	case LOFS_BEGIN_TREE:
+	case LOFS_BLOB:
+		if (filter_data->omits)
+			oidset_insert(filter_data->omits, &obj->oid);
+		return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
+
+	case LOFS_END_TREE:
+		assert(obj->type == OBJ_TREE);
+		return LOFR_ZERO;
+
+	}
+}
+
+static void* filter_trees_none__init(
+	struct oidset *omitted,
+	struct list_objects_filter_options *filter_options,
+	filter_object_fn *filter_fn,
+	filter_free_fn *filter_free_fn)
+{
+	struct filter_trees_none_data *d = xcalloc(1, sizeof(*d));
+	d->omits = omitted;
+
+	*filter_fn = filter_trees_none;
+	*filter_free_fn = free;
+	return d;
+}
+
 /*
  * A filter for list-objects to omit large blobs.
  * And to OPTIONALLY collect a list of the omitted OIDs.
@@ -374,6 +423,7 @@ static filter_init_fn s_filters[] = {
 	NULL,
 	filter_blobs_none__init,
 	filter_blobs_limit__init,
+	filter_trees_none__init,
 	filter_sparse_oid__init,
 	filter_sparse_path__init,
 };
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 5e35f33bf..65f2cf446 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -72,6 +72,33 @@ test_expect_success 'get an error for missing tree object' '
 	grep -q "bad tree object" bad_tree
 '
 
+test_expect_success 'setup for tests of tree:0' '
+	mkdir r1/subtree &&
+	echo "This is a file in a subtree" > r1/subtree/file &&
+	git -C r1 add subtree/file &&
+	git -C r1 commit -m subtree
+'
+
+test_expect_success 'verify tree:0 packfile has no blobs or trees' '
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	! grep -E "tree|blob" objs
+'
+
+test_expect_success 'grab tree directly when using tree:0' '
+	# We should get the tree specified directly but not its blobs or subtrees.
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD:
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	grep -E "tree|blob" objs >trees_and_blobs &&
+	test_line_count = 1 trees_and_blobs
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index bbbe7537d..fc4d182c0 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -170,6 +170,33 @@ test_expect_success 'partial clone fetches blobs pointed to by refs even if norm
 	git -C dst fsck
 '
 
+test_expect_success 'can use tree:0 to filter partial clone' '
+	rm -rf dst &&
+	git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
+	git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
+	cat fetched_objects \
+		| awk -f print_1.awk \
+		| xargs -n1 git -C dst cat-file -t >fetched_types &&
+	sort fetched_types -u >unique_types.observed &&
+	echo commit > unique_types.expected &&
+	test_cmp unique_types.observed unique_types.expected
+'
+
+test_expect_success 'show missing tree objects with --missing=print' '
+	git -C dst rev-list master --missing=print --quiet --objects >missing_objs &&
+	sed "s/?//" missing_objs \
+		| xargs -n1 git -C srv.bare cat-file -t \
+		>missing_types &&
+	sort -u missing_types >missing_types.uniq &&
+	echo tree >expected &&
+	test_cmp missing_types.uniq expected
+'
+
+test_expect_success 'do not complain when a missing tree cannot be parsed' '
+	git -C dst rev-list master --missing=print --quiet --objects 2>rev_list_err >&2 &&
+	! grep -q "Could not read " rev_list_err
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 0a37dd5f9..6ccffddbc 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -196,6 +196,19 @@ test_expect_success 'verify sparse:oid=oid-ish omits top-level files' '
 	test_cmp observed expected
 '
 
+# Test tree:0 filter.
+
+test_expect_success 'verify tree:0 includes trees in "filtered" output' '
+	git -C r3 rev-list HEAD --quiet --objects --filter-print-omitted --filter=tree:0 \
+		| awk -f print_1.awk \
+		| sed s/~// \
+		| xargs -n1 git -C r3 cat-file -t \
+		| sort -u >filtered_types &&
+	printf "blob\ntree\n" > expected &&
+	test_cmp filtered_types expected
+'
+
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v2 3/5] rev-list: handle missing tree objects properly
  2018-08-10 23:06   ` [PATCH v2 3/5] rev-list: handle missing tree objects properly Matthew DeVore
@ 2018-08-13 18:20     ` Jonathan Tan
  2018-08-14  0:22       ` Matthew DeVore
  0 siblings, 1 reply; 151+ messages in thread
From: Jonathan Tan @ 2018-08-13 18:20 UTC (permalink / raw)
  To: matvore; +Cc: git, jeffhost, peff, stefanbeller, jonathantanmy

> In list-objects.c we no longer print a message to stderr if a tree
> object is missing (quiet_on_missing is always true). I couldn't find
> any place where this would matter, or where the caller of
> traverse_commit_list would need to be fixed to show the error. However,
> in the future it would be trivial to make the caller show the message if
> we needed to.

Indeed, and I'm not sure why the message was there in the first place -
if parsing fails when revs->ignore_missing_links and
revs->exclude_promisor_objects are both false, we print the OID anyway
in the "die" call, so any message printed by parse_tree_gently() seems
superfluous.

It might be better to add an additional commit that removes the "gently"
condition (in other words, always parsing gently), with a message
explaining the above. Also, in that commit, I prefer not to add the
"/*quiet_on_missing*/" explanation (we don't seem to do that in Git
code); I also know that the ">= 0" is a holdover from the existing "< 0"
code, but we don't need to do that either.

> This is not tested very thoroughly, since we cannot create promisor
> objects in tests without using an actual partial clone. t0410 has a
> promise_and_delete utility function, but the is_promisor_object function
> does not return 1 for objects deleted in this way. More tests will will
> come in a patch that implements a filter that can be used with git
> clone.

is_promisor_object() should. If you still have the code you used to
verify that, can you share it? In particular, pay attention to the path
of the repo - promise_and_delete is hardcoded to use one particular
path.

Whether you test in this patch or in the last patch, make sure that the
following are tested:
 git rev-list --missing=error, allow-any, allow-promisor, print
 git rev-list --exclude-promisor-objects

Also, test when a tree pointed to by a commit is missing, and when a
tree pointed to by a tree is missing.

> @@ -152,20 +151,21 @@ static void process_tree(struct traversal_context *ctx,
>  		die("bad tree object");
>  	if (obj->flags & (UNINTERESTING | SEEN))
>  		return;
> -	if (parse_tree_gently(tree, gently) < 0) {
> +	parsed = parse_tree_gently(tree, /*quiet_on_missing=*/1) >= 0;
> +	if (!parsed) {
>  		if (revs->ignore_missing_links)
>  			return;
>  
> +		if (!is_promisor_object(&obj->oid))
> +			die("bad tree object %s", oid_to_hex(&obj->oid));
> +
>  		/*
>  		 * Pre-filter known-missing tree objects when explicitly
>  		 * requested.  This may cause the actual filter to report
>  		 * an incomplete list of missing objects.
>  		 */
> -		if (revs->exclude_promisor_objects &&
> -		    is_promisor_object(&obj->oid))
> +		if (revs->exclude_promisor_objects)
>  			return;
> -
> -		die("bad tree object %s", oid_to_hex(&obj->oid));
>  	}

The missing mechanism (for error, allow-any, print) should work without
needing to consult whether an object is a promisor object or not - it
should just print whatever is missing, so the "if
(!is_promisor_object..." line looks out of place.

In my original review [1], I suggested that we always show a tree if we
have its hash - if we don't have the object, we just recurse into it.
This would be the same as your patch, except that the 'die("bad tree
object...' is totally removed instead of merely moved. I still think
this solution has some merit - all the tests still pass (except that we
need to check for "unable to read" instead of "bad tree object" in error
messages), but I just realized that it might still be backwards
incompatible in that a basic "rev-list --objects" would now succeed
instead of fail if a tree was missing (I haven't tested this though).

We might need a flag called "do_not_die_on_missing_tree" (much like your
original idea of "show_missing_trees") so that callers that are prepared
to deal with missing trees can set this. Sorry for the churn. You can
document it as such:

 Blobs are shown without regard for their existence. But not so for
 trees: unless exclude_promisor_objects is set and the tree in question
 is a promisor object, or ignore_missing_links is set (and in this case,
 the tree in question may or may not be a promisor object), the revision
 walker dies with a "bad tree object" message when encountering a
 missing tree.

 For callers that can handle missing trees and want them to be
 filterable and showable, set this to true. The revision walker will
 filter and show such a missing tree as usual, but will not attempt to
 recurse into this tree object.

[1] https://public-inbox.org/git/20180810002411.13447-1-jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v2 5/5] list-objects-filter: implement filter tree:none
  2018-08-10 23:06   ` [PATCH v2 5/5] list-objects-filter: implement filter tree:none Matthew DeVore
  2018-08-13 16:38     ` Jeff Hostetler
@ 2018-08-13 18:29     ` Jonathan Tan
  2018-08-14  0:55       ` Matthew DeVore
  1 sibling, 1 reply; 151+ messages in thread
From: Jonathan Tan @ 2018-08-13 18:29 UTC (permalink / raw)
  To: matvore; +Cc: git, jeffhost, peff, stefanbeller, jonathantanmy

> -	case LOFS_BEGIN_TREE:
> -		assert(obj->type == OBJ_TREE);
> -		/* always include all tree objects */
> -		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
> -
>  	case LOFS_END_TREE:
>  		assert(obj->type == OBJ_TREE);
>  		return LOFR_ZERO;
>  
> +	case LOFS_BEGIN_TREE:
> +		assert(obj->type == OBJ_TREE);
> +		if (!filter_data->omit_trees)
> +			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
> +
> +		/*
> +		 * Fallthrough to insert into omitted list for trees as well as
> +		 * blobs.
> +		 */
> +		/* fallthrough */
>  	case LOFS_BLOB:
> -		assert(obj->type == OBJ_BLOB);
>  		assert((obj->flags & SEEN) == 0);

After looking at the resulting file, I don't think saving a few lines of
code (to add the OID, then return LOFR_MARK_SEEN) is worth rearranging
the cases and falling through. Can you just add the OID-adding code to
the LOFS_BEGIN_TREE case?

> +test_expect_success 'can use tree:none to filter partial clone' '
> +	rm -rf dst &&
> +	git clone --no-checkout --filter=tree:none "file://$(pwd)/srv.bare" dst &&
> +	git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
> +	cat fetched_objects \
> +		| awk -f print_1.awk \
> +		| xargs -n1 git -C dst cat-file -t >fetched_types &&
> +	sort fetched_types -u >unique_types.observed &&
> +	echo commit > unique_types.expected &&
> +	test_cmp unique_types.observed unique_types.expected
> +'

We also need to verify that the resulting partial clone works - after
all relevant tests, can you also ensure that:
 - fsck works
 - a cat-file on an indirectly missing tree works (i.e. if you have
   commit -> A -> B and both A and B are missing, cat-file the B)
 - fsck still works after the cat-file

There is another potential issue about expanding the documentation of
the pack protocol because we now support a new type of filter, but that
is fine because the protocol currently points us to the rev-list
documentation (which is updated). We probably need a way for clients to
query servers about which filters they support, but that is definitely
beyond the scope of this patch set.

> +test_expect_success 'show missing tree objects with --missing=print' '
> +	git -C dst rev-list master --missing=print --quiet --objects >missing_objs &&
> +	sed "s/?//" missing_objs \
> +		| xargs -n1 git -C srv.bare cat-file -t \
> +		>missing_types &&
> +	sort -u missing_types >missing_types.uniq &&
> +	echo tree >expected &&
> +	test_cmp missing_types.uniq expected
> +'

As stated in my review of patch 3, also test the other --missing
arguments.

Patches 1, 2, and 4 look good to me. (Writing this here so that I don't
need to send one e-mail for each.)

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v2 3/5] rev-list: handle missing tree objects properly
  2018-08-13 18:20     ` Jonathan Tan
@ 2018-08-14  0:22       ` Matthew DeVore
  2018-08-14 16:03         ` Jonathan Tan
  0 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-08-14  0:22 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jeffhost, peff, stefanbeller

Resending this in plain-text mode so that git@vger.kernel.org won't
bounce it. Sorry for those of you receiving this twice.

On Mon, Aug 13, 2018 at 11:20 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > In list-objects.c we no longer print a message to stderr if a tree
> > object is missing (quiet_on_missing is always true). I couldn't find
> > any place where this would matter, or where the caller of
> > traverse_commit_list would need to be fixed to show the error. However,
> > in the future it would be trivial to make the caller show the message if
> > we needed to.
>
> Indeed, and I'm not sure why the message was there in the first place -
> if parsing fails when revs->ignore_missing_links and
> revs->exclude_promisor_objects are both false, we print the OID anyway
> in the "die" call, so any message printed by parse_tree_gently() seems
> superfluous.
>
> It might be better to add an additional commit that removes the "gently"
> condition (in other words, always parsing gently), with a message
> explaining the above. Also, in that commit, I prefer not to add the
> "/*quiet_on_missing*/" explanation (we don't seem to do that in Git
> code); I also know that the ">= 0" is a holdover from the existing "< 0"
> code, but we don't need to do that either.
Good idea. I've added a new commit which replaces the calculation with
a hard-coded "1"
I don't understand about the ">= 0". What should I replace it with?
Maybe you mean the return is never positive so I can change:

parse_tree_gently(tree, 1) >= 0

to:
!parse_tree_gently(tree, 1)

?

>
> > This is not tested very thoroughly, since we cannot create promisor
> > objects in tests without using an actual partial clone. t0410 has a
> > promise_and_delete utility function, but the is_promisor_object function
> > does not return 1 for objects deleted in this way. More tests will will
> > come in a patch that implements a filter that can be used with git
> > clone.
>
> is_promisor_object() should. If you still have the code you used to
> verify that, can you share it? In particular, pay attention to the path
> of the repo - promise_and_delete is hardcoded to use one particular
> path.
It turns out I wasn't setting the extensions.partial_clone config in
my test, and that's why everything wasn't working. So I've moved all
the tests feasible back to the earlier commit. Cool :)

>
> Whether you test in this patch or in the last patch, make sure that the
> following are tested:
>  git rev-list --missing=error, allow-any, allow-promisor, print
>  git rev-list --exclude-promisor-objects
>
Added --missing=print, --missing=allow-any, and
--exclude-promisor-objects to t0410
--missing=allow-promisor did some seem sufficiently interesting or
different from allow-any to justify adding it.
I had to put missing=error into the commit that introduces the tree:0
filter, since that flag causes an automatic attempt to fetch the
missing object, which t0410 does not seem to support. So added test
case "auto-fetching of trees with --missing=error" to t5616.

> Also, test when a tree pointed to by a commit is missing, and when a
> tree pointed to by a tree is missing.
Former is done multiple times already, added latter to t0410 as
"missing non-root tree object and rev-list."
>
> > @@ -152,20 +151,21 @@ static void process_tree(struct traversal_context *ctx,
> >               die("bad tree object");
> >       if (obj->flags & (UNINTERESTING | SEEN))
> >               return;
> > -     if (parse_tree_gently(tree, gently) < 0) {
> > +     parsed = parse_tree_gently(tree, /*quiet_on_missing=*/1) >= 0;
> > +     if (!parsed) {
> >               if (revs->ignore_missing_links)
> >                       return;
> >
> > +             if (!is_promisor_object(&obj->oid))
> > +                     die("bad tree object %s", oid_to_hex(&obj->oid));
> > +
> >               /*
> >                * Pre-filter known-missing tree objects when explicitly
> >                * requested.  This may cause the actual filter to report
> >                * an incomplete list of missing objects.
> >                */
> > -             if (revs->exclude_promisor_objects &&
> > -                 is_promisor_object(&obj->oid))
> > +             if (revs->exclude_promisor_objects)
> >                       return;
> > -
> > -             die("bad tree object %s", oid_to_hex(&obj->oid));
> >       }
>
> The missing mechanism (for error, allow-any, print) should work without
> needing to consult whether an object is a promisor object or not - it
> should just print whatever is missing, so the "if
> (!is_promisor_object..." line looks out of place.
Done. I considered that a missing object which is not a promisor is a
serious error, so I had it die here. But now that I've added the
do_not_die_on_missing_tree flag, it's more natural to keep the
previous promisor check as-is. Also, is_promisor_object is an
expensive check, and it would be better to skip it during the common
execution path (which should be when exclude_promisor_objects, an
internal-use-only flag, is *not* set, which means we never call
is_promisor_object.

>
> In my original review [1], I suggested that we always show a tree if we
> have its hash - if we don't have the object, we just recurse into it.
> This would be the same as your patch, except that the 'die("bad tree
> object...' is totally removed instead of merely moved. I still think
> this solution has some merit - all the tests still pass (except that we
> need to check for "unable to read" instead of "bad tree object" in error
> messages), but I just realized that it might still be backwards
> incompatible in that a basic "rev-list --objects" would now succeed
> instead of fail if a tree was missing (I haven't tested this though).
The presence of the die if !is_promisor_object is what justified the
changing of the parse_tree_gently to always be gently, since it is
what showed the OID. Can we really remove both? Maybe in a different
patch set, since I'm no longer touching that line?

>
> We might need a flag called "do_not_die_on_missing_tree" (much like your
> original idea of "show_missing_trees") so that callers that are prepared
> to deal with missing trees can set this. Sorry for the churn. You can
> document it as such:
Added it, but not with a command-line flag, only in rev-info.h. We can
always  add a flag later if people have been relying on the existing
behavior of git rev-list to balk at missing trees. (That seems
unlikely though, considering there is no filter to enable that before
this patchset).

>
>  Blobs are shown without regard for their existence. But not so for
>  trees: unless exclude_promisor_objects is set and the tree in question
>  is a promisor object, or ignore_missing_links is set (and in this case,
>  the tree in question may or may not be a promisor object), the revision
>  walker dies with a "bad tree object" message when encountering a
>  missing tree.
>
>  For callers that can handle missing trees and want them to be
>  filterable and showable, set this to true. The revision walker will
>  filter and show such a missing tree as usual, but will not attempt to
>  recurse into this tree object.
>
> [1] https://public-inbox.org/git/20180810002411.13447-1-jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v2 5/5] list-objects-filter: implement filter tree:none
  2018-08-13 18:29     ` Jonathan Tan
@ 2018-08-14  0:55       ` Matthew DeVore
  0 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-14  0:55 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jeffhost, peff, stefanbeller

On Mon, Aug 13, 2018 at 11:29 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > -     case LOFS_BEGIN_TREE:
> > -             assert(obj->type == OBJ_TREE);
> > -             /* always include all tree objects */
> > -             return LOFR_MARK_SEEN | LOFR_DO_SHOW;
> > -
> >       case LOFS_END_TREE:
> >               assert(obj->type == OBJ_TREE);
> >               return LOFR_ZERO;
> >
> > +     case LOFS_BEGIN_TREE:
> > +             assert(obj->type == OBJ_TREE);
> > +             if (!filter_data->omit_trees)
> > +                     return LOFR_MARK_SEEN | LOFR_DO_SHOW;
> > +
> > +             /*
> > +              * Fallthrough to insert into omitted list for trees as well as
> > +              * blobs.
> > +              */
> > +             /* fallthrough */
> >       case LOFS_BLOB:
> > -             assert(obj->type == OBJ_BLOB);
> >               assert((obj->flags & SEEN) == 0);
>
> After looking at the resulting file, I don't think saving a few lines of
> code (to add the OID, then return LOFR_MARK_SEEN) is worth rearranging
> the cases and falling through. Can you just add the OID-adding code to
> the LOFS_BEGIN_TREE case?

I've followed Jeff's suggestion of splitting up the functions, so
though the code is now redundant, it is ready for some future
improvements that are planned, and it's more clear and consistent with
the other filter functions.

>
> > +test_expect_success 'can use tree:none to filter partial clone' '
> > +     rm -rf dst &&
> > +     git clone --no-checkout --filter=tree:none "file://$(pwd)/srv.bare" dst &&
> > +     git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
> > +     cat fetched_objects \
> > +             | awk -f print_1.awk \
> > +             | xargs -n1 git -C dst cat-file -t >fetched_types &&
> > +     sort fetched_types -u >unique_types.observed &&
> > +     echo commit > unique_types.expected &&
> > +     test_cmp unique_types.observed unique_types.expected
> > +'
>
> We also need to verify that the resulting partial clone works - after
> all relevant tests, can you also ensure that:
>  - fsck works
>  - a cat-file on an indirectly missing tree works (i.e. if you have
>    commit -> A -> B and both A and B are missing, cat-file the B)
>  - fsck still works after the cat-file
Done - added it to t5616 in the last commit of the patchset.

>
> There is another potential issue about expanding the documentation of
> the pack protocol because we now support a new type of filter, but that
> is fine because the protocol currently points us to the rev-list
> documentation (which is updated). We probably need a way for clients to
> query servers about which filters they support, but that is definitely
> beyond the scope of this patch set.
>
> > +test_expect_success 'show missing tree objects with --missing=print' '
> > +     git -C dst rev-list master --missing=print --quiet --objects >missing_objs &&
> > +     sed "s/?//" missing_objs \
> > +             | xargs -n1 git -C srv.bare cat-file -t \
> > +             >missing_types &&
> > +     sort -u missing_types >missing_types.uniq &&
> > +     echo tree >expected &&
> > +     test_cmp missing_types.uniq expected
> > +'
>
> As stated in my review of patch 3, also test the other --missing
> arguments.
Done, mostly in t0410 in patch 3.

>
> Patches 1, 2, and 4 look good to me. (Writing this here so that I don't
> need to send one e-mail for each.)

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v2 5/5] list-objects-filter: implement filter tree:none
  2018-08-13 16:38     ` Jeff Hostetler
@ 2018-08-14  0:57       ` Matthew DeVore
  0 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-14  0:57 UTC (permalink / raw)
  To: git; +Cc: git, jeffhost, peff, stefanbeller, Jonathan Tan

On Mon, Aug 13, 2018 at 9:38 AM Jeff Hostetler <git@jeffhostetler.com> wrote:
>
>
>
> On 8/10/2018 7:06 PM, Matthew DeVore wrote:
> > Teach list-objects the "tree:none" filter which allows for filtering
> > out all tree and blob objects (unless other objects are explicitly
> > specified by the user). The purpose of this patch is to allow smaller
> > partial clones.
> >
> > The name of this filter - tree:none - does not explicitly specify that
> > it also filters out all blobs, but this should not cause much confusion
> > because blobs are not at all useful without the trees that refer to
> > them.
> >
> > I also consider only:commits as a name, but this is inaccurate because
> > it suggests that annotated tags are omitted, but actually they are
> > included.
> >
> > Signed-off-by: Matthew DeVore <matvore@google.com>
> > ---
> >   Documentation/rev-list-options.txt     |  2 ++
> >   list-objects-filter-options.c          |  4 +++
> >   list-objects-filter-options.h          |  1 +
> >   list-objects-filter.c                  | 49 +++++++++++++++++++-------
> >   t/t5317-pack-objects-filter-objects.sh | 27 ++++++++++++++
> >   t/t5616-partial-clone.sh               | 27 ++++++++++++++
> >   t/t6112-rev-list-filters-objects.sh    | 13 +++++++
> >   7 files changed, 110 insertions(+), 13 deletions(-)
> >
> > diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
> > index 7b273635d..68b4b9552 100644
> > --- a/Documentation/rev-list-options.txt
> > +++ b/Documentation/rev-list-options.txt
> > @@ -743,6 +743,8 @@ specification contained in <path>.
> >       A debug option to help with future "partial clone" development.
> >       This option specifies how missing objects are handled.
> >   +
> > +The form '--filter=tree:none' omits all blobs and trees.
> > ++
> >   The form '--missing=error' requests that rev-list stop with an error if
> >   a missing object is encountered.  This is the default action.
> >   +
> > diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> > index c0e2bd6a0..523cb00a0 100644
> > --- a/list-objects-filter-options.c
> > +++ b/list-objects-filter-options.c
> > @@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
> >                       return 0;
> >               }
> >
> > +     } else if (!strcmp(arg, "tree:none")) {
> > +             filter_options->choice = LOFC_TREE_NONE;
> > +             return 0;
> > +
> >       } else if (skip_prefix(arg, "sparse:oid=", &v0)) {
> >               struct object_context oc;
> >               struct object_id sparse_oid;
> > diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
> > index 0000a61f8..af64e5c66 100644
> > --- a/list-objects-filter-options.h
> > +++ b/list-objects-filter-options.h
> > @@ -10,6 +10,7 @@ enum list_objects_filter_choice {
> >       LOFC_DISABLED = 0,
> >       LOFC_BLOB_NONE,
> >       LOFC_BLOB_LIMIT,
> > +     LOFC_TREE_NONE,
> >       LOFC_SPARSE_OID,
> >       LOFC_SPARSE_PATH,
> >       LOFC__COUNT /* must be last */
> > diff --git a/list-objects-filter.c b/list-objects-filter.c
> > index a0ba78b20..22c894093 100644
> > --- a/list-objects-filter.c
> > +++ b/list-objects-filter.c
> > @@ -26,38 +26,45 @@
> >   #define FILTER_SHOWN_BUT_REVISIT (1<<21)
> >
> >   /*
> > - * A filter for list-objects to omit ALL blobs from the traversal.
> > - * And to OPTIONALLY collect a list of the omitted OIDs.
> > + * A filter for list-objects to omit ALL blobs from the traversal, and possibly
> > + * trees as well.
> > + * Can OPTIONALLY collect a list of the omitted OIDs.
> >    */
> > -struct filter_blobs_none_data {
> > +struct filter_none_of_type_data {
> > +     /* blobs are always omitted */
> > +     unsigned omit_trees : 1;
> >       struct oidset *omits;
> >   };
> >
>
> I'm not sure I'd convert the existing filter types.
> When I created this file, I created a set of function pairs
> for each filter type:
>      filter_<name>() and filter_<name>__init()
>
> with the latter being added to the s_filters[] array and created
> a choice enum having corresponding values
>      LOFC_<name>
>
> Here you're adding a new _init() and LOFC_ key, but mapping both
> the original "blob:none" and the new "tree:none" to a combined
> filter function and blends these 2 modes.
>
> Style-wise, I'd keep the original filters as they were and add a
> new function pair for the new tree:none filter.  Then you can
> simplify the logic inside your new filter.  For example, in your
> filter "filter_data->omit_trees" will always be true, so you can
> just do the "if (filter_data->omits) oidset_insert(...); return _SEEN"
> and not have the fallthru stuff -- or get rid of the asserts() and put
> the case labels together.
>
> One of the things I wanted to do (when I found some free time) was to
> add a "tree:none" and maybe a "tree:root" filter.  (The latter only
> including the root trees associated with the fetched commits, since
> there are/were some places where we implicitly also load the root tree
> when loading the commit object.)  So in that vein, it might be that we
> would want a "tree:<depth>" filter instead with 0 = none and 1 = root.
> I wasn't ready to propose that when I did the filtering, but I had that
> in mind.  (And is partially why I suggest keeping your new filter
> independent of the existing ones.)

That's fair. I've split up the functions to be completely separate,
and changed the filter name to tree:0 so it can later be extended as
you suggest.

>
> Jeff
>
>
> > -static enum list_objects_filter_result filter_blobs_none(
> > +static enum list_objects_filter_result filter_none_of_type(
> >       enum list_objects_filter_situation filter_situation,
> >       struct object *obj,
> >       const char *pathname,
> >       const char *filename,
> >       void *filter_data_)
> >   {
> > -     struct filter_blobs_none_data *filter_data = filter_data_;
> > +     struct filter_none_of_type_data *filter_data = filter_data_;
> >
> >       switch (filter_situation) {
> >       default:
> >               die("unknown filter_situation");
> >               return LOFR_ZERO;
> >
> > -     case LOFS_BEGIN_TREE:
> > -             assert(obj->type == OBJ_TREE);
> > -             /* always include all tree objects */
> > -             return LOFR_MARK_SEEN | LOFR_DO_SHOW;
> > -
> >       case LOFS_END_TREE:
> >               assert(obj->type == OBJ_TREE);
> >               return LOFR_ZERO;
> >
> > +     case LOFS_BEGIN_TREE:
> > +             assert(obj->type == OBJ_TREE);
> > +             if (!filter_data->omit_trees)
> > +                     return LOFR_MARK_SEEN | LOFR_DO_SHOW;
> > +
> > +             /*
> > +              * Fallthrough to insert into omitted list for trees as well as
> > +              * blobs.
> > +              */
> > +             /* fallthrough */
> >       case LOFS_BLOB:
> > -             assert(obj->type == OBJ_BLOB);
> >               assert((obj->flags & SEEN) == 0);
> >
> >               if (filter_data->omits)
> > @@ -72,10 +79,25 @@ static void *filter_blobs_none__init(
> >       filter_object_fn *filter_fn,
> >       filter_free_fn *filter_free_fn)
> >   {
> > -     struct filter_blobs_none_data *d = xcalloc(1, sizeof(*d));
> > +     struct filter_none_of_type_data *d = xcalloc(1, sizeof(*d));
> > +     d->omits = omitted;
> > +
> > +     *filter_fn = filter_none_of_type;
> > +     *filter_free_fn = free;
> > +     return d;
> > +}
> > +
> > +static void* filter_tree_none__init(
> > +     struct oidset *omitted,
> > +     struct list_objects_filter_options *filter_options,
> > +     filter_object_fn *filter_fn,
> > +     filter_free_fn *filter_free_fn)
> > +{
> > +     struct filter_none_of_type_data *d = xcalloc(1, sizeof(*d));
> > +     d->omit_trees = 1;
> >       d->omits = omitted;
> >
> > -     *filter_fn = filter_blobs_none;
> > +     *filter_fn = filter_none_of_type;
> >       *filter_free_fn = free;
> >       return d;
> >   }
> > @@ -374,6 +396,7 @@ static filter_init_fn s_filters[] = {
> >       NULL,
> >       filter_blobs_none__init,
> >       filter_blobs_limit__init,
> > +     filter_tree_none__init,
> >       filter_sparse_oid__init,
> >       filter_sparse_path__init,
> >   };
> > diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
> > index 5e35f33bf..28a8c916a 100755
> > --- a/t/t5317-pack-objects-filter-objects.sh
> > +++ b/t/t5317-pack-objects-filter-objects.sh
> > @@ -72,6 +72,33 @@ test_expect_success 'get an error for missing tree object' '
> >       grep -q "bad tree object" bad_tree
> >   '
> >
> > +test_expect_success 'setup for tests of tree:none' '
> > +     mkdir r1/subtree &&
> > +     echo "This is a file in a subtree" > r1/subtree/file &&
> > +     git -C r1 add subtree/file &&
> > +     git -C r1 commit -m subtree
> > +'
> > +
> > +test_expect_success 'verify tree:none packfile has no blobs or trees' '
> > +     git -C r1 pack-objects --rev --stdout --filter=tree:none >commitsonly.pack <<-EOF &&
> > +     HEAD
> > +     EOF
> > +     git -C r1 index-pack ../commitsonly.pack &&
> > +     git -C r1 verify-pack -v ../commitsonly.pack >objs &&
> > +     ! grep -E "tree|blob" objs
> > +'
> > +
> > +test_expect_success 'grab tree directly when using tree:none' '
> > +     # We should get the tree specified directly but not its blobs or subtrees.
> > +     git -C r1 pack-objects --rev --stdout --filter=tree:none >commitsonly.pack <<-EOF &&
> > +     HEAD:
> > +     EOF
> > +     git -C r1 index-pack ../commitsonly.pack &&
> > +     git -C r1 verify-pack -v ../commitsonly.pack >objs &&
> > +     grep -E "tree|blob" objs >trees_and_blobs &&
> > +     test_line_count = 1 trees_and_blobs
> > +'
> > +
> >   # Test blob:limit=<n>[kmg] filter.
> >   # We boundary test around the size parameter.  The filter is strictly less than
> >   # the value, so size 500 and 1000 should have the same results, but 1001 should
> > diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> > index bbbe7537d..4fc068716 100755
> > --- a/t/t5616-partial-clone.sh
> > +++ b/t/t5616-partial-clone.sh
> > @@ -170,6 +170,33 @@ test_expect_success 'partial clone fetches blobs pointed to by refs even if norm
> >       git -C dst fsck
> >   '
> >
> > +test_expect_success 'can use tree:none to filter partial clone' '
> > +     rm -rf dst &&
> > +     git clone --no-checkout --filter=tree:none "file://$(pwd)/srv.bare" dst &&
> > +     git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
> > +     cat fetched_objects \
> > +             | awk -f print_1.awk \
> > +             | xargs -n1 git -C dst cat-file -t >fetched_types &&
> > +     sort fetched_types -u >unique_types.observed &&
> > +     echo commit > unique_types.expected &&
> > +     test_cmp unique_types.observed unique_types.expected
> > +'
> > +
> > +test_expect_success 'show missing tree objects with --missing=print' '
> > +     git -C dst rev-list master --missing=print --quiet --objects >missing_objs &&
> > +     sed "s/?//" missing_objs \
> > +             | xargs -n1 git -C srv.bare cat-file -t \
> > +             >missing_types &&
> > +     sort -u missing_types >missing_types.uniq &&
> > +     echo tree >expected &&
> > +     test_cmp missing_types.uniq expected
> > +'
> > +
> > +test_expect_success 'do not complain when a missing tree cannot be parsed' '
> > +     git -C dst rev-list master --missing=print --quiet --objects 2>rev_list_err >&2 &&
> > +     ! grep -q "Could not read " rev_list_err
> > +'
> > +
> >   . "$TEST_DIRECTORY"/lib-httpd.sh
> >   start_httpd
> >
> > diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
> > index 0a37dd5f9..ecdf6b4c3 100755
> > --- a/t/t6112-rev-list-filters-objects.sh
> > +++ b/t/t6112-rev-list-filters-objects.sh
> > @@ -196,6 +196,19 @@ test_expect_success 'verify sparse:oid=oid-ish omits top-level files' '
> >       test_cmp observed expected
> >   '
> >
> > +# Test tree:none filter.
> > +
> > +test_expect_success 'verify tree:none includes trees in "filtered" output' '
> > +     git -C r3 rev-list HEAD --quiet --objects --filter-print-omitted --filter=tree:none \
> > +             | awk -f print_1.awk \
> > +             | sed s/~// \
> > +             | xargs -n1 git -C r3 cat-file -t \
> > +             | sort -u >filtered_types &&
> > +     printf "blob\ntree\n" > expected &&
> > +     test_cmp filtered_types expected
> > +'
> > +
> > +
> >   # Delete some loose objects and use rev-list, but WITHOUT any filtering.
> >   # This models previously omitted objects that we did not receive.
> >
> >

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v3 5/5] list-objects-filter: implement filter tree:0
  2018-08-13 18:14   ` [PATCH v3 5/5] list-objects-filter: implement filter tree:0 Matthew DeVore
@ 2018-08-14 15:13     ` Jeff Hostetler
  2018-08-14 17:25       ` Matthew DeVore
  2018-10-03 19:00       ` Matthew DeVore
  0 siblings, 2 replies; 151+ messages in thread
From: Jeff Hostetler @ 2018-08-14 15:13 UTC (permalink / raw)
  To: Matthew DeVore, git; +Cc: jeffhost, peff, stefanbeller, jonathantanmy



On 8/13/2018 2:14 PM, Matthew DeVore wrote:
> Teach list-objects the "tree:0" filter which allows for filtering
> out all tree and blob objects (unless other objects are explicitly
> specified by the user). The purpose of this patch is to allow smaller
> partial clones.
> 
> The name of this filter - tree:0 - does not explicitly specify that
> it also filters out all blobs, but this should not cause much confusion
> because blobs are not at all useful without the trees that refer to
> them.
> 
> I also consider only:commits as a name, but this is inaccurate because
> it suggests that annotated tags are omitted, but actually they are
> included.
> 
> The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
> would filter out all but the root tree and blobs. In order to avoid
> confusion between 0 and capital O, the documentation was worded in a
> somewhat round-about way that also hints at this future improvement to
> the feature.
> 
> Signed-off-by: Matthew DeVore <matvore@google.com>
> ---
>   Documentation/rev-list-options.txt     |  3 ++
>   list-objects-filter-options.c          |  4 +++
>   list-objects-filter-options.h          |  1 +
>   list-objects-filter.c                  | 50 ++++++++++++++++++++++++++
>   t/t5317-pack-objects-filter-objects.sh | 27 ++++++++++++++
>   t/t5616-partial-clone.sh               | 27 ++++++++++++++
>   t/t6112-rev-list-filters-objects.sh    | 13 +++++++
>   7 files changed, 125 insertions(+)
> 
> diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
> index 7b273635d..9e351ec2a 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -743,6 +743,9 @@ specification contained in <path>.
>   	A debug option to help with future "partial clone" development.
>   	This option specifies how missing objects are handled.
>   +
> +The form '--filter=tree:<depth>' omits all blobs and trees deeper than
> +<depth> from the root tree. Currently, only <depth>=0 is supported.
> ++
>   The form '--missing=error' requests that rev-list stop with an error if
>   a missing object is encountered.  This is the default action.
>   +
> diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> index c0e2bd6a0..a28382940 100644
> --- a/list-objects-filter-options.c
> +++ b/list-objects-filter-options.c
> @@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
>   			return 0;
>   		}
>   
> +	} else if (!strcmp(arg, "tree:0")) {
> +		filter_options->choice = LOFC_TREE_NONE;
> +		return 0;
> +
>   	} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
>   		struct object_context oc;
>   		struct object_id sparse_oid;
> diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
> index 0000a61f8..af64e5c66 100644
> --- a/list-objects-filter-options.h
> +++ b/list-objects-filter-options.h
> @@ -10,6 +10,7 @@ enum list_objects_filter_choice {
>   	LOFC_DISABLED = 0,
>   	LOFC_BLOB_NONE,
>   	LOFC_BLOB_LIMIT,
> +	LOFC_TREE_NONE,
>   	LOFC_SPARSE_OID,
>   	LOFC_SPARSE_PATH,
>   	LOFC__COUNT /* must be last */
> diff --git a/list-objects-filter.c b/list-objects-filter.c
> index a0ba78b20..8e3caf5bf 100644
> --- a/list-objects-filter.c
> +++ b/list-objects-filter.c
> @@ -80,6 +80,55 @@ static void *filter_blobs_none__init(
>   	return d;
>   }
>   
> +/*
> + * A filter for list-objects to omit ALL trees and blobs from the traversal.
> + * Can OPTIONALLY collect a list of the omitted OIDs.
> + */
> +struct filter_trees_none_data {
> +	struct oidset *omits;
> +};
> +
> +static enum list_objects_filter_result filter_trees_none(
> +	enum list_objects_filter_situation filter_situation,
> +	struct object *obj,
> +	const char *pathname,
> +	const char *filename,
> +	void *filter_data_)
> +{
> +	struct filter_trees_none_data *filter_data = filter_data_;
> +
> +	switch (filter_situation) {
> +	default:
> +		die("unknown filter_situation");
> +		return LOFR_ZERO;
> +
> +	case LOFS_BEGIN_TREE:
> +	case LOFS_BLOB:
> +		if (filter_data->omits)
> +			oidset_insert(filter_data->omits, &obj->oid);
> +		return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
> +
> +	case LOFS_END_TREE:
> +		assert(obj->type == OBJ_TREE);
> +		return LOFR_ZERO;
> +
> +	}
> +}

There are a couple of options here:
[] If really want to omit all trees and blobs (and we DO NOT want
    the oidset of everything omitted), then we might be able to
    shortcut the traversal and speed things up.

    {} add a LOFR_SKIP_TREE bit to list_objects_filter_result
    {} test this bit process_tree() and avoid the init_tree_desc() and
       the while loop and some adjacent setup/tear-down code.
    {} make this filter something like:

	case LOFS_BEGIN_TREE:
		if (filter_data->omits) {
			oidset_insert(filter_data->omits, &obj->oid);
			return LOFR_MARK_SEEN; /* ... (hard omit) */
		} else
			return LOFR_SKIP_TREE;
	case LOFS_BLOB:
		if (filter_data->omits) {
			oidset_insert(filter_data->omits, &obj->oid);
			return LOFR_MARK_SEEN; /* ... (hard omit) */
		else
			assert(...should not happen...);

[] Later, if we choose to actually support a depth>0, we'll probably
    want a different filter function to conditionally include/exclude
    blobs, include shallow tree[node]s, and do some of the provisional-
    omit logic on deep tree[nodes] (in case a tree appears at multiple
    places/depths in the history).  But that can wait.

Jeff


> +
> +static void* filter_trees_none__init(
> +	struct oidset *omitted,
> +	struct list_objects_filter_options *filter_options,
> +	filter_object_fn *filter_fn,
> +	filter_free_fn *filter_free_fn)
> +{
> +	struct filter_trees_none_data *d = xcalloc(1, sizeof(*d));
> +	d->omits = omitted;
> +
> +	*filter_fn = filter_trees_none;
> +	*filter_free_fn = free;
> +	return d;
> +}
> +
>   /*
>    * A filter for list-objects to omit large blobs.
>    * And to OPTIONALLY collect a list of the omitted OIDs.
> @@ -374,6 +423,7 @@ static filter_init_fn s_filters[] = {
>   	NULL,
>   	filter_blobs_none__init,
>   	filter_blobs_limit__init,
> +	filter_trees_none__init,
>   	filter_sparse_oid__init,
>   	filter_sparse_path__init,
>   };
> diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
> index 5e35f33bf..65f2cf446 100755
> --- a/t/t5317-pack-objects-filter-objects.sh
> +++ b/t/t5317-pack-objects-filter-objects.sh
> @@ -72,6 +72,33 @@ test_expect_success 'get an error for missing tree object' '
>   	grep -q "bad tree object" bad_tree
>   '
>   
> +test_expect_success 'setup for tests of tree:0' '
> +	mkdir r1/subtree &&
> +	echo "This is a file in a subtree" > r1/subtree/file &&
> +	git -C r1 add subtree/file &&
> +	git -C r1 commit -m subtree
> +'
> +
> +test_expect_success 'verify tree:0 packfile has no blobs or trees' '
> +	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
> +	HEAD
> +	EOF
> +	git -C r1 index-pack ../commitsonly.pack &&
> +	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
> +	! grep -E "tree|blob" objs
> +'
> +
> +test_expect_success 'grab tree directly when using tree:0' '
> +	# We should get the tree specified directly but not its blobs or subtrees.
> +	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
> +	HEAD:
> +	EOF
> +	git -C r1 index-pack ../commitsonly.pack &&
> +	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
> +	grep -E "tree|blob" objs >trees_and_blobs &&
> +	test_line_count = 1 trees_and_blobs
> +'
> +
>   # Test blob:limit=<n>[kmg] filter.
>   # We boundary test around the size parameter.  The filter is strictly less than
>   # the value, so size 500 and 1000 should have the same results, but 1001 should
> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> index bbbe7537d..fc4d182c0 100755
> --- a/t/t5616-partial-clone.sh
> +++ b/t/t5616-partial-clone.sh
> @@ -170,6 +170,33 @@ test_expect_success 'partial clone fetches blobs pointed to by refs even if norm
>   	git -C dst fsck
>   '
>   
> +test_expect_success 'can use tree:0 to filter partial clone' '
> +	rm -rf dst &&
> +	git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
> +	git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
> +	cat fetched_objects \
> +		| awk -f print_1.awk \
> +		| xargs -n1 git -C dst cat-file -t >fetched_types &&
> +	sort fetched_types -u >unique_types.observed &&
> +	echo commit > unique_types.expected &&
> +	test_cmp unique_types.observed unique_types.expected
> +'
> +
> +test_expect_success 'show missing tree objects with --missing=print' '
> +	git -C dst rev-list master --missing=print --quiet --objects >missing_objs &&
> +	sed "s/?//" missing_objs \
> +		| xargs -n1 git -C srv.bare cat-file -t \
> +		>missing_types &&
> +	sort -u missing_types >missing_types.uniq &&
> +	echo tree >expected &&
> +	test_cmp missing_types.uniq expected
> +'
> +
> +test_expect_success 'do not complain when a missing tree cannot be parsed' '
> +	git -C dst rev-list master --missing=print --quiet --objects 2>rev_list_err >&2 &&
> +	! grep -q "Could not read " rev_list_err
> +'
> +
>   . "$TEST_DIRECTORY"/lib-httpd.sh
>   start_httpd
>   
> diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
> index 0a37dd5f9..6ccffddbc 100755
> --- a/t/t6112-rev-list-filters-objects.sh
> +++ b/t/t6112-rev-list-filters-objects.sh
> @@ -196,6 +196,19 @@ test_expect_success 'verify sparse:oid=oid-ish omits top-level files' '
>   	test_cmp observed expected
>   '
>   
> +# Test tree:0 filter.
> +
> +test_expect_success 'verify tree:0 includes trees in "filtered" output' '
> +	git -C r3 rev-list HEAD --quiet --objects --filter-print-omitted --filter=tree:0 \
> +		| awk -f print_1.awk \
> +		| sed s/~// \
> +		| xargs -n1 git -C r3 cat-file -t \
> +		| sort -u >filtered_types &&
> +	printf "blob\ntree\n" > expected &&
> +	test_cmp filtered_types expected
> +'
> +
> +
>   # Delete some loose objects and use rev-list, but WITHOUT any filtering.
>   # This models previously omitted objects that we did not receive.
>   
> 

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v2 3/5] rev-list: handle missing tree objects properly
  2018-08-14  0:22       ` Matthew DeVore
@ 2018-08-14 16:03         ` Jonathan Tan
  0 siblings, 0 replies; 151+ messages in thread
From: Jonathan Tan @ 2018-08-14 16:03 UTC (permalink / raw)
  To: matvore; +Cc: jonathantanmy, git, jeffhost, peff, stefanbeller

> I don't understand about the ">= 0". What should I replace it with?
> Maybe you mean the return is never positive so I can change:
> 
> parse_tree_gently(tree, 1) >= 0
> 
> to:
> !parse_tree_gently(tree, 1)
> 
> ?

Sorry for the lack of clarity - that is what I meant.

> > The missing mechanism (for error, allow-any, print) should work without
> > needing to consult whether an object is a promisor object or not - it
> > should just print whatever is missing, so the "if
> > (!is_promisor_object..." line looks out of place.
> Done. I considered that a missing object which is not a promisor is a
> serious error, so I had it die here.

It is a serious error, but as far as I can tell, that is what the
--missing flags are supposed to help diagnose (so we can't die since we
need the diagnoses to be printed). See, for example, 'rev-list W/
--missing=print' in t6112 - the "r1" repository does not have partial
clone enabled (which I verified by inserting a test_pause then cat-ting
r1/.git/config), but nothing dies.

> But now that I've added the
> do_not_die_on_missing_tree flag, it's more natural to keep the
> previous promisor check as-is.

OK, I'll take a look once you send out v4.

> Also, is_promisor_object is an
> expensive check, and it would be better to skip it during the common
> execution path (which should be when exclude_promisor_objects, an
> internal-use-only flag, is *not* set, which means we never call
> is_promisor_object.

That's true.

> > In my original review [1], I suggested that we always show a tree if we
> > have its hash - if we don't have the object, we just recurse into it.
> > This would be the same as your patch, except that the 'die("bad tree
> > object...' is totally removed instead of merely moved. I still think
> > this solution has some merit - all the tests still pass (except that we
> > need to check for "unable to read" instead of "bad tree object" in error
> > messages), but I just realized that it might still be backwards
> > incompatible in that a basic "rev-list --objects" would now succeed
> > instead of fail if a tree was missing (I haven't tested this though).
> The presence of the die if !is_promisor_object is what justified the
> changing of the parse_tree_gently to always be gently, since it is
> what showed the OID. Can we really remove both? Maybe in a different
> patch set, since I'm no longer touching that line?

That's true - the idea of removing both needs more thought, and if we
were to do so, we definitely can do it in a different patch set.

> > We might need a flag called "do_not_die_on_missing_tree" (much like your
> > original idea of "show_missing_trees") so that callers that are prepared
> > to deal with missing trees can set this. Sorry for the churn. You can
> > document it as such:
> Added it, but not with a command-line flag, only in rev-info.h. We can
> always  add a flag later if people have been relying on the existing
> behavior of git rev-list to balk at missing trees. (That seems
> unlikely though, considering there is no filter to enable that before
> this patchset).

By flag, I indeed meant in rev-info.h - sorry for the confusion. That
sounds good.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v3 5/5] list-objects-filter: implement filter tree:0
  2018-08-14 15:13     ` Jeff Hostetler
@ 2018-08-14 17:25       ` Matthew DeVore
  2018-10-03 19:00       ` Matthew DeVore
  1 sibling, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-14 17:25 UTC (permalink / raw)
  To: git; +Cc: git, jeffhost, Jeff King, Stefan Beller, Jonathan Tan

On Tue, Aug 14, 2018 at 8:13 AM Jeff Hostetler <git@jeffhostetler.com> wrote:
>
>
>
> On 8/13/2018 2:14 PM, Matthew DeVore wrote:
> > Teach list-objects the "tree:0" filter which allows for filtering
> > out all tree and blob objects (unless other objects are explicitly
> > specified by the user). The purpose of this patch is to allow smaller
> > partial clones.
> >
> > The name of this filter - tree:0 - does not explicitly specify that
> > it also filters out all blobs, but this should not cause much confusion
> > because blobs are not at all useful without the trees that refer to
> > them.
> >
> > I also consider only:commits as a name, but this is inaccurate because
> > it suggests that annotated tags are omitted, but actually they are
> > included.
> >
> > The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
> > would filter out all but the root tree and blobs. In order to avoid
> > confusion between 0 and capital O, the documentation was worded in a
> > somewhat round-about way that also hints at this future improvement to
> > the feature.
> >
> > Signed-off-by: Matthew DeVore <matvore@google.com>
> > ---
> >   Documentation/rev-list-options.txt     |  3 ++
> >   list-objects-filter-options.c          |  4 +++
> >   list-objects-filter-options.h          |  1 +
> >   list-objects-filter.c                  | 50 ++++++++++++++++++++++++++
> >   t/t5317-pack-objects-filter-objects.sh | 27 ++++++++++++++
> >   t/t5616-partial-clone.sh               | 27 ++++++++++++++
> >   t/t6112-rev-list-filters-objects.sh    | 13 +++++++
> >   7 files changed, 125 insertions(+)
> >
> > diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
> > index 7b273635d..9e351ec2a 100644
> > --- a/Documentation/rev-list-options.txt
> > +++ b/Documentation/rev-list-options.txt
> > @@ -743,6 +743,9 @@ specification contained in <path>.
> >       A debug option to help with future "partial clone" development.
> >       This option specifies how missing objects are handled.
> >   +
> > +The form '--filter=tree:<depth>' omits all blobs and trees deeper than
> > +<depth> from the root tree. Currently, only <depth>=0 is supported.
> > ++
> >   The form '--missing=error' requests that rev-list stop with an error if
> >   a missing object is encountered.  This is the default action.
> >   +
> > diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> > index c0e2bd6a0..a28382940 100644
> > --- a/list-objects-filter-options.c
> > +++ b/list-objects-filter-options.c
> > @@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
> >                       return 0;
> >               }
> >
> > +     } else if (!strcmp(arg, "tree:0")) {
> > +             filter_options->choice = LOFC_TREE_NONE;
> > +             return 0;
> > +
> >       } else if (skip_prefix(arg, "sparse:oid=", &v0)) {
> >               struct object_context oc;
> >               struct object_id sparse_oid;
> > diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
> > index 0000a61f8..af64e5c66 100644
> > --- a/list-objects-filter-options.h
> > +++ b/list-objects-filter-options.h
> > @@ -10,6 +10,7 @@ enum list_objects_filter_choice {
> >       LOFC_DISABLED = 0,
> >       LOFC_BLOB_NONE,
> >       LOFC_BLOB_LIMIT,
> > +     LOFC_TREE_NONE,
> >       LOFC_SPARSE_OID,
> >       LOFC_SPARSE_PATH,
> >       LOFC__COUNT /* must be last */
> > diff --git a/list-objects-filter.c b/list-objects-filter.c
> > index a0ba78b20..8e3caf5bf 100644
> > --- a/list-objects-filter.c
> > +++ b/list-objects-filter.c
> > @@ -80,6 +80,55 @@ static void *filter_blobs_none__init(
> >       return d;
> >   }
> >
> > +/*
> > + * A filter for list-objects to omit ALL trees and blobs from the traversal.
> > + * Can OPTIONALLY collect a list of the omitted OIDs.
> > + */
> > +struct filter_trees_none_data {
> > +     struct oidset *omits;
> > +};
> > +
> > +static enum list_objects_filter_result filter_trees_none(
> > +     enum list_objects_filter_situation filter_situation,
> > +     struct object *obj,
> > +     const char *pathname,
> > +     const char *filename,
> > +     void *filter_data_)
> > +{
> > +     struct filter_trees_none_data *filter_data = filter_data_;
> > +
> > +     switch (filter_situation) {
> > +     default:
> > +             die("unknown filter_situation");
> > +             return LOFR_ZERO;
> > +
> > +     case LOFS_BEGIN_TREE:
> > +     case LOFS_BLOB:
> > +             if (filter_data->omits)
> > +                     oidset_insert(filter_data->omits, &obj->oid);
> > +             return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
> > +
> > +     case LOFS_END_TREE:
> > +             assert(obj->type == OBJ_TREE);
> > +             return LOFR_ZERO;
> > +
> > +     }
> > +}
>
> There are a couple of options here:
> [] If really want to omit all trees and blobs (and we DO NOT want
>     the oidset of everything omitted), then we might be able to
>     shortcut the traversal and speed things up.
>
>     {} add a LOFR_SKIP_TREE bit to list_objects_filter_result
>     {} test this bit process_tree() and avoid the init_tree_desc() and
>        the while loop and some adjacent setup/tear-down code.
>     {} make this filter something like:
>
>         case LOFS_BEGIN_TREE:
>                 if (filter_data->omits) {
>                         oidset_insert(filter_data->omits, &obj->oid);
>                         return LOFR_MARK_SEEN; /* ... (hard omit) */
>                 } else
>                         return LOFR_SKIP_TREE;
>         case LOFS_BLOB:
>                 if (filter_data->omits) {
>                         oidset_insert(filter_data->omits, &obj->oid);
>                         return LOFR_MARK_SEEN; /* ... (hard omit) */
>                 else
>                         assert(...should not happen...);
I like this - it will considerably reduce the amount of work the
server needs to do on a partial clone. I'd prefer to do this in a
follow-up patchset, so I added a NEEDSWORK in the commit that adds
proper handling for filtered tree objects. I want to make sure the
unit tests are thorough when I apply your suggestion, and this
patchset is already a bit more complex than I was expecting.

>
> [] Later, if we choose to actually support a depth>0, we'll probably
>     want a different filter function to conditionally include/exclude
>     blobs, include shallow tree[node]s, and do some of the provisional-
>     omit logic on deep tree[nodes] (in case a tree appears at multiple
>     places/depths in the history).  But that can wait.
>
> Jeff
>
>
> > +
> > +static void* filter_trees_none__init(
> > +     struct oidset *omitted,
> > +     struct list_objects_filter_options *filter_options,
> > +     filter_object_fn *filter_fn,
> > +     filter_free_fn *filter_free_fn)
> > +{
> > +     struct filter_trees_none_data *d = xcalloc(1, sizeof(*d));
> > +     d->omits = omitted;
> > +
> > +     *filter_fn = filter_trees_none;
> > +     *filter_free_fn = free;
> > +     return d;
> > +}
> > +
> >   /*
> >    * A filter for list-objects to omit large blobs.
> >    * And to OPTIONALLY collect a list of the omitted OIDs.
> > @@ -374,6 +423,7 @@ static filter_init_fn s_filters[] = {
> >       NULL,
> >       filter_blobs_none__init,
> >       filter_blobs_limit__init,
> > +     filter_trees_none__init,
> >       filter_sparse_oid__init,
> >       filter_sparse_path__init,
> >   };
> > diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
> > index 5e35f33bf..65f2cf446 100755
> > --- a/t/t5317-pack-objects-filter-objects.sh
> > +++ b/t/t5317-pack-objects-filter-objects.sh
> > @@ -72,6 +72,33 @@ test_expect_success 'get an error for missing tree object' '
> >       grep -q "bad tree object" bad_tree
> >   '
> >
> > +test_expect_success 'setup for tests of tree:0' '
> > +     mkdir r1/subtree &&
> > +     echo "This is a file in a subtree" > r1/subtree/file &&
> > +     git -C r1 add subtree/file &&
> > +     git -C r1 commit -m subtree
> > +'
> > +
> > +test_expect_success 'verify tree:0 packfile has no blobs or trees' '
> > +     git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
> > +     HEAD
> > +     EOF
> > +     git -C r1 index-pack ../commitsonly.pack &&
> > +     git -C r1 verify-pack -v ../commitsonly.pack >objs &&
> > +     ! grep -E "tree|blob" objs
> > +'
> > +
> > +test_expect_success 'grab tree directly when using tree:0' '
> > +     # We should get the tree specified directly but not its blobs or subtrees.
> > +     git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
> > +     HEAD:
> > +     EOF
> > +     git -C r1 index-pack ../commitsonly.pack &&
> > +     git -C r1 verify-pack -v ../commitsonly.pack >objs &&
> > +     grep -E "tree|blob" objs >trees_and_blobs &&
> > +     test_line_count = 1 trees_and_blobs
> > +'
> > +
> >   # Test blob:limit=<n>[kmg] filter.
> >   # We boundary test around the size parameter.  The filter is strictly less than
> >   # the value, so size 500 and 1000 should have the same results, but 1001 should
> > diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> > index bbbe7537d..fc4d182c0 100755
> > --- a/t/t5616-partial-clone.sh
> > +++ b/t/t5616-partial-clone.sh
> > @@ -170,6 +170,33 @@ test_expect_success 'partial clone fetches blobs pointed to by refs even if norm
> >       git -C dst fsck
> >   '
> >
> > +test_expect_success 'can use tree:0 to filter partial clone' '
> > +     rm -rf dst &&
> > +     git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
> > +     git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
> > +     cat fetched_objects \
> > +             | awk -f print_1.awk \
> > +             | xargs -n1 git -C dst cat-file -t >fetched_types &&
> > +     sort fetched_types -u >unique_types.observed &&
> > +     echo commit > unique_types.expected &&
> > +     test_cmp unique_types.observed unique_types.expected
> > +'
> > +
> > +test_expect_success 'show missing tree objects with --missing=print' '
> > +     git -C dst rev-list master --missing=print --quiet --objects >missing_objs &&
> > +     sed "s/?//" missing_objs \
> > +             | xargs -n1 git -C srv.bare cat-file -t \
> > +             >missing_types &&
> > +     sort -u missing_types >missing_types.uniq &&
> > +     echo tree >expected &&
> > +     test_cmp missing_types.uniq expected
> > +'
> > +
> > +test_expect_success 'do not complain when a missing tree cannot be parsed' '
> > +     git -C dst rev-list master --missing=print --quiet --objects 2>rev_list_err >&2 &&
> > +     ! grep -q "Could not read " rev_list_err
> > +'
> > +
> >   . "$TEST_DIRECTORY"/lib-httpd.sh
> >   start_httpd
> >
> > diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
> > index 0a37dd5f9..6ccffddbc 100755
> > --- a/t/t6112-rev-list-filters-objects.sh
> > +++ b/t/t6112-rev-list-filters-objects.sh
> > @@ -196,6 +196,19 @@ test_expect_success 'verify sparse:oid=oid-ish omits top-level files' '
> >       test_cmp observed expected
> >   '
> >
> > +# Test tree:0 filter.
> > +
> > +test_expect_success 'verify tree:0 includes trees in "filtered" output' '
> > +     git -C r3 rev-list HEAD --quiet --objects --filter-print-omitted --filter=tree:0 \
> > +             | awk -f print_1.awk \
> > +             | sed s/~// \
> > +             | xargs -n1 git -C r3 cat-file -t \
> > +             | sort -u >filtered_types &&
> > +     printf "blob\ntree\n" > expected &&
> > +     test_cmp filtered_types expected
> > +'
> > +
> > +
> >   # Delete some loose objects and use rev-list, but WITHOUT any filtering.
> >   # This models previously omitted objects that we did not receive.
> >
> >

^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v4 0/6] filter: support for excluding all trees and blobs
  2018-08-09 22:44 [RFC PATCH 0/5] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (7 preceding siblings ...)
  2018-08-13 18:14 ` [PATCH v3 0/5] filter: support for excluding all trees and blobs Matthew DeVore
@ 2018-08-14 17:28 ` Matthew DeVore
  2018-08-14 17:28   ` [PATCH v4 1/6] list-objects: store common func args in struct Matthew DeVore
                     ` (5 more replies)
  2018-08-15  0:22 ` [PATCH v5 0/6] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (7 subsequent siblings)
  16 siblings, 6 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-14 17:28 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy

I've applied or responded to all changes suggested by Jonathan and Jeff.

Matthew DeVore (6):
  list-objects: store common func args in struct
  list-objects: refactor to process_tree_contents
  list-objects: always parse trees gently
  rev-list: handle missing tree objects properly
  revision: mark non-user-given objects instead
  list-objects-filter: implement filter tree:0

 Documentation/rev-list-options.txt     |   3 +
 builtin/rev-list.c                     |  11 +-
 list-objects-filter-options.c          |   4 +
 list-objects-filter-options.h          |   1 +
 list-objects-filter.c                  |  50 ++++++
 list-objects.c                         | 238 +++++++++++++------------
 revision.c                             |   1 -
 revision.h                             |  11 +-
 t/t0410-partial-clone.sh               |  66 +++++++
 t/t5317-pack-objects-filter-objects.sh |  40 +++++
 t/t5616-partial-clone.sh               |  38 ++++
 t/t6112-rev-list-filters-objects.sh    |  13 ++
 12 files changed, 358 insertions(+), 118 deletions(-)

-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v4 1/6] list-objects: store common func args in struct
  2018-08-14 17:28 ` [PATCH v4 0/6] filter: support for excluding all trees and blobs Matthew DeVore
@ 2018-08-14 17:28   ` Matthew DeVore
  2018-08-14 17:28   ` [PATCH v4 2/6] list-objects: refactor to process_tree_contents Matthew DeVore
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-14 17:28 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy

This will make utility functions easier to create, as done by the next
patch.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 158 +++++++++++++++++++++++--------------------------
 1 file changed, 74 insertions(+), 84 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index c99c47ac1..584518a3f 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -12,20 +12,25 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static void process_blob(struct rev_info *revs,
+struct traversal_context {
+	struct rev_info *revs;
+	show_object_fn show_object;
+	show_commit_fn show_commit;
+	void *show_data;
+	filter_object_fn filter_fn;
+	void *filter_data;
+};
+
+static void process_blob(struct traversal_context *ctx,
 			 struct blob *blob,
-			 show_object_fn show,
 			 struct strbuf *path,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &blob->object;
 	size_t pathlen;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
-	if (!revs->blob_objects)
+	if (!ctx->revs->blob_objects)
 		return;
 	if (!obj)
 		die("bad blob object");
@@ -41,21 +46,21 @@ static void process_blob(struct rev_info *revs,
 	 * may cause the actual filter to report an incomplete list
 	 * of missing objects.
 	 */
-	if (revs->exclude_promisor_objects &&
+	if (ctx->revs->exclude_promisor_objects &&
 	    !has_object_file(&obj->oid) &&
 	    is_promisor_object(&obj->oid))
 		return;
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BLOB, obj,
-			      path->buf, &path->buf[pathlen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BLOB, obj,
+				   path->buf, &path->buf[pathlen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, path->buf, cb_data);
+		ctx->show_object(obj, path->buf, ctx->show_data);
 	strbuf_setlen(path, pathlen);
 }
 
@@ -81,26 +86,21 @@ static void process_blob(struct rev_info *revs,
  * the link, and how to do it. Whether it necessarily makes
  * any sense what-so-ever to ever do that is another issue.
  */
-static void process_gitlink(struct rev_info *revs,
+static void process_gitlink(struct traversal_context *ctx,
 			    const unsigned char *sha1,
-			    show_object_fn show,
 			    struct strbuf *path,
-			    const char *name,
-			    void *cb_data)
+			    const char *name)
 {
 	/* Nothing to do */
 }
 
-static void process_tree(struct rev_info *revs,
+static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
-			 show_object_fn show,
 			 struct strbuf *base,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &tree->object;
+	struct rev_info *revs = ctx->revs;
 	struct tree_desc desc;
 	struct name_entry entry;
 	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
@@ -133,14 +133,14 @@ static void process_tree(struct rev_info *revs,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BEGIN_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, base->buf, cb_data);
+		ctx->show_object(obj, base->buf, ctx->show_data);
 	if (base->len)
 		strbuf_addch(base, '/');
 
@@ -157,29 +157,25 @@ static void process_tree(struct rev_info *revs,
 		}
 
 		if (S_ISDIR(entry.mode))
-			process_tree(revs,
+			process_tree(ctx,
 				     lookup_tree(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(revs, entry.oid->hash,
-					show, base, entry.path,
-					cb_data);
+			process_gitlink(ctx, entry.oid->hash, base, entry.path);
 		else
-			process_blob(revs,
+			process_blob(ctx,
 				     lookup_blob(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 	}
 
-	if (!(obj->flags & USER_GIVEN) && filter_fn) {
-		r = filter_fn(LOFS_END_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+		r = ctx->filter_fn(LOFS_END_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 		if (r & LOFR_MARK_SEEN)
 			obj->flags |= SEEN;
 		if (r & LOFR_DO_SHOW)
-			show(obj, base->buf, cb_data);
+			ctx->show_object(obj, base->buf, ctx->show_data);
 	}
 
 	strbuf_setlen(base, baselen);
@@ -242,19 +238,15 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 	add_pending_object(revs, &tree->object, "");
 }
 
-static void traverse_trees_and_blobs(struct rev_info *revs,
-				     struct strbuf *base,
-				     show_object_fn show_object,
-				     void *show_data,
-				     filter_object_fn filter_fn,
-				     void *filter_data)
+static void traverse_trees_and_blobs(struct traversal_context *ctx,
+				     struct strbuf *base)
 {
 	int i;
 
 	assert(base->len == 0);
 
-	for (i = 0; i < revs->pending.nr; i++) {
-		struct object_array_entry *pending = revs->pending.objects + i;
+	for (i = 0; i < ctx->revs->pending.nr; i++) {
+		struct object_array_entry *pending = ctx->revs->pending.objects + i;
 		struct object *obj = pending->item;
 		const char *name = pending->name;
 		const char *path = pending->path;
@@ -262,62 +254,49 @@ static void traverse_trees_and_blobs(struct rev_info *revs,
 			continue;
 		if (obj->type == OBJ_TAG) {
 			obj->flags |= SEEN;
-			show_object(obj, name, show_data);
+			ctx->show_object(obj, name, ctx->show_data);
 			continue;
 		}
 		if (!path)
 			path = "";
 		if (obj->type == OBJ_TREE) {
-			process_tree(revs, (struct tree *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_tree(ctx, (struct tree *)obj, base, path);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
-			process_blob(revs, (struct blob *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_blob(ctx, (struct blob *)obj, base, path);
 			continue;
 		}
 		die("unknown pending object %s (%s)",
 		    oid_to_hex(&obj->oid), name);
 	}
-	object_array_clear(&revs->pending);
+	object_array_clear(&ctx->revs->pending);
 }
 
-static void do_traverse(struct rev_info *revs,
-			show_commit_fn show_commit,
-			show_object_fn show_object,
-			void *show_data,
-			filter_object_fn filter_fn,
-			void *filter_data)
+static void do_traverse(struct traversal_context *ctx)
 {
 	struct commit *commit;
 	struct strbuf csp; /* callee's scratch pad */
 	strbuf_init(&csp, PATH_MAX);
 
-	while ((commit = get_revision(revs)) != NULL) {
+	while ((commit = get_revision(ctx->revs)) != NULL) {
 		/*
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
 		if (get_commit_tree(commit))
-			add_pending_tree(revs, get_commit_tree(commit));
-		show_commit(commit, show_data);
+			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		ctx->show_commit(commit, ctx->show_data);
 
-		if (revs->tree_blobs_in_commit_order)
+		if (ctx->revs->tree_blobs_in_commit_order)
 			/*
 			 * NEEDSWORK: Adding the tree and then flushing it here
 			 * needs a reallocation for each commit. Can we pass the
 			 * tree directory without allocation churn?
 			 */
-			traverse_trees_and_blobs(revs, &csp,
-						 show_object, show_data,
-						 filter_fn, filter_data);
+			traverse_trees_and_blobs(ctx, &csp);
 	}
-	traverse_trees_and_blobs(revs, &csp,
-				 show_object, show_data,
-				 filter_fn, filter_data);
+	traverse_trees_and_blobs(ctx, &csp);
 	strbuf_release(&csp);
 }
 
@@ -326,7 +305,14 @@ void traverse_commit_list(struct rev_info *revs,
 			  show_object_fn show_object,
 			  void *show_data)
 {
-	do_traverse(revs, show_commit, show_object, show_data, NULL, NULL);
+	struct traversal_context ctx;
+	ctx.revs = revs;
+	ctx.show_commit = show_commit;
+	ctx.show_object = show_object;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+	ctx.filter_data = NULL;
+	do_traverse(&ctx);
 }
 
 void traverse_commit_list_filtered(
@@ -337,14 +323,18 @@ void traverse_commit_list_filtered(
 	void *show_data,
 	struct oidset *omitted)
 {
-	filter_object_fn filter_fn = NULL;
+	struct traversal_context ctx;
 	filter_free_fn filter_free_fn = NULL;
-	void *filter_data = NULL;
-
-	filter_data = list_objects_filter__init(omitted, filter_options,
-						&filter_fn, &filter_free_fn);
-	do_traverse(revs, show_commit, show_object, show_data,
-		    filter_fn, filter_data);
-	if (filter_data && filter_free_fn)
-		filter_free_fn(filter_data);
+
+	ctx.revs = revs;
+	ctx.show_object = show_object;
+	ctx.show_commit = show_commit;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+
+	ctx.filter_data = list_objects_filter__init(omitted, filter_options,
+						    &ctx.filter_fn, &filter_free_fn);
+	do_traverse(&ctx);
+	if (ctx.filter_data && filter_free_fn)
+		filter_free_fn(ctx.filter_data);
 }
-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v4 2/6] list-objects: refactor to process_tree_contents
  2018-08-14 17:28 ` [PATCH v4 0/6] filter: support for excluding all trees and blobs Matthew DeVore
  2018-08-14 17:28   ` [PATCH v4 1/6] list-objects: store common func args in struct Matthew DeVore
@ 2018-08-14 17:28   ` Matthew DeVore
  2018-08-14 17:28   ` [PATCH v4 3/6] list-objects: always parse trees gently Matthew DeVore
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-14 17:28 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy

This will be used in a follow-up patch to reduce indentation needed when
invoking the logic conditionally. i.e. rather than:

if (foo) {
	while (...) {
		/* this is very indented */
	}
}

we will have:

if (foo)
	process_tree_contents(...);

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 68 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 584518a3f..ccc529e5e 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -94,6 +94,46 @@ static void process_gitlink(struct traversal_context *ctx,
 	/* Nothing to do */
 }
 
+static void process_tree(struct traversal_context *ctx,
+			 struct tree *tree,
+			 struct strbuf *base,
+			 const char *name);
+
+static void process_tree_contents(struct traversal_context *ctx,
+				  struct tree *tree,
+				  struct strbuf *base)
+{
+	struct tree_desc desc;
+	struct name_entry entry;
+	enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
+		all_entries_interesting : entry_not_interesting;
+
+	init_tree_desc(&desc, tree->buffer, tree->size);
+
+	while (tree_entry(&desc, &entry)) {
+		if (match != all_entries_interesting) {
+			match = tree_entry_interesting(&entry, base, 0,
+						       &ctx->revs->diffopt.pathspec);
+			if (match == all_entries_not_interesting)
+				break;
+			if (match == entry_not_interesting)
+				continue;
+		}
+
+		if (S_ISDIR(entry.mode))
+			process_tree(ctx,
+				     lookup_tree(the_repository, entry.oid),
+				     base, entry.path);
+		else if (S_ISGITLINK(entry.mode))
+			process_gitlink(ctx, entry.oid->hash,
+					base, entry.path);
+		else
+			process_blob(ctx,
+				     lookup_blob(the_repository, entry.oid),
+				     base, entry.path);
+	}
+}
+
 static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
 			 struct strbuf *base,
@@ -101,10 +141,6 @@ static void process_tree(struct traversal_context *ctx,
 {
 	struct object *obj = &tree->object;
 	struct rev_info *revs = ctx->revs;
-	struct tree_desc desc;
-	struct name_entry entry;
-	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
-		all_entries_interesting: entry_not_interesting;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 	int gently = revs->ignore_missing_links ||
@@ -144,29 +180,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
-
-	while (tree_entry(&desc, &entry)) {
-		if (match != all_entries_interesting) {
-			match = tree_entry_interesting(&entry, base, 0,
-						       &revs->diffopt.pathspec);
-			if (match == all_entries_not_interesting)
-				break;
-			if (match == entry_not_interesting)
-				continue;
-		}
-
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
-		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(ctx, entry.oid->hash, base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
-	}
+	process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v4 3/6] list-objects: always parse trees gently
  2018-08-14 17:28 ` [PATCH v4 0/6] filter: support for excluding all trees and blobs Matthew DeVore
  2018-08-14 17:28   ` [PATCH v4 1/6] list-objects: store common func args in struct Matthew DeVore
  2018-08-14 17:28   ` [PATCH v4 2/6] list-objects: refactor to process_tree_contents Matthew DeVore
@ 2018-08-14 17:28   ` Matthew DeVore
  2018-08-14 17:28   ` [PATCH v4 4/6] rev-list: handle missing tree objects properly Matthew DeVore
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-14 17:28 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy

If parsing fails when revs->ignore_missing_links and
revs->exclude_promisor_objects are both false, we print the OID anyway
in the die("bad tree object...") call, so any message printed by
parse_tree_gently() is superfluous.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index ccc529e5e..f9b51db7a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,8 +143,6 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
-	int gently = revs->ignore_missing_links ||
-		     revs->exclude_promisor_objects;
 
 	if (!revs->tree_objects)
 		return;
@@ -152,7 +150,7 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, gently) < 0) {
+	if (parse_tree_gently(tree, 1) < 0) {
 		if (revs->ignore_missing_links)
 			return;
 
-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v4 4/6] rev-list: handle missing tree objects properly
  2018-08-14 17:28 ` [PATCH v4 0/6] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (2 preceding siblings ...)
  2018-08-14 17:28   ` [PATCH v4 3/6] list-objects: always parse trees gently Matthew DeVore
@ 2018-08-14 17:28   ` Matthew DeVore
  2018-08-14 18:06     ` Jonathan Tan
  2018-08-14 17:28   ` [PATCH v4 5/6] revision: mark non-user-given objects instead Matthew DeVore
  2018-08-14 17:28   ` [PATCH v4 6/6] list-objects-filter: implement filter tree:0 Matthew DeVore
  5 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-08-14 17:28 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy

Previously, we assumed only blob objects could be missing. This patch
makes rev-list handle missing trees like missing blobs. A missing tree
will cause an error if --missing indicates an error should be caused,
and the hash is printed even if the tree is missing.

In list-objects.c we no longer print a message to stderr if a tree
object is missing (quiet_on_missing is always true). I couldn't find
any place where this would matter, or where the caller of
traverse_commit_list would need to be fixed to show the error. However,
in the future it would be trivial to make the caller show the message if
we needed to.

This is not tested very thoroughly, since we cannot create promisor
objects in tests without using an actual partial clone. t0410 has a
promise_and_delete utility function, but the is_promisor_object function
does not return 1 for objects deleted in this way. More tests will will
come in a patch that implements a filter that can be used with git
clone.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 builtin/rev-list.c                     | 11 +++--
 list-objects.c                         | 17 +++++--
 revision.h                             |  1 +
 t/t0410-partial-clone.sh               | 66 ++++++++++++++++++++++++++
 t/t5317-pack-objects-filter-objects.sh | 13 +++++
 5 files changed, 101 insertions(+), 7 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5b07f3f4a..49d6deed7 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -6,6 +6,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "object.h"
 #include "object-store.h"
 #include "pack.h"
 #include "pack-bitmap.h"
@@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj)
 	 */
 	switch (arg_missing_action) {
 	case MA_ERROR:
-		die("missing blob object '%s'", oid_to_hex(&obj->oid));
+		die("missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	case MA_ALLOW_ANY:
@@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj)
 	case MA_ALLOW_PROMISOR:
 		if (is_promisor_object(&obj->oid))
 			return;
-		die("unexpected missing blob object '%s'",
-		    oid_to_hex(&obj->oid));
+		die("unexpected missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	default:
@@ -235,7 +237,7 @@ static inline void finish_object__ma(struct object *obj)
 static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
+	if (!has_object_file(&obj->oid)) {
 		finish_object__ma(obj);
 		return 1;
 	}
@@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	init_revisions(&revs, prefix);
 	revs.abbrev = DEFAULT_ABBREV;
 	revs.commit_format = CMIT_FMT_UNSPECIFIED;
+	revs.do_not_die_on_missing_tree = 1;
 
 	/*
 	 * Scan the argument list before invoking setup_revisions(), so that we
diff --git a/list-objects.c b/list-objects.c
index f9b51db7a..e88474a2d 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,6 +143,7 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
+	int parsed;
 
 	if (!revs->tree_objects)
 		return;
@@ -150,7 +151,9 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, 1) < 0) {
+
+	parsed = parse_tree_gently(tree, 1) >= 0;
+	if (!parsed) {
 		if (revs->ignore_missing_links)
 			return;
 
@@ -163,7 +166,8 @@ static void process_tree(struct traversal_context *ctx,
 		    is_promisor_object(&obj->oid))
 			return;
 
-		die("bad tree object %s", oid_to_hex(&obj->oid));
+		if (!revs->do_not_die_on_missing_tree)
+			die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
 	strbuf_addstr(base, name);
@@ -178,7 +182,14 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	process_tree_contents(ctx, tree, base);
+	/*
+	 * NEEDSWORK: we should not have to process this tree's contents if the
+	 * filter wants to exclude all its contents AND the filter doesn't need
+	 * to collect the omitted OIDs. We should add a LOFR_SKIP_TREE bit which
+	 * allows skipping all children.
+	 */
+	if (parsed)
+		process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
diff --git a/revision.h b/revision.h
index c599c34da..c94243543 100644
--- a/revision.h
+++ b/revision.h
@@ -124,6 +124,7 @@ struct rev_info {
 			first_parent_only:1,
 			line_level_traverse:1,
 			tree_blobs_in_commit_order:1,
+			do_not_die_on_missing_tree:1,
 
 			/* for internal use only */
 			exclude_promisor_objects:1;
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 4984ca583..74e3c5767 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -186,6 +186,72 @@ test_expect_success 'rev-list stops traversal at missing and promised commit' '
 	! grep $FOO out
 '
 
+test_expect_success 'show missing tree objects with --missing=print' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo foo &&
+	test_commit -C repo bar &&
+	test_commit -C repo baz &&
+
+	TREE=$(git -C repo rev-parse bar^{tree}) &&
+
+	promise_and_delete $TREE &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+	git -C repo rev-list --quiet --missing=print --objects HEAD >missing_objs 2>rev_list_err &&
+	echo "?$TREE" >expected &&
+	test_cmp expected missing_objs &&
+
+	# do not complain when a missing tree cannot be parsed
+	! grep -q "Could not read " rev_list_err
+'
+
+test_expect_success 'missing tree objects with --missing=allow-any and --exclude-promisor-objects' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo foo &&
+	test_commit -C repo bar &&
+	test_commit -C repo baz &&
+
+	promise_and_delete $(git -C repo rev-parse bar^{tree}) &&
+	promise_and_delete $(git -C repo rev-parse foo^{tree}) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err &&
+	test_line_count = 0 rev_list_err &&
+	# 3 commits, 3 blobs, and 1 tree
+	test_line_count = 7 objs &&
+
+	# Do the same for --exclude-promisor-objects, but with all trees gone.
+	promise_and_delete $(git -C repo rev-parse baz^{tree}) &&
+	git -C repo rev-list --exclude-promisor-objects --objects HEAD >objs 2>rev_list_err &&
+	test_line_count = 0 rev_list_err &&
+	# 3 commits, no blobs or trees
+	test_line_count = 3 objs
+'
+
+test_expect_success 'missing non-root tree object and rev-list' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	mkdir repo/dir &&
+	echo foo > repo/dir/foo &&
+	git -C repo add dir/foo &&
+	git -C repo commit -m "commit dir/foo" &&
+
+	promise_and_delete $(git -C repo rev-parse HEAD:dir) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err &&
+	test_line_count = 0 rev_list_err &&
+	# 1 commit and 1 tree
+	test_line_count = 2 objs
+'
+
 test_expect_success 'rev-list stops traversal at missing and promised tree' '
 	rm -rf repo &&
 	test_create_repo repo &&
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 6710c8bc8..5e35f33bf 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -59,6 +59,19 @@ test_expect_success 'verify normal and blob:none packfiles have same commits/tre
 	test_cmp observed expected
 '
 
+test_expect_success 'get an error for missing tree object' '
+	git init r5 &&
+	echo foo > r5/foo &&
+	git -C r5 add foo &&
+	git -C r5 commit -m "foo" &&
+	del=$(git -C r5 rev-parse HEAD^{tree} | sed "s|..|&/|") &&
+	rm r5/.git/objects/$del &&
+	test_must_fail git -C r5 pack-objects --rev --stdout 2>bad_tree <<-EOF &&
+	HEAD
+	EOF
+	grep -q "bad tree object" bad_tree
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v4 5/6] revision: mark non-user-given objects instead
  2018-08-14 17:28 ` [PATCH v4 0/6] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (3 preceding siblings ...)
  2018-08-14 17:28   ` [PATCH v4 4/6] rev-list: handle missing tree objects properly Matthew DeVore
@ 2018-08-14 17:28   ` Matthew DeVore
  2018-08-14 17:28   ` [PATCH v4 6/6] list-objects-filter: implement filter tree:0 Matthew DeVore
  5 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-14 17:28 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy

Currently, list-objects.c incorrectly treats all root trees of commits
as USER_GIVEN. Also, it would be easier to mark objects that are
non-user-given instead of user-given, since the places in the code
where we access an object through a reference are more obvious than
the places where we access an object that was given by the user.

Resolve these two problems by introducing a flag NOT_USER_GIVEN that
marks blobs and trees that are non-user-given, replacing USER_GIVEN.
(Only blobs and trees are marked because this mark is only used when
filtering objects, and filtering of other types of objects is not
supported yet.)

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 31 ++++++++++++++++++-------------
 revision.c     |  1 -
 revision.h     | 10 +++++++---
 3 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index e88474a2d..3aee1b8a4 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -53,7 +53,7 @@ static void process_blob(struct traversal_context *ctx,
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BLOB, obj,
 				   path->buf, &path->buf[pathlen],
 				   ctx->filter_data);
@@ -120,17 +120,19 @@ static void process_tree_contents(struct traversal_context *ctx,
 				continue;
 		}
 
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
+		if (S_ISDIR(entry.mode)) {
+			struct tree *t = lookup_tree(the_repository, entry.oid);
+			t->object.flags |= NOT_USER_GIVEN;
+			process_tree(ctx, t, base, entry.path);
+		}
 		else if (S_ISGITLINK(entry.mode))
 			process_gitlink(ctx, entry.oid->hash,
 					base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
+		else {
+			struct blob *b = lookup_blob(the_repository, entry.oid);
+			b->object.flags |= NOT_USER_GIVEN;
+			process_blob(ctx, b, base, entry.path);
+		}
 	}
 }
 
@@ -171,7 +173,7 @@ static void process_tree(struct traversal_context *ctx,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -191,7 +193,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (parsed)
 		process_tree_contents(ctx, tree, base);
 
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -307,8 +309,11 @@ static void do_traverse(struct traversal_context *ctx)
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
-		if (get_commit_tree(commit))
-			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		if (get_commit_tree(commit)) {
+			struct tree *tree = get_commit_tree(commit);
+			tree->object.flags |= NOT_USER_GIVEN;
+			add_pending_tree(ctx->revs, tree);
+		}
 		ctx->show_commit(commit, ctx->show_data);
 
 		if (ctx->revs->tree_blobs_in_commit_order)
diff --git a/revision.c b/revision.c
index 062749437..6d355b43c 100644
--- a/revision.c
+++ b/revision.c
@@ -175,7 +175,6 @@ static void add_pending_object_with_path(struct rev_info *revs,
 		strbuf_release(&buf);
 		return; /* do not add the commit itself */
 	}
-	obj->flags |= USER_GIVEN;
 	add_object_array_with_path(obj, name, &revs->pending, mode, path);
 }
 
diff --git a/revision.h b/revision.h
index c94243543..55d47004d 100644
--- a/revision.h
+++ b/revision.h
@@ -8,7 +8,11 @@
 #include "diff.h"
 #include "commit-slab-decl.h"
 
-/* Remember to update object flag allocation in object.h */
+/* Remember to update object flag allocation in object.h
+ * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
+ * filtering trees and blobs, but it may be useful to support filtering commits
+ * in the future.
+ */
 #define SEEN		(1u<<0)
 #define UNINTERESTING   (1u<<1)
 #define TREESAME	(1u<<2)
@@ -20,9 +24,9 @@
 #define SYMMETRIC_LEFT	(1u<<8)
 #define PATCHSAME	(1u<<9)
 #define BOTTOM		(1u<<10)
-#define USER_GIVEN	(1u<<25) /* given directly by the user */
+#define NOT_USER_GIVEN	(1u<<25) /* tree or blob not given directly by user */
 #define TRACK_LINEAR	(1u<<26)
-#define ALL_REV_FLAGS	(((1u<<11)-1) | USER_GIVEN | TRACK_LINEAR)
+#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)
 
 #define DECORATE_SHORT_REFS	1
 #define DECORATE_FULL_REFS	2
-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v4 6/6] list-objects-filter: implement filter tree:0
  2018-08-14 17:28 ` [PATCH v4 0/6] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (4 preceding siblings ...)
  2018-08-14 17:28   ` [PATCH v4 5/6] revision: mark non-user-given objects instead Matthew DeVore
@ 2018-08-14 17:28   ` Matthew DeVore
  2018-08-14 18:18     ` Jonathan Tan
  2018-08-14 20:01     ` Jeff King
  5 siblings, 2 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-14 17:28 UTC (permalink / raw)
  To: git; +Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy

Teach list-objects the "tree:0" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:0 - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also consider only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
would filter out all but the root tree and blobs. In order to avoid
confusion between 0 and capital O, the documentation was worded in a
somewhat round-about way that also hints at this future improvement to
the feature.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 Documentation/rev-list-options.txt     |  3 ++
 list-objects-filter-options.c          |  4 +++
 list-objects-filter-options.h          |  1 +
 list-objects-filter.c                  | 50 ++++++++++++++++++++++++++
 t/t5317-pack-objects-filter-objects.sh | 27 ++++++++++++++
 t/t5616-partial-clone.sh               | 38 ++++++++++++++++++++
 t/t6112-rev-list-filters-objects.sh    | 13 +++++++
 7 files changed, 136 insertions(+)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 7b273635d..9e351ec2a 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -743,6 +743,9 @@ specification contained in <path>.
 	A debug option to help with future "partial clone" development.
 	This option specifies how missing objects are handled.
 +
+The form '--filter=tree:<depth>' omits all blobs and trees deeper than
+<depth> from the root tree. Currently, only <depth>=0 is supported.
++
 The form '--missing=error' requests that rev-list stop with an error if
 a missing object is encountered.  This is the default action.
 +
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..a28382940 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
 			return 0;
 		}
 
+	} else if (!strcmp(arg, "tree:0")) {
+		filter_options->choice = LOFC_TREE_NONE;
+		return 0;
+
 	} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
 		struct object_context oc;
 		struct object_id sparse_oid;
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 0000a61f8..af64e5c66 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -10,6 +10,7 @@ enum list_objects_filter_choice {
 	LOFC_DISABLED = 0,
 	LOFC_BLOB_NONE,
 	LOFC_BLOB_LIMIT,
+	LOFC_TREE_NONE,
 	LOFC_SPARSE_OID,
 	LOFC_SPARSE_PATH,
 	LOFC__COUNT /* must be last */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..8e3caf5bf 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -80,6 +80,55 @@ static void *filter_blobs_none__init(
 	return d;
 }
 
+/*
+ * A filter for list-objects to omit ALL trees and blobs from the traversal.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_trees_none_data {
+	struct oidset *omits;
+};
+
+static enum list_objects_filter_result filter_trees_none(
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data_)
+{
+	struct filter_trees_none_data *filter_data = filter_data_;
+
+	switch (filter_situation) {
+	default:
+		die("unknown filter_situation");
+		return LOFR_ZERO;
+
+	case LOFS_BEGIN_TREE:
+	case LOFS_BLOB:
+		if (filter_data->omits)
+			oidset_insert(filter_data->omits, &obj->oid);
+		return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
+
+	case LOFS_END_TREE:
+		assert(obj->type == OBJ_TREE);
+		return LOFR_ZERO;
+
+	}
+}
+
+static void* filter_trees_none__init(
+	struct oidset *omitted,
+	struct list_objects_filter_options *filter_options,
+	filter_object_fn *filter_fn,
+	filter_free_fn *filter_free_fn)
+{
+	struct filter_trees_none_data *d = xcalloc(1, sizeof(*d));
+	d->omits = omitted;
+
+	*filter_fn = filter_trees_none;
+	*filter_free_fn = free;
+	return d;
+}
+
 /*
  * A filter for list-objects to omit large blobs.
  * And to OPTIONALLY collect a list of the omitted OIDs.
@@ -374,6 +423,7 @@ static filter_init_fn s_filters[] = {
 	NULL,
 	filter_blobs_none__init,
 	filter_blobs_limit__init,
+	filter_trees_none__init,
 	filter_sparse_oid__init,
 	filter_sparse_path__init,
 };
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 5e35f33bf..65f2cf446 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -72,6 +72,33 @@ test_expect_success 'get an error for missing tree object' '
 	grep -q "bad tree object" bad_tree
 '
 
+test_expect_success 'setup for tests of tree:0' '
+	mkdir r1/subtree &&
+	echo "This is a file in a subtree" > r1/subtree/file &&
+	git -C r1 add subtree/file &&
+	git -C r1 commit -m subtree
+'
+
+test_expect_success 'verify tree:0 packfile has no blobs or trees' '
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	! grep -E "tree|blob" objs
+'
+
+test_expect_success 'grab tree directly when using tree:0' '
+	# We should get the tree specified directly but not its blobs or subtrees.
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD:
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	grep -E "tree|blob" objs >trees_and_blobs &&
+	test_line_count = 1 trees_and_blobs
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index bbbe7537d..d2859aba1 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -154,6 +154,22 @@ test_expect_success 'partial clone with transfer.fsckobjects=1 uses index-pack -
 	grep "git index-pack.*--fsck-objects" trace
 '
 
+test_expect_success 'use fsck before and after manually fetching a missing subtree' '
+	# push new commit so server has a subtree
+	mkdir src/dir &&
+	echo "in dir" > src/dir/file.txt &&
+	git -C src add dir/file.txt &&
+	git -C src commit -m "file in dir" &&
+	git -C src push -u srv master &&
+	SUBTREE=$(git -C src rev-parse HEAD:dir) &&
+
+	rm -rf dst &&
+	git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
+	git -C dst fsck &&
+	git -C dst cat-file -p $SUBTREE >tree_contents 2>err &&
+	git -C dst fsck
+'
+
 test_expect_success 'partial clone fetches blobs pointed to by refs even if normally filtered out' '
 	rm -rf src dst &&
 	git init src &&
@@ -170,6 +186,28 @@ test_expect_success 'partial clone fetches blobs pointed to by refs even if norm
 	git -C dst fsck
 '
 
+test_expect_success 'can use tree:0 to filter partial clone' '
+	rm -rf dst &&
+	git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
+	git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
+	cat fetched_objects \
+		| awk -f print_1.awk \
+		| xargs -n1 git -C dst cat-file -t >fetched_types &&
+	sort fetched_types -u >unique_types.observed &&
+	echo commit > unique_types.expected &&
+	test_cmp unique_types.observed unique_types.expected
+'
+
+test_expect_success 'auto-fetching of trees with --missing=error' '
+	git -C dst rev-list master --missing=error --objects >fetched_objects &&
+	cat fetched_objects \
+		| awk -f print_1.awk \
+		| xargs -n1 git -C dst cat-file -t >fetched_types &&
+	sort fetched_types -u >unique_types.observed &&
+	printf "blob\ncommit\ntree\n" >unique_types.expected &&
+	test_cmp unique_types.observed unique_types.expected
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 0a37dd5f9..6ccffddbc 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -196,6 +196,19 @@ test_expect_success 'verify sparse:oid=oid-ish omits top-level files' '
 	test_cmp observed expected
 '
 
+# Test tree:0 filter.
+
+test_expect_success 'verify tree:0 includes trees in "filtered" output' '
+	git -C r3 rev-list HEAD --quiet --objects --filter-print-omitted --filter=tree:0 \
+		| awk -f print_1.awk \
+		| sed s/~// \
+		| xargs -n1 git -C r3 cat-file -t \
+		| sort -u >filtered_types &&
+	printf "blob\ntree\n" > expected &&
+	test_cmp filtered_types expected
+'
+
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 4/6] rev-list: handle missing tree objects properly
  2018-08-14 17:28   ` [PATCH v4 4/6] rev-list: handle missing tree objects properly Matthew DeVore
@ 2018-08-14 18:06     ` Jonathan Tan
  2018-08-14 22:43       ` Matthew DeVore
  0 siblings, 1 reply; 151+ messages in thread
From: Jonathan Tan @ 2018-08-14 18:06 UTC (permalink / raw)
  To: matvore; +Cc: git, git, jeffhost, peff, stefanbeller, jonathantanmy

> Previously, we assumed only blob objects could be missing. This patch
> makes rev-list handle missing trees like missing blobs. A missing tree
> will cause an error if --missing indicates an error should be caused,
> and the hash is printed even if the tree is missing.

The last sentence is difficult to understand - probably better to say
that all --missing= arguments and --exclude-promisor-objects work for
missing trees like they currently do for blobs (and do not fixate on
just --missing=error). And also demonstrate this in tests, like in
t6612.

> In list-objects.c we no longer print a message to stderr if a tree
> object is missing (quiet_on_missing is always true). I couldn't find
> any place where this would matter, or where the caller of
> traverse_commit_list would need to be fixed to show the error. However,
> in the future it would be trivial to make the caller show the message if
> we needed to.
> 
> This is not tested very thoroughly, since we cannot create promisor
> objects in tests without using an actual partial clone. t0410 has a
> promise_and_delete utility function, but the is_promisor_object function
> does not return 1 for objects deleted in this way. More tests will will
> come in a patch that implements a filter that can be used with git
> clone.

These two paragraphs are no longer applicable, I think.

> @@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  	init_revisions(&revs, prefix);
>  	revs.abbrev = DEFAULT_ABBREV;
>  	revs.commit_format = CMIT_FMT_UNSPECIFIED;
> +	revs.do_not_die_on_missing_tree = 1;

Is this correct? I would have expected this to be set only if --missing
was set.

> -	process_tree_contents(ctx, tree, base);
> +	/*
> +	 * NEEDSWORK: we should not have to process this tree's contents if the
> +	 * filter wants to exclude all its contents AND the filter doesn't need
> +	 * to collect the omitted OIDs. We should add a LOFR_SKIP_TREE bit which
> +	 * allows skipping all children.
> +	 */
> +	if (parsed)
> +		process_tree_contents(ctx, tree, base);

I agree with Jeff Hostetler in [1] that a LOFR_SKIP_TREE bit is
desirable, but I don't think that this patch is the right place to
introduce this NEEDSWORK. For me, this patch is about skipping iterating
over the contents of a tree because the tree does not exist; this
NEEDSWORK is about skipping iterating over the contents of a tree
because we don't want its contents, and it is quite confusing to
conflate the two.

[1] https://public-inbox.org/git/d751d56b-84bb-a03d-5f2a-7dbaf8d947cc@jeffhostetler.com/

> @@ -124,6 +124,7 @@ struct rev_info {
>  			first_parent_only:1,
>  			line_level_traverse:1,
>  			tree_blobs_in_commit_order:1,
> +			do_not_die_on_missing_tree:1,

I know that the other flags don't have documentation, but I think it's
worth documenting this one because it is rather complicated. I have
provided a sample one in my earlier review - feel free to use that or
come up with your own.

> diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
> index 4984ca583..74e3c5767 100755
> --- a/t/t0410-partial-clone.sh
> +++ b/t/t0410-partial-clone.sh
> @@ -186,6 +186,72 @@ test_expect_success 'rev-list stops traversal at missing and promised commit' '
>  	! grep $FOO out
>  '
>  
> +test_expect_success 'show missing tree objects with --missing=print' '
> +	rm -rf repo &&
> +	test_create_repo repo &&
> +	test_commit -C repo foo &&
> +	test_commit -C repo bar &&
> +	test_commit -C repo baz &&
> +
> +	TREE=$(git -C repo rev-parse bar^{tree}) &&
> +
> +	promise_and_delete $TREE &&
> +
> +	git -C repo config core.repositoryformatversion 1 &&
> +	git -C repo config extensions.partialclone "arbitrary string" &&
> +	git -C repo rev-list --quiet --missing=print --objects HEAD >missing_objs 2>rev_list_err &&
> +	echo "?$TREE" >expected &&
> +	test_cmp expected missing_objs &&
> +
> +	# do not complain when a missing tree cannot be parsed
> +	! grep -q "Could not read " rev_list_err
> +'

I think that the --exclude-promisor-tests can go in t0410 as you have
done, but the --missing tests (except for --missing=allow-promisor)
should go in t6112. (And like the existing --missing tests, they should
be done without setting extensions.partialclone.)

As for --missing=allow-promisor, I don't see them being tested anywhere
:-( so feel free to make a suggestion. I would put them in t6112 for
easy comparison with the other --missing tests.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0
  2018-08-14 17:28   ` [PATCH v4 6/6] list-objects-filter: implement filter tree:0 Matthew DeVore
@ 2018-08-14 18:18     ` Jonathan Tan
  2018-08-14 20:00       ` Matthew DeVore
  2018-08-14 20:01     ` Jeff King
  1 sibling, 1 reply; 151+ messages in thread
From: Jonathan Tan @ 2018-08-14 18:18 UTC (permalink / raw)
  To: matvore; +Cc: git, git, jeffhost, peff, stefanbeller, jonathantanmy

> @@ -743,6 +743,9 @@ specification contained in <path>.
>  	A debug option to help with future "partial clone" development.
>  	This option specifies how missing objects are handled.
>  +
> +The form '--filter=tree:<depth>' omits all blobs and trees deeper than
> +<depth> from the root tree. Currently, only <depth>=0 is supported.
> ++
>  The form '--missing=error' requests that rev-list stop with an error if
>  a missing object is encountered.  This is the default action.
>  +

The "--filter" documentation should go with the other "--filter"
information, not right after --missing.

> +test_expect_success 'setup for tests of tree:0' '
> +	mkdir r1/subtree &&
> +	echo "This is a file in a subtree" > r1/subtree/file &&
> +	git -C r1 add subtree/file &&
> +	git -C r1 commit -m subtree
> +'

Style: no space after >

> +test_expect_success 'grab tree directly when using tree:0' '
> +	# We should get the tree specified directly but not its blobs or subtrees.
> +	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
> +	HEAD:
> +	EOF
> +	git -C r1 index-pack ../commitsonly.pack &&
> +	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
> +	grep -E "tree|blob" objs >trees_and_blobs &&
> +	test_line_count = 1 trees_and_blobs
> +'

Can we also verify that the SHA-1 in trees_and_blobs is what we
expected?

> +test_expect_success 'use fsck before and after manually fetching a missing subtree' '
> +	# push new commit so server has a subtree
> +	mkdir src/dir &&
> +	echo "in dir" > src/dir/file.txt &&

No space after >

> +	git -C src add dir/file.txt &&
> +	git -C src commit -m "file in dir" &&
> +	git -C src push -u srv master &&
> +	SUBTREE=$(git -C src rev-parse HEAD:dir) &&
> +
> +	rm -rf dst &&
> +	git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
> +	git -C dst fsck &&
> +	git -C dst cat-file -p $SUBTREE >tree_contents 2>err &&
> +	git -C dst fsck
> +'

If you don't need to redirect to err, don't do so.

Before the cat-file, also verify that the tree is missing, most likely
through a "git rev-list" with "--missing=print".

And I would grep on the tree_contents to ensure that the filename
("file.txt") is there, so that we know that we got the correct tree.

> +test_expect_success 'can use tree:0 to filter partial clone' '
> +	rm -rf dst &&
> +	git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
> +	git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
> +	cat fetched_objects \
> +		| awk -f print_1.awk \
> +		| xargs -n1 git -C dst cat-file -t >fetched_types &&
> +	sort fetched_types -u >unique_types.observed &&
> +	echo commit > unique_types.expected &&
> +	test_cmp unique_types.observed unique_types.expected
> +'
> +
> +test_expect_success 'auto-fetching of trees with --missing=error' '
> +	git -C dst rev-list master --missing=error --objects >fetched_objects &&
> +	cat fetched_objects \
> +		| awk -f print_1.awk \
> +		| xargs -n1 git -C dst cat-file -t >fetched_types &&
> +	sort fetched_types -u >unique_types.observed &&
> +	printf "blob\ncommit\ntree\n" >unique_types.expected &&
> +	test_cmp unique_types.observed unique_types.expected
> +'

These two tests seem redundant with the 'use fsck before and after
manually fetching a missing subtree' test (after the latter is
appropriately renamed). I think we only need to test this sequence once,
which can be placed in one or spread over multiple tests:

 1. partial clone with --filter=tree:0
 2. fsck works
 3. verify that trees are indeed missing
 4. autofetch a tree
 5. fsck still works

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0
  2018-08-14 18:18     ` Jonathan Tan
@ 2018-08-14 20:00       ` Matthew DeVore
  2018-08-14 20:19         ` Jonathan Tan
  0 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-08-14 20:00 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, git, jeffhost, Jeff King, Stefan Beller

On Tue, Aug 14, 2018 at 11:18 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > @@ -743,6 +743,9 @@ specification contained in <path>.
> >       A debug option to help with future "partial clone" development.
> >       This option specifies how missing objects are handled.
> >  +
> > +The form '--filter=tree:<depth>' omits all blobs and trees deeper than
> > +<depth> from the root tree. Currently, only <depth>=0 is supported.
> > ++
> >  The form '--missing=error' requests that rev-list stop with an error if
> >  a missing object is encountered.  This is the default action.
> >  +
>
> The "--filter" documentation should go with the other "--filter"
> information, not right after --missing.
Fixed. My problem was that I didn't know what the + meant - I guess it
means that the paragraph before and after are in the same section?

>
> > +test_expect_success 'setup for tests of tree:0' '
> > +     mkdir r1/subtree &&
> > +     echo "This is a file in a subtree" > r1/subtree/file &&
> > +     git -C r1 add subtree/file &&
> > +     git -C r1 commit -m subtree
> > +'
>
> Style: no space after >
Fixed.

>
> > +test_expect_success 'grab tree directly when using tree:0' '
> > +     # We should get the tree specified directly but not its blobs or subtrees.
> > +     git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
> > +     HEAD:
> > +     EOF
> > +     git -C r1 index-pack ../commitsonly.pack &&
> > +     git -C r1 verify-pack -v ../commitsonly.pack >objs &&
> > +     grep -E "tree|blob" objs >trees_and_blobs &&
> > +     test_line_count = 1 trees_and_blobs
> > +'
>
> Can we also verify that the SHA-1 in trees_and_blobs is what we
> expected?
Done - Now I'm comparing to the output of `git rev-parse HEAD:` and I
don't need the separate line count check either.
>
> > +test_expect_success 'use fsck before and after manually fetching a missing subtree' '
> > +     # push new commit so server has a subtree
> > +     mkdir src/dir &&
> > +     echo "in dir" > src/dir/file.txt &&
>
> No space after >
Fixed.

>
> > +     git -C src add dir/file.txt &&
> > +     git -C src commit -m "file in dir" &&
> > +     git -C src push -u srv master &&
> > +     SUBTREE=$(git -C src rev-parse HEAD:dir) &&
> > +
> > +     rm -rf dst &&
> > +     git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
> > +     git -C dst fsck &&
> > +     git -C dst cat-file -p $SUBTREE >tree_contents 2>err &&
> > +     git -C dst fsck
> > +'
>
> If you don't need to redirect to err, don't do so.
>
> Before the cat-file, also verify that the tree is missing, most likely
> through a "git rev-list" with "--missing=print".
That won't work though - the subtree's hash is not known because its
parent tree is not there. I've merged the three tests in this file,
and as a result am now using the check which makes sure the object
types are only "commit"

>
> And I would grep on the tree_contents to ensure that the filename
> ("file.txt") is there, so that we know that we got the correct tree.
Done.

>
> > +test_expect_success 'can use tree:0 to filter partial clone' '
> > +     rm -rf dst &&
> > +     git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
> > +     git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
> > +     cat fetched_objects \
> > +             | awk -f print_1.awk \
> > +             | xargs -n1 git -C dst cat-file -t >fetched_types &&
> > +     sort fetched_types -u >unique_types.observed &&
> > +     echo commit > unique_types.expected &&
> > +     test_cmp unique_types.observed unique_types.expected
> > +'
> > +
> > +test_expect_success 'auto-fetching of trees with --missing=error' '
> > +     git -C dst rev-list master --missing=error --objects >fetched_objects &&
> > +     cat fetched_objects \
> > +             | awk -f print_1.awk \
> > +             | xargs -n1 git -C dst cat-file -t >fetched_types &&
> > +     sort fetched_types -u >unique_types.observed &&
> > +     printf "blob\ncommit\ntree\n" >unique_types.expected &&
> > +     test_cmp unique_types.observed unique_types.expected
> > +'
>
> These two tests seem redundant with the 'use fsck before and after
> manually fetching a missing subtree' test (after the latter is
> appropriately renamed). I think we only need to test this sequence once,
> which can be placed in one or spread over multiple tests:
>
>  1. partial clone with --filter=tree:0
>  2. fsck works
>  3. verify that trees are indeed missing
>  4. autofetch a tree
>  5. fsck still works
Done - that's much nicer. Thanks!

Here is an interdiff from v4 of the patch:

diff --git a/Documentation/rev-list-options.txt
b/Documentation/rev-list-options.txt
index 9e351ec2a..0b5f77ad3 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -731,6 +731,9 @@ the requested refs.
 +
 The form '--filter=sparse:path=<path>' similarly uses a sparse-checkout
 specification contained in <path>.
++
+The form '--filter=tree:<depth>' omits all blobs and trees deeper than
+<depth> from the root tree. Currently, only <depth>=0 is supported.

 --no-filter::
  Turn off any previous `--filter=` argument.
@@ -743,9 +746,6 @@ specification contained in <path>.
  A debug option to help with future "partial clone" development.
  This option specifies how missing objects are handled.
 +
-The form '--filter=tree:<depth>' omits all blobs and trees deeper than
-<depth> from the root tree. Currently, only <depth>=0 is supported.
-+
 The form '--missing=error' requests that rev-list stop with an error if
 a missing object is encountered.  This is the default action.
 +
diff --git a/t/t5317-pack-objects-filter-objects.sh
b/t/t5317-pack-objects-filter-objects.sh
index 65f2cf446..fe7c13a03 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -74,7 +74,7 @@ test_expect_success 'get an error for missing tree object' '

 test_expect_success 'setup for tests of tree:0' '
  mkdir r1/subtree &&
- echo "This is a file in a subtree" > r1/subtree/file &&
+ echo "This is a file in a subtree" >r1/subtree/file &&
  git -C r1 add subtree/file &&
  git -C r1 commit -m subtree
 '
@@ -95,8 +95,10 @@ test_expect_success 'grab tree directly when using tree:0' '
  EOF
  git -C r1 index-pack ../commitsonly.pack &&
  git -C r1 verify-pack -v ../commitsonly.pack >objs &&
- grep -E "tree|blob" objs >trees_and_blobs &&
- test_line_count = 1 trees_and_blobs
+ grep -E "tree|blob" objs \
+ | awk -f print_1.awk >trees_and_blobs &&
+ git -C r1 rev-parse HEAD: >expected &&
+ test_cmp trees_and_blobs expected
 '

 # Test blob:limit=<n>[kmg] filter.
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index d2859aba1..c3186d934 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -157,7 +157,7 @@ test_expect_success 'partial clone with
transfer.fsckobjects=1 uses index-pack -
 test_expect_success 'use fsck before and after manually fetching a
missing subtree' '
  # push new commit so server has a subtree
  mkdir src/dir &&
- echo "in dir" > src/dir/file.txt &&
+ echo "in dir" >src/dir/file.txt &&
  git -C src add dir/file.txt &&
  git -C src commit -m "file in dir" &&
  git -C src push -u srv master &&
@@ -166,8 +166,32 @@ test_expect_success 'use fsck before and after
manually fetching a missing subtr
  rm -rf dst &&
  git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
  git -C dst fsck &&
- git -C dst cat-file -p $SUBTREE >tree_contents 2>err &&
- git -C dst fsck
+
+ # Make sure we only have commits, and all trees and blobs are missing.
+ git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
+ cat fetched_objects \
+ | awk -f print_1.awk \
+ | xargs -n1 git -C dst cat-file -t >fetched_types &&
+ sort fetched_types -u >unique_types.observed &&
+ echo commit >unique_types.expected &&
+ test_cmp unique_types.observed unique_types.expected &&
+
+ # Auto-fetch a tree with cat-file.
+ git -C dst cat-file -p $SUBTREE >tree_contents &&
+ grep file.txt tree_contents &&
+
+ # fsck still works after an auto-fetch of a tree.
+ git -C dst fsck &&
+
+ # Auto-fetch all remaining trees and blobs with --missing=error
+ git -C dst rev-list master --missing=error --objects >fetched_objects &&
+ test_line_count = 70 fetched_objects &&
+ cat fetched_objects \
+ | awk -f print_1.awk \
+ | xargs -n1 git -C dst cat-file -t >fetched_types &&
+ sort fetched_types -u >unique_types.observed &&
+ printf "blob\ncommit\ntree\n" >unique_types.expected &&
+ test_cmp unique_types.observed unique_types.expected
 '

 test_expect_success 'partial clone fetches blobs pointed to by refs
even if normally filtered out' '
@@ -186,28 +210,6 @@ test_expect_success 'partial clone fetches blobs
pointed to by refs even if norm
  git -C dst fsck
 '

-test_expect_success 'can use tree:0 to filter partial clone' '
- rm -rf dst &&
- git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
- git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
- cat fetched_objects \
- | awk -f print_1.awk \
- | xargs -n1 git -C dst cat-file -t >fetched_types &&
- sort fetched_types -u >unique_types.observed &&
- echo commit > unique_types.expected &&
- test_cmp unique_types.observed unique_types.expected
-'
-
-test_expect_success 'auto-fetching of trees with --missing=error' '
- git -C dst rev-list master --missing=error --objects >fetched_objects &&
- cat fetched_objects \
- | awk -f print_1.awk \
- | xargs -n1 git -C dst cat-file -t >fetched_types &&
- sort fetched_types -u >unique_types.observed &&
- printf "blob\ncommit\ntree\n" >unique_types.expected &&
- test_cmp unique_types.observed unique_types.expected
-'
-
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0
  2018-08-14 17:28   ` [PATCH v4 6/6] list-objects-filter: implement filter tree:0 Matthew DeVore
  2018-08-14 18:18     ` Jonathan Tan
@ 2018-08-14 20:01     ` Jeff King
  2018-08-14 23:55       ` Matthew DeVore
  1 sibling, 1 reply; 151+ messages in thread
From: Jeff King @ 2018-08-14 20:01 UTC (permalink / raw)
  To: Matthew DeVore; +Cc: git, git, jeffhost, stefanbeller, jonathantanmy

On Tue, Aug 14, 2018 at 10:28:13AM -0700, Matthew DeVore wrote:

> The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
> would filter out all but the root tree and blobs. In order to avoid
> confusion between 0 and capital O, the documentation was worded in a
> somewhat round-about way that also hints at this future improvement to
> the feature.

I'm OK with this as a name, since we're explicitly not supporting deeper
depths. But I'd note that "depth" is actually a tricky characteristic,
as it's not a property of the object itself, but rather who refers to
it. So:

  - it's expensive to compute, because you have to actually walk all of
    the possible commits and trees that could refer to it. This
    prohibits a lot of other optimizations like reachability bitmaps
    (though with some complexity you could cache the depths, too).

  - you have to define it as something like "the minimum depth at which
    this object is found", since there may be multiple depths

I think you can read that second definition between the lines of:

> +The form '--filter=tree:<depth>' omits all blobs and trees deeper than
> +<depth> from the root tree. Currently, only <depth>=0 is supported.

But I wonder if we should be more precise. It doesn't matter now, but it
may help set expectations if the feature does come later.

-Peff

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0
  2018-08-14 20:00       ` Matthew DeVore
@ 2018-08-14 20:19         ` Jonathan Tan
  2018-08-14 20:55           ` Junio C Hamano
  0 siblings, 1 reply; 151+ messages in thread
From: Jonathan Tan @ 2018-08-14 20:19 UTC (permalink / raw)
  To: matvore; +Cc: jonathantanmy, git, git, jeffhost, peff, stefanbeller

> - grep -E "tree|blob" objs >trees_and_blobs &&
> - test_line_count = 1 trees_and_blobs
> + grep -E "tree|blob" objs \
> + | awk -f print_1.awk >trees_and_blobs &&
> + git -C r1 rev-parse HEAD: >expected &&
> + test_cmp trees_and_blobs expected

Indent "| awk" (and similar lines in this patch) - although I guess it
is likely that you actually have it indented, and your e-mail client
modified the whitespace so that it looks like there is no indent.

Other than that, this interdiff looks good to me. Thanks.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0
  2018-08-14 20:19         ` Jonathan Tan
@ 2018-08-14 20:55           ` Junio C Hamano
  2018-08-14 23:30             ` Matthew DeVore
  0 siblings, 1 reply; 151+ messages in thread
From: Junio C Hamano @ 2018-08-14 20:55 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: matvore, git, git, jeffhost, peff, stefanbeller

Jonathan Tan <jonathantanmy@google.com> writes:

>> - grep -E "tree|blob" objs >trees_and_blobs &&
>> - test_line_count = 1 trees_and_blobs
>> + grep -E "tree|blob" objs \
>> + | awk -f print_1.awk >trees_and_blobs &&
>> + git -C r1 rev-parse HEAD: >expected &&
>> + test_cmp trees_and_blobs expected
>
> Indent "| awk" (and similar lines in this patch) - although I guess it
> is likely that you actually have it indented, and your e-mail client
> modified the whitespace so that it looks like there is no indent.

No, wrap lines like this

	command1 arg1 arg2 |
	command2 arg1 arg2 &&

That way, you do not need backslash to continue line.

Also think twice when you are seeing yourself piping output from
"grep" to more powerful tools like "perl", "awk" or "sed".  Often
you can lose the upstream "grep".

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 4/6] rev-list: handle missing tree objects properly
  2018-08-14 18:06     ` Jonathan Tan
@ 2018-08-14 22:43       ` Matthew DeVore
  2018-08-14 22:56         ` Jonathan Tan
  0 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-08-14 22:43 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, git, jeffhost, Jeff King, Stefan Beller

On Tue, Aug 14, 2018 at 11:06 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > Previously, we assumed only blob objects could be missing. This patch
> > makes rev-list handle missing trees like missing blobs. A missing tree
> > will cause an error if --missing indicates an error should be caused,
> > and the hash is printed even if the tree is missing.
>
> The last sentence is difficult to understand - probably better to say
> that all --missing= arguments and --exclude-promisor-objects work for
> missing trees like they currently do for blobs (and do not fixate on
> just --missing=error). And also demonstrate this in tests, like in
> t6612.
Fixed the commit message. And for the tests, in t0410 I changed the
--missing=allow-any to --missing-allow-promisor, and in t6112 I added
--missing=allow-any and --missing=print test cases.

>
> > In list-objects.c we no longer print a message to stderr if a tree
> > object is missing (quiet_on_missing is always true). I couldn't find
> > any place where this would matter, or where the caller of
> > traverse_commit_list would need to be fixed to show the error. However,
> > in the future it would be trivial to make the caller show the message if
> > we needed to.
> >
> > This is not tested very thoroughly, since we cannot create promisor
> > objects in tests without using an actual partial clone. t0410 has a
> > promise_and_delete utility function, but the is_promisor_object function
> > does not return 1 for objects deleted in this way. More tests will will
> > come in a patch that implements a filter that can be used with git
> > clone.
>
> These two paragraphs are no longer applicable, I think.
Sorry about that. Removed.

>
> > @@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
> >       init_revisions(&revs, prefix);
> >       revs.abbrev = DEFAULT_ABBREV;
> >       revs.commit_format = CMIT_FMT_UNSPECIFIED;
> > +     revs.do_not_die_on_missing_tree = 1;
>
> Is this correct? I would have expected this to be set only if --missing
> was set.
If --missing is not set, then we want to fetch missing objects
automatically, and then die if we fail to do that, which is what
happens for blobs. So we don't want to die in list-objects.c. If we
fail to fetch, then we will die on line 213 in rev-list.c.

>
> > -     process_tree_contents(ctx, tree, base);
> > +     /*
> > +      * NEEDSWORK: we should not have to process this tree's contents if the
> > +      * filter wants to exclude all its contents AND the filter doesn't need
> > +      * to collect the omitted OIDs. We should add a LOFR_SKIP_TREE bit which
> > +      * allows skipping all children.
> > +      */
> > +     if (parsed)
> > +             process_tree_contents(ctx, tree, base);
>
> I agree with Jeff Hostetler in [1] that a LOFR_SKIP_TREE bit is
> desirable, but I don't think that this patch is the right place to
> introduce this NEEDSWORK. For me, this patch is about skipping iterating
> over the contents of a tree because the tree does not exist; this
> NEEDSWORK is about skipping iterating over the contents of a tree
> because we don't want its contents, and it is quite confusing to
> conflate the two.
I've removed this.

>
> [1] https://public-inbox.org/git/d751d56b-84bb-a03d-5f2a-7dbaf8d947cc@jeffhostetler.com/
>
> > @@ -124,6 +124,7 @@ struct rev_info {
> >                       first_parent_only:1,
> >                       line_level_traverse:1,
> >                       tree_blobs_in_commit_order:1,
> > +                     do_not_die_on_missing_tree:1,
>
> I know that the other flags don't have documentation, but I think it's
> worth documenting this one because it is rather complicated. I have
> provided a sample one in my earlier review - feel free to use that or
> come up with your own.
Added your wording to revision.h without major change.

>
> > diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
> > index 4984ca583..74e3c5767 100755
> > --- a/t/t0410-partial-clone.sh
> > +++ b/t/t0410-partial-clone.sh
> > @@ -186,6 +186,72 @@ test_expect_success 'rev-list stops traversal at missing and promised commit' '
> >       ! grep $FOO out
> >  '
> >
> > +test_expect_success 'show missing tree objects with --missing=print' '
> > +     rm -rf repo &&
> > +     test_create_repo repo &&
> > +     test_commit -C repo foo &&
> > +     test_commit -C repo bar &&
> > +     test_commit -C repo baz &&
> > +
> > +     TREE=$(git -C repo rev-parse bar^{tree}) &&
> > +
> > +     promise_and_delete $TREE &&
> > +
> > +     git -C repo config core.repositoryformatversion 1 &&
> > +     git -C repo config extensions.partialclone "arbitrary string" &&
> > +     git -C repo rev-list --quiet --missing=print --objects HEAD >missing_objs 2>rev_list_err &&
> > +     echo "?$TREE" >expected &&
> > +     test_cmp expected missing_objs &&
> > +
> > +     # do not complain when a missing tree cannot be parsed
> > +     ! grep -q "Could not read " rev_list_err
> > +'
>
> I think that the --exclude-promisor-tests can go in t0410 as you have
> done, but the --missing tests (except for --missing=allow-promisor)
> should go in t6112. (And like the existing --missing tests, they should
> be done without setting extensions.partialclone.)
Done.

>
> As for --missing=allow-promisor, I don't see them being tested anywhere
> :-( so feel free to make a suggestion. I would put them in t6112 for
> easy comparison with the other --missing tests.
Kept my allow-promisor test in t0410 since it requires partial clone
to be turned on in the config, and because it is pretty similar to
--exclude-promisor-objects.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 4/6] rev-list: handle missing tree objects properly
  2018-08-14 22:43       ` Matthew DeVore
@ 2018-08-14 22:56         ` Jonathan Tan
  2018-08-14 23:14           ` Jonathan Tan
  0 siblings, 1 reply; 151+ messages in thread
From: Jonathan Tan @ 2018-08-14 22:56 UTC (permalink / raw)
  To: matvore; +Cc: jonathantanmy, git, git, jeffhost, peff, stefanbeller

> > > @@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
> > >       init_revisions(&revs, prefix);
> > >       revs.abbrev = DEFAULT_ABBREV;
> > >       revs.commit_format = CMIT_FMT_UNSPECIFIED;
> > > +     revs.do_not_die_on_missing_tree = 1;
> >
> > Is this correct? I would have expected this to be set only if --missing
> > was set.
> If --missing is not set, then we want to fetch missing objects
> automatically, and then die if we fail to do that, which is what
> happens for blobs.

This is true, and should already be handled. Pay attention to when
fetch_if_missing is set in builtin/rev-list.c.

do_not_die_on_missing_tree should probably be set to 1 whenever
fetch_if_missing is set to 0, I think.

(I acknowledge that the usage of this global variable is confusing, but
I couldn't think of a better way to implement this when I did. Perhaps
when the object store refactoring is done, this can be a store-specific
setting instead of a global variable.)

> So we don't want to die in list-objects.c. If we
> fail to fetch, then we will die on line 213 in rev-list.c.

Why don't we want to die in list-objects.c? When --missing=error is
passed, fetch_if_missing retains its default value of 1, so
parse_tree_gently() will attempt to fetch it - and if it fails, I think
it's appropriate to die in list-objects.c (and this should be the
current behavior). On other values, e.g. --missing=allow-any, there is
no autofetch (since fetch_if_missing is 0), so it is correct not to die
in list-objects.c.

> > As for --missing=allow-promisor, I don't see them being tested anywhere
> > :-( so feel free to make a suggestion. I would put them in t6112 for
> > easy comparison with the other --missing tests.
> Kept my allow-promisor test in t0410 since it requires partial clone
> to be turned on in the config, and because it is pretty similar to
> --exclude-promisor-objects.

OK, sounds good to me.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 4/6] rev-list: handle missing tree objects properly
  2018-08-14 22:56         ` Jonathan Tan
@ 2018-08-14 23:14           ` Jonathan Tan
  0 siblings, 0 replies; 151+ messages in thread
From: Jonathan Tan @ 2018-08-14 23:14 UTC (permalink / raw)
  To: jonathantanmy; +Cc: matvore, git, git, jeffhost, peff, stefanbeller

> > So we don't want to die in list-objects.c. If we
> > fail to fetch, then we will die on line 213 in rev-list.c.
> 
> Why don't we want to die in list-objects.c? When --missing=error is
> passed, fetch_if_missing retains its default value of 1, so
> parse_tree_gently() will attempt to fetch it - and if it fails, I think
> it's appropriate to die in list-objects.c (and this should be the
> current behavior). On other values, e.g. --missing=allow-any, there is
> no autofetch (since fetch_if_missing is 0), so it is correct not to die
> in list-objects.c.

After some in-office discussion, I should have checked line 213 in
builtin/rev-list.c more thorougly. Indeed it is OK not to die in
list-objects.c here, since builtin/rev-list.c already knows how to
handle missing objects in the --missing=error circumstance.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0
  2018-08-14 20:55           ` Junio C Hamano
@ 2018-08-14 23:30             ` Matthew DeVore
  2018-08-15 16:14               ` Junio C Hamano
  0 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-08-14 23:30 UTC (permalink / raw)
  To: gitster; +Cc: Jonathan Tan, git, git, jeffhost, Jeff King, Stefan Beller

On Tue, Aug 14, 2018 at 1:55 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jonathan Tan <jonathantanmy@google.com> writes:
>
> >> - grep -E "tree|blob" objs >trees_and_blobs &&
> >> - test_line_count = 1 trees_and_blobs
> >> + grep -E "tree|blob" objs \
> >> + | awk -f print_1.awk >trees_and_blobs &&
> >> + git -C r1 rev-parse HEAD: >expected &&
> >> + test_cmp trees_and_blobs expected
> >
> > Indent "| awk" (and similar lines in this patch) - although I guess it
> > is likely that you actually have it indented, and your e-mail client
> > modified the whitespace so that it looks like there is no indent.
>
> No, wrap lines like this
>
>         command1 arg1 arg2 |
>         command2 arg1 arg2 &&
>
> That way, you do not need backslash to continue line.
>
> Also think twice when you are seeing yourself piping output from
> "grep" to more powerful tools like "perl", "awk" or "sed".  Often
> you can lose the upstream "grep".

Thank you. I changed it to this:
  awk -e "/tree|blob/{print \$1}" objs >trees_and_blobs

About the line wrapping strategy, the files I've edited all did it
with the \ and the | at the start of subsequent lines - so I made my
code match that style. Otherwise I would have liked to use the style
you suggest...

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0
  2018-08-14 20:01     ` Jeff King
@ 2018-08-14 23:55       ` Matthew DeVore
  2018-08-15  1:22         ` Jeff King
  0 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-08-14 23:55 UTC (permalink / raw)
  To: Jeff King; +Cc: git, git, jeffhost, Stefan Beller, Jonathan Tan

On Tue, Aug 14, 2018 at 1:01 PM Jeff King <peff@peff.net> wrote:
>
> On Tue, Aug 14, 2018 at 10:28:13AM -0700, Matthew DeVore wrote:
>
> > The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
> > would filter out all but the root tree and blobs. In order to avoid
> > confusion between 0 and capital O, the documentation was worded in a
> > somewhat round-about way that also hints at this future improvement to
> > the feature.
>
> I'm OK with this as a name, since we're explicitly not supporting deeper
> depths. But I'd note that "depth" is actually a tricky characteristic,
> as it's not a property of the object itself, but rather who refers to
> it. So:
>
>   - it's expensive to compute, because you have to actually walk all of
>     the possible commits and trees that could refer to it. This
>     prohibits a lot of other optimizations like reachability bitmaps
>     (though with some complexity you could cache the depths, too).
I think what the user likely wants is to use the minimum depth based
on the commits in the traversal, not every commit in the repo - is
this what you mean?

>
>   - you have to define it as something like "the minimum depth at which
>     this object is found", since there may be multiple depths
>
> I think you can read that second definition between the lines of:
>
> > +The form '--filter=tree:<depth>' omits all blobs and trees deeper than
> > +<depth> from the root tree. Currently, only <depth>=0 is supported.
>
> But I wonder if we should be more precise. It doesn't matter now, but it
> may help set expectations if the feature does come later.
>
Makes sense. I changed it like this -

diff --git a/Documentation/rev-list-options.txt
b/Documentation/rev-list-options.txt
index 0b5f77ad3..5f1672913 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -732,8 +732,10 @@ the requested refs.
 The form '--filter=sparse:path=<path>' similarly uses a sparse-checkout
 specification contained in <path>.
 +
-The form '--filter=tree:<depth>' omits all blobs and trees deeper than
-<depth> from the root tree. Currently, only <depth>=0 is supported.
+The form '--filter=tree:<depth>' omits all blobs and trees whose depth
+from the root tree is >= <depth> (minimum depth if an object is located
+at multiple depths in the commits traversed). Currently, only <depth>=0
+is supported, which omits all blobs and trees.

 --no-filter::
  Turn off any previous `--filter=` argument.


> -Peff

^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v5 0/6] filter: support for excluding all trees and blobs
  2018-08-09 22:44 [RFC PATCH 0/5] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (8 preceding siblings ...)
  2018-08-14 17:28 ` [PATCH v4 0/6] filter: support for excluding all trees and blobs Matthew DeVore
@ 2018-08-15  0:22 ` Matthew DeVore
  2018-08-15  0:22   ` [PATCH v5 1/6] list-objects: store common func args in struct Matthew DeVore
                     ` (5 more replies)
  2018-08-15 23:19 ` [PATCH v6 0/6] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (6 subsequent siblings)
  16 siblings, 6 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-15  0:22 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster

Please take a look. I believe I have applied or responded to all suggestions
since the last iteration.

Matthew DeVore (6):
  list-objects: store common func args in struct
  list-objects: refactor to process_tree_contents
  list-objects: always parse trees gently
  rev-list: handle missing tree objects properly
  revision: mark non-user-given objects instead
  list-objects-filter: implement filter tree:0

 Documentation/rev-list-options.txt     |   5 +
 builtin/rev-list.c                     |  11 +-
 list-objects-filter-options.c          |   4 +
 list-objects-filter-options.h          |   1 +
 list-objects-filter.c                  |  50 ++++++
 list-objects.c                         | 232 +++++++++++++------------
 revision.c                             |   1 -
 revision.h                             |  25 ++-
 t/t0410-partial-clone.sh               |  45 +++++
 t/t5317-pack-objects-filter-objects.sh |  41 +++++
 t/t5616-partial-clone.sh               |  38 ++++
 t/t6112-rev-list-filters-objects.sh    |  29 ++++
 12 files changed, 364 insertions(+), 118 deletions(-)

-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v5 1/6] list-objects: store common func args in struct
  2018-08-15  0:22 ` [PATCH v5 0/6] filter: support for excluding all trees and blobs Matthew DeVore
@ 2018-08-15  0:22   ` Matthew DeVore
  2018-08-15  0:22   ` [PATCH v5 2/6] list-objects: refactor to process_tree_contents Matthew DeVore
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-15  0:22 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster

This will make utility functions easier to create, as done by the next
patch.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 158 +++++++++++++++++++++++--------------------------
 1 file changed, 74 insertions(+), 84 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index c99c47ac1..584518a3f 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -12,20 +12,25 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static void process_blob(struct rev_info *revs,
+struct traversal_context {
+	struct rev_info *revs;
+	show_object_fn show_object;
+	show_commit_fn show_commit;
+	void *show_data;
+	filter_object_fn filter_fn;
+	void *filter_data;
+};
+
+static void process_blob(struct traversal_context *ctx,
 			 struct blob *blob,
-			 show_object_fn show,
 			 struct strbuf *path,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &blob->object;
 	size_t pathlen;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
-	if (!revs->blob_objects)
+	if (!ctx->revs->blob_objects)
 		return;
 	if (!obj)
 		die("bad blob object");
@@ -41,21 +46,21 @@ static void process_blob(struct rev_info *revs,
 	 * may cause the actual filter to report an incomplete list
 	 * of missing objects.
 	 */
-	if (revs->exclude_promisor_objects &&
+	if (ctx->revs->exclude_promisor_objects &&
 	    !has_object_file(&obj->oid) &&
 	    is_promisor_object(&obj->oid))
 		return;
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BLOB, obj,
-			      path->buf, &path->buf[pathlen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BLOB, obj,
+				   path->buf, &path->buf[pathlen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, path->buf, cb_data);
+		ctx->show_object(obj, path->buf, ctx->show_data);
 	strbuf_setlen(path, pathlen);
 }
 
@@ -81,26 +86,21 @@ static void process_blob(struct rev_info *revs,
  * the link, and how to do it. Whether it necessarily makes
  * any sense what-so-ever to ever do that is another issue.
  */
-static void process_gitlink(struct rev_info *revs,
+static void process_gitlink(struct traversal_context *ctx,
 			    const unsigned char *sha1,
-			    show_object_fn show,
 			    struct strbuf *path,
-			    const char *name,
-			    void *cb_data)
+			    const char *name)
 {
 	/* Nothing to do */
 }
 
-static void process_tree(struct rev_info *revs,
+static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
-			 show_object_fn show,
 			 struct strbuf *base,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &tree->object;
+	struct rev_info *revs = ctx->revs;
 	struct tree_desc desc;
 	struct name_entry entry;
 	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
@@ -133,14 +133,14 @@ static void process_tree(struct rev_info *revs,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BEGIN_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, base->buf, cb_data);
+		ctx->show_object(obj, base->buf, ctx->show_data);
 	if (base->len)
 		strbuf_addch(base, '/');
 
@@ -157,29 +157,25 @@ static void process_tree(struct rev_info *revs,
 		}
 
 		if (S_ISDIR(entry.mode))
-			process_tree(revs,
+			process_tree(ctx,
 				     lookup_tree(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(revs, entry.oid->hash,
-					show, base, entry.path,
-					cb_data);
+			process_gitlink(ctx, entry.oid->hash, base, entry.path);
 		else
-			process_blob(revs,
+			process_blob(ctx,
 				     lookup_blob(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 	}
 
-	if (!(obj->flags & USER_GIVEN) && filter_fn) {
-		r = filter_fn(LOFS_END_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+		r = ctx->filter_fn(LOFS_END_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 		if (r & LOFR_MARK_SEEN)
 			obj->flags |= SEEN;
 		if (r & LOFR_DO_SHOW)
-			show(obj, base->buf, cb_data);
+			ctx->show_object(obj, base->buf, ctx->show_data);
 	}
 
 	strbuf_setlen(base, baselen);
@@ -242,19 +238,15 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 	add_pending_object(revs, &tree->object, "");
 }
 
-static void traverse_trees_and_blobs(struct rev_info *revs,
-				     struct strbuf *base,
-				     show_object_fn show_object,
-				     void *show_data,
-				     filter_object_fn filter_fn,
-				     void *filter_data)
+static void traverse_trees_and_blobs(struct traversal_context *ctx,
+				     struct strbuf *base)
 {
 	int i;
 
 	assert(base->len == 0);
 
-	for (i = 0; i < revs->pending.nr; i++) {
-		struct object_array_entry *pending = revs->pending.objects + i;
+	for (i = 0; i < ctx->revs->pending.nr; i++) {
+		struct object_array_entry *pending = ctx->revs->pending.objects + i;
 		struct object *obj = pending->item;
 		const char *name = pending->name;
 		const char *path = pending->path;
@@ -262,62 +254,49 @@ static void traverse_trees_and_blobs(struct rev_info *revs,
 			continue;
 		if (obj->type == OBJ_TAG) {
 			obj->flags |= SEEN;
-			show_object(obj, name, show_data);
+			ctx->show_object(obj, name, ctx->show_data);
 			continue;
 		}
 		if (!path)
 			path = "";
 		if (obj->type == OBJ_TREE) {
-			process_tree(revs, (struct tree *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_tree(ctx, (struct tree *)obj, base, path);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
-			process_blob(revs, (struct blob *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_blob(ctx, (struct blob *)obj, base, path);
 			continue;
 		}
 		die("unknown pending object %s (%s)",
 		    oid_to_hex(&obj->oid), name);
 	}
-	object_array_clear(&revs->pending);
+	object_array_clear(&ctx->revs->pending);
 }
 
-static void do_traverse(struct rev_info *revs,
-			show_commit_fn show_commit,
-			show_object_fn show_object,
-			void *show_data,
-			filter_object_fn filter_fn,
-			void *filter_data)
+static void do_traverse(struct traversal_context *ctx)
 {
 	struct commit *commit;
 	struct strbuf csp; /* callee's scratch pad */
 	strbuf_init(&csp, PATH_MAX);
 
-	while ((commit = get_revision(revs)) != NULL) {
+	while ((commit = get_revision(ctx->revs)) != NULL) {
 		/*
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
 		if (get_commit_tree(commit))
-			add_pending_tree(revs, get_commit_tree(commit));
-		show_commit(commit, show_data);
+			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		ctx->show_commit(commit, ctx->show_data);
 
-		if (revs->tree_blobs_in_commit_order)
+		if (ctx->revs->tree_blobs_in_commit_order)
 			/*
 			 * NEEDSWORK: Adding the tree and then flushing it here
 			 * needs a reallocation for each commit. Can we pass the
 			 * tree directory without allocation churn?
 			 */
-			traverse_trees_and_blobs(revs, &csp,
-						 show_object, show_data,
-						 filter_fn, filter_data);
+			traverse_trees_and_blobs(ctx, &csp);
 	}
-	traverse_trees_and_blobs(revs, &csp,
-				 show_object, show_data,
-				 filter_fn, filter_data);
+	traverse_trees_and_blobs(ctx, &csp);
 	strbuf_release(&csp);
 }
 
@@ -326,7 +305,14 @@ void traverse_commit_list(struct rev_info *revs,
 			  show_object_fn show_object,
 			  void *show_data)
 {
-	do_traverse(revs, show_commit, show_object, show_data, NULL, NULL);
+	struct traversal_context ctx;
+	ctx.revs = revs;
+	ctx.show_commit = show_commit;
+	ctx.show_object = show_object;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+	ctx.filter_data = NULL;
+	do_traverse(&ctx);
 }
 
 void traverse_commit_list_filtered(
@@ -337,14 +323,18 @@ void traverse_commit_list_filtered(
 	void *show_data,
 	struct oidset *omitted)
 {
-	filter_object_fn filter_fn = NULL;
+	struct traversal_context ctx;
 	filter_free_fn filter_free_fn = NULL;
-	void *filter_data = NULL;
-
-	filter_data = list_objects_filter__init(omitted, filter_options,
-						&filter_fn, &filter_free_fn);
-	do_traverse(revs, show_commit, show_object, show_data,
-		    filter_fn, filter_data);
-	if (filter_data && filter_free_fn)
-		filter_free_fn(filter_data);
+
+	ctx.revs = revs;
+	ctx.show_object = show_object;
+	ctx.show_commit = show_commit;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+
+	ctx.filter_data = list_objects_filter__init(omitted, filter_options,
+						    &ctx.filter_fn, &filter_free_fn);
+	do_traverse(&ctx);
+	if (ctx.filter_data && filter_free_fn)
+		filter_free_fn(ctx.filter_data);
 }
-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v5 2/6] list-objects: refactor to process_tree_contents
  2018-08-15  0:22 ` [PATCH v5 0/6] filter: support for excluding all trees and blobs Matthew DeVore
  2018-08-15  0:22   ` [PATCH v5 1/6] list-objects: store common func args in struct Matthew DeVore
@ 2018-08-15  0:22   ` Matthew DeVore
  2018-08-15  0:22   ` [PATCH v5 3/6] list-objects: always parse trees gently Matthew DeVore
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-15  0:22 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster

This will be used in a follow-up patch to reduce indentation needed when
invoking the logic conditionally. i.e. rather than:

if (foo) {
	while (...) {
		/* this is very indented */
	}
}

we will have:

if (foo)
	process_tree_contents(...);

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 68 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 584518a3f..ccc529e5e 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -94,6 +94,46 @@ static void process_gitlink(struct traversal_context *ctx,
 	/* Nothing to do */
 }
 
+static void process_tree(struct traversal_context *ctx,
+			 struct tree *tree,
+			 struct strbuf *base,
+			 const char *name);
+
+static void process_tree_contents(struct traversal_context *ctx,
+				  struct tree *tree,
+				  struct strbuf *base)
+{
+	struct tree_desc desc;
+	struct name_entry entry;
+	enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
+		all_entries_interesting : entry_not_interesting;
+
+	init_tree_desc(&desc, tree->buffer, tree->size);
+
+	while (tree_entry(&desc, &entry)) {
+		if (match != all_entries_interesting) {
+			match = tree_entry_interesting(&entry, base, 0,
+						       &ctx->revs->diffopt.pathspec);
+			if (match == all_entries_not_interesting)
+				break;
+			if (match == entry_not_interesting)
+				continue;
+		}
+
+		if (S_ISDIR(entry.mode))
+			process_tree(ctx,
+				     lookup_tree(the_repository, entry.oid),
+				     base, entry.path);
+		else if (S_ISGITLINK(entry.mode))
+			process_gitlink(ctx, entry.oid->hash,
+					base, entry.path);
+		else
+			process_blob(ctx,
+				     lookup_blob(the_repository, entry.oid),
+				     base, entry.path);
+	}
+}
+
 static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
 			 struct strbuf *base,
@@ -101,10 +141,6 @@ static void process_tree(struct traversal_context *ctx,
 {
 	struct object *obj = &tree->object;
 	struct rev_info *revs = ctx->revs;
-	struct tree_desc desc;
-	struct name_entry entry;
-	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
-		all_entries_interesting: entry_not_interesting;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 	int gently = revs->ignore_missing_links ||
@@ -144,29 +180,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
-
-	while (tree_entry(&desc, &entry)) {
-		if (match != all_entries_interesting) {
-			match = tree_entry_interesting(&entry, base, 0,
-						       &revs->diffopt.pathspec);
-			if (match == all_entries_not_interesting)
-				break;
-			if (match == entry_not_interesting)
-				continue;
-		}
-
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
-		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(ctx, entry.oid->hash, base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
-	}
+	process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v5 3/6] list-objects: always parse trees gently
  2018-08-15  0:22 ` [PATCH v5 0/6] filter: support for excluding all trees and blobs Matthew DeVore
  2018-08-15  0:22   ` [PATCH v5 1/6] list-objects: store common func args in struct Matthew DeVore
  2018-08-15  0:22   ` [PATCH v5 2/6] list-objects: refactor to process_tree_contents Matthew DeVore
@ 2018-08-15  0:22   ` Matthew DeVore
  2018-08-15  0:22   ` [PATCH v5 4/6] rev-list: handle missing tree objects properly Matthew DeVore
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-15  0:22 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster

If parsing fails when revs->ignore_missing_links and
revs->exclude_promisor_objects are both false, we print the OID anyway
in the die("bad tree object...") call, so any message printed by
parse_tree_gently() is superfluous.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index ccc529e5e..f9b51db7a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,8 +143,6 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
-	int gently = revs->ignore_missing_links ||
-		     revs->exclude_promisor_objects;
 
 	if (!revs->tree_objects)
 		return;
@@ -152,7 +150,7 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, gently) < 0) {
+	if (parse_tree_gently(tree, 1) < 0) {
 		if (revs->ignore_missing_links)
 			return;
 
-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v5 4/6] rev-list: handle missing tree objects properly
  2018-08-15  0:22 ` [PATCH v5 0/6] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (2 preceding siblings ...)
  2018-08-15  0:22   ` [PATCH v5 3/6] list-objects: always parse trees gently Matthew DeVore
@ 2018-08-15  0:22   ` Matthew DeVore
  2018-08-15  0:22   ` [PATCH v5 5/6] revision: mark non-user-given objects instead Matthew DeVore
  2018-08-15  0:22   ` [PATCH v5 6/6] list-objects-filter: implement filter tree:0 Matthew DeVore
  5 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-15  0:22 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster

Previously, we assumed only blob objects could be missing. This patch
makes rev-list handle missing trees like missing blobs. The --missing=*
and --exclude-promisor-objects flags now work for trees as they already
do for blobs. This is demonstrated in t6112.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 builtin/rev-list.c                     | 11 ++++---
 list-objects.c                         | 11 +++++--
 revision.h                             | 15 +++++++++
 t/t0410-partial-clone.sh               | 45 ++++++++++++++++++++++++++
 t/t5317-pack-objects-filter-objects.sh | 13 ++++++++
 t/t6112-rev-list-filters-objects.sh    | 17 ++++++++++
 6 files changed, 105 insertions(+), 7 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5b07f3f4a..49d6deed7 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -6,6 +6,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "object.h"
 #include "object-store.h"
 #include "pack.h"
 #include "pack-bitmap.h"
@@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj)
 	 */
 	switch (arg_missing_action) {
 	case MA_ERROR:
-		die("missing blob object '%s'", oid_to_hex(&obj->oid));
+		die("missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	case MA_ALLOW_ANY:
@@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj)
 	case MA_ALLOW_PROMISOR:
 		if (is_promisor_object(&obj->oid))
 			return;
-		die("unexpected missing blob object '%s'",
-		    oid_to_hex(&obj->oid));
+		die("unexpected missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	default:
@@ -235,7 +237,7 @@ static inline void finish_object__ma(struct object *obj)
 static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
+	if (!has_object_file(&obj->oid)) {
 		finish_object__ma(obj);
 		return 1;
 	}
@@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	init_revisions(&revs, prefix);
 	revs.abbrev = DEFAULT_ABBREV;
 	revs.commit_format = CMIT_FMT_UNSPECIFIED;
+	revs.do_not_die_on_missing_tree = 1;
 
 	/*
 	 * Scan the argument list before invoking setup_revisions(), so that we
diff --git a/list-objects.c b/list-objects.c
index f9b51db7a..243192af5 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,6 +143,7 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
+	int failed_parse;
 
 	if (!revs->tree_objects)
 		return;
@@ -150,7 +151,9 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, 1) < 0) {
+
+	failed_parse = parse_tree_gently(tree, 1);
+	if (failed_parse) {
 		if (revs->ignore_missing_links)
 			return;
 
@@ -163,7 +166,8 @@ static void process_tree(struct traversal_context *ctx,
 		    is_promisor_object(&obj->oid))
 			return;
 
-		die("bad tree object %s", oid_to_hex(&obj->oid));
+		if (!revs->do_not_die_on_missing_tree)
+			die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
 	strbuf_addstr(base, name);
@@ -178,7 +182,8 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	process_tree_contents(ctx, tree, base);
+	if (!failed_parse)
+		process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
diff --git a/revision.h b/revision.h
index c599c34da..5118aaaa9 100644
--- a/revision.h
+++ b/revision.h
@@ -125,6 +125,21 @@ struct rev_info {
 			line_level_traverse:1,
 			tree_blobs_in_commit_order:1,
 
+			/*
+			 * Blobs are shown without regard for their existence.
+			 * But not so for trees: unless exclude_promisor_objects
+			 * is set and the tree in question is a promisor object;
+			 * OR ignore_missing_links is set, the revision walker
+			 * dies with a "bad tree object HASH" message when
+			 * encountering a missing tree. For callers that can
+			 * handle missing trees and want them to be filterable
+			 * and showable, set this to true. The revision walker
+			 * will filter and show such a missing tree as usual,
+			 * but will not attempt to recurse into this tree
+			 * object.
+			 */
+			do_not_die_on_missing_tree:1,
+
 			/* for internal use only */
 			exclude_promisor_objects:1;
 
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 4984ca583..a1b93c72c 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -186,6 +186,51 @@ test_expect_success 'rev-list stops traversal at missing and promised commit' '
 	! grep $FOO out
 '
 
+test_expect_success 'missing tree objects with --missing=allow-promisor and --exclude-promisor-objects' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo foo &&
+	test_commit -C repo bar &&
+	test_commit -C repo baz &&
+
+	promise_and_delete $(git -C repo rev-parse bar^{tree}) &&
+	promise_and_delete $(git -C repo rev-parse foo^{tree}) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo rev-list --missing=allow-promisor --objects HEAD >objs 2>rev_list_err &&
+	test_line_count = 0 rev_list_err &&
+	# 3 commits, 3 blobs, and 1 tree
+	test_line_count = 7 objs &&
+
+	# Do the same for --exclude-promisor-objects, but with all trees gone.
+	promise_and_delete $(git -C repo rev-parse baz^{tree}) &&
+	git -C repo rev-list --exclude-promisor-objects --objects HEAD >objs 2>rev_list_err &&
+	test_line_count = 0 rev_list_err &&
+	# 3 commits, no blobs or trees
+	test_line_count = 3 objs
+'
+
+test_expect_success 'missing non-root tree object and rev-list' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	mkdir repo/dir &&
+	echo foo > repo/dir/foo &&
+	git -C repo add dir/foo &&
+	git -C repo commit -m "commit dir/foo" &&
+
+	promise_and_delete $(git -C repo rev-parse HEAD:dir) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err &&
+	test_line_count = 0 rev_list_err &&
+	# 1 commit and 1 tree
+	test_line_count = 2 objs
+'
+
 test_expect_success 'rev-list stops traversal at missing and promised tree' '
 	rm -rf repo &&
 	test_create_repo repo &&
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 6710c8bc8..5e35f33bf 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -59,6 +59,19 @@ test_expect_success 'verify normal and blob:none packfiles have same commits/tre
 	test_cmp observed expected
 '
 
+test_expect_success 'get an error for missing tree object' '
+	git init r5 &&
+	echo foo > r5/foo &&
+	git -C r5 add foo &&
+	git -C r5 commit -m "foo" &&
+	del=$(git -C r5 rev-parse HEAD^{tree} | sed "s|..|&/|") &&
+	rm r5/.git/objects/$del &&
+	test_must_fail git -C r5 pack-objects --rev --stdout 2>bad_tree <<-EOF &&
+	HEAD
+	EOF
+	grep -q "bad tree object" bad_tree
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 0a37dd5f9..fc0f92a16 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -196,6 +196,23 @@ test_expect_success 'verify sparse:oid=oid-ish omits top-level files' '
 	test_cmp observed expected
 '
 
+test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for trees' '
+	TREE=$(git -C r3 rev-parse HEAD:dir1) &&
+
+	rm r3/.git/objects/$(echo $TREE | sed "s|^..|&/|") &&
+
+	git -C r3 rev-list --quiet --missing=print --objects HEAD >missing_objs 2>rev_list_err &&
+	echo "?$TREE" >expected &&
+	test_cmp expected missing_objs &&
+
+	# do not complain when a missing tree cannot be parsed
+	test_line_count = 0 rev_list_err &&
+
+	git -C r3 rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err &&
+	! grep $TREE objs &&
+	test_line_count = 0 rev_list_err
+'
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v5 5/6] revision: mark non-user-given objects instead
  2018-08-15  0:22 ` [PATCH v5 0/6] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (3 preceding siblings ...)
  2018-08-15  0:22   ` [PATCH v5 4/6] rev-list: handle missing tree objects properly Matthew DeVore
@ 2018-08-15  0:22   ` Matthew DeVore
  2018-08-15  0:22   ` [PATCH v5 6/6] list-objects-filter: implement filter tree:0 Matthew DeVore
  5 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-15  0:22 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster

Currently, list-objects.c incorrectly treats all root trees of commits
as USER_GIVEN. Also, it would be easier to mark objects that are
non-user-given instead of user-given, since the places in the code
where we access an object through a reference are more obvious than
the places where we access an object that was given by the user.

Resolve these two problems by introducing a flag NOT_USER_GIVEN that
marks blobs and trees that are non-user-given, replacing USER_GIVEN.
(Only blobs and trees are marked because this mark is only used when
filtering objects, and filtering of other types of objects is not
supported yet.)

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 31 ++++++++++++++++++-------------
 revision.c     |  1 -
 revision.h     | 10 +++++++---
 3 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 243192af5..7a1a0929d 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -53,7 +53,7 @@ static void process_blob(struct traversal_context *ctx,
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BLOB, obj,
 				   path->buf, &path->buf[pathlen],
 				   ctx->filter_data);
@@ -120,17 +120,19 @@ static void process_tree_contents(struct traversal_context *ctx,
 				continue;
 		}
 
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
+		if (S_ISDIR(entry.mode)) {
+			struct tree *t = lookup_tree(the_repository, entry.oid);
+			t->object.flags |= NOT_USER_GIVEN;
+			process_tree(ctx, t, base, entry.path);
+		}
 		else if (S_ISGITLINK(entry.mode))
 			process_gitlink(ctx, entry.oid->hash,
 					base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
+		else {
+			struct blob *b = lookup_blob(the_repository, entry.oid);
+			b->object.flags |= NOT_USER_GIVEN;
+			process_blob(ctx, b, base, entry.path);
+		}
 	}
 }
 
@@ -171,7 +173,7 @@ static void process_tree(struct traversal_context *ctx,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -185,7 +187,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (!failed_parse)
 		process_tree_contents(ctx, tree, base);
 
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -301,8 +303,11 @@ static void do_traverse(struct traversal_context *ctx)
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
-		if (get_commit_tree(commit))
-			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		if (get_commit_tree(commit)) {
+			struct tree *tree = get_commit_tree(commit);
+			tree->object.flags |= NOT_USER_GIVEN;
+			add_pending_tree(ctx->revs, tree);
+		}
 		ctx->show_commit(commit, ctx->show_data);
 
 		if (ctx->revs->tree_blobs_in_commit_order)
diff --git a/revision.c b/revision.c
index 062749437..6d355b43c 100644
--- a/revision.c
+++ b/revision.c
@@ -175,7 +175,6 @@ static void add_pending_object_with_path(struct rev_info *revs,
 		strbuf_release(&buf);
 		return; /* do not add the commit itself */
 	}
-	obj->flags |= USER_GIVEN;
 	add_object_array_with_path(obj, name, &revs->pending, mode, path);
 }
 
diff --git a/revision.h b/revision.h
index 5118aaaa9..2d381e636 100644
--- a/revision.h
+++ b/revision.h
@@ -8,7 +8,11 @@
 #include "diff.h"
 #include "commit-slab-decl.h"
 
-/* Remember to update object flag allocation in object.h */
+/* Remember to update object flag allocation in object.h
+ * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
+ * filtering trees and blobs, but it may be useful to support filtering commits
+ * in the future.
+ */
 #define SEEN		(1u<<0)
 #define UNINTERESTING   (1u<<1)
 #define TREESAME	(1u<<2)
@@ -20,9 +24,9 @@
 #define SYMMETRIC_LEFT	(1u<<8)
 #define PATCHSAME	(1u<<9)
 #define BOTTOM		(1u<<10)
-#define USER_GIVEN	(1u<<25) /* given directly by the user */
+#define NOT_USER_GIVEN	(1u<<25) /* tree or blob not given directly by user */
 #define TRACK_LINEAR	(1u<<26)
-#define ALL_REV_FLAGS	(((1u<<11)-1) | USER_GIVEN | TRACK_LINEAR)
+#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)
 
 #define DECORATE_SHORT_REFS	1
 #define DECORATE_FULL_REFS	2
-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v5 6/6] list-objects-filter: implement filter tree:0
  2018-08-15  0:22 ` [PATCH v5 0/6] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (4 preceding siblings ...)
  2018-08-15  0:22   ` [PATCH v5 5/6] revision: mark non-user-given objects instead Matthew DeVore
@ 2018-08-15  0:22   ` Matthew DeVore
  5 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-15  0:22 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster

Teach list-objects the "tree:0" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:0 - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also consider only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
would filter out all but the root tree and blobs. In order to avoid
confusion between 0 and capital O, the documentation was worded in a
somewhat round-about way that also hints at this future improvement to
the feature.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 Documentation/rev-list-options.txt     |  5 +++
 list-objects-filter-options.c          |  4 +++
 list-objects-filter-options.h          |  1 +
 list-objects-filter.c                  | 50 ++++++++++++++++++++++++++
 t/t5317-pack-objects-filter-objects.sh | 28 +++++++++++++++
 t/t5616-partial-clone.sh               | 38 ++++++++++++++++++++
 t/t6112-rev-list-filters-objects.sh    | 12 +++++++
 7 files changed, 138 insertions(+)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 7b273635d..5f1672913 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -731,6 +731,11 @@ the requested refs.
 +
 The form '--filter=sparse:path=<path>' similarly uses a sparse-checkout
 specification contained in <path>.
++
+The form '--filter=tree:<depth>' omits all blobs and trees whose depth
+from the root tree is >= <depth> (minimum depth if an object is located
+at multiple depths in the commits traversed). Currently, only <depth>=0
+is supported, which omits all blobs and trees.
 
 --no-filter::
 	Turn off any previous `--filter=` argument.
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..a28382940 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
 			return 0;
 		}
 
+	} else if (!strcmp(arg, "tree:0")) {
+		filter_options->choice = LOFC_TREE_NONE;
+		return 0;
+
 	} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
 		struct object_context oc;
 		struct object_id sparse_oid;
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 0000a61f8..af64e5c66 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -10,6 +10,7 @@ enum list_objects_filter_choice {
 	LOFC_DISABLED = 0,
 	LOFC_BLOB_NONE,
 	LOFC_BLOB_LIMIT,
+	LOFC_TREE_NONE,
 	LOFC_SPARSE_OID,
 	LOFC_SPARSE_PATH,
 	LOFC__COUNT /* must be last */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..8e3caf5bf 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -80,6 +80,55 @@ static void *filter_blobs_none__init(
 	return d;
 }
 
+/*
+ * A filter for list-objects to omit ALL trees and blobs from the traversal.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_trees_none_data {
+	struct oidset *omits;
+};
+
+static enum list_objects_filter_result filter_trees_none(
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data_)
+{
+	struct filter_trees_none_data *filter_data = filter_data_;
+
+	switch (filter_situation) {
+	default:
+		die("unknown filter_situation");
+		return LOFR_ZERO;
+
+	case LOFS_BEGIN_TREE:
+	case LOFS_BLOB:
+		if (filter_data->omits)
+			oidset_insert(filter_data->omits, &obj->oid);
+		return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
+
+	case LOFS_END_TREE:
+		assert(obj->type == OBJ_TREE);
+		return LOFR_ZERO;
+
+	}
+}
+
+static void* filter_trees_none__init(
+	struct oidset *omitted,
+	struct list_objects_filter_options *filter_options,
+	filter_object_fn *filter_fn,
+	filter_free_fn *filter_free_fn)
+{
+	struct filter_trees_none_data *d = xcalloc(1, sizeof(*d));
+	d->omits = omitted;
+
+	*filter_fn = filter_trees_none;
+	*filter_free_fn = free;
+	return d;
+}
+
 /*
  * A filter for list-objects to omit large blobs.
  * And to OPTIONALLY collect a list of the omitted OIDs.
@@ -374,6 +423,7 @@ static filter_init_fn s_filters[] = {
 	NULL,
 	filter_blobs_none__init,
 	filter_blobs_limit__init,
+	filter_trees_none__init,
 	filter_sparse_oid__init,
 	filter_sparse_path__init,
 };
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 5e35f33bf..985668a07 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -72,6 +72,34 @@ test_expect_success 'get an error for missing tree object' '
 	grep -q "bad tree object" bad_tree
 '
 
+test_expect_success 'setup for tests of tree:0' '
+	mkdir r1/subtree &&
+	echo "This is a file in a subtree" >r1/subtree/file &&
+	git -C r1 add subtree/file &&
+	git -C r1 commit -m subtree
+'
+
+test_expect_success 'verify tree:0 packfile has no blobs or trees' '
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	! grep -E "tree|blob" objs
+'
+
+test_expect_success 'grab tree directly when using tree:0' '
+	# We should get the tree specified directly but not its blobs or subtrees.
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD:
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	awk -e "/tree|blob/{print \$1}" objs >trees_and_blobs &&
+	git -C r1 rev-parse HEAD: >expected &&
+	test_cmp trees_and_blobs expected
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index bbbe7537d..8eeb85fbc 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -154,6 +154,44 @@ test_expect_success 'partial clone with transfer.fsckobjects=1 uses index-pack -
 	grep "git index-pack.*--fsck-objects" trace
 '
 
+test_expect_success 'use fsck before and after manually fetching a missing subtree' '
+	# push new commit so server has a subtree
+	mkdir src/dir &&
+	echo "in dir" >src/dir/file.txt &&
+	git -C src add dir/file.txt &&
+	git -C src commit -m "file in dir" &&
+	git -C src push -u srv master &&
+	SUBTREE=$(git -C src rev-parse HEAD:dir) &&
+
+	rm -rf dst &&
+	git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
+	git -C dst fsck &&
+
+	# Make sure we only have commits, and all trees and blobs are missing.
+	git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
+	awk -f print_1.awk fetched_objects \
+		| xargs -n1 git -C dst cat-file -t >fetched_types &&
+	sort fetched_types -u >unique_types.observed &&
+	echo commit >unique_types.expected &&
+	test_cmp unique_types.observed unique_types.expected &&
+
+	# Auto-fetch a tree with cat-file.
+	git -C dst cat-file -p $SUBTREE >tree_contents &&
+	grep file.txt tree_contents &&
+
+	# fsck still works after an auto-fetch of a tree.
+	git -C dst fsck &&
+
+	# Auto-fetch all remaining trees and blobs with --missing=error
+	git -C dst rev-list master --missing=error --objects >fetched_objects &&
+	test_line_count = 70 fetched_objects &&
+	awk -f print_1.awk fetched_objects \
+		| xargs -n1 git -C dst cat-file -t >fetched_types &&
+	sort fetched_types -u >unique_types.observed &&
+	printf "blob\ncommit\ntree\n" >unique_types.expected &&
+	test_cmp unique_types.observed unique_types.expected
+'
+
 test_expect_success 'partial clone fetches blobs pointed to by refs even if normally filtered out' '
 	rm -rf src dst &&
 	git init src &&
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index fc0f92a16..30bf1c73e 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -213,6 +213,18 @@ test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for tre
 	test_line_count = 0 rev_list_err
 '
 
+# Test tree:0 filter.
+
+test_expect_success 'verify tree:0 includes trees in "filtered" output' '
+	git -C r3 rev-list HEAD --quiet --objects --filter-print-omitted --filter=tree:0 \
+		| awk -f print_1.awk \
+		| sed s/~// \
+		| xargs -n1 git -C r3 cat-file -t \
+		| sort -u >filtered_types &&
+	printf "blob\ntree\n" > expected &&
+	test_cmp filtered_types expected
+'
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0
  2018-08-14 23:55       ` Matthew DeVore
@ 2018-08-15  1:22         ` Jeff King
  2018-08-15 16:17           ` Junio C Hamano
  0 siblings, 1 reply; 151+ messages in thread
From: Jeff King @ 2018-08-15  1:22 UTC (permalink / raw)
  To: Matthew DeVore; +Cc: git, git, jeffhost, Stefan Beller, Jonathan Tan

On Tue, Aug 14, 2018 at 04:55:34PM -0700, Matthew DeVore wrote:

> >   - it's expensive to compute, because you have to actually walk all of
> >     the possible commits and trees that could refer to it. This
> >     prohibits a lot of other optimizations like reachability bitmaps
> >     (though with some complexity you could cache the depths, too).
> I think what the user likely wants is to use the minimum depth based
> on the commits in the traversal, not every commit in the repo - is
> this what you mean?

Right, I'd agree they probably want the minimum for that traversal. And
for `rev-list --filter`, that's probably OK. But keep in mind the main
goal for --filter is using it for fetches, and many servers do not
perform the traversal at all. Instead they use reachability bitmaps to
come up with the set of objects to send. The bitmaps have enough
information to say "remove all trees from the set", but not enough to do
any kind of depth-based calculation (not even "is this a root tree").

> Makes sense. I changed it like this -
> 
> diff --git a/Documentation/rev-list-options.txt
> b/Documentation/rev-list-options.txt
> index 0b5f77ad3..5f1672913 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -732,8 +732,10 @@ the requested refs.
>  The form '--filter=sparse:path=<path>' similarly uses a sparse-checkout
>  specification contained in <path>.
>  +
> -The form '--filter=tree:<depth>' omits all blobs and trees deeper than
> -<depth> from the root tree. Currently, only <depth>=0 is supported.
> +The form '--filter=tree:<depth>' omits all blobs and trees whose depth
> +from the root tree is >= <depth> (minimum depth if an object is located
> +at multiple depths in the commits traversed). Currently, only <depth>=0
> +is supported, which omits all blobs and trees.

Yes, I think that makes sense. Thanks!

-Peff

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0
  2018-08-14 23:30             ` Matthew DeVore
@ 2018-08-15 16:14               ` Junio C Hamano
  2018-08-15 16:37                 ` Matthew DeVore
  0 siblings, 1 reply; 151+ messages in thread
From: Junio C Hamano @ 2018-08-15 16:14 UTC (permalink / raw)
  To: Matthew DeVore; +Cc: Jonathan Tan, git, git, jeffhost, Jeff King, Stefan Beller

Matthew DeVore <matvore@google.com> writes:

> Thank you. I changed it to this:
>   awk -e "/tree|blob/{print \$1}" objs >trees_and_blobs

The "-e" option does not appear in

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html

and I think you can safely drop it from your command line.

    If no -f option is specified, the first operand to awk shall be the
    text of the awk program. The application shall supply the program
    operand as a single argument to awk. If the text does not end in a
    <newline>, awk shall interpret the text as if it did.


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0
  2018-08-15  1:22         ` Jeff King
@ 2018-08-15 16:17           ` Junio C Hamano
  2018-08-15 17:54             ` Matthew DeVore
  0 siblings, 1 reply; 151+ messages in thread
From: Junio C Hamano @ 2018-08-15 16:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Matthew DeVore, git, git, jeffhost, Stefan Beller, Jonathan Tan

Jeff King <peff@peff.net> writes:

> Right, I'd agree they probably want the minimum for that traversal. And
> for `rev-list --filter`, that's probably OK. But keep in mind the main
> goal for --filter is using it for fetches, and many servers do not
> perform the traversal at all. Instead they use reachability bitmaps to
> come up with the set of objects to send. The bitmaps have enough
> information to say "remove all trees from the set", but not enough to do
> any kind of depth-based calculation (not even "is this a root tree").

If the depth-based cutoff turns out to make sense (on which I
haven't formed an opinion yet), newer version of pack bitmaps could
store that information ;-)

How are these "fitler" expressions negotiated between the fetcher
and uploader?  Does a "fetch-patch" say "am I allowed to ask you to
filter with tree:4?" and refrain from using the option when
"upload-pack" says "no"?

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0
  2018-08-15 16:14               ` Junio C Hamano
@ 2018-08-15 16:37                 ` Matthew DeVore
  0 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-15 16:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git, git, jeffhost, Jeff King, Stefan Beller

On Wed, Aug 15, 2018 at 9:14 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matthew DeVore <matvore@google.com> writes:
>
> > Thank you. I changed it to this:
> >   awk -e "/tree|blob/{print \$1}" objs >trees_and_blobs
>
> The "-e" option does not appear in
>
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
>
> and I think you can safely drop it from your command line.
Fixed it, thank you. It will be in the next patchset version.

>
>     If no -f option is specified, the first operand to awk shall be the
>     text of the awk program. The application shall supply the program
>     operand as a single argument to awk. If the text does not end in a
>     <newline>, awk shall interpret the text as if it did.
>

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0
  2018-08-15 16:17           ` Junio C Hamano
@ 2018-08-15 17:54             ` Matthew DeVore
  0 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-15 17:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, git, jeffhost, Stefan Beller, Jonathan Tan

On Wed, Aug 15, 2018 at 9:17 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jeff King <peff@peff.net> writes:
>
> > Right, I'd agree they probably want the minimum for that traversal. And
> > for `rev-list --filter`, that's probably OK. But keep in mind the main
> > goal for --filter is using it for fetches, and many servers do not
> > perform the traversal at all. Instead they use reachability bitmaps to
> > come up with the set of objects to send. The bitmaps have enough
> > information to say "remove all trees from the set", but not enough to do
> > any kind of depth-based calculation (not even "is this a root tree").
>
> If the depth-based cutoff turns out to make sense (on which I
> haven't formed an opinion yet), newer version of pack bitmaps could
> store that information ;-)
>
> How are these "fitler" expressions negotiated between the fetcher
> and uploader?  Does a "fetch-patch" say "am I allowed to ask you to
> filter with tree:4?" and refrain from using the option when
> "upload-pack" says "no"?

I couldn't find a feature like that for the existing features, but
adding such a think seems reasonable to me. (thinking in terms of
protocol v2,) There could be a filter-check command which takes a
single argument: the filter string (like "tree:4"), and responds with
a single line: either "ok" or "unsupported".

^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v6 0/6] filter: support for excluding all trees and blobs
  2018-08-09 22:44 [RFC PATCH 0/5] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (9 preceding siblings ...)
  2018-08-15  0:22 ` [PATCH v5 0/6] filter: support for excluding all trees and blobs Matthew DeVore
@ 2018-08-15 23:19 ` Matthew DeVore
  2018-08-15 23:19   ` [PATCH v6 1/6] list-objects: store common func args in struct Matthew DeVore
                     ` (5 more replies)
  2018-09-04 18:05 ` [PATCH v7 0/7] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (5 subsequent siblings)
  16 siblings, 6 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-15 23:19 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster

Applied suggestion from Junio about removing -e flag from awk invocation.
Sending an updated patchset now since I haven't heard any other comments for a
while, and I don't believe Jonathan, the most active reviewer, has any more
concerns.

Matthew DeVore (6):
  list-objects: store common func args in struct
  list-objects: refactor to process_tree_contents
  list-objects: always parse trees gently
  rev-list: handle missing tree objects properly
  revision: mark non-user-given objects instead
  list-objects-filter: implement filter tree:0

 Documentation/rev-list-options.txt     |   5 +
 builtin/rev-list.c                     |  11 +-
 list-objects-filter-options.c          |   4 +
 list-objects-filter-options.h          |   1 +
 list-objects-filter.c                  |  50 ++++++
 list-objects.c                         | 232 +++++++++++++------------
 revision.c                             |   1 -
 revision.h                             |  25 ++-
 t/t0410-partial-clone.sh               |  45 +++++
 t/t5317-pack-objects-filter-objects.sh |  41 +++++
 t/t5616-partial-clone.sh               |  38 ++++
 t/t6112-rev-list-filters-objects.sh    |  29 ++++
 12 files changed, 364 insertions(+), 118 deletions(-)

-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v6 1/6] list-objects: store common func args in struct
  2018-08-15 23:19 ` [PATCH v6 0/6] filter: support for excluding all trees and blobs Matthew DeVore
@ 2018-08-15 23:19   ` Matthew DeVore
  2018-08-15 23:19   ` [PATCH v6 2/6] list-objects: refactor to process_tree_contents Matthew DeVore
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-15 23:19 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster

This will make utility functions easier to create, as done by the next
patch.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 158 +++++++++++++++++++++++--------------------------
 1 file changed, 74 insertions(+), 84 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index c99c47ac1..584518a3f 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -12,20 +12,25 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static void process_blob(struct rev_info *revs,
+struct traversal_context {
+	struct rev_info *revs;
+	show_object_fn show_object;
+	show_commit_fn show_commit;
+	void *show_data;
+	filter_object_fn filter_fn;
+	void *filter_data;
+};
+
+static void process_blob(struct traversal_context *ctx,
 			 struct blob *blob,
-			 show_object_fn show,
 			 struct strbuf *path,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &blob->object;
 	size_t pathlen;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
-	if (!revs->blob_objects)
+	if (!ctx->revs->blob_objects)
 		return;
 	if (!obj)
 		die("bad blob object");
@@ -41,21 +46,21 @@ static void process_blob(struct rev_info *revs,
 	 * may cause the actual filter to report an incomplete list
 	 * of missing objects.
 	 */
-	if (revs->exclude_promisor_objects &&
+	if (ctx->revs->exclude_promisor_objects &&
 	    !has_object_file(&obj->oid) &&
 	    is_promisor_object(&obj->oid))
 		return;
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BLOB, obj,
-			      path->buf, &path->buf[pathlen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BLOB, obj,
+				   path->buf, &path->buf[pathlen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, path->buf, cb_data);
+		ctx->show_object(obj, path->buf, ctx->show_data);
 	strbuf_setlen(path, pathlen);
 }
 
@@ -81,26 +86,21 @@ static void process_blob(struct rev_info *revs,
  * the link, and how to do it. Whether it necessarily makes
  * any sense what-so-ever to ever do that is another issue.
  */
-static void process_gitlink(struct rev_info *revs,
+static void process_gitlink(struct traversal_context *ctx,
 			    const unsigned char *sha1,
-			    show_object_fn show,
 			    struct strbuf *path,
-			    const char *name,
-			    void *cb_data)
+			    const char *name)
 {
 	/* Nothing to do */
 }
 
-static void process_tree(struct rev_info *revs,
+static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
-			 show_object_fn show,
 			 struct strbuf *base,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &tree->object;
+	struct rev_info *revs = ctx->revs;
 	struct tree_desc desc;
 	struct name_entry entry;
 	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
@@ -133,14 +133,14 @@ static void process_tree(struct rev_info *revs,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BEGIN_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, base->buf, cb_data);
+		ctx->show_object(obj, base->buf, ctx->show_data);
 	if (base->len)
 		strbuf_addch(base, '/');
 
@@ -157,29 +157,25 @@ static void process_tree(struct rev_info *revs,
 		}
 
 		if (S_ISDIR(entry.mode))
-			process_tree(revs,
+			process_tree(ctx,
 				     lookup_tree(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(revs, entry.oid->hash,
-					show, base, entry.path,
-					cb_data);
+			process_gitlink(ctx, entry.oid->hash, base, entry.path);
 		else
-			process_blob(revs,
+			process_blob(ctx,
 				     lookup_blob(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 	}
 
-	if (!(obj->flags & USER_GIVEN) && filter_fn) {
-		r = filter_fn(LOFS_END_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+		r = ctx->filter_fn(LOFS_END_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 		if (r & LOFR_MARK_SEEN)
 			obj->flags |= SEEN;
 		if (r & LOFR_DO_SHOW)
-			show(obj, base->buf, cb_data);
+			ctx->show_object(obj, base->buf, ctx->show_data);
 	}
 
 	strbuf_setlen(base, baselen);
@@ -242,19 +238,15 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 	add_pending_object(revs, &tree->object, "");
 }
 
-static void traverse_trees_and_blobs(struct rev_info *revs,
-				     struct strbuf *base,
-				     show_object_fn show_object,
-				     void *show_data,
-				     filter_object_fn filter_fn,
-				     void *filter_data)
+static void traverse_trees_and_blobs(struct traversal_context *ctx,
+				     struct strbuf *base)
 {
 	int i;
 
 	assert(base->len == 0);
 
-	for (i = 0; i < revs->pending.nr; i++) {
-		struct object_array_entry *pending = revs->pending.objects + i;
+	for (i = 0; i < ctx->revs->pending.nr; i++) {
+		struct object_array_entry *pending = ctx->revs->pending.objects + i;
 		struct object *obj = pending->item;
 		const char *name = pending->name;
 		const char *path = pending->path;
@@ -262,62 +254,49 @@ static void traverse_trees_and_blobs(struct rev_info *revs,
 			continue;
 		if (obj->type == OBJ_TAG) {
 			obj->flags |= SEEN;
-			show_object(obj, name, show_data);
+			ctx->show_object(obj, name, ctx->show_data);
 			continue;
 		}
 		if (!path)
 			path = "";
 		if (obj->type == OBJ_TREE) {
-			process_tree(revs, (struct tree *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_tree(ctx, (struct tree *)obj, base, path);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
-			process_blob(revs, (struct blob *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_blob(ctx, (struct blob *)obj, base, path);
 			continue;
 		}
 		die("unknown pending object %s (%s)",
 		    oid_to_hex(&obj->oid), name);
 	}
-	object_array_clear(&revs->pending);
+	object_array_clear(&ctx->revs->pending);
 }
 
-static void do_traverse(struct rev_info *revs,
-			show_commit_fn show_commit,
-			show_object_fn show_object,
-			void *show_data,
-			filter_object_fn filter_fn,
-			void *filter_data)
+static void do_traverse(struct traversal_context *ctx)
 {
 	struct commit *commit;
 	struct strbuf csp; /* callee's scratch pad */
 	strbuf_init(&csp, PATH_MAX);
 
-	while ((commit = get_revision(revs)) != NULL) {
+	while ((commit = get_revision(ctx->revs)) != NULL) {
 		/*
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
 		if (get_commit_tree(commit))
-			add_pending_tree(revs, get_commit_tree(commit));
-		show_commit(commit, show_data);
+			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		ctx->show_commit(commit, ctx->show_data);
 
-		if (revs->tree_blobs_in_commit_order)
+		if (ctx->revs->tree_blobs_in_commit_order)
 			/*
 			 * NEEDSWORK: Adding the tree and then flushing it here
 			 * needs a reallocation for each commit. Can we pass the
 			 * tree directory without allocation churn?
 			 */
-			traverse_trees_and_blobs(revs, &csp,
-						 show_object, show_data,
-						 filter_fn, filter_data);
+			traverse_trees_and_blobs(ctx, &csp);
 	}
-	traverse_trees_and_blobs(revs, &csp,
-				 show_object, show_data,
-				 filter_fn, filter_data);
+	traverse_trees_and_blobs(ctx, &csp);
 	strbuf_release(&csp);
 }
 
@@ -326,7 +305,14 @@ void traverse_commit_list(struct rev_info *revs,
 			  show_object_fn show_object,
 			  void *show_data)
 {
-	do_traverse(revs, show_commit, show_object, show_data, NULL, NULL);
+	struct traversal_context ctx;
+	ctx.revs = revs;
+	ctx.show_commit = show_commit;
+	ctx.show_object = show_object;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+	ctx.filter_data = NULL;
+	do_traverse(&ctx);
 }
 
 void traverse_commit_list_filtered(
@@ -337,14 +323,18 @@ void traverse_commit_list_filtered(
 	void *show_data,
 	struct oidset *omitted)
 {
-	filter_object_fn filter_fn = NULL;
+	struct traversal_context ctx;
 	filter_free_fn filter_free_fn = NULL;
-	void *filter_data = NULL;
-
-	filter_data = list_objects_filter__init(omitted, filter_options,
-						&filter_fn, &filter_free_fn);
-	do_traverse(revs, show_commit, show_object, show_data,
-		    filter_fn, filter_data);
-	if (filter_data && filter_free_fn)
-		filter_free_fn(filter_data);
+
+	ctx.revs = revs;
+	ctx.show_object = show_object;
+	ctx.show_commit = show_commit;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+
+	ctx.filter_data = list_objects_filter__init(omitted, filter_options,
+						    &ctx.filter_fn, &filter_free_fn);
+	do_traverse(&ctx);
+	if (ctx.filter_data && filter_free_fn)
+		filter_free_fn(ctx.filter_data);
 }
-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v6 2/6] list-objects: refactor to process_tree_contents
  2018-08-15 23:19 ` [PATCH v6 0/6] filter: support for excluding all trees and blobs Matthew DeVore
  2018-08-15 23:19   ` [PATCH v6 1/6] list-objects: store common func args in struct Matthew DeVore
@ 2018-08-15 23:19   ` Matthew DeVore
  2018-08-15 23:19   ` [PATCH v6 3/6] list-objects: always parse trees gently Matthew DeVore
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-15 23:19 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster

This will be used in a follow-up patch to reduce indentation needed when
invoking the logic conditionally. i.e. rather than:

if (foo) {
	while (...) {
		/* this is very indented */
	}
}

we will have:

if (foo)
	process_tree_contents(...);

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 68 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 584518a3f..ccc529e5e 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -94,6 +94,46 @@ static void process_gitlink(struct traversal_context *ctx,
 	/* Nothing to do */
 }
 
+static void process_tree(struct traversal_context *ctx,
+			 struct tree *tree,
+			 struct strbuf *base,
+			 const char *name);
+
+static void process_tree_contents(struct traversal_context *ctx,
+				  struct tree *tree,
+				  struct strbuf *base)
+{
+	struct tree_desc desc;
+	struct name_entry entry;
+	enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
+		all_entries_interesting : entry_not_interesting;
+
+	init_tree_desc(&desc, tree->buffer, tree->size);
+
+	while (tree_entry(&desc, &entry)) {
+		if (match != all_entries_interesting) {
+			match = tree_entry_interesting(&entry, base, 0,
+						       &ctx->revs->diffopt.pathspec);
+			if (match == all_entries_not_interesting)
+				break;
+			if (match == entry_not_interesting)
+				continue;
+		}
+
+		if (S_ISDIR(entry.mode))
+			process_tree(ctx,
+				     lookup_tree(the_repository, entry.oid),
+				     base, entry.path);
+		else if (S_ISGITLINK(entry.mode))
+			process_gitlink(ctx, entry.oid->hash,
+					base, entry.path);
+		else
+			process_blob(ctx,
+				     lookup_blob(the_repository, entry.oid),
+				     base, entry.path);
+	}
+}
+
 static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
 			 struct strbuf *base,
@@ -101,10 +141,6 @@ static void process_tree(struct traversal_context *ctx,
 {
 	struct object *obj = &tree->object;
 	struct rev_info *revs = ctx->revs;
-	struct tree_desc desc;
-	struct name_entry entry;
-	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
-		all_entries_interesting: entry_not_interesting;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 	int gently = revs->ignore_missing_links ||
@@ -144,29 +180,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
-
-	while (tree_entry(&desc, &entry)) {
-		if (match != all_entries_interesting) {
-			match = tree_entry_interesting(&entry, base, 0,
-						       &revs->diffopt.pathspec);
-			if (match == all_entries_not_interesting)
-				break;
-			if (match == entry_not_interesting)
-				continue;
-		}
-
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
-		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(ctx, entry.oid->hash, base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
-	}
+	process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v6 3/6] list-objects: always parse trees gently
  2018-08-15 23:19 ` [PATCH v6 0/6] filter: support for excluding all trees and blobs Matthew DeVore
  2018-08-15 23:19   ` [PATCH v6 1/6] list-objects: store common func args in struct Matthew DeVore
  2018-08-15 23:19   ` [PATCH v6 2/6] list-objects: refactor to process_tree_contents Matthew DeVore
@ 2018-08-15 23:19   ` Matthew DeVore
  2018-08-15 23:19   ` [PATCH v6 4/6] rev-list: handle missing tree objects properly Matthew DeVore
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-15 23:19 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster

If parsing fails when revs->ignore_missing_links and
revs->exclude_promisor_objects are both false, we print the OID anyway
in the die("bad tree object...") call, so any message printed by
parse_tree_gently() is superfluous.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index ccc529e5e..f9b51db7a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,8 +143,6 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
-	int gently = revs->ignore_missing_links ||
-		     revs->exclude_promisor_objects;
 
 	if (!revs->tree_objects)
 		return;
@@ -152,7 +150,7 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, gently) < 0) {
+	if (parse_tree_gently(tree, 1) < 0) {
 		if (revs->ignore_missing_links)
 			return;
 
-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v6 4/6] rev-list: handle missing tree objects properly
  2018-08-15 23:19 ` [PATCH v6 0/6] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (2 preceding siblings ...)
  2018-08-15 23:19   ` [PATCH v6 3/6] list-objects: always parse trees gently Matthew DeVore
@ 2018-08-15 23:19   ` Matthew DeVore
  2018-08-15 23:19   ` [PATCH v6 5/6] revision: mark non-user-given objects instead Matthew DeVore
  2018-08-15 23:19   ` [PATCH v6 6/6] list-objects-filter: implement filter tree:0 Matthew DeVore
  5 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-15 23:19 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster

Previously, we assumed only blob objects could be missing. This patch
makes rev-list handle missing trees like missing blobs. The --missing=*
and --exclude-promisor-objects flags now work for trees as they already
do for blobs. This is demonstrated in t6112.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 builtin/rev-list.c                     | 11 ++++---
 list-objects.c                         | 11 +++++--
 revision.h                             | 15 +++++++++
 t/t0410-partial-clone.sh               | 45 ++++++++++++++++++++++++++
 t/t5317-pack-objects-filter-objects.sh | 13 ++++++++
 t/t6112-rev-list-filters-objects.sh    | 17 ++++++++++
 6 files changed, 105 insertions(+), 7 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5b07f3f4a..49d6deed7 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -6,6 +6,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "object.h"
 #include "object-store.h"
 #include "pack.h"
 #include "pack-bitmap.h"
@@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj)
 	 */
 	switch (arg_missing_action) {
 	case MA_ERROR:
-		die("missing blob object '%s'", oid_to_hex(&obj->oid));
+		die("missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	case MA_ALLOW_ANY:
@@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj)
 	case MA_ALLOW_PROMISOR:
 		if (is_promisor_object(&obj->oid))
 			return;
-		die("unexpected missing blob object '%s'",
-		    oid_to_hex(&obj->oid));
+		die("unexpected missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	default:
@@ -235,7 +237,7 @@ static inline void finish_object__ma(struct object *obj)
 static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
+	if (!has_object_file(&obj->oid)) {
 		finish_object__ma(obj);
 		return 1;
 	}
@@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	init_revisions(&revs, prefix);
 	revs.abbrev = DEFAULT_ABBREV;
 	revs.commit_format = CMIT_FMT_UNSPECIFIED;
+	revs.do_not_die_on_missing_tree = 1;
 
 	/*
 	 * Scan the argument list before invoking setup_revisions(), so that we
diff --git a/list-objects.c b/list-objects.c
index f9b51db7a..243192af5 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,6 +143,7 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
+	int failed_parse;
 
 	if (!revs->tree_objects)
 		return;
@@ -150,7 +151,9 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, 1) < 0) {
+
+	failed_parse = parse_tree_gently(tree, 1);
+	if (failed_parse) {
 		if (revs->ignore_missing_links)
 			return;
 
@@ -163,7 +166,8 @@ static void process_tree(struct traversal_context *ctx,
 		    is_promisor_object(&obj->oid))
 			return;
 
-		die("bad tree object %s", oid_to_hex(&obj->oid));
+		if (!revs->do_not_die_on_missing_tree)
+			die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
 	strbuf_addstr(base, name);
@@ -178,7 +182,8 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	process_tree_contents(ctx, tree, base);
+	if (!failed_parse)
+		process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
diff --git a/revision.h b/revision.h
index c599c34da..5118aaaa9 100644
--- a/revision.h
+++ b/revision.h
@@ -125,6 +125,21 @@ struct rev_info {
 			line_level_traverse:1,
 			tree_blobs_in_commit_order:1,
 
+			/*
+			 * Blobs are shown without regard for their existence.
+			 * But not so for trees: unless exclude_promisor_objects
+			 * is set and the tree in question is a promisor object;
+			 * OR ignore_missing_links is set, the revision walker
+			 * dies with a "bad tree object HASH" message when
+			 * encountering a missing tree. For callers that can
+			 * handle missing trees and want them to be filterable
+			 * and showable, set this to true. The revision walker
+			 * will filter and show such a missing tree as usual,
+			 * but will not attempt to recurse into this tree
+			 * object.
+			 */
+			do_not_die_on_missing_tree:1,
+
 			/* for internal use only */
 			exclude_promisor_objects:1;
 
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 4984ca583..a1b93c72c 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -186,6 +186,51 @@ test_expect_success 'rev-list stops traversal at missing and promised commit' '
 	! grep $FOO out
 '
 
+test_expect_success 'missing tree objects with --missing=allow-promisor and --exclude-promisor-objects' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo foo &&
+	test_commit -C repo bar &&
+	test_commit -C repo baz &&
+
+	promise_and_delete $(git -C repo rev-parse bar^{tree}) &&
+	promise_and_delete $(git -C repo rev-parse foo^{tree}) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo rev-list --missing=allow-promisor --objects HEAD >objs 2>rev_list_err &&
+	test_line_count = 0 rev_list_err &&
+	# 3 commits, 3 blobs, and 1 tree
+	test_line_count = 7 objs &&
+
+	# Do the same for --exclude-promisor-objects, but with all trees gone.
+	promise_and_delete $(git -C repo rev-parse baz^{tree}) &&
+	git -C repo rev-list --exclude-promisor-objects --objects HEAD >objs 2>rev_list_err &&
+	test_line_count = 0 rev_list_err &&
+	# 3 commits, no blobs or trees
+	test_line_count = 3 objs
+'
+
+test_expect_success 'missing non-root tree object and rev-list' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	mkdir repo/dir &&
+	echo foo > repo/dir/foo &&
+	git -C repo add dir/foo &&
+	git -C repo commit -m "commit dir/foo" &&
+
+	promise_and_delete $(git -C repo rev-parse HEAD:dir) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err &&
+	test_line_count = 0 rev_list_err &&
+	# 1 commit and 1 tree
+	test_line_count = 2 objs
+'
+
 test_expect_success 'rev-list stops traversal at missing and promised tree' '
 	rm -rf repo &&
 	test_create_repo repo &&
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 6710c8bc8..5e35f33bf 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -59,6 +59,19 @@ test_expect_success 'verify normal and blob:none packfiles have same commits/tre
 	test_cmp observed expected
 '
 
+test_expect_success 'get an error for missing tree object' '
+	git init r5 &&
+	echo foo > r5/foo &&
+	git -C r5 add foo &&
+	git -C r5 commit -m "foo" &&
+	del=$(git -C r5 rev-parse HEAD^{tree} | sed "s|..|&/|") &&
+	rm r5/.git/objects/$del &&
+	test_must_fail git -C r5 pack-objects --rev --stdout 2>bad_tree <<-EOF &&
+	HEAD
+	EOF
+	grep -q "bad tree object" bad_tree
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 0a37dd5f9..fc0f92a16 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -196,6 +196,23 @@ test_expect_success 'verify sparse:oid=oid-ish omits top-level files' '
 	test_cmp observed expected
 '
 
+test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for trees' '
+	TREE=$(git -C r3 rev-parse HEAD:dir1) &&
+
+	rm r3/.git/objects/$(echo $TREE | sed "s|^..|&/|") &&
+
+	git -C r3 rev-list --quiet --missing=print --objects HEAD >missing_objs 2>rev_list_err &&
+	echo "?$TREE" >expected &&
+	test_cmp expected missing_objs &&
+
+	# do not complain when a missing tree cannot be parsed
+	test_line_count = 0 rev_list_err &&
+
+	git -C r3 rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err &&
+	! grep $TREE objs &&
+	test_line_count = 0 rev_list_err
+'
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v6 5/6] revision: mark non-user-given objects instead
  2018-08-15 23:19 ` [PATCH v6 0/6] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (3 preceding siblings ...)
  2018-08-15 23:19   ` [PATCH v6 4/6] rev-list: handle missing tree objects properly Matthew DeVore
@ 2018-08-15 23:19   ` Matthew DeVore
  2018-08-15 23:19   ` [PATCH v6 6/6] list-objects-filter: implement filter tree:0 Matthew DeVore
  5 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-15 23:19 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster

Currently, list-objects.c incorrectly treats all root trees of commits
as USER_GIVEN. Also, it would be easier to mark objects that are
non-user-given instead of user-given, since the places in the code
where we access an object through a reference are more obvious than
the places where we access an object that was given by the user.

Resolve these two problems by introducing a flag NOT_USER_GIVEN that
marks blobs and trees that are non-user-given, replacing USER_GIVEN.
(Only blobs and trees are marked because this mark is only used when
filtering objects, and filtering of other types of objects is not
supported yet.)

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 31 ++++++++++++++++++-------------
 revision.c     |  1 -
 revision.h     | 10 +++++++---
 3 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 243192af5..7a1a0929d 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -53,7 +53,7 @@ static void process_blob(struct traversal_context *ctx,
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BLOB, obj,
 				   path->buf, &path->buf[pathlen],
 				   ctx->filter_data);
@@ -120,17 +120,19 @@ static void process_tree_contents(struct traversal_context *ctx,
 				continue;
 		}
 
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
+		if (S_ISDIR(entry.mode)) {
+			struct tree *t = lookup_tree(the_repository, entry.oid);
+			t->object.flags |= NOT_USER_GIVEN;
+			process_tree(ctx, t, base, entry.path);
+		}
 		else if (S_ISGITLINK(entry.mode))
 			process_gitlink(ctx, entry.oid->hash,
 					base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
+		else {
+			struct blob *b = lookup_blob(the_repository, entry.oid);
+			b->object.flags |= NOT_USER_GIVEN;
+			process_blob(ctx, b, base, entry.path);
+		}
 	}
 }
 
@@ -171,7 +173,7 @@ static void process_tree(struct traversal_context *ctx,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -185,7 +187,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (!failed_parse)
 		process_tree_contents(ctx, tree, base);
 
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -301,8 +303,11 @@ static void do_traverse(struct traversal_context *ctx)
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
-		if (get_commit_tree(commit))
-			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		if (get_commit_tree(commit)) {
+			struct tree *tree = get_commit_tree(commit);
+			tree->object.flags |= NOT_USER_GIVEN;
+			add_pending_tree(ctx->revs, tree);
+		}
 		ctx->show_commit(commit, ctx->show_data);
 
 		if (ctx->revs->tree_blobs_in_commit_order)
diff --git a/revision.c b/revision.c
index 062749437..6d355b43c 100644
--- a/revision.c
+++ b/revision.c
@@ -175,7 +175,6 @@ static void add_pending_object_with_path(struct rev_info *revs,
 		strbuf_release(&buf);
 		return; /* do not add the commit itself */
 	}
-	obj->flags |= USER_GIVEN;
 	add_object_array_with_path(obj, name, &revs->pending, mode, path);
 }
 
diff --git a/revision.h b/revision.h
index 5118aaaa9..2d381e636 100644
--- a/revision.h
+++ b/revision.h
@@ -8,7 +8,11 @@
 #include "diff.h"
 #include "commit-slab-decl.h"
 
-/* Remember to update object flag allocation in object.h */
+/* Remember to update object flag allocation in object.h
+ * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
+ * filtering trees and blobs, but it may be useful to support filtering commits
+ * in the future.
+ */
 #define SEEN		(1u<<0)
 #define UNINTERESTING   (1u<<1)
 #define TREESAME	(1u<<2)
@@ -20,9 +24,9 @@
 #define SYMMETRIC_LEFT	(1u<<8)
 #define PATCHSAME	(1u<<9)
 #define BOTTOM		(1u<<10)
-#define USER_GIVEN	(1u<<25) /* given directly by the user */
+#define NOT_USER_GIVEN	(1u<<25) /* tree or blob not given directly by user */
 #define TRACK_LINEAR	(1u<<26)
-#define ALL_REV_FLAGS	(((1u<<11)-1) | USER_GIVEN | TRACK_LINEAR)
+#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)
 
 #define DECORATE_SHORT_REFS	1
 #define DECORATE_FULL_REFS	2
-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v6 6/6] list-objects-filter: implement filter tree:0
  2018-08-15 23:19 ` [PATCH v6 0/6] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (4 preceding siblings ...)
  2018-08-15 23:19   ` [PATCH v6 5/6] revision: mark non-user-given objects instead Matthew DeVore
@ 2018-08-15 23:19   ` Matthew DeVore
  2018-08-17 21:42     ` Stefan Beller
  2018-08-18 16:17     ` Duy Nguyen
  5 siblings, 2 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-08-15 23:19 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster

Teach list-objects the "tree:0" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:0 - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also consider only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
would filter out all but the root tree and blobs. In order to avoid
confusion between 0 and capital O, the documentation was worded in a
somewhat round-about way that also hints at this future improvement to
the feature.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 Documentation/rev-list-options.txt     |  5 +++
 list-objects-filter-options.c          |  4 +++
 list-objects-filter-options.h          |  1 +
 list-objects-filter.c                  | 50 ++++++++++++++++++++++++++
 t/t5317-pack-objects-filter-objects.sh | 28 +++++++++++++++
 t/t5616-partial-clone.sh               | 38 ++++++++++++++++++++
 t/t6112-rev-list-filters-objects.sh    | 12 +++++++
 7 files changed, 138 insertions(+)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 7b273635d..5f1672913 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -731,6 +731,11 @@ the requested refs.
 +
 The form '--filter=sparse:path=<path>' similarly uses a sparse-checkout
 specification contained in <path>.
++
+The form '--filter=tree:<depth>' omits all blobs and trees whose depth
+from the root tree is >= <depth> (minimum depth if an object is located
+at multiple depths in the commits traversed). Currently, only <depth>=0
+is supported, which omits all blobs and trees.
 
 --no-filter::
 	Turn off any previous `--filter=` argument.
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..a28382940 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
 			return 0;
 		}
 
+	} else if (!strcmp(arg, "tree:0")) {
+		filter_options->choice = LOFC_TREE_NONE;
+		return 0;
+
 	} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
 		struct object_context oc;
 		struct object_id sparse_oid;
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 0000a61f8..af64e5c66 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -10,6 +10,7 @@ enum list_objects_filter_choice {
 	LOFC_DISABLED = 0,
 	LOFC_BLOB_NONE,
 	LOFC_BLOB_LIMIT,
+	LOFC_TREE_NONE,
 	LOFC_SPARSE_OID,
 	LOFC_SPARSE_PATH,
 	LOFC__COUNT /* must be last */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..8e3caf5bf 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -80,6 +80,55 @@ static void *filter_blobs_none__init(
 	return d;
 }
 
+/*
+ * A filter for list-objects to omit ALL trees and blobs from the traversal.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_trees_none_data {
+	struct oidset *omits;
+};
+
+static enum list_objects_filter_result filter_trees_none(
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data_)
+{
+	struct filter_trees_none_data *filter_data = filter_data_;
+
+	switch (filter_situation) {
+	default:
+		die("unknown filter_situation");
+		return LOFR_ZERO;
+
+	case LOFS_BEGIN_TREE:
+	case LOFS_BLOB:
+		if (filter_data->omits)
+			oidset_insert(filter_data->omits, &obj->oid);
+		return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
+
+	case LOFS_END_TREE:
+		assert(obj->type == OBJ_TREE);
+		return LOFR_ZERO;
+
+	}
+}
+
+static void* filter_trees_none__init(
+	struct oidset *omitted,
+	struct list_objects_filter_options *filter_options,
+	filter_object_fn *filter_fn,
+	filter_free_fn *filter_free_fn)
+{
+	struct filter_trees_none_data *d = xcalloc(1, sizeof(*d));
+	d->omits = omitted;
+
+	*filter_fn = filter_trees_none;
+	*filter_free_fn = free;
+	return d;
+}
+
 /*
  * A filter for list-objects to omit large blobs.
  * And to OPTIONALLY collect a list of the omitted OIDs.
@@ -374,6 +423,7 @@ static filter_init_fn s_filters[] = {
 	NULL,
 	filter_blobs_none__init,
 	filter_blobs_limit__init,
+	filter_trees_none__init,
 	filter_sparse_oid__init,
 	filter_sparse_path__init,
 };
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 5e35f33bf..7a4d49ea1 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -72,6 +72,34 @@ test_expect_success 'get an error for missing tree object' '
 	grep -q "bad tree object" bad_tree
 '
 
+test_expect_success 'setup for tests of tree:0' '
+	mkdir r1/subtree &&
+	echo "This is a file in a subtree" >r1/subtree/file &&
+	git -C r1 add subtree/file &&
+	git -C r1 commit -m subtree
+'
+
+test_expect_success 'verify tree:0 packfile has no blobs or trees' '
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	! grep -E "tree|blob" objs
+'
+
+test_expect_success 'grab tree directly when using tree:0' '
+	# We should get the tree specified directly but not its blobs or subtrees.
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD:
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	awk "/tree|blob/{print \$1}" objs >trees_and_blobs &&
+	git -C r1 rev-parse HEAD: >expected &&
+	test_cmp trees_and_blobs expected
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index bbbe7537d..8eeb85fbc 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -154,6 +154,44 @@ test_expect_success 'partial clone with transfer.fsckobjects=1 uses index-pack -
 	grep "git index-pack.*--fsck-objects" trace
 '
 
+test_expect_success 'use fsck before and after manually fetching a missing subtree' '
+	# push new commit so server has a subtree
+	mkdir src/dir &&
+	echo "in dir" >src/dir/file.txt &&
+	git -C src add dir/file.txt &&
+	git -C src commit -m "file in dir" &&
+	git -C src push -u srv master &&
+	SUBTREE=$(git -C src rev-parse HEAD:dir) &&
+
+	rm -rf dst &&
+	git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
+	git -C dst fsck &&
+
+	# Make sure we only have commits, and all trees and blobs are missing.
+	git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
+	awk -f print_1.awk fetched_objects \
+		| xargs -n1 git -C dst cat-file -t >fetched_types &&
+	sort fetched_types -u >unique_types.observed &&
+	echo commit >unique_types.expected &&
+	test_cmp unique_types.observed unique_types.expected &&
+
+	# Auto-fetch a tree with cat-file.
+	git -C dst cat-file -p $SUBTREE >tree_contents &&
+	grep file.txt tree_contents &&
+
+	# fsck still works after an auto-fetch of a tree.
+	git -C dst fsck &&
+
+	# Auto-fetch all remaining trees and blobs with --missing=error
+	git -C dst rev-list master --missing=error --objects >fetched_objects &&
+	test_line_count = 70 fetched_objects &&
+	awk -f print_1.awk fetched_objects \
+		| xargs -n1 git -C dst cat-file -t >fetched_types &&
+	sort fetched_types -u >unique_types.observed &&
+	printf "blob\ncommit\ntree\n" >unique_types.expected &&
+	test_cmp unique_types.observed unique_types.expected
+'
+
 test_expect_success 'partial clone fetches blobs pointed to by refs even if normally filtered out' '
 	rm -rf src dst &&
 	git init src &&
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index fc0f92a16..30bf1c73e 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -213,6 +213,18 @@ test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for tre
 	test_line_count = 0 rev_list_err
 '
 
+# Test tree:0 filter.
+
+test_expect_success 'verify tree:0 includes trees in "filtered" output' '
+	git -C r3 rev-list HEAD --quiet --objects --filter-print-omitted --filter=tree:0 \
+		| awk -f print_1.awk \
+		| sed s/~// \
+		| xargs -n1 git -C r3 cat-file -t \
+		| sort -u >filtered_types &&
+	printf "blob\ntree\n" > expected &&
+	test_cmp filtered_types expected
+'
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.18.0.865.gffc8e1a3cd6-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
  2018-08-15 23:19   ` [PATCH v6 6/6] list-objects-filter: implement filter tree:0 Matthew DeVore
@ 2018-08-17 21:42     ` Stefan Beller
  2018-08-17 22:19       ` Matthew DeVore
  2018-08-18 16:17     ` Duy Nguyen
  1 sibling, 1 reply; 151+ messages in thread
From: Stefan Beller @ 2018-08-17 21:42 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: git, Jeff Hostetler, Jeff Hostetler, Jeff King, Stefan Beller,
	Jonathan Tan, Junio C Hamano

On Wed, Aug 15, 2018 at 4:23 PM Matthew DeVore <matvore@google.com> wrote:
>
> Teach list-objects the "tree:0" filter which allows for filtering
> out all tree and blob objects (unless other objects are explicitly
> specified by the user). The purpose of this patch is to allow smaller
> partial clones.
>
> The name of this filter - tree:0 - does not explicitly specify that
> it also filters out all blobs, but this should not cause much confusion
> because blobs are not at all useful without the trees that refer to
> them.
>
> I also consider only:commits as a name, but this is inaccurate because
> it suggests that annotated tags are omitted, but actually they are
> included.

Speaking of tag objects, it is possible to tag anything, including blobs.
Would a blob that is tagged (hence reachable without a tree) be not
filtered by tree:0 (or in the future any deeper depth) ?

I found this series a good read, despite my unfamiliarity of the
partial cloning.

One situation where I scratched my head for a second were previous patches
that  use "test_line_count = 0 rev_list_err" whereas using test_must_be_empty
would be an equally good choice (I am more used to the latter than the former)

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
  2018-08-17 21:42     ` Stefan Beller
@ 2018-08-17 22:19       ` Matthew DeVore
  2018-08-17 22:28         ` Stefan Beller
  0 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-08-17 22:19 UTC (permalink / raw)
  To: sbeller
  Cc: git, git, jeffhost, Jeff King, Stefan Beller, Jonathan Tan,
	Junio C Hamano

On Fri, Aug 17, 2018 at 2:42 PM Stefan Beller <sbeller@google.com> wrote:
>
> On Wed, Aug 15, 2018 at 4:23 PM Matthew DeVore <matvore@google.com> wrote:
> >
> > Teach list-objects the "tree:0" filter which allows for filtering
> > out all tree and blob objects (unless other objects are explicitly
> > specified by the user). The purpose of this patch is to allow smaller
> > partial clones.
> >
> > The name of this filter - tree:0 - does not explicitly specify that
> > it also filters out all blobs, but this should not cause much confusion
> > because blobs are not at all useful without the trees that refer to
> > them.
> >
> > I also consider only:commits as a name, but this is inaccurate because
> > it suggests that annotated tags are omitted, but actually they are
> > included.
>
> Speaking of tag objects, it is possible to tag anything, including blobs.
> Would a blob that is tagged (hence reachable without a tree) be not
> filtered by tree:0 (or in the future any deeper depth) ?
I think so. If I try to fetch a tagged tree or blob, it should fetch
that object itself, since I'm referring to it explicitly in the git
pack-objects arguments (I mention fetch since git rev-list apparently
doesn't support specifying non-commits on the command line). This is
similar to how I can fetch a commit that would otherwise be filtered
*if* I specify it explicitly (rather than a child commit).

If you're fetching a tagged tree, then for depth=0, it will fetch the
given tree only, and not fetch any referents of an explicitly-given
tree. For depth=1, it will fetch all direct referents.

If you're fetching a commit, then for depth=0, you will not get any
tree objects, and for depth=1, you'll get only the root tree object
and none of its referrents. So the commit itself is like a "layer" in
the depth count.

>
> I found this series a good read, despite my unfamiliarity of the
> partial cloning.
>
> One situation where I scratched my head for a second were previous patches
> that  use "test_line_count = 0 rev_list_err" whereas using test_must_be_empty
> would be an equally good choice (I am more used to the latter than the former)

Done. Here is an interdiff (sorry, the tab characters are not
maintained by my mail client):

diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index a1b93c72c..7e2f7ff26 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -200,14 +200,14 @@ test_expect_success 'missing tree objects with
--missing=allow-promisor and --ex
  git -C repo config extensions.partialclone "arbitrary string" &&

  git -C repo rev-list --missing=allow-promisor --objects HEAD >objs
2>rev_list_err &&
- test_line_count = 0 rev_list_err &&
+ test_must_be_empty rev_list_err &&
  # 3 commits, 3 blobs, and 1 tree
  test_line_count = 7 objs &&

  # Do the same for --exclude-promisor-objects, but with all trees gone.
  promise_and_delete $(git -C repo rev-parse baz^{tree}) &&
  git -C repo rev-list --exclude-promisor-objects --objects HEAD >objs
2>rev_list_err &&
- test_line_count = 0 rev_list_err &&
+ test_must_be_empty rev_list_err &&
  # 3 commits, no blobs or trees
  test_line_count = 3 objs
 '
@@ -226,7 +226,7 @@ test_expect_success 'missing non-root tree object
and rev-list' '
  git -C repo config extensions.partialclone "arbitrary string" &&

  git -C repo rev-list --missing=allow-any --objects HEAD >objs
2>rev_list_err &&
- test_line_count = 0 rev_list_err &&
+ test_must_be_empty rev_list_err &&
  # 1 commit and 1 tree
  test_line_count = 2 objs
 '
diff --git a/t/t6112-rev-list-filters-objects.sh
b/t/t6112-rev-list-filters-objects.sh
index 30bf1c73e..27040d73a 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -206,11 +206,11 @@ test_expect_success 'rev-list W/ --missing=print
and --missing=allow-any for tre
  test_cmp expected missing_objs &&

  # do not complain when a missing tree cannot be parsed
- test_line_count = 0 rev_list_err &&
+ test_must_be_empty rev_list_err &&

  git -C r3 rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err &&
  ! grep $TREE objs &&
- test_line_count = 0 rev_list_err
+ test_must_be_empty rev_list_err
 '

 # Test tree:0 filter.

>
> Thanks,
> Stefan

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
  2018-08-17 22:19       ` Matthew DeVore
@ 2018-08-17 22:28         ` Stefan Beller
  2018-08-20 23:30           ` Matthew DeVore
  0 siblings, 1 reply; 151+ messages in thread
From: Stefan Beller @ 2018-08-17 22:28 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: git, Jeff Hostetler, Jeff Hostetler, Jeff King, Stefan Beller,
	Jonathan Tan, Junio C Hamano

On Fri, Aug 17, 2018 at 3:20 PM Matthew DeVore <matvore@google.com> wrote:
>
> On Fri, Aug 17, 2018 at 2:42 PM Stefan Beller <sbeller@google.com> wrote:
> >
> > On Wed, Aug 15, 2018 at 4:23 PM Matthew DeVore <matvore@google.com> wrote:
> > >
> > > Teach list-objects the "tree:0" filter which allows for filtering
> > > out all tree and blob objects (unless other objects are explicitly
> > > specified by the user). The purpose of this patch is to allow smaller
> > > partial clones.
> > >
> > > The name of this filter - tree:0 - does not explicitly specify that
> > > it also filters out all blobs, but this should not cause much confusion
> > > because blobs are not at all useful without the trees that refer to
> > > them.
> > >
> > > I also consider only:commits as a name, but this is inaccurate because
> > > it suggests that annotated tags are omitted, but actually they are
> > > included.
> >
> > Speaking of tag objects, it is possible to tag anything, including blobs.
> > Would a blob that is tagged (hence reachable without a tree) be not
> > filtered by tree:0 (or in the future any deeper depth) ?
> I think so. If I try to fetch a tagged tree or blob, it should fetch
> that object itself, since I'm referring to it explicitly in the git
> pack-objects arguments (I mention fetch since git rev-list apparently
> doesn't support specifying non-commits on the command line). This is
> similar to how I can fetch a commit that would otherwise be filtered
> *if* I specify it explicitly (rather than a child commit).
>
> If you're fetching a tagged tree, then for depth=0, it will fetch the
> given tree only, and not fetch any referents of an explicitly-given
> tree. For depth=1, it will fetch all direct referents.
>
> If you're fetching a commit, then for depth=0, you will not get any
> tree objects, and for depth=1, you'll get only the root tree object
> and none of its referrents. So the commit itself is like a "layer" in
> the depth count.

That seems smart. Thanks!

>
> >
> > I found this series a good read, despite my unfamiliarity of the
> > partial cloning.
> >
> > One situation where I scratched my head for a second were previous patches
> > that  use "test_line_count = 0 rev_list_err" whereas using test_must_be_empty
> > would be an equally good choice (I am more used to the latter than the former)
>
> Done. Here is an interdiff (sorry, the tab characters are not
> maintained by my mail client):

heh. Thanks for switching the style; I should have emphasized that
(after reflection) I found them equally good, I am used to one
over the other more.

So if that is the only issue brought up, I would not even ask for a resend.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
  2018-08-15 23:19   ` [PATCH v6 6/6] list-objects-filter: implement filter tree:0 Matthew DeVore
  2018-08-17 21:42     ` Stefan Beller
@ 2018-08-18 16:17     ` Duy Nguyen
  2018-08-20 13:04       ` Matthew DeVore
  1 sibling, 1 reply; 151+ messages in thread
From: Duy Nguyen @ 2018-08-18 16:17 UTC (permalink / raw)
  To: matvore
  Cc: Git Mailing List, Jeff Hostetler, Jeff Hostetler, Jeff King,
	Stefan Beller, Jonathan Tan, Junio C Hamano

On Thu, Aug 16, 2018 at 1:54 AM Matthew DeVore <matvore@google.com> wrote:
> diff --git a/list-objects-filter.c b/list-objects-filter.c
> index a0ba78b20..8e3caf5bf 100644
> --- a/list-objects-filter.c
> +++ b/list-objects-filter.c
> @@ -80,6 +80,55 @@ static void *filter_blobs_none__init(
>         return d;
>  }
>
> +/*
> + * A filter for list-objects to omit ALL trees and blobs from the traversal.
> + * Can OPTIONALLY collect a list of the omitted OIDs.
> + */
> +struct filter_trees_none_data {
> +       struct oidset *omits;
> +};
> +
> +static enum list_objects_filter_result filter_trees_none(
> +       enum list_objects_filter_situation filter_situation,
> +       struct object *obj,
> +       const char *pathname,
> +       const char *filename,
> +       void *filter_data_)
> +{
> +       struct filter_trees_none_data *filter_data = filter_data_;
> +
> +       switch (filter_situation) {
> +       default:
> +               die("unknown filter_situation");

This sounds like BUG() than die() without _(). And you probably want
to report filter_situation value too.

> +               return LOFR_ZERO;

And neither BUG() or die() returns.
-- 
Duy

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
  2018-08-18 16:17     ` Duy Nguyen
@ 2018-08-20 13:04       ` Matthew DeVore
  2018-08-20 18:38         ` Stefan Beller
  0 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-08-20 13:04 UTC (permalink / raw)
  To: pclouds
  Cc: git, Jeff Hostetler, Jeff Hostetler, Jeff King, Stefan Beller,
	Jonathan Tan, Junio C Hamano

There were many instances in this file where it seemed like BUG would be
better, so I created a new commit before this one to switch them over. The
interdiff is below.

BTW, why are there so many instances of "die" without "_"? I expect all
errors that may be caused by a user to be localized.

I'm going by the output of this: grep -IrE '\Wdie\([^_]' --exclude-dir=t

diff --git a/list-objects-filter.c b/list-objects-filter.c
index 8e3caf5bf..09b2b05d5 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -44,8 +44,7 @@ static enum list_objects_filter_result filter_blobs_none(

  	switch (filter_situation) {
  	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);

  	case LOFS_BEGIN_TREE:
  		assert(obj->type == OBJ_TREE);
@@ -99,8 +98,7 @@ static enum list_objects_filter_result filter_trees_none(

  	switch (filter_situation) {
  	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);

  	case LOFS_BEGIN_TREE:
  	case LOFS_BLOB:
@@ -151,8 +149,7 @@ static enum list_objects_filter_result  
filter_blobs_limit(

  	switch (filter_situation) {
  	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);

  	case LOFS_BEGIN_TREE:
  		assert(obj->type == OBJ_TREE);
@@ -257,8 +254,7 @@ static enum list_objects_filter_result filter_sparse(

  	switch (filter_situation) {
  	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);

  	case LOFS_BEGIN_TREE:
  		assert(obj->type == OBJ_TREE);
@@ -439,7 +435,7 @@ void *list_objects_filter__init(
  	assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);

  	if (filter_options->choice >= LOFC__COUNT)
-		die("invalid list-objects filter choice: %d",
+		BUG("invalid list-objects filter choice: %d",
  		    filter_options->choice);

  	init_fn = s_filters[filter_options->choice];

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
  2018-08-20 13:04       ` Matthew DeVore
@ 2018-08-20 18:38         ` Stefan Beller
  2018-08-20 23:20           ` Matthew DeVore
  2018-08-21 15:50           ` Duy Nguyen
  0 siblings, 2 replies; 151+ messages in thread
From: Stefan Beller @ 2018-08-20 18:38 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: Duy Nguyen, git, Jeff Hostetler, Jeff Hostetler, Jeff King,
	Stefan Beller, Jonathan Tan, Junio C Hamano

On Mon, Aug 20, 2018 at 6:18 AM Matthew DeVore <matvore@google.com> wrote:
>
> There were many instances in this file where it seemed like BUG would be
> better, so I created a new commit before this one to switch them over. The
> interdiff is below.
>
> BTW, why are there so many instances of "die" without "_"? I expect all
> errors that may be caused by a user to be localized.

Well, there is the porcelain layer to be consumed by a human user
and the plumbing that is good for scripts. And in scripts you might want
to grep for certain errors and react to that, so a non-localized error
message makes the script possible to run in any localisation.

The BUG is strictly for things that are due to Gits internals,
not for problematic user input. Problematic user input
definitely wants a die(...), and depending on the plumbing/porcelain
layer it may need to be _(translatable).

I think BUG() would never go with translated strings.

> I'm going by the output of this: grep -IrE '\Wdie\([^_]' --exclude-dir=t
>
> diff --git a/list-objects-filter.c b/list-objects-filter.c
> index 8e3caf5bf..09b2b05d5 100644
> --- a/list-objects-filter.c
> +++ b/list-objects-filter.c
> @@ -44,8 +44,7 @@ static enum list_objects_filter_result filter_blobs_none(
>
>         switch (filter_situation) {
>         default:
> -               die("unknown filter_situation");
> -               return LOFR_ZERO;
> +               BUG("unknown filter_situation: %d", filter_situation);
>
>         case LOFS_BEGIN_TREE:
>                 assert(obj->type == OBJ_TREE);
> @@ -99,8 +98,7 @@ static enum list_objects_filter_result filter_trees_none(
>
>         switch (filter_situation) {
>         default:
> -               die("unknown filter_situation");
> -               return LOFR_ZERO;
> +               BUG("unknown filter_situation: %d", filter_situation);
>
>         case LOFS_BEGIN_TREE:
>         case LOFS_BLOB:
> @@ -151,8 +149,7 @@ static enum list_objects_filter_result
> filter_blobs_limit(
>
>         switch (filter_situation) {
>         default:
> -               die("unknown filter_situation");
> -               return LOFR_ZERO;
> +               BUG("unknown filter_situation: %d", filter_situation);
>
>         case LOFS_BEGIN_TREE:
>                 assert(obj->type == OBJ_TREE);
> @@ -257,8 +254,7 @@ static enum list_objects_filter_result filter_sparse(
>
>         switch (filter_situation) {
>         default:
> -               die("unknown filter_situation");
> -               return LOFR_ZERO;
> +               BUG("unknown filter_situation: %d", filter_situation);

Up until here we just have replace the die by BUG in the default
case of the state machine switch. (We need the default due to strict
compile flags, but as filter_situation is an enum I thought we would not
as compilers are smart enough to see we got all values of the enum
covered).

I agree that keeping the defaults and having a BUG() is reasonable.


>
>         case LOFS_BEGIN_TREE:
>                 assert(obj->type == OBJ_TREE);
> @@ -439,7 +435,7 @@ void *list_objects_filter__init(
>         assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);
>
>         if (filter_options->choice >= LOFC__COUNT)
> -               die("invalid list-objects filter choice: %d",
> +               BUG("invalid list-objects filter choice: %d",
>                     filter_options->choice);

This also makes sense, combined with the assert before, this looks like
really defensive code.

I think this patch is a good idea!

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
  2018-08-20 18:38         ` Stefan Beller
@ 2018-08-20 23:20           ` Matthew DeVore
  2018-08-21  0:36             ` Stefan Beller
  2018-08-21 15:50           ` Duy Nguyen
  1 sibling, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-08-20 23:20 UTC (permalink / raw)
  To: Stefan Beller
  Cc: pclouds, git, git, jeffhost, Jeff King, Stefan Beller,
	Jonathan Tan, Junio C Hamano

On Mon, Aug 20, 2018 at 11:38 AM Stefan Beller <sbeller@google.com> wrote:
>
> On Mon, Aug 20, 2018 at 6:18 AM Matthew DeVore <matvore@google.com> wrote:
> >
> > There were many instances in this file where it seemed like BUG would be
> > better, so I created a new commit before this one to switch them over. The
> > interdiff is below.
> >
> > BTW, why are there so many instances of "die" without "_"? I expect all
> > errors that may be caused by a user to be localized.
>
> Well, there is the porcelain layer to be consumed by a human user
> and the plumbing that is good for scripts. And in scripts you might want
> to grep for certain errors and react to that, so a non-localized error
> message makes the script possible to run in any localisation.
>
> The BUG is strictly for things that are due to Gits internals,
> not for problematic user input. Problematic user input
> definitely wants a die(...), and depending on the plumbing/porcelain
> layer it may need to be _(translatable).
Ah I see. Plumbing commands are not translated. Makes perfect sense now.

>
> I think BUG() would never go with translated strings.
>
> > I'm going by the output of this: grep -IrE '\Wdie\([^_]' --exclude-dir=t
> >
> > diff --git a/list-objects-filter.c b/list-objects-filter.c
> > index 8e3caf5bf..09b2b05d5 100644
> > --- a/list-objects-filter.c
> > +++ b/list-objects-filter.c
> > @@ -44,8 +44,7 @@ static enum list_objects_filter_result filter_blobs_none(
> >
> >         switch (filter_situation) {
> >         default:
> > -               die("unknown filter_situation");
> > -               return LOFR_ZERO;
> > +               BUG("unknown filter_situation: %d", filter_situation);
> >
> >         case LOFS_BEGIN_TREE:
> >                 assert(obj->type == OBJ_TREE);
> > @@ -99,8 +98,7 @@ static enum list_objects_filter_result filter_trees_none(
> >
> >         switch (filter_situation) {
> >         default:
> > -               die("unknown filter_situation");
> > -               return LOFR_ZERO;
> > +               BUG("unknown filter_situation: %d", filter_situation);
> >
> >         case LOFS_BEGIN_TREE:
> >         case LOFS_BLOB:
> > @@ -151,8 +149,7 @@ static enum list_objects_filter_result
> > filter_blobs_limit(
> >
> >         switch (filter_situation) {
> >         default:
> > -               die("unknown filter_situation");
> > -               return LOFR_ZERO;
> > +               BUG("unknown filter_situation: %d", filter_situation);
> >
> >         case LOFS_BEGIN_TREE:
> >                 assert(obj->type == OBJ_TREE);
> > @@ -257,8 +254,7 @@ static enum list_objects_filter_result filter_sparse(
> >
> >         switch (filter_situation) {
> >         default:
> > -               die("unknown filter_situation");
> > -               return LOFR_ZERO;
> > +               BUG("unknown filter_situation: %d", filter_situation);
>
> Up until here we just have replace the die by BUG in the default
> case of the state machine switch. (We need the default due to strict
> compile flags, but as filter_situation is an enum I thought we would not
> as compilers are smart enough to see we got all values of the enum
> covered).
At the risk of going on a tangent, I assumed this was because enums
are really ints, and the "default" is there in case the enum somehow
got assigned to an int without a corresponding value. Either because
of a cast from an int that was out-of-range, or new values that were
obtained from arithmetic or bitwise operations on the declared enum
values, which created undeclared values.

>
> I agree that keeping the defaults and having a BUG() is reasonable.
>
>
> >
> >         case LOFS_BEGIN_TREE:
> >                 assert(obj->type == OBJ_TREE);
> > @@ -439,7 +435,7 @@ void *list_objects_filter__init(
> >         assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);
> >
> >         if (filter_options->choice >= LOFC__COUNT)
> > -               die("invalid list-objects filter choice: %d",
> > +               BUG("invalid list-objects filter choice: %d",
> >                     filter_options->choice);
>
> This also makes sense, combined with the assert before, this looks like
> really defensive code.
>
> I think this patch is a good idea!
Thank you for your feedback :)

>
> Thanks,
> Stefan

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
  2018-08-17 22:28         ` Stefan Beller
@ 2018-08-20 23:30           ` Matthew DeVore
  2018-08-21  0:29             ` Stefan Beller
  0 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-08-20 23:30 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, git, jeffhost, Jeff King, Stefan Beller, Jonathan Tan,
	Junio C Hamano

On Fri, Aug 17, 2018 at 3:28 PM Stefan Beller <sbeller@google.com> wrote:
>
> On Fri, Aug 17, 2018 at 3:20 PM Matthew DeVore <matvore@google.com> wrote:
> >
> > On Fri, Aug 17, 2018 at 2:42 PM Stefan Beller <sbeller@google.com> wrote:
> > >
> > > On Wed, Aug 15, 2018 at 4:23 PM Matthew DeVore <matvore@google.com> wrote:
> > > >
> > > > Teach list-objects the "tree:0" filter which allows for filtering
> > > > out all tree and blob objects (unless other objects are explicitly
> > > > specified by the user). The purpose of this patch is to allow smaller
> > > > partial clones.
> > > >
> > > > The name of this filter - tree:0 - does not explicitly specify that
> > > > it also filters out all blobs, but this should not cause much confusion
> > > > because blobs are not at all useful without the trees that refer to
> > > > them.
> > > >
> > > > I also consider only:commits as a name, but this is inaccurate because
> > > > it suggests that annotated tags are omitted, but actually they are
> > > > included.
> > >
> > > Speaking of tag objects, it is possible to tag anything, including blobs.
> > > Would a blob that is tagged (hence reachable without a tree) be not
> > > filtered by tree:0 (or in the future any deeper depth) ?
> > I think so. If I try to fetch a tagged tree or blob, it should fetch
> > that object itself, since I'm referring to it explicitly in the git
> > pack-objects arguments (I mention fetch since git rev-list apparently
> > doesn't support specifying non-commits on the command line). This is
> > similar to how I can fetch a commit that would otherwise be filtered
> > *if* I specify it explicitly (rather than a child commit).
> >
> > If you're fetching a tagged tree, then for depth=0, it will fetch the
> > given tree only, and not fetch any referents of an explicitly-given
> > tree. For depth=1, it will fetch all direct referents.
> >
> > If you're fetching a commit, then for depth=0, you will not get any
> > tree objects, and for depth=1, you'll get only the root tree object
> > and none of its referrents. So the commit itself is like a "layer" in
> > the depth count.
>
> That seems smart. Thanks!
>
> >
> > >
> > > I found this series a good read, despite my unfamiliarity of the
> > > partial cloning.
> > >
> > > One situation where I scratched my head for a second were previous patches
> > > that  use "test_line_count = 0 rev_list_err" whereas using test_must_be_empty
> > > would be an equally good choice (I am more used to the latter than the former)
> >
> > Done. Here is an interdiff (sorry, the tab characters are not
> > maintained by my mail client):
>
> heh. Thanks for switching the style; I should have emphasized that
> (after reflection) I found them equally good, I am used to one
> over the other more.
It seems marginally better to me. I also noticed a clean-up patch
going by that aggressively switched to test_must_be_empty wherever
possible: https://public-inbox.org/git/20180819215725.29001-1-szeder.dev@gmail.com/

OTOH, if it were up to me I would have just gotten rid of
test_must_be_empty and used an existing function with the right
argument, like `test_cmp /dev/null` - but using some form consistently
is the most important, whatever it is.

>
> So if that is the only issue brought up, I would not even ask for a resend.
>
> Thanks,
> Stefan

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
  2018-08-20 23:30           ` Matthew DeVore
@ 2018-08-21  0:29             ` Stefan Beller
  2018-08-21 21:46               ` Junio C Hamano
  0 siblings, 1 reply; 151+ messages in thread
From: Stefan Beller @ 2018-08-21  0:29 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: git, Jeff Hostetler, Jeff Hostetler, Jeff King, Stefan Beller,
	Jonathan Tan, Junio C Hamano

> > heh. Thanks for switching the style; I should have emphasized that
> > (after reflection) I found them equally good, I am used to one
> > over the other more.
> It seems marginally better to me. I also noticed a clean-up patch
> going by that aggressively switched to test_must_be_empty wherever
> possible: https://public-inbox.org/git/20180819215725.29001-1-szeder.dev@gmail.com/
>
> OTOH, if it were up to me I would have just gotten rid of
> test_must_be_empty and used an existing function with the right
> argument, like `test_cmp /dev/null` - but using some form consistently
> is the most important, whatever it is.

/dev/null, eh? It shows you don't use Windows on a day to day basis. ;-)
But yeah consistency is really good to have. :)

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
  2018-08-20 23:20           ` Matthew DeVore
@ 2018-08-21  0:36             ` Stefan Beller
  0 siblings, 0 replies; 151+ messages in thread
From: Stefan Beller @ 2018-08-21  0:36 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: Duy Nguyen, git, Jeff Hostetler, Jeff Hostetler, Jeff King,
	Stefan Beller, Jonathan Tan, Junio C Hamano

> At the risk of going on a tangent, I assumed this was because enums
> are really ints, and the "default" is there in case the enum somehow
> got assigned to an int without a corresponding value. Either because
> of a cast from an int that was out-of-range, or new values that were
> obtained from arithmetic or bitwise operations on the declared enum
> values, which created undeclared values.

See
374166cb381 (grep: catch a missing enum in switch statement, 2017-05-25)
or a bit date, but nevertheless an interesting read:
b8527d5fa61 (wt-status: fix possible use of uninitialized variable, 2013-03-21)

...compilers these days are just too smart to reason about them :-)

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
  2018-08-20 18:38         ` Stefan Beller
  2018-08-20 23:20           ` Matthew DeVore
@ 2018-08-21 15:50           ` Duy Nguyen
  1 sibling, 0 replies; 151+ messages in thread
From: Duy Nguyen @ 2018-08-21 15:50 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Matthew DeVore, Git Mailing List, Jeff Hostetler, Jeff Hostetler,
	Jeff King, Stefan Beller, Jonathan Tan, Junio C Hamano

On Mon, Aug 20, 2018 at 8:38 PM Stefan Beller <sbeller@google.com> wrote:
>
> On Mon, Aug 20, 2018 at 6:18 AM Matthew DeVore <matvore@google.com> wrote:
> >
> > There were many instances in this file where it seemed like BUG would be
> > better, so I created a new commit before this one to switch them over. The
> > interdiff is below.
> >
> > BTW, why are there so many instances of "die" without "_"? I expect all
> > errors that may be caused by a user to be localized.
>
> Well, there is the porcelain layer to be consumed by a human user
> and the plumbing that is good for scripts. And in scripts you might want
> to grep for certain errors and react to that, so a non-localized error
> message makes the script possible to run in any localisation.

I probably have a different view about this, but strings (as English
sentences) are for human only and should be translated. For machines
there should be well defined format (that  just might look like
English), not totally free text. In some case, this format can be as
simple as the "error/warning/fatal" prefix, which is left
untranslated, but the rest should be. There is no guarantee that these
die() messages will not change in the future, even left untranslated.
-- 
Duy

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
  2018-08-21  0:29             ` Stefan Beller
@ 2018-08-21 21:46               ` Junio C Hamano
  2018-08-22 18:00                 ` Stefan Beller
  0 siblings, 1 reply; 151+ messages in thread
From: Junio C Hamano @ 2018-08-21 21:46 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Matthew DeVore, git, Jeff Hostetler, Jeff Hostetler, Jeff King,
	Stefan Beller, Jonathan Tan

Stefan Beller <sbeller@google.com> writes:

>> ...
>> OTOH, if it were up to me I would have just gotten rid of
>> test_must_be_empty and used an existing function with the right
>> argument, like `test_cmp /dev/null` - but using some form consistently
>> is the most important, whatever it is.
>
> /dev/null, eh? It shows you don't use Windows on a day to day basis. ;-)
> But yeah consistency is really good to have. :)

Just to make sure we don't give wrong impression to bystanders, do
you mean that we should discourage using /dev/null in our tests or
scripts due to portability concerns?

I thought they had good enough emulation that writing /dev/null on
the command line in scripts do what we expect the shell to do; the
same thing can be said for calling open(2) on "/dev/null".

Back to the topic from the tangent, but there was a discussion on
choosing between "test_must_be_empty actual" vs "test_cmp empty
actual", and was even a proposal to trigger an error when an empty
file is given to test_cmp.  You two might want to join the party
there, perhaps?


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
  2018-08-21 21:46               ` Junio C Hamano
@ 2018-08-22 18:00                 ` Stefan Beller
  0 siblings, 0 replies; 151+ messages in thread
From: Stefan Beller @ 2018-08-22 18:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Matthew DeVore, git, Jeff Hostetler, Jeff Hostetler, Jeff King,
	Stefan Beller, Jonathan Tan

On Tue, Aug 21, 2018 at 2:46 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Stefan Beller <sbeller@google.com> writes:
>
> >> ...
> >> OTOH, if it were up to me I would have just gotten rid of
> >> test_must_be_empty and used an existing function with the right
> >> argument, like `test_cmp /dev/null` - but using some form consistently
> >> is the most important, whatever it is.
> >
> > /dev/null, eh? It shows you don't use Windows on a day to day basis. ;-)
> > But yeah consistency is really good to have. :)
>
> Just to make sure we don't give wrong impression to bystanders, do
> you mean that we should discourage using /dev/null in our tests or
> scripts due to portability concerns?

I would discourage reading /dev/null (as in `test_cmp /dev/null actual`)
over a more specific `test_must_be_empty` as that is easier to read.
(But I do neither en- or discourage the use of /dev/null in the implementation
of that function).

> I thought they had good enough emulation that writing /dev/null on
> the command line in scripts do what we expect the shell to do; the
> same thing can be said for calling open(2) on "/dev/null".

Oh, opening and reading is new to me, thanks!

> Back to the topic from the tangent, but there was a discussion on
> choosing between "test_must_be_empty actual" vs "test_cmp empty
> actual", and was even a proposal to trigger an error when an empty
> file is given to test_cmp.

Oh, that is an interesting way to ensure consistency.

>  You two might want to join the party
> there, perhaps?

I'll read into that.

Stefan

^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v7 0/7] filter: support for excluding all trees and blobs
  2018-08-09 22:44 [RFC PATCH 0/5] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (10 preceding siblings ...)
  2018-08-15 23:19 ` [PATCH v6 0/6] filter: support for excluding all trees and blobs Matthew DeVore
@ 2018-09-04 18:05 ` Matthew DeVore
  2018-09-04 18:05   ` [PATCH v7 1/7] list-objects: store common func args in struct Matthew DeVore
                     ` (7 more replies)
  2018-09-14  0:55 ` [PATCH v8 " Matthew DeVore
                   ` (4 subsequent siblings)
  16 siblings, 8 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-04 18:05 UTC (permalink / raw)
  To: sbeller, git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster, pclouds

I made the following changes since v6 of the patchset:
 - (suggested by Duy Nguyen) add a new commit which replaces uses of die() with
   BUG() in list-objects-filter.c wherever it corresponds to a coding error.
 - Replace die() with BUG() in new code.
 - Replace test_line_count = 0 with test_must_be_empty in new tests since the
   trend seems to be, based on other RFCs in progress, that we are standardizing
   on that phraseology. See:
   https://public-inbox.org/git/20180819215725.29001-1-szeder.dev@gmail.com/

As asked in the last "What's cooking in git.git" post, the status of this patch
is:
 - The original reviewer, Jonathan Tan, is on vacation and will be back later
   this week.
 - Stefan Beller has been reviewing the patchset in Jonathan's absence, and
   stated that it's a good read despite not being familiar with the code:
   https://public-inbox.org/git/CAGZ79kaWcGbyc2S5gOCU7NdvT4fN46jq4xK9MvTLAFBGhyuo2A@mail.gmail.com/
 - I haven't updated this patch in a while since we have been in RC for a while,
   but after this update I think it's ready. There hasn't been any comment or
   request for change to the patchset recently.

Matthew DeVore (7):
  list-objects: store common func args in struct
  list-objects: refactor to process_tree_contents
  list-objects: always parse trees gently
  rev-list: handle missing tree objects properly
  revision: mark non-user-given objects instead
  list-objects-filter: use BUG rather than die
  list-objects-filter: implement filter tree:0

 Documentation/rev-list-options.txt     |   5 +
 builtin/rev-list.c                     |  11 +-
 list-objects-filter-options.c          |   4 +
 list-objects-filter-options.h          |   1 +
 list-objects-filter.c                  |  60 ++++++-
 list-objects.c                         | 232 +++++++++++++------------
 revision.c                             |   1 -
 revision.h                             |  25 ++-
 t/t0410-partial-clone.sh               |  45 +++++
 t/t5317-pack-objects-filter-objects.sh |  41 +++++
 t/t5616-partial-clone.sh               |  38 ++++
 t/t6112-rev-list-filters-objects.sh    |  29 ++++
 12 files changed, 367 insertions(+), 125 deletions(-)

-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v7 1/7] list-objects: store common func args in struct
  2018-09-04 18:05 ` [PATCH v7 0/7] filter: support for excluding all trees and blobs Matthew DeVore
@ 2018-09-04 18:05   ` Matthew DeVore
  2018-09-04 18:05   ` [PATCH v7 2/7] list-objects: refactor to process_tree_contents Matthew DeVore
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-04 18:05 UTC (permalink / raw)
  To: sbeller, git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster, pclouds

This will make utility functions easier to create, as done by the next
patch.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 158 +++++++++++++++++++++++--------------------------
 1 file changed, 74 insertions(+), 84 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index c99c47ac1..584518a3f 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -12,20 +12,25 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static void process_blob(struct rev_info *revs,
+struct traversal_context {
+	struct rev_info *revs;
+	show_object_fn show_object;
+	show_commit_fn show_commit;
+	void *show_data;
+	filter_object_fn filter_fn;
+	void *filter_data;
+};
+
+static void process_blob(struct traversal_context *ctx,
 			 struct blob *blob,
-			 show_object_fn show,
 			 struct strbuf *path,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &blob->object;
 	size_t pathlen;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
-	if (!revs->blob_objects)
+	if (!ctx->revs->blob_objects)
 		return;
 	if (!obj)
 		die("bad blob object");
@@ -41,21 +46,21 @@ static void process_blob(struct rev_info *revs,
 	 * may cause the actual filter to report an incomplete list
 	 * of missing objects.
 	 */
-	if (revs->exclude_promisor_objects &&
+	if (ctx->revs->exclude_promisor_objects &&
 	    !has_object_file(&obj->oid) &&
 	    is_promisor_object(&obj->oid))
 		return;
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BLOB, obj,
-			      path->buf, &path->buf[pathlen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BLOB, obj,
+				   path->buf, &path->buf[pathlen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, path->buf, cb_data);
+		ctx->show_object(obj, path->buf, ctx->show_data);
 	strbuf_setlen(path, pathlen);
 }
 
@@ -81,26 +86,21 @@ static void process_blob(struct rev_info *revs,
  * the link, and how to do it. Whether it necessarily makes
  * any sense what-so-ever to ever do that is another issue.
  */
-static void process_gitlink(struct rev_info *revs,
+static void process_gitlink(struct traversal_context *ctx,
 			    const unsigned char *sha1,
-			    show_object_fn show,
 			    struct strbuf *path,
-			    const char *name,
-			    void *cb_data)
+			    const char *name)
 {
 	/* Nothing to do */
 }
 
-static void process_tree(struct rev_info *revs,
+static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
-			 show_object_fn show,
 			 struct strbuf *base,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &tree->object;
+	struct rev_info *revs = ctx->revs;
 	struct tree_desc desc;
 	struct name_entry entry;
 	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
@@ -133,14 +133,14 @@ static void process_tree(struct rev_info *revs,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BEGIN_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, base->buf, cb_data);
+		ctx->show_object(obj, base->buf, ctx->show_data);
 	if (base->len)
 		strbuf_addch(base, '/');
 
@@ -157,29 +157,25 @@ static void process_tree(struct rev_info *revs,
 		}
 
 		if (S_ISDIR(entry.mode))
-			process_tree(revs,
+			process_tree(ctx,
 				     lookup_tree(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(revs, entry.oid->hash,
-					show, base, entry.path,
-					cb_data);
+			process_gitlink(ctx, entry.oid->hash, base, entry.path);
 		else
-			process_blob(revs,
+			process_blob(ctx,
 				     lookup_blob(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 	}
 
-	if (!(obj->flags & USER_GIVEN) && filter_fn) {
-		r = filter_fn(LOFS_END_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+		r = ctx->filter_fn(LOFS_END_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 		if (r & LOFR_MARK_SEEN)
 			obj->flags |= SEEN;
 		if (r & LOFR_DO_SHOW)
-			show(obj, base->buf, cb_data);
+			ctx->show_object(obj, base->buf, ctx->show_data);
 	}
 
 	strbuf_setlen(base, baselen);
@@ -242,19 +238,15 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 	add_pending_object(revs, &tree->object, "");
 }
 
-static void traverse_trees_and_blobs(struct rev_info *revs,
-				     struct strbuf *base,
-				     show_object_fn show_object,
-				     void *show_data,
-				     filter_object_fn filter_fn,
-				     void *filter_data)
+static void traverse_trees_and_blobs(struct traversal_context *ctx,
+				     struct strbuf *base)
 {
 	int i;
 
 	assert(base->len == 0);
 
-	for (i = 0; i < revs->pending.nr; i++) {
-		struct object_array_entry *pending = revs->pending.objects + i;
+	for (i = 0; i < ctx->revs->pending.nr; i++) {
+		struct object_array_entry *pending = ctx->revs->pending.objects + i;
 		struct object *obj = pending->item;
 		const char *name = pending->name;
 		const char *path = pending->path;
@@ -262,62 +254,49 @@ static void traverse_trees_and_blobs(struct rev_info *revs,
 			continue;
 		if (obj->type == OBJ_TAG) {
 			obj->flags |= SEEN;
-			show_object(obj, name, show_data);
+			ctx->show_object(obj, name, ctx->show_data);
 			continue;
 		}
 		if (!path)
 			path = "";
 		if (obj->type == OBJ_TREE) {
-			process_tree(revs, (struct tree *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_tree(ctx, (struct tree *)obj, base, path);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
-			process_blob(revs, (struct blob *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_blob(ctx, (struct blob *)obj, base, path);
 			continue;
 		}
 		die("unknown pending object %s (%s)",
 		    oid_to_hex(&obj->oid), name);
 	}
-	object_array_clear(&revs->pending);
+	object_array_clear(&ctx->revs->pending);
 }
 
-static void do_traverse(struct rev_info *revs,
-			show_commit_fn show_commit,
-			show_object_fn show_object,
-			void *show_data,
-			filter_object_fn filter_fn,
-			void *filter_data)
+static void do_traverse(struct traversal_context *ctx)
 {
 	struct commit *commit;
 	struct strbuf csp; /* callee's scratch pad */
 	strbuf_init(&csp, PATH_MAX);
 
-	while ((commit = get_revision(revs)) != NULL) {
+	while ((commit = get_revision(ctx->revs)) != NULL) {
 		/*
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
 		if (get_commit_tree(commit))
-			add_pending_tree(revs, get_commit_tree(commit));
-		show_commit(commit, show_data);
+			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		ctx->show_commit(commit, ctx->show_data);
 
-		if (revs->tree_blobs_in_commit_order)
+		if (ctx->revs->tree_blobs_in_commit_order)
 			/*
 			 * NEEDSWORK: Adding the tree and then flushing it here
 			 * needs a reallocation for each commit. Can we pass the
 			 * tree directory without allocation churn?
 			 */
-			traverse_trees_and_blobs(revs, &csp,
-						 show_object, show_data,
-						 filter_fn, filter_data);
+			traverse_trees_and_blobs(ctx, &csp);
 	}
-	traverse_trees_and_blobs(revs, &csp,
-				 show_object, show_data,
-				 filter_fn, filter_data);
+	traverse_trees_and_blobs(ctx, &csp);
 	strbuf_release(&csp);
 }
 
@@ -326,7 +305,14 @@ void traverse_commit_list(struct rev_info *revs,
 			  show_object_fn show_object,
 			  void *show_data)
 {
-	do_traverse(revs, show_commit, show_object, show_data, NULL, NULL);
+	struct traversal_context ctx;
+	ctx.revs = revs;
+	ctx.show_commit = show_commit;
+	ctx.show_object = show_object;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+	ctx.filter_data = NULL;
+	do_traverse(&ctx);
 }
 
 void traverse_commit_list_filtered(
@@ -337,14 +323,18 @@ void traverse_commit_list_filtered(
 	void *show_data,
 	struct oidset *omitted)
 {
-	filter_object_fn filter_fn = NULL;
+	struct traversal_context ctx;
 	filter_free_fn filter_free_fn = NULL;
-	void *filter_data = NULL;
-
-	filter_data = list_objects_filter__init(omitted, filter_options,
-						&filter_fn, &filter_free_fn);
-	do_traverse(revs, show_commit, show_object, show_data,
-		    filter_fn, filter_data);
-	if (filter_data && filter_free_fn)
-		filter_free_fn(filter_data);
+
+	ctx.revs = revs;
+	ctx.show_object = show_object;
+	ctx.show_commit = show_commit;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+
+	ctx.filter_data = list_objects_filter__init(omitted, filter_options,
+						    &ctx.filter_fn, &filter_free_fn);
+	do_traverse(&ctx);
+	if (ctx.filter_data && filter_free_fn)
+		filter_free_fn(ctx.filter_data);
 }
-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v7 2/7] list-objects: refactor to process_tree_contents
  2018-09-04 18:05 ` [PATCH v7 0/7] filter: support for excluding all trees and blobs Matthew DeVore
  2018-09-04 18:05   ` [PATCH v7 1/7] list-objects: store common func args in struct Matthew DeVore
@ 2018-09-04 18:05   ` Matthew DeVore
  2018-09-04 18:05   ` [PATCH v7 3/7] list-objects: always parse trees gently Matthew DeVore
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-04 18:05 UTC (permalink / raw)
  To: sbeller, git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster, pclouds

This will be used in a follow-up patch to reduce indentation needed when
invoking the logic conditionally. i.e. rather than:

if (foo) {
	while (...) {
		/* this is very indented */
	}
}

we will have:

if (foo)
	process_tree_contents(...);

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 68 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 584518a3f..ccc529e5e 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -94,6 +94,46 @@ static void process_gitlink(struct traversal_context *ctx,
 	/* Nothing to do */
 }
 
+static void process_tree(struct traversal_context *ctx,
+			 struct tree *tree,
+			 struct strbuf *base,
+			 const char *name);
+
+static void process_tree_contents(struct traversal_context *ctx,
+				  struct tree *tree,
+				  struct strbuf *base)
+{
+	struct tree_desc desc;
+	struct name_entry entry;
+	enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
+		all_entries_interesting : entry_not_interesting;
+
+	init_tree_desc(&desc, tree->buffer, tree->size);
+
+	while (tree_entry(&desc, &entry)) {
+		if (match != all_entries_interesting) {
+			match = tree_entry_interesting(&entry, base, 0,
+						       &ctx->revs->diffopt.pathspec);
+			if (match == all_entries_not_interesting)
+				break;
+			if (match == entry_not_interesting)
+				continue;
+		}
+
+		if (S_ISDIR(entry.mode))
+			process_tree(ctx,
+				     lookup_tree(the_repository, entry.oid),
+				     base, entry.path);
+		else if (S_ISGITLINK(entry.mode))
+			process_gitlink(ctx, entry.oid->hash,
+					base, entry.path);
+		else
+			process_blob(ctx,
+				     lookup_blob(the_repository, entry.oid),
+				     base, entry.path);
+	}
+}
+
 static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
 			 struct strbuf *base,
@@ -101,10 +141,6 @@ static void process_tree(struct traversal_context *ctx,
 {
 	struct object *obj = &tree->object;
 	struct rev_info *revs = ctx->revs;
-	struct tree_desc desc;
-	struct name_entry entry;
-	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
-		all_entries_interesting: entry_not_interesting;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 	int gently = revs->ignore_missing_links ||
@@ -144,29 +180,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
-
-	while (tree_entry(&desc, &entry)) {
-		if (match != all_entries_interesting) {
-			match = tree_entry_interesting(&entry, base, 0,
-						       &revs->diffopt.pathspec);
-			if (match == all_entries_not_interesting)
-				break;
-			if (match == entry_not_interesting)
-				continue;
-		}
-
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
-		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(ctx, entry.oid->hash, base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
-	}
+	process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v7 3/7] list-objects: always parse trees gently
  2018-09-04 18:05 ` [PATCH v7 0/7] filter: support for excluding all trees and blobs Matthew DeVore
  2018-09-04 18:05   ` [PATCH v7 1/7] list-objects: store common func args in struct Matthew DeVore
  2018-09-04 18:05   ` [PATCH v7 2/7] list-objects: refactor to process_tree_contents Matthew DeVore
@ 2018-09-04 18:05   ` Matthew DeVore
  2018-09-04 18:05   ` [PATCH v7 4/7] rev-list: handle missing tree objects properly Matthew DeVore
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-04 18:05 UTC (permalink / raw)
  To: sbeller, git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster, pclouds

If parsing fails when revs->ignore_missing_links and
revs->exclude_promisor_objects are both false, we print the OID anyway
in the die("bad tree object...") call, so any message printed by
parse_tree_gently() is superfluous.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index ccc529e5e..f9b51db7a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,8 +143,6 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
-	int gently = revs->ignore_missing_links ||
-		     revs->exclude_promisor_objects;
 
 	if (!revs->tree_objects)
 		return;
@@ -152,7 +150,7 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, gently) < 0) {
+	if (parse_tree_gently(tree, 1) < 0) {
 		if (revs->ignore_missing_links)
 			return;
 
-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v7 4/7] rev-list: handle missing tree objects properly
  2018-09-04 18:05 ` [PATCH v7 0/7] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (2 preceding siblings ...)
  2018-09-04 18:05   ` [PATCH v7 3/7] list-objects: always parse trees gently Matthew DeVore
@ 2018-09-04 18:05   ` Matthew DeVore
  2018-09-04 18:05   ` [PATCH v7 5/7] revision: mark non-user-given objects instead Matthew DeVore
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-04 18:05 UTC (permalink / raw)
  To: sbeller, git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster, pclouds

Previously, we assumed only blob objects could be missing. This patch
makes rev-list handle missing trees like missing blobs. The --missing=*
and --exclude-promisor-objects flags now work for trees as they already
do for blobs. This is demonstrated in t6112.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 builtin/rev-list.c                     | 11 ++++---
 list-objects.c                         | 11 +++++--
 revision.h                             | 15 +++++++++
 t/t0410-partial-clone.sh               | 45 ++++++++++++++++++++++++++
 t/t5317-pack-objects-filter-objects.sh | 13 ++++++++
 t/t6112-rev-list-filters-objects.sh    | 17 ++++++++++
 6 files changed, 105 insertions(+), 7 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5b07f3f4a..49d6deed7 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -6,6 +6,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "object.h"
 #include "object-store.h"
 #include "pack.h"
 #include "pack-bitmap.h"
@@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj)
 	 */
 	switch (arg_missing_action) {
 	case MA_ERROR:
-		die("missing blob object '%s'", oid_to_hex(&obj->oid));
+		die("missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	case MA_ALLOW_ANY:
@@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj)
 	case MA_ALLOW_PROMISOR:
 		if (is_promisor_object(&obj->oid))
 			return;
-		die("unexpected missing blob object '%s'",
-		    oid_to_hex(&obj->oid));
+		die("unexpected missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	default:
@@ -235,7 +237,7 @@ static inline void finish_object__ma(struct object *obj)
 static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
+	if (!has_object_file(&obj->oid)) {
 		finish_object__ma(obj);
 		return 1;
 	}
@@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	init_revisions(&revs, prefix);
 	revs.abbrev = DEFAULT_ABBREV;
 	revs.commit_format = CMIT_FMT_UNSPECIFIED;
+	revs.do_not_die_on_missing_tree = 1;
 
 	/*
 	 * Scan the argument list before invoking setup_revisions(), so that we
diff --git a/list-objects.c b/list-objects.c
index f9b51db7a..243192af5 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,6 +143,7 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
+	int failed_parse;
 
 	if (!revs->tree_objects)
 		return;
@@ -150,7 +151,9 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, 1) < 0) {
+
+	failed_parse = parse_tree_gently(tree, 1);
+	if (failed_parse) {
 		if (revs->ignore_missing_links)
 			return;
 
@@ -163,7 +166,8 @@ static void process_tree(struct traversal_context *ctx,
 		    is_promisor_object(&obj->oid))
 			return;
 
-		die("bad tree object %s", oid_to_hex(&obj->oid));
+		if (!revs->do_not_die_on_missing_tree)
+			die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
 	strbuf_addstr(base, name);
@@ -178,7 +182,8 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	process_tree_contents(ctx, tree, base);
+	if (!failed_parse)
+		process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
diff --git a/revision.h b/revision.h
index c599c34da..5118aaaa9 100644
--- a/revision.h
+++ b/revision.h
@@ -125,6 +125,21 @@ struct rev_info {
 			line_level_traverse:1,
 			tree_blobs_in_commit_order:1,
 
+			/*
+			 * Blobs are shown without regard for their existence.
+			 * But not so for trees: unless exclude_promisor_objects
+			 * is set and the tree in question is a promisor object;
+			 * OR ignore_missing_links is set, the revision walker
+			 * dies with a "bad tree object HASH" message when
+			 * encountering a missing tree. For callers that can
+			 * handle missing trees and want them to be filterable
+			 * and showable, set this to true. The revision walker
+			 * will filter and show such a missing tree as usual,
+			 * but will not attempt to recurse into this tree
+			 * object.
+			 */
+			do_not_die_on_missing_tree:1,
+
 			/* for internal use only */
 			exclude_promisor_objects:1;
 
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 4984ca583..7e2f7ff26 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -186,6 +186,51 @@ test_expect_success 'rev-list stops traversal at missing and promised commit' '
 	! grep $FOO out
 '
 
+test_expect_success 'missing tree objects with --missing=allow-promisor and --exclude-promisor-objects' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo foo &&
+	test_commit -C repo bar &&
+	test_commit -C repo baz &&
+
+	promise_and_delete $(git -C repo rev-parse bar^{tree}) &&
+	promise_and_delete $(git -C repo rev-parse foo^{tree}) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo rev-list --missing=allow-promisor --objects HEAD >objs 2>rev_list_err &&
+	test_must_be_empty rev_list_err &&
+	# 3 commits, 3 blobs, and 1 tree
+	test_line_count = 7 objs &&
+
+	# Do the same for --exclude-promisor-objects, but with all trees gone.
+	promise_and_delete $(git -C repo rev-parse baz^{tree}) &&
+	git -C repo rev-list --exclude-promisor-objects --objects HEAD >objs 2>rev_list_err &&
+	test_must_be_empty rev_list_err &&
+	# 3 commits, no blobs or trees
+	test_line_count = 3 objs
+'
+
+test_expect_success 'missing non-root tree object and rev-list' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	mkdir repo/dir &&
+	echo foo > repo/dir/foo &&
+	git -C repo add dir/foo &&
+	git -C repo commit -m "commit dir/foo" &&
+
+	promise_and_delete $(git -C repo rev-parse HEAD:dir) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err &&
+	test_must_be_empty rev_list_err &&
+	# 1 commit and 1 tree
+	test_line_count = 2 objs
+'
+
 test_expect_success 'rev-list stops traversal at missing and promised tree' '
 	rm -rf repo &&
 	test_create_repo repo &&
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 6710c8bc8..5e35f33bf 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -59,6 +59,19 @@ test_expect_success 'verify normal and blob:none packfiles have same commits/tre
 	test_cmp observed expected
 '
 
+test_expect_success 'get an error for missing tree object' '
+	git init r5 &&
+	echo foo > r5/foo &&
+	git -C r5 add foo &&
+	git -C r5 commit -m "foo" &&
+	del=$(git -C r5 rev-parse HEAD^{tree} | sed "s|..|&/|") &&
+	rm r5/.git/objects/$del &&
+	test_must_fail git -C r5 pack-objects --rev --stdout 2>bad_tree <<-EOF &&
+	HEAD
+	EOF
+	grep -q "bad tree object" bad_tree
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 0a37dd5f9..d3d07975f 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -196,6 +196,23 @@ test_expect_success 'verify sparse:oid=oid-ish omits top-level files' '
 	test_cmp observed expected
 '
 
+test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for trees' '
+	TREE=$(git -C r3 rev-parse HEAD:dir1) &&
+
+	rm r3/.git/objects/$(echo $TREE | sed "s|^..|&/|") &&
+
+	git -C r3 rev-list --quiet --missing=print --objects HEAD >missing_objs 2>rev_list_err &&
+	echo "?$TREE" >expected &&
+	test_cmp expected missing_objs &&
+
+	# do not complain when a missing tree cannot be parsed
+	test_must_be_empty rev_list_err &&
+
+	git -C r3 rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err &&
+	! grep $TREE objs &&
+	test_must_be_empty rev_list_err
+'
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v7 5/7] revision: mark non-user-given objects instead
  2018-09-04 18:05 ` [PATCH v7 0/7] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (3 preceding siblings ...)
  2018-09-04 18:05   ` [PATCH v7 4/7] rev-list: handle missing tree objects properly Matthew DeVore
@ 2018-09-04 18:05   ` Matthew DeVore
  2018-09-04 20:31     ` Junio C Hamano
  2018-09-04 18:05   ` [PATCH v7 6/7] list-objects-filter: use BUG rather than die Matthew DeVore
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-09-04 18:05 UTC (permalink / raw)
  To: sbeller, git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster, pclouds

Currently, list-objects.c incorrectly treats all root trees of commits
as USER_GIVEN. Also, it would be easier to mark objects that are
non-user-given instead of user-given, since the places in the code
where we access an object through a reference are more obvious than
the places where we access an object that was given by the user.

Resolve these two problems by introducing a flag NOT_USER_GIVEN that
marks blobs and trees that are non-user-given, replacing USER_GIVEN.
(Only blobs and trees are marked because this mark is only used when
filtering objects, and filtering of other types of objects is not
supported yet.)

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 31 ++++++++++++++++++-------------
 revision.c     |  1 -
 revision.h     | 10 +++++++---
 3 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 243192af5..7a1a0929d 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -53,7 +53,7 @@ static void process_blob(struct traversal_context *ctx,
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BLOB, obj,
 				   path->buf, &path->buf[pathlen],
 				   ctx->filter_data);
@@ -120,17 +120,19 @@ static void process_tree_contents(struct traversal_context *ctx,
 				continue;
 		}
 
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
+		if (S_ISDIR(entry.mode)) {
+			struct tree *t = lookup_tree(the_repository, entry.oid);
+			t->object.flags |= NOT_USER_GIVEN;
+			process_tree(ctx, t, base, entry.path);
+		}
 		else if (S_ISGITLINK(entry.mode))
 			process_gitlink(ctx, entry.oid->hash,
 					base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
+		else {
+			struct blob *b = lookup_blob(the_repository, entry.oid);
+			b->object.flags |= NOT_USER_GIVEN;
+			process_blob(ctx, b, base, entry.path);
+		}
 	}
 }
 
@@ -171,7 +173,7 @@ static void process_tree(struct traversal_context *ctx,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -185,7 +187,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (!failed_parse)
 		process_tree_contents(ctx, tree, base);
 
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -301,8 +303,11 @@ static void do_traverse(struct traversal_context *ctx)
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
-		if (get_commit_tree(commit))
-			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		if (get_commit_tree(commit)) {
+			struct tree *tree = get_commit_tree(commit);
+			tree->object.flags |= NOT_USER_GIVEN;
+			add_pending_tree(ctx->revs, tree);
+		}
 		ctx->show_commit(commit, ctx->show_data);
 
 		if (ctx->revs->tree_blobs_in_commit_order)
diff --git a/revision.c b/revision.c
index 062749437..6d355b43c 100644
--- a/revision.c
+++ b/revision.c
@@ -175,7 +175,6 @@ static void add_pending_object_with_path(struct rev_info *revs,
 		strbuf_release(&buf);
 		return; /* do not add the commit itself */
 	}
-	obj->flags |= USER_GIVEN;
 	add_object_array_with_path(obj, name, &revs->pending, mode, path);
 }
 
diff --git a/revision.h b/revision.h
index 5118aaaa9..2d381e636 100644
--- a/revision.h
+++ b/revision.h
@@ -8,7 +8,11 @@
 #include "diff.h"
 #include "commit-slab-decl.h"
 
-/* Remember to update object flag allocation in object.h */
+/* Remember to update object flag allocation in object.h
+ * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
+ * filtering trees and blobs, but it may be useful to support filtering commits
+ * in the future.
+ */
 #define SEEN		(1u<<0)
 #define UNINTERESTING   (1u<<1)
 #define TREESAME	(1u<<2)
@@ -20,9 +24,9 @@
 #define SYMMETRIC_LEFT	(1u<<8)
 #define PATCHSAME	(1u<<9)
 #define BOTTOM		(1u<<10)
-#define USER_GIVEN	(1u<<25) /* given directly by the user */
+#define NOT_USER_GIVEN	(1u<<25) /* tree or blob not given directly by user */
 #define TRACK_LINEAR	(1u<<26)
-#define ALL_REV_FLAGS	(((1u<<11)-1) | USER_GIVEN | TRACK_LINEAR)
+#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)
 
 #define DECORATE_SHORT_REFS	1
 #define DECORATE_FULL_REFS	2
-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v7 6/7] list-objects-filter: use BUG rather than die
  2018-09-04 18:05 ` [PATCH v7 0/7] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (4 preceding siblings ...)
  2018-09-04 18:05   ` [PATCH v7 5/7] revision: mark non-user-given objects instead Matthew DeVore
@ 2018-09-04 18:05   ` Matthew DeVore
  2018-09-04 20:32     ` Junio C Hamano
  2018-09-04 18:05   ` [PATCH v7 7/7] list-objects-filter: implement filter tree:0 Matthew DeVore
  2018-09-04 18:41   ` [PATCH v7 0/7] filter: support for excluding all trees and blobs Stefan Beller
  7 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-09-04 18:05 UTC (permalink / raw)
  To: sbeller, git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster, pclouds

In some cases in this file, BUG makes more sense than die. In such
cases, a we get there from a coding error rather than a user error.

'return' has been removed following some instances of BUG since BUG does
not return.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects-filter.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..5f8b1a002 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -44,8 +44,7 @@ static enum list_objects_filter_result filter_blobs_none(
 
 	switch (filter_situation) {
 	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
@@ -102,8 +101,7 @@ static enum list_objects_filter_result filter_blobs_limit(
 
 	switch (filter_situation) {
 	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
@@ -208,8 +206,7 @@ static enum list_objects_filter_result filter_sparse(
 
 	switch (filter_situation) {
 	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
@@ -389,7 +386,7 @@ void *list_objects_filter__init(
 	assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);
 
 	if (filter_options->choice >= LOFC__COUNT)
-		die("invalid list-objects filter choice: %d",
+		BUG("invalid list-objects filter choice: %d",
 		    filter_options->choice);
 
 	init_fn = s_filters[filter_options->choice];
-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v7 7/7] list-objects-filter: implement filter tree:0
  2018-09-04 18:05 ` [PATCH v7 0/7] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (5 preceding siblings ...)
  2018-09-04 18:05   ` [PATCH v7 6/7] list-objects-filter: use BUG rather than die Matthew DeVore
@ 2018-09-04 18:05   ` Matthew DeVore
  2018-09-04 20:44     ` Junio C Hamano
  2018-09-04 18:41   ` [PATCH v7 0/7] filter: support for excluding all trees and blobs Stefan Beller
  7 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-09-04 18:05 UTC (permalink / raw)
  To: sbeller, git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster, pclouds

Teach list-objects the "tree:0" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:0 - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also considered only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
would filter out all but the root tree and blobs. In order to avoid
confusion between 0 and capital O, the documentation was worded in a
somewhat round-about way that also hints at this future improvement to
the feature.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 Documentation/rev-list-options.txt     |  5 +++
 list-objects-filter-options.c          |  4 +++
 list-objects-filter-options.h          |  1 +
 list-objects-filter.c                  | 49 ++++++++++++++++++++++++++
 t/t5317-pack-objects-filter-objects.sh | 28 +++++++++++++++
 t/t5616-partial-clone.sh               | 38 ++++++++++++++++++++
 t/t6112-rev-list-filters-objects.sh    | 12 +++++++
 7 files changed, 137 insertions(+)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 7b273635d..5f1672913 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -731,6 +731,11 @@ the requested refs.
 +
 The form '--filter=sparse:path=<path>' similarly uses a sparse-checkout
 specification contained in <path>.
++
+The form '--filter=tree:<depth>' omits all blobs and trees whose depth
+from the root tree is >= <depth> (minimum depth if an object is located
+at multiple depths in the commits traversed). Currently, only <depth>=0
+is supported, which omits all blobs and trees.
 
 --no-filter::
 	Turn off any previous `--filter=` argument.
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..a28382940 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
 			return 0;
 		}
 
+	} else if (!strcmp(arg, "tree:0")) {
+		filter_options->choice = LOFC_TREE_NONE;
+		return 0;
+
 	} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
 		struct object_context oc;
 		struct object_id sparse_oid;
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 0000a61f8..af64e5c66 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -10,6 +10,7 @@ enum list_objects_filter_choice {
 	LOFC_DISABLED = 0,
 	LOFC_BLOB_NONE,
 	LOFC_BLOB_LIMIT,
+	LOFC_TREE_NONE,
 	LOFC_SPARSE_OID,
 	LOFC_SPARSE_PATH,
 	LOFC__COUNT /* must be last */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 5f8b1a002..09b2b05d5 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -79,6 +79,54 @@ static void *filter_blobs_none__init(
 	return d;
 }
 
+/*
+ * A filter for list-objects to omit ALL trees and blobs from the traversal.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_trees_none_data {
+	struct oidset *omits;
+};
+
+static enum list_objects_filter_result filter_trees_none(
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data_)
+{
+	struct filter_trees_none_data *filter_data = filter_data_;
+
+	switch (filter_situation) {
+	default:
+		BUG("unknown filter_situation: %d", filter_situation);
+
+	case LOFS_BEGIN_TREE:
+	case LOFS_BLOB:
+		if (filter_data->omits)
+			oidset_insert(filter_data->omits, &obj->oid);
+		return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
+
+	case LOFS_END_TREE:
+		assert(obj->type == OBJ_TREE);
+		return LOFR_ZERO;
+
+	}
+}
+
+static void* filter_trees_none__init(
+	struct oidset *omitted,
+	struct list_objects_filter_options *filter_options,
+	filter_object_fn *filter_fn,
+	filter_free_fn *filter_free_fn)
+{
+	struct filter_trees_none_data *d = xcalloc(1, sizeof(*d));
+	d->omits = omitted;
+
+	*filter_fn = filter_trees_none;
+	*filter_free_fn = free;
+	return d;
+}
+
 /*
  * A filter for list-objects to omit large blobs.
  * And to OPTIONALLY collect a list of the omitted OIDs.
@@ -371,6 +419,7 @@ static filter_init_fn s_filters[] = {
 	NULL,
 	filter_blobs_none__init,
 	filter_blobs_limit__init,
+	filter_trees_none__init,
 	filter_sparse_oid__init,
 	filter_sparse_path__init,
 };
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 5e35f33bf..7a4d49ea1 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -72,6 +72,34 @@ test_expect_success 'get an error for missing tree object' '
 	grep -q "bad tree object" bad_tree
 '
 
+test_expect_success 'setup for tests of tree:0' '
+	mkdir r1/subtree &&
+	echo "This is a file in a subtree" >r1/subtree/file &&
+	git -C r1 add subtree/file &&
+	git -C r1 commit -m subtree
+'
+
+test_expect_success 'verify tree:0 packfile has no blobs or trees' '
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	! grep -E "tree|blob" objs
+'
+
+test_expect_success 'grab tree directly when using tree:0' '
+	# We should get the tree specified directly but not its blobs or subtrees.
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD:
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	awk "/tree|blob/{print \$1}" objs >trees_and_blobs &&
+	git -C r1 rev-parse HEAD: >expected &&
+	test_cmp trees_and_blobs expected
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index bbbe7537d..8eeb85fbc 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -154,6 +154,44 @@ test_expect_success 'partial clone with transfer.fsckobjects=1 uses index-pack -
 	grep "git index-pack.*--fsck-objects" trace
 '
 
+test_expect_success 'use fsck before and after manually fetching a missing subtree' '
+	# push new commit so server has a subtree
+	mkdir src/dir &&
+	echo "in dir" >src/dir/file.txt &&
+	git -C src add dir/file.txt &&
+	git -C src commit -m "file in dir" &&
+	git -C src push -u srv master &&
+	SUBTREE=$(git -C src rev-parse HEAD:dir) &&
+
+	rm -rf dst &&
+	git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
+	git -C dst fsck &&
+
+	# Make sure we only have commits, and all trees and blobs are missing.
+	git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
+	awk -f print_1.awk fetched_objects \
+		| xargs -n1 git -C dst cat-file -t >fetched_types &&
+	sort fetched_types -u >unique_types.observed &&
+	echo commit >unique_types.expected &&
+	test_cmp unique_types.observed unique_types.expected &&
+
+	# Auto-fetch a tree with cat-file.
+	git -C dst cat-file -p $SUBTREE >tree_contents &&
+	grep file.txt tree_contents &&
+
+	# fsck still works after an auto-fetch of a tree.
+	git -C dst fsck &&
+
+	# Auto-fetch all remaining trees and blobs with --missing=error
+	git -C dst rev-list master --missing=error --objects >fetched_objects &&
+	test_line_count = 70 fetched_objects &&
+	awk -f print_1.awk fetched_objects \
+		| xargs -n1 git -C dst cat-file -t >fetched_types &&
+	sort fetched_types -u >unique_types.observed &&
+	printf "blob\ncommit\ntree\n" >unique_types.expected &&
+	test_cmp unique_types.observed unique_types.expected
+'
+
 test_expect_success 'partial clone fetches blobs pointed to by refs even if normally filtered out' '
 	rm -rf src dst &&
 	git init src &&
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index d3d07975f..27040d73a 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -213,6 +213,18 @@ test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for tre
 	test_must_be_empty rev_list_err
 '
 
+# Test tree:0 filter.
+
+test_expect_success 'verify tree:0 includes trees in "filtered" output' '
+	git -C r3 rev-list HEAD --quiet --objects --filter-print-omitted --filter=tree:0 \
+		| awk -f print_1.awk \
+		| sed s/~// \
+		| xargs -n1 git -C r3 cat-file -t \
+		| sort -u >filtered_types &&
+	printf "blob\ntree\n" > expected &&
+	test_cmp filtered_types expected
+'
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.19.0.rc1.350.ge57e33dbd1-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v7 0/7] filter: support for excluding all trees and blobs
  2018-09-04 18:05 ` [PATCH v7 0/7] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (6 preceding siblings ...)
  2018-09-04 18:05   ` [PATCH v7 7/7] list-objects-filter: implement filter tree:0 Matthew DeVore
@ 2018-09-04 18:41   ` Stefan Beller
  7 siblings, 0 replies; 151+ messages in thread
From: Stefan Beller @ 2018-09-04 18:41 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: git, Jeff Hostetler, Jeff Hostetler, Jeff King, Stefan Beller,
	Jonathan Tan, Junio C Hamano, Duy Nguyen

On Tue, Sep 4, 2018 at 11:06 AM Matthew DeVore <matvore@google.com> wrote:
>
> I made the following changes since v6 of the patchset:
>  - (suggested by Duy Nguyen) add a new commit which replaces uses of die() with
>    BUG() in list-objects-filter.c wherever it corresponds to a coding error.
>  - Replace die() with BUG() in new code.
>  - Replace test_line_count = 0 with test_must_be_empty in new tests since the
>    trend seems to be, based on other RFCs in progress, that we are standardizing
>    on that phraseology. See:
>    https://public-inbox.org/git/20180819215725.29001-1-szeder.dev@gmail.com/
>
> As asked in the last "What's cooking in git.git" post, the status of this patch
> is:
>  - The original reviewer, Jonathan Tan, is on vacation and will be back later
>    this week.
>  - Stefan Beller has been reviewing the patchset in Jonathan's absence, and
>    stated that it's a good read despite not being familiar with the code:
>    https://public-inbox.org/git/CAGZ79kaWcGbyc2S5gOCU7NdvT4fN46jq4xK9MvTLAFBGhyuo2A@mail.gmail.com/

and this still holds.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v7 5/7] revision: mark non-user-given objects instead
  2018-09-04 18:05   ` [PATCH v7 5/7] revision: mark non-user-given objects instead Matthew DeVore
@ 2018-09-04 20:31     ` Junio C Hamano
  2018-09-05 18:00       ` Matthew DeVore
  0 siblings, 1 reply; 151+ messages in thread
From: Junio C Hamano @ 2018-09-04 20:31 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: sbeller, git, git, jeffhost, peff, stefanbeller, jonathantanmy, pclouds

Matthew DeVore <matvore@google.com> writes:

> diff --git a/revision.h b/revision.h
> index 5118aaaa9..2d381e636 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -8,7 +8,11 @@
>  #include "diff.h"
>  #include "commit-slab-decl.h"
>  
> -/* Remember to update object flag allocation in object.h */
> +/* Remember to update object flag allocation in object.h
> + * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
> + * filtering trees and blobs, but it may be useful to support filtering commits
> + * in the future.
> + */

Just a minor style nit, but our multi-line comment begins with the
opening "/*" (and closing "*/", too, but you got that right) on its
own line, i.e.

	/*
	 * Remember to update ...

> -#define USER_GIVEN	(1u<<25) /* given directly by the user */
> +#define NOT_USER_GIVEN	(1u<<25) /* tree or blob not given directly by user */

Is "given directly by user" equivalent to "given on the command
line"?  Do objects given via "--stdin" count the same way?  How abot
those given via "--branches" or "A^@"?  Does "not given directly by
user" mean roughly the same thing as "discovered by traversal"?

Not a suggestion to change anything in this patch, but if you can
come up with a better phrase that helps new readers' understanding
so that they do not have to ask a question like this, that would be
great.

Thanks.


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v7 6/7] list-objects-filter: use BUG rather than die
  2018-09-04 18:05   ` [PATCH v7 6/7] list-objects-filter: use BUG rather than die Matthew DeVore
@ 2018-09-04 20:32     ` Junio C Hamano
  0 siblings, 0 replies; 151+ messages in thread
From: Junio C Hamano @ 2018-09-04 20:32 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: sbeller, git, git, jeffhost, peff, stefanbeller, jonathantanmy, pclouds

Matthew DeVore <matvore@google.com> writes:

> In some cases in this file, BUG makes more sense than die. In such
> cases, a we get there from a coding error rather than a user error.
>
> 'return' has been removed following some instances of BUG since BUG does
> not return.
>
> Signed-off-by: Matthew DeVore <matvore@google.com>
> ---

Makes sense.

>  list-objects-filter.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/list-objects-filter.c b/list-objects-filter.c
> index a0ba78b20..5f8b1a002 100644
> --- a/list-objects-filter.c
> +++ b/list-objects-filter.c
> @@ -44,8 +44,7 @@ static enum list_objects_filter_result filter_blobs_none(
>  
>  	switch (filter_situation) {
>  	default:
> -		die("unknown filter_situation");
> -		return LOFR_ZERO;
> +		BUG("unknown filter_situation: %d", filter_situation);
>  
>  	case LOFS_BEGIN_TREE:
>  		assert(obj->type == OBJ_TREE);
> @@ -102,8 +101,7 @@ static enum list_objects_filter_result filter_blobs_limit(
>  
>  	switch (filter_situation) {
>  	default:
> -		die("unknown filter_situation");
> -		return LOFR_ZERO;
> +		BUG("unknown filter_situation: %d", filter_situation);
>  
>  	case LOFS_BEGIN_TREE:
>  		assert(obj->type == OBJ_TREE);
> @@ -208,8 +206,7 @@ static enum list_objects_filter_result filter_sparse(
>  
>  	switch (filter_situation) {
>  	default:
> -		die("unknown filter_situation");
> -		return LOFR_ZERO;
> +		BUG("unknown filter_situation: %d", filter_situation);
>  
>  	case LOFS_BEGIN_TREE:
>  		assert(obj->type == OBJ_TREE);
> @@ -389,7 +386,7 @@ void *list_objects_filter__init(
>  	assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);
>  
>  	if (filter_options->choice >= LOFC__COUNT)
> -		die("invalid list-objects filter choice: %d",
> +		BUG("invalid list-objects filter choice: %d",
>  		    filter_options->choice);
>  
>  	init_fn = s_filters[filter_options->choice];

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v7 7/7] list-objects-filter: implement filter tree:0
  2018-09-04 18:05   ` [PATCH v7 7/7] list-objects-filter: implement filter tree:0 Matthew DeVore
@ 2018-09-04 20:44     ` Junio C Hamano
  2018-09-06  0:08       ` Matthew DeVore
  0 siblings, 1 reply; 151+ messages in thread
From: Junio C Hamano @ 2018-09-04 20:44 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: sbeller, git, git, jeffhost, peff, stefanbeller, jonathantanmy, pclouds

Matthew DeVore <matvore@google.com> writes:

> @@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
>  			return 0;
>  		}
>  
> +	} else if (!strcmp(arg, "tree:0")) {
> +		filter_options->choice = LOFC_TREE_NONE;
> +		return 0;
> +

This is not wrong per-se, but I would have expected to see something
like:

	... else if (skip_prefix(arg, "tree:", &param)) {
		unsigned long depth;
		if (!git_parse_ulong(param, &depth) || depth != 0) {
			err = "only 'tree:0' is supported";
			return -1;
		}
		filter_options->choice = LOFC_TREE_NONE;
		return 0;

so that "tree:1" is rejected not with "invalid filter-spec" but a
bit more descriptive "only tree:0 is".  Accepting "tree:00" or
"tree:0k" is merely an added bogus^wbonus.


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v7 5/7] revision: mark non-user-given objects instead
  2018-09-04 20:31     ` Junio C Hamano
@ 2018-09-05 18:00       ` Matthew DeVore
  0 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-05 18:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, git, git, jeffhost, Jeff King, Stefan Beller,
	Jonathan Tan, pclouds

On Tue, Sep 4, 2018 at 1:31 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matthew DeVore <matvore@google.com> writes:
>
> > diff --git a/revision.h b/revision.h
> > index 5118aaaa9..2d381e636 100644
> > --- a/revision.h
> > +++ b/revision.h
> > @@ -8,7 +8,11 @@
> >  #include "diff.h"
> >  #include "commit-slab-decl.h"
> >
> > -/* Remember to update object flag allocation in object.h */
> > +/* Remember to update object flag allocation in object.h
> > + * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
> > + * filtering trees and blobs, but it may be useful to support filtering commits
> > + * in the future.
> > + */
>
> Just a minor style nit, but our multi-line comment begins with the
> opening "/*" (and closing "*/", too, but you got that right) on its
> own line, i.e.
>
Fixed.

>         /*
>          * Remember to update ...
>
> > -#define USER_GIVEN   (1u<<25) /* given directly by the user */
> > +#define NOT_USER_GIVEN       (1u<<25) /* tree or blob not given directly by user */
>
> Is "given directly by user" equivalent to "given on the command
> line"?  Do objects given via "--stdin" count the same way?  How abot
> those given via "--branches" or "A^@"?  Does "not given directly by
> user" mean roughly the same thing as "discovered by traversal"?
Note that --branches and A^@ expands to commits, and commits can't be
filtered, so perhaps these questions will be unlikely for the time
being. I did clarify the comment a bit.

I also noticed that this commit fixes a bug - before this patch, "git
rev-list" would potentially filter objects given on the command
line/--stdin, while "git pack-objects" would not. That was because
each command used a different function to populate the rev_info
struct, and rev-list's code path did not contain the hack which turned
on USER_GIVEN.

So I added a test to check the desired behavior. Here is an interdiff
for this commit only:

diff --git a/revision.h b/revision.h
index fe4ff1fec..83e164039 100644
--- a/revision.h
+++ b/revision.h
@@ -9,11 +9,7 @@
 #include "diff.h"
 #include "commit-slab-decl.h"

-/* Remember to update object flag allocation in object.h
- * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
- * filtering trees and blobs, but it may be useful to support filtering commits
- * in the future.
- */
+/* Remember to update object flag allocation in object.h */
 #define SEEN (1u<<0)
 #define UNINTERESTING   (1u<<1)
 #define TREESAME (1u<<2)
@@ -25,7 +21,14 @@
 #define SYMMETRIC_LEFT (1u<<8)
 #define PATCHSAME (1u<<9)
 #define BOTTOM (1u<<10)
-#define NOT_USER_GIVEN (1u<<25) /* tree or blob not given directly by user */
+/*
+ * Indicates object was reached by traversal. i.e. not given by user on
+ * command-line or stdin.
+ * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
+ * filtering trees and blobs, but it may be useful to support filtering commits
+ * in the future.
+ */
+#define NOT_USER_GIVEN (1u<<25)
 #define TRACK_LINEAR (1u<<26)
 #define ALL_REV_FLAGS (((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)

diff --git a/t/t6112-rev-list-filters-objects.sh
b/t/t6112-rev-list-filters-objects.sh
index 2e6a6a32e..a989a7082 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -30,6 +30,16 @@ test_expect_success 'verify blob:none omits all 5 blobs' '
  test_cmp observed expected
 '

+test_expect_success 'specify blob explicitly prevents filtering' '
+ file_3=$(git -C r1 ls-files -s file.3 \
+ | awk -f print_2.awk) &&
+ file_4=$(git -C r1 ls-files -s file.4 \
+ | awk -f print_2.awk) &&
+ git -C r1 rev-list HEAD --objects --filter=blob:none HEAD $file_3 >observed &&
+ grep -q "$file_3" observed &&
+ test_must_fail grep -q "$file_4" observed
+'
+
 test_expect_success 'verify emitted+omitted == all' '
  git -C r1 rev-list HEAD --objects \
  | awk -f print_1.awk \

>
> Not a suggestion to change anything in this patch, but if you can
> come up with a better phrase that helps new readers' understanding
> so that they do not have to ask a question like this, that would be
> great.
>
> Thanks.
>

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v7 7/7] list-objects-filter: implement filter tree:0
  2018-09-04 20:44     ` Junio C Hamano
@ 2018-09-06  0:08       ` Matthew DeVore
  0 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-06  0:08 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, git, git, jeffhost, Jeff King, Stefan Beller,
	Jonathan Tan, pclouds

On Tue, Sep 4, 2018 at 1:44 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matthew DeVore <matvore@google.com> writes:
>
> > @@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
> >                       return 0;
> >               }
> >
> > +     } else if (!strcmp(arg, "tree:0")) {
> > +             filter_options->choice = LOFC_TREE_NONE;
> > +             return 0;
> > +
>
> This is not wrong per-se, but I would have expected to see something
> like:
>
>         ... else if (skip_prefix(arg, "tree:", &param)) {
>                 unsigned long depth;
>                 if (!git_parse_ulong(param, &depth) || depth != 0) {
>                         err = "only 'tree:0' is supported";
>                         return -1;
>                 }
>                 filter_options->choice = LOFC_TREE_NONE;
>                 return 0;
>
> so that "tree:1" is rejected not with "invalid filter-spec" but a
> bit more descriptive "only tree:0 is".  Accepting "tree:00" or
> "tree:0k" is merely an added bogus^wbonus.
>
Good idea. An interdiff for my fix is below. I didn't add a test,
since adding a shell test for every trivial error doesn't seem to
scale, but let me know if you disagree. I did of course try provoking
the error manually.

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index a28382940..14f251de4 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -50,7 +50,17 @@ static int gently_parse_list_objects_filter(
  return 0;
  }

- } else if (!strcmp(arg, "tree:0")) {
+ } else if (skip_prefix(arg, "tree:", &v0)) {
+ unsigned long depth;
+ if (!git_parse_ulong(v0, &depth) || depth != 0) {
+ if (errbuf) {
+ strbuf_init(errbuf, 0);
+ strbuf_addstr(
+ errbuf,
+ _("only 'tree:0' is supported"));
+ }
+ return 1;
+ }
  filter_options->choice = LOFC_TREE_NONE;
  return 0;

^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v8 0/7] filter: support for excluding all trees and blobs
  2018-08-09 22:44 [RFC PATCH 0/5] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (11 preceding siblings ...)
  2018-09-04 18:05 ` [PATCH v7 0/7] filter: support for excluding all trees and blobs Matthew DeVore
@ 2018-09-14  0:55 ` " Matthew DeVore
  2018-09-14  0:55   ` [PATCH v8 1/7] list-objects: store common func args in struct Matthew DeVore
                     ` (6 more replies)
  2018-09-21 20:31 ` [PATCH v9 0/8] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (3 subsequent siblings)
  16 siblings, 7 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-14  0:55 UTC (permalink / raw)
  To: sbeller, git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster, pclouds

Things seem to have settled down in terms of responses, so here is a re-roll,
some of the changes being Junio's suggestions:
 - show a more helpful error if a positive integer is given after "tree:"
 - added a test for an issue that this patchset inadvertently fixes:
   git rev-list would filter objects given explicitly on the command-line,
   but it should not.
 - improved documentation on the NOT_USER_GIVEN flag in revision.h

Matthew DeVore (7):
  list-objects: store common func args in struct
  list-objects: refactor to process_tree_contents
  list-objects: always parse trees gently
  rev-list: handle missing tree objects properly
  revision: mark non-user-given objects instead
  list-objects-filter: use BUG rather than die
  list-objects-filter: implement filter tree:0

 Documentation/rev-list-options.txt     |   5 +
 builtin/rev-list.c                     |  11 +-
 list-objects-filter-options.c          |  14 ++
 list-objects-filter-options.h          |   1 +
 list-objects-filter.c                  |  60 ++++++-
 list-objects.c                         | 232 +++++++++++++------------
 revision.c                             |   1 -
 revision.h                             |  26 ++-
 t/t0410-partial-clone.sh               |  45 +++++
 t/t5317-pack-objects-filter-objects.sh |  41 +++++
 t/t5616-partial-clone.sh               |  38 ++++
 t/t6112-rev-list-filters-objects.sh    |  39 +++++
 12 files changed, 389 insertions(+), 124 deletions(-)

-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v8 1/7] list-objects: store common func args in struct
  2018-09-14  0:55 ` [PATCH v8 " Matthew DeVore
@ 2018-09-14  0:55   ` Matthew DeVore
  2018-09-14  0:55   ` [PATCH v8 2/7] list-objects: refactor to process_tree_contents Matthew DeVore
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-14  0:55 UTC (permalink / raw)
  To: sbeller, git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster, pclouds

This will make utility functions easier to create, as done by the next
patch.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 158 +++++++++++++++++++++++--------------------------
 1 file changed, 74 insertions(+), 84 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index c99c47ac1..584518a3f 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -12,20 +12,25 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static void process_blob(struct rev_info *revs,
+struct traversal_context {
+	struct rev_info *revs;
+	show_object_fn show_object;
+	show_commit_fn show_commit;
+	void *show_data;
+	filter_object_fn filter_fn;
+	void *filter_data;
+};
+
+static void process_blob(struct traversal_context *ctx,
 			 struct blob *blob,
-			 show_object_fn show,
 			 struct strbuf *path,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &blob->object;
 	size_t pathlen;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
-	if (!revs->blob_objects)
+	if (!ctx->revs->blob_objects)
 		return;
 	if (!obj)
 		die("bad blob object");
@@ -41,21 +46,21 @@ static void process_blob(struct rev_info *revs,
 	 * may cause the actual filter to report an incomplete list
 	 * of missing objects.
 	 */
-	if (revs->exclude_promisor_objects &&
+	if (ctx->revs->exclude_promisor_objects &&
 	    !has_object_file(&obj->oid) &&
 	    is_promisor_object(&obj->oid))
 		return;
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BLOB, obj,
-			      path->buf, &path->buf[pathlen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BLOB, obj,
+				   path->buf, &path->buf[pathlen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, path->buf, cb_data);
+		ctx->show_object(obj, path->buf, ctx->show_data);
 	strbuf_setlen(path, pathlen);
 }
 
@@ -81,26 +86,21 @@ static void process_blob(struct rev_info *revs,
  * the link, and how to do it. Whether it necessarily makes
  * any sense what-so-ever to ever do that is another issue.
  */
-static void process_gitlink(struct rev_info *revs,
+static void process_gitlink(struct traversal_context *ctx,
 			    const unsigned char *sha1,
-			    show_object_fn show,
 			    struct strbuf *path,
-			    const char *name,
-			    void *cb_data)
+			    const char *name)
 {
 	/* Nothing to do */
 }
 
-static void process_tree(struct rev_info *revs,
+static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
-			 show_object_fn show,
 			 struct strbuf *base,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &tree->object;
+	struct rev_info *revs = ctx->revs;
 	struct tree_desc desc;
 	struct name_entry entry;
 	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
@@ -133,14 +133,14 @@ static void process_tree(struct rev_info *revs,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BEGIN_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, base->buf, cb_data);
+		ctx->show_object(obj, base->buf, ctx->show_data);
 	if (base->len)
 		strbuf_addch(base, '/');
 
@@ -157,29 +157,25 @@ static void process_tree(struct rev_info *revs,
 		}
 
 		if (S_ISDIR(entry.mode))
-			process_tree(revs,
+			process_tree(ctx,
 				     lookup_tree(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(revs, entry.oid->hash,
-					show, base, entry.path,
-					cb_data);
+			process_gitlink(ctx, entry.oid->hash, base, entry.path);
 		else
-			process_blob(revs,
+			process_blob(ctx,
 				     lookup_blob(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 	}
 
-	if (!(obj->flags & USER_GIVEN) && filter_fn) {
-		r = filter_fn(LOFS_END_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+		r = ctx->filter_fn(LOFS_END_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 		if (r & LOFR_MARK_SEEN)
 			obj->flags |= SEEN;
 		if (r & LOFR_DO_SHOW)
-			show(obj, base->buf, cb_data);
+			ctx->show_object(obj, base->buf, ctx->show_data);
 	}
 
 	strbuf_setlen(base, baselen);
@@ -242,19 +238,15 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 	add_pending_object(revs, &tree->object, "");
 }
 
-static void traverse_trees_and_blobs(struct rev_info *revs,
-				     struct strbuf *base,
-				     show_object_fn show_object,
-				     void *show_data,
-				     filter_object_fn filter_fn,
-				     void *filter_data)
+static void traverse_trees_and_blobs(struct traversal_context *ctx,
+				     struct strbuf *base)
 {
 	int i;
 
 	assert(base->len == 0);
 
-	for (i = 0; i < revs->pending.nr; i++) {
-		struct object_array_entry *pending = revs->pending.objects + i;
+	for (i = 0; i < ctx->revs->pending.nr; i++) {
+		struct object_array_entry *pending = ctx->revs->pending.objects + i;
 		struct object *obj = pending->item;
 		const char *name = pending->name;
 		const char *path = pending->path;
@@ -262,62 +254,49 @@ static void traverse_trees_and_blobs(struct rev_info *revs,
 			continue;
 		if (obj->type == OBJ_TAG) {
 			obj->flags |= SEEN;
-			show_object(obj, name, show_data);
+			ctx->show_object(obj, name, ctx->show_data);
 			continue;
 		}
 		if (!path)
 			path = "";
 		if (obj->type == OBJ_TREE) {
-			process_tree(revs, (struct tree *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_tree(ctx, (struct tree *)obj, base, path);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
-			process_blob(revs, (struct blob *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_blob(ctx, (struct blob *)obj, base, path);
 			continue;
 		}
 		die("unknown pending object %s (%s)",
 		    oid_to_hex(&obj->oid), name);
 	}
-	object_array_clear(&revs->pending);
+	object_array_clear(&ctx->revs->pending);
 }
 
-static void do_traverse(struct rev_info *revs,
-			show_commit_fn show_commit,
-			show_object_fn show_object,
-			void *show_data,
-			filter_object_fn filter_fn,
-			void *filter_data)
+static void do_traverse(struct traversal_context *ctx)
 {
 	struct commit *commit;
 	struct strbuf csp; /* callee's scratch pad */
 	strbuf_init(&csp, PATH_MAX);
 
-	while ((commit = get_revision(revs)) != NULL) {
+	while ((commit = get_revision(ctx->revs)) != NULL) {
 		/*
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
 		if (get_commit_tree(commit))
-			add_pending_tree(revs, get_commit_tree(commit));
-		show_commit(commit, show_data);
+			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		ctx->show_commit(commit, ctx->show_data);
 
-		if (revs->tree_blobs_in_commit_order)
+		if (ctx->revs->tree_blobs_in_commit_order)
 			/*
 			 * NEEDSWORK: Adding the tree and then flushing it here
 			 * needs a reallocation for each commit. Can we pass the
 			 * tree directory without allocation churn?
 			 */
-			traverse_trees_and_blobs(revs, &csp,
-						 show_object, show_data,
-						 filter_fn, filter_data);
+			traverse_trees_and_blobs(ctx, &csp);
 	}
-	traverse_trees_and_blobs(revs, &csp,
-				 show_object, show_data,
-				 filter_fn, filter_data);
+	traverse_trees_and_blobs(ctx, &csp);
 	strbuf_release(&csp);
 }
 
@@ -326,7 +305,14 @@ void traverse_commit_list(struct rev_info *revs,
 			  show_object_fn show_object,
 			  void *show_data)
 {
-	do_traverse(revs, show_commit, show_object, show_data, NULL, NULL);
+	struct traversal_context ctx;
+	ctx.revs = revs;
+	ctx.show_commit = show_commit;
+	ctx.show_object = show_object;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+	ctx.filter_data = NULL;
+	do_traverse(&ctx);
 }
 
 void traverse_commit_list_filtered(
@@ -337,14 +323,18 @@ void traverse_commit_list_filtered(
 	void *show_data,
 	struct oidset *omitted)
 {
-	filter_object_fn filter_fn = NULL;
+	struct traversal_context ctx;
 	filter_free_fn filter_free_fn = NULL;
-	void *filter_data = NULL;
-
-	filter_data = list_objects_filter__init(omitted, filter_options,
-						&filter_fn, &filter_free_fn);
-	do_traverse(revs, show_commit, show_object, show_data,
-		    filter_fn, filter_data);
-	if (filter_data && filter_free_fn)
-		filter_free_fn(filter_data);
+
+	ctx.revs = revs;
+	ctx.show_object = show_object;
+	ctx.show_commit = show_commit;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+
+	ctx.filter_data = list_objects_filter__init(omitted, filter_options,
+						    &ctx.filter_fn, &filter_free_fn);
+	do_traverse(&ctx);
+	if (ctx.filter_data && filter_free_fn)
+		filter_free_fn(ctx.filter_data);
 }
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v8 2/7] list-objects: refactor to process_tree_contents
  2018-09-14  0:55 ` [PATCH v8 " Matthew DeVore
  2018-09-14  0:55   ` [PATCH v8 1/7] list-objects: store common func args in struct Matthew DeVore
@ 2018-09-14  0:55   ` Matthew DeVore
  2018-09-14  0:55   ` [PATCH v8 3/7] list-objects: always parse trees gently Matthew DeVore
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-14  0:55 UTC (permalink / raw)
  To: sbeller, git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster, pclouds

This will be used in a follow-up patch to reduce indentation needed when
invoking the logic conditionally. i.e. rather than:

if (foo) {
	while (...) {
		/* this is very indented */
	}
}

we will have:

if (foo)
	process_tree_contents(...);

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 68 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 584518a3f..ccc529e5e 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -94,6 +94,46 @@ static void process_gitlink(struct traversal_context *ctx,
 	/* Nothing to do */
 }
 
+static void process_tree(struct traversal_context *ctx,
+			 struct tree *tree,
+			 struct strbuf *base,
+			 const char *name);
+
+static void process_tree_contents(struct traversal_context *ctx,
+				  struct tree *tree,
+				  struct strbuf *base)
+{
+	struct tree_desc desc;
+	struct name_entry entry;
+	enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
+		all_entries_interesting : entry_not_interesting;
+
+	init_tree_desc(&desc, tree->buffer, tree->size);
+
+	while (tree_entry(&desc, &entry)) {
+		if (match != all_entries_interesting) {
+			match = tree_entry_interesting(&entry, base, 0,
+						       &ctx->revs->diffopt.pathspec);
+			if (match == all_entries_not_interesting)
+				break;
+			if (match == entry_not_interesting)
+				continue;
+		}
+
+		if (S_ISDIR(entry.mode))
+			process_tree(ctx,
+				     lookup_tree(the_repository, entry.oid),
+				     base, entry.path);
+		else if (S_ISGITLINK(entry.mode))
+			process_gitlink(ctx, entry.oid->hash,
+					base, entry.path);
+		else
+			process_blob(ctx,
+				     lookup_blob(the_repository, entry.oid),
+				     base, entry.path);
+	}
+}
+
 static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
 			 struct strbuf *base,
@@ -101,10 +141,6 @@ static void process_tree(struct traversal_context *ctx,
 {
 	struct object *obj = &tree->object;
 	struct rev_info *revs = ctx->revs;
-	struct tree_desc desc;
-	struct name_entry entry;
-	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
-		all_entries_interesting: entry_not_interesting;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 	int gently = revs->ignore_missing_links ||
@@ -144,29 +180,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
-
-	while (tree_entry(&desc, &entry)) {
-		if (match != all_entries_interesting) {
-			match = tree_entry_interesting(&entry, base, 0,
-						       &revs->diffopt.pathspec);
-			if (match == all_entries_not_interesting)
-				break;
-			if (match == entry_not_interesting)
-				continue;
-		}
-
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
-		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(ctx, entry.oid->hash, base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
-	}
+	process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v8 3/7] list-objects: always parse trees gently
  2018-09-14  0:55 ` [PATCH v8 " Matthew DeVore
  2018-09-14  0:55   ` [PATCH v8 1/7] list-objects: store common func args in struct Matthew DeVore
  2018-09-14  0:55   ` [PATCH v8 2/7] list-objects: refactor to process_tree_contents Matthew DeVore
@ 2018-09-14  0:55   ` Matthew DeVore
  2018-09-14  0:55   ` [PATCH v8 4/7] rev-list: handle missing tree objects properly Matthew DeVore
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-14  0:55 UTC (permalink / raw)
  To: sbeller, git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster, pclouds

If parsing fails when revs->ignore_missing_links and
revs->exclude_promisor_objects are both false, we print the OID anyway
in the die("bad tree object...") call, so any message printed by
parse_tree_gently() is superfluous.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index ccc529e5e..f9b51db7a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,8 +143,6 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
-	int gently = revs->ignore_missing_links ||
-		     revs->exclude_promisor_objects;
 
 	if (!revs->tree_objects)
 		return;
@@ -152,7 +150,7 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, gently) < 0) {
+	if (parse_tree_gently(tree, 1) < 0) {
 		if (revs->ignore_missing_links)
 			return;
 
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v8 4/7] rev-list: handle missing tree objects properly
  2018-09-14  0:55 ` [PATCH v8 " Matthew DeVore
                     ` (2 preceding siblings ...)
  2018-09-14  0:55   ` [PATCH v8 3/7] list-objects: always parse trees gently Matthew DeVore
@ 2018-09-14  0:55   ` Matthew DeVore
  2018-09-14  0:55   ` [PATCH v8 5/7] revision: mark non-user-given objects instead Matthew DeVore
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-14  0:55 UTC (permalink / raw)
  To: sbeller, git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster, pclouds

Previously, we assumed only blob objects could be missing. This patch
makes rev-list handle missing trees like missing blobs. The --missing=*
and --exclude-promisor-objects flags now work for trees as they already
do for blobs. This is demonstrated in t6112.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 builtin/rev-list.c                     | 11 ++++---
 list-objects.c                         | 11 +++++--
 revision.h                             | 15 +++++++++
 t/t0410-partial-clone.sh               | 45 ++++++++++++++++++++++++++
 t/t5317-pack-objects-filter-objects.sh | 13 ++++++++
 t/t6112-rev-list-filters-objects.sh    | 17 ++++++++++
 6 files changed, 105 insertions(+), 7 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5b07f3f4a..49d6deed7 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -6,6 +6,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "object.h"
 #include "object-store.h"
 #include "pack.h"
 #include "pack-bitmap.h"
@@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj)
 	 */
 	switch (arg_missing_action) {
 	case MA_ERROR:
-		die("missing blob object '%s'", oid_to_hex(&obj->oid));
+		die("missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	case MA_ALLOW_ANY:
@@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj)
 	case MA_ALLOW_PROMISOR:
 		if (is_promisor_object(&obj->oid))
 			return;
-		die("unexpected missing blob object '%s'",
-		    oid_to_hex(&obj->oid));
+		die("unexpected missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	default:
@@ -235,7 +237,7 @@ static inline void finish_object__ma(struct object *obj)
 static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
+	if (!has_object_file(&obj->oid)) {
 		finish_object__ma(obj);
 		return 1;
 	}
@@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	init_revisions(&revs, prefix);
 	revs.abbrev = DEFAULT_ABBREV;
 	revs.commit_format = CMIT_FMT_UNSPECIFIED;
+	revs.do_not_die_on_missing_tree = 1;
 
 	/*
 	 * Scan the argument list before invoking setup_revisions(), so that we
diff --git a/list-objects.c b/list-objects.c
index f9b51db7a..243192af5 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,6 +143,7 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
+	int failed_parse;
 
 	if (!revs->tree_objects)
 		return;
@@ -150,7 +151,9 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, 1) < 0) {
+
+	failed_parse = parse_tree_gently(tree, 1);
+	if (failed_parse) {
 		if (revs->ignore_missing_links)
 			return;
 
@@ -163,7 +166,8 @@ static void process_tree(struct traversal_context *ctx,
 		    is_promisor_object(&obj->oid))
 			return;
 
-		die("bad tree object %s", oid_to_hex(&obj->oid));
+		if (!revs->do_not_die_on_missing_tree)
+			die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
 	strbuf_addstr(base, name);
@@ -178,7 +182,8 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	process_tree_contents(ctx, tree, base);
+	if (!failed_parse)
+		process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
diff --git a/revision.h b/revision.h
index 007278cc1..5910613cb 100644
--- a/revision.h
+++ b/revision.h
@@ -126,6 +126,21 @@ struct rev_info {
 			line_level_traverse:1,
 			tree_blobs_in_commit_order:1,
 
+			/*
+			 * Blobs are shown without regard for their existence.
+			 * But not so for trees: unless exclude_promisor_objects
+			 * is set and the tree in question is a promisor object;
+			 * OR ignore_missing_links is set, the revision walker
+			 * dies with a "bad tree object HASH" message when
+			 * encountering a missing tree. For callers that can
+			 * handle missing trees and want them to be filterable
+			 * and showable, set this to true. The revision walker
+			 * will filter and show such a missing tree as usual,
+			 * but will not attempt to recurse into this tree
+			 * object.
+			 */
+			do_not_die_on_missing_tree:1,
+
 			/* for internal use only */
 			exclude_promisor_objects:1;
 
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 128130066..f02b9ae37 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -186,6 +186,51 @@ test_expect_success 'rev-list stops traversal at missing and promised commit' '
 	! grep $FOO out
 '
 
+test_expect_success 'missing tree objects with --missing=allow-promisor and --exclude-promisor-objects' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo foo &&
+	test_commit -C repo bar &&
+	test_commit -C repo baz &&
+
+	promise_and_delete $(git -C repo rev-parse bar^{tree}) &&
+	promise_and_delete $(git -C repo rev-parse foo^{tree}) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo rev-list --missing=allow-promisor --objects HEAD >objs 2>rev_list_err &&
+	test_must_be_empty rev_list_err &&
+	# 3 commits, 3 blobs, and 1 tree
+	test_line_count = 7 objs &&
+
+	# Do the same for --exclude-promisor-objects, but with all trees gone.
+	promise_and_delete $(git -C repo rev-parse baz^{tree}) &&
+	git -C repo rev-list --exclude-promisor-objects --objects HEAD >objs 2>rev_list_err &&
+	test_must_be_empty rev_list_err &&
+	# 3 commits, no blobs or trees
+	test_line_count = 3 objs
+'
+
+test_expect_success 'missing non-root tree object and rev-list' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	mkdir repo/dir &&
+	echo foo > repo/dir/foo &&
+	git -C repo add dir/foo &&
+	git -C repo commit -m "commit dir/foo" &&
+
+	promise_and_delete $(git -C repo rev-parse HEAD:dir) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err &&
+	test_must_be_empty rev_list_err &&
+	# 1 commit and 1 tree
+	test_line_count = 2 objs
+'
+
 test_expect_success 'rev-list stops traversal at missing and promised tree' '
 	rm -rf repo &&
 	test_create_repo repo &&
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 6710c8bc8..5e35f33bf 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -59,6 +59,19 @@ test_expect_success 'verify normal and blob:none packfiles have same commits/tre
 	test_cmp observed expected
 '
 
+test_expect_success 'get an error for missing tree object' '
+	git init r5 &&
+	echo foo > r5/foo &&
+	git -C r5 add foo &&
+	git -C r5 commit -m "foo" &&
+	del=$(git -C r5 rev-parse HEAD^{tree} | sed "s|..|&/|") &&
+	rm r5/.git/objects/$del &&
+	test_must_fail git -C r5 pack-objects --rev --stdout 2>bad_tree <<-EOF &&
+	HEAD
+	EOF
+	grep -q "bad tree object" bad_tree
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index d4ff0b3be..c662c97db 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -195,6 +195,23 @@ test_expect_success 'verify sparse:oid=oid-ish omits top-level files' '
 	test_cmp observed expected
 '
 
+test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for trees' '
+	TREE=$(git -C r3 rev-parse HEAD:dir1) &&
+
+	rm r3/.git/objects/$(echo $TREE | sed "s|^..|&/|") &&
+
+	git -C r3 rev-list --quiet --missing=print --objects HEAD >missing_objs 2>rev_list_err &&
+	echo "?$TREE" >expected &&
+	test_cmp expected missing_objs &&
+
+	# do not complain when a missing tree cannot be parsed
+	test_must_be_empty rev_list_err &&
+
+	git -C r3 rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err &&
+	! grep $TREE objs &&
+	test_must_be_empty rev_list_err
+'
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v8 5/7] revision: mark non-user-given objects instead
  2018-09-14  0:55 ` [PATCH v8 " Matthew DeVore
                     ` (3 preceding siblings ...)
  2018-09-14  0:55   ` [PATCH v8 4/7] rev-list: handle missing tree objects properly Matthew DeVore
@ 2018-09-14  0:55   ` Matthew DeVore
  2018-09-14 17:23     ` Junio C Hamano
  2018-09-14  0:55   ` [PATCH v8 6/7] list-objects-filter: use BUG rather than die Matthew DeVore
  2018-09-14  0:55   ` [PATCH v8 7/7] list-objects-filter: implement filter tree:0 Matthew DeVore
  6 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-09-14  0:55 UTC (permalink / raw)
  To: sbeller, git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster, pclouds

Currently, list-objects.c incorrectly treats all root trees of commits
as USER_GIVEN. Also, it would be easier to mark objects that are
non-user-given instead of user-given, since the places in the code
where we access an object through a reference are more obvious than
the places where we access an object that was given by the user.

Resolve these two problems by introducing a flag NOT_USER_GIVEN that
marks blobs and trees that are non-user-given, replacing USER_GIVEN.
(Only blobs and trees are marked because this mark is only used when
filtering objects, and filtering of other types of objects is not
supported yet.)

This fixes a bug in that git rev-list behaved differently from git
pack-objects. pack-objects would *not* filter objects given explicitly
on the command line and rev-list would filter. This was because the two
commands used a different function to add objects to the rev_info
struct. This seems to have been an oversight, and pack-objects has the
correct behavior, so I added a test to make sure that rev-list now
behaves properly.

Signed-off-by: Matthew DeVore <matvore@google.com>

fixup of 6defd70de
---
 list-objects.c                      | 31 +++++++++++++++++------------
 revision.c                          |  1 -
 revision.h                          | 11 ++++++++--
 t/t6112-rev-list-filters-objects.sh | 10 ++++++++++
 4 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 243192af5..7a1a0929d 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -53,7 +53,7 @@ static void process_blob(struct traversal_context *ctx,
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BLOB, obj,
 				   path->buf, &path->buf[pathlen],
 				   ctx->filter_data);
@@ -120,17 +120,19 @@ static void process_tree_contents(struct traversal_context *ctx,
 				continue;
 		}
 
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
+		if (S_ISDIR(entry.mode)) {
+			struct tree *t = lookup_tree(the_repository, entry.oid);
+			t->object.flags |= NOT_USER_GIVEN;
+			process_tree(ctx, t, base, entry.path);
+		}
 		else if (S_ISGITLINK(entry.mode))
 			process_gitlink(ctx, entry.oid->hash,
 					base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
+		else {
+			struct blob *b = lookup_blob(the_repository, entry.oid);
+			b->object.flags |= NOT_USER_GIVEN;
+			process_blob(ctx, b, base, entry.path);
+		}
 	}
 }
 
@@ -171,7 +173,7 @@ static void process_tree(struct traversal_context *ctx,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -185,7 +187,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (!failed_parse)
 		process_tree_contents(ctx, tree, base);
 
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -301,8 +303,11 @@ static void do_traverse(struct traversal_context *ctx)
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
-		if (get_commit_tree(commit))
-			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		if (get_commit_tree(commit)) {
+			struct tree *tree = get_commit_tree(commit);
+			tree->object.flags |= NOT_USER_GIVEN;
+			add_pending_tree(ctx->revs, tree);
+		}
 		ctx->show_commit(commit, ctx->show_data);
 
 		if (ctx->revs->tree_blobs_in_commit_order)
diff --git a/revision.c b/revision.c
index de4dce600..72d48a17f 100644
--- a/revision.c
+++ b/revision.c
@@ -175,7 +175,6 @@ static void add_pending_object_with_path(struct rev_info *revs,
 		strbuf_release(&buf);
 		return; /* do not add the commit itself */
 	}
-	obj->flags |= USER_GIVEN;
 	add_object_array_with_path(obj, name, &revs->pending, mode, path);
 }
 
diff --git a/revision.h b/revision.h
index 5910613cb..83e164039 100644
--- a/revision.h
+++ b/revision.h
@@ -21,9 +21,16 @@
 #define SYMMETRIC_LEFT	(1u<<8)
 #define PATCHSAME	(1u<<9)
 #define BOTTOM		(1u<<10)
-#define USER_GIVEN	(1u<<25) /* given directly by the user */
+/*
+ * Indicates object was reached by traversal. i.e. not given by user on
+ * command-line or stdin.
+ * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
+ * filtering trees and blobs, but it may be useful to support filtering commits
+ * in the future.
+ */
+#define NOT_USER_GIVEN	(1u<<25)
 #define TRACK_LINEAR	(1u<<26)
-#define ALL_REV_FLAGS	(((1u<<11)-1) | USER_GIVEN | TRACK_LINEAR)
+#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)
 
 #define DECORATE_SHORT_REFS	1
 #define DECORATE_FULL_REFS	2
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index c662c97db..2e07dadf0 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -30,6 +30,16 @@ test_expect_success 'verify blob:none omits all 5 blobs' '
 	test_cmp observed expected
 '
 
+test_expect_success 'specify blob explicitly prevents filtering' '
+	file_3=$(git -C r1 ls-files -s file.3 \
+		| awk -f print_2.awk) &&
+	file_4=$(git -C r1 ls-files -s file.4 \
+		| awk -f print_2.awk) &&
+	git -C r1 rev-list HEAD --objects --filter=blob:none HEAD $file_3 >observed &&
+	grep -q "$file_3" observed &&
+	test_must_fail grep -q "$file_4" observed
+'
+
 test_expect_success 'verify emitted+omitted == all' '
 	git -C r1 rev-list HEAD --objects \
 		| awk -f print_1.awk \
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v8 6/7] list-objects-filter: use BUG rather than die
  2018-09-14  0:55 ` [PATCH v8 " Matthew DeVore
                     ` (4 preceding siblings ...)
  2018-09-14  0:55   ` [PATCH v8 5/7] revision: mark non-user-given objects instead Matthew DeVore
@ 2018-09-14  0:55   ` Matthew DeVore
  2018-09-14  0:55   ` [PATCH v8 7/7] list-objects-filter: implement filter tree:0 Matthew DeVore
  6 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-14  0:55 UTC (permalink / raw)
  To: sbeller, git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster, pclouds

In some cases in this file, BUG makes more sense than die. In such
cases, a we get there from a coding error rather than a user error.

'return' has been removed following some instances of BUG since BUG does
not return.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects-filter.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..5f8b1a002 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -44,8 +44,7 @@ static enum list_objects_filter_result filter_blobs_none(
 
 	switch (filter_situation) {
 	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
@@ -102,8 +101,7 @@ static enum list_objects_filter_result filter_blobs_limit(
 
 	switch (filter_situation) {
 	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
@@ -208,8 +206,7 @@ static enum list_objects_filter_result filter_sparse(
 
 	switch (filter_situation) {
 	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
@@ -389,7 +386,7 @@ void *list_objects_filter__init(
 	assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);
 
 	if (filter_options->choice >= LOFC__COUNT)
-		die("invalid list-objects filter choice: %d",
+		BUG("invalid list-objects filter choice: %d",
 		    filter_options->choice);
 
 	init_fn = s_filters[filter_options->choice];
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v8 7/7] list-objects-filter: implement filter tree:0
  2018-09-14  0:55 ` [PATCH v8 " Matthew DeVore
                     ` (5 preceding siblings ...)
  2018-09-14  0:55   ` [PATCH v8 6/7] list-objects-filter: use BUG rather than die Matthew DeVore
@ 2018-09-14  0:55   ` Matthew DeVore
  2018-09-14 17:39     ` Junio C Hamano
  6 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-09-14  0:55 UTC (permalink / raw)
  To: sbeller, git
  Cc: Matthew DeVore, git, jeffhost, peff, stefanbeller, jonathantanmy,
	gitster, pclouds

Teach list-objects the "tree:0" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:0 - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also considered only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
would filter out all but the root tree and blobs. In order to avoid
confusion between 0 and capital O, the documentation was worded in a
somewhat round-about way that also hints at this future improvement to
the feature.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 Documentation/rev-list-options.txt     |  5 +++
 list-objects-filter-options.c          | 14 ++++++++
 list-objects-filter-options.h          |  1 +
 list-objects-filter.c                  | 49 ++++++++++++++++++++++++++
 t/t5317-pack-objects-filter-objects.sh | 28 +++++++++++++++
 t/t5616-partial-clone.sh               | 38 ++++++++++++++++++++
 t/t6112-rev-list-filters-objects.sh    | 12 +++++++
 7 files changed, 147 insertions(+)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 7b273635d..5f1672913 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -731,6 +731,11 @@ the requested refs.
 +
 The form '--filter=sparse:path=<path>' similarly uses a sparse-checkout
 specification contained in <path>.
++
+The form '--filter=tree:<depth>' omits all blobs and trees whose depth
+from the root tree is >= <depth> (minimum depth if an object is located
+at multiple depths in the commits traversed). Currently, only <depth>=0
+is supported, which omits all blobs and trees.
 
 --no-filter::
 	Turn off any previous `--filter=` argument.
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..14f251de4 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -50,6 +50,20 @@ static int gently_parse_list_objects_filter(
 			return 0;
 		}
 
+	} else if (skip_prefix(arg, "tree:", &v0)) {
+		unsigned long depth;
+		if (!git_parse_ulong(v0, &depth) || depth != 0) {
+			if (errbuf) {
+				strbuf_init(errbuf, 0);
+				strbuf_addstr(
+					errbuf,
+					_("only 'tree:0' is supported"));
+			}
+			return 1;
+		}
+		filter_options->choice = LOFC_TREE_NONE;
+		return 0;
+
 	} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
 		struct object_context oc;
 		struct object_id sparse_oid;
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 0000a61f8..af64e5c66 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -10,6 +10,7 @@ enum list_objects_filter_choice {
 	LOFC_DISABLED = 0,
 	LOFC_BLOB_NONE,
 	LOFC_BLOB_LIMIT,
+	LOFC_TREE_NONE,
 	LOFC_SPARSE_OID,
 	LOFC_SPARSE_PATH,
 	LOFC__COUNT /* must be last */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 5f8b1a002..09b2b05d5 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -79,6 +79,54 @@ static void *filter_blobs_none__init(
 	return d;
 }
 
+/*
+ * A filter for list-objects to omit ALL trees and blobs from the traversal.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_trees_none_data {
+	struct oidset *omits;
+};
+
+static enum list_objects_filter_result filter_trees_none(
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data_)
+{
+	struct filter_trees_none_data *filter_data = filter_data_;
+
+	switch (filter_situation) {
+	default:
+		BUG("unknown filter_situation: %d", filter_situation);
+
+	case LOFS_BEGIN_TREE:
+	case LOFS_BLOB:
+		if (filter_data->omits)
+			oidset_insert(filter_data->omits, &obj->oid);
+		return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
+
+	case LOFS_END_TREE:
+		assert(obj->type == OBJ_TREE);
+		return LOFR_ZERO;
+
+	}
+}
+
+static void* filter_trees_none__init(
+	struct oidset *omitted,
+	struct list_objects_filter_options *filter_options,
+	filter_object_fn *filter_fn,
+	filter_free_fn *filter_free_fn)
+{
+	struct filter_trees_none_data *d = xcalloc(1, sizeof(*d));
+	d->omits = omitted;
+
+	*filter_fn = filter_trees_none;
+	*filter_free_fn = free;
+	return d;
+}
+
 /*
  * A filter for list-objects to omit large blobs.
  * And to OPTIONALLY collect a list of the omitted OIDs.
@@ -371,6 +419,7 @@ static filter_init_fn s_filters[] = {
 	NULL,
 	filter_blobs_none__init,
 	filter_blobs_limit__init,
+	filter_trees_none__init,
 	filter_sparse_oid__init,
 	filter_sparse_path__init,
 };
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 5e35f33bf..7a4d49ea1 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -72,6 +72,34 @@ test_expect_success 'get an error for missing tree object' '
 	grep -q "bad tree object" bad_tree
 '
 
+test_expect_success 'setup for tests of tree:0' '
+	mkdir r1/subtree &&
+	echo "This is a file in a subtree" >r1/subtree/file &&
+	git -C r1 add subtree/file &&
+	git -C r1 commit -m subtree
+'
+
+test_expect_success 'verify tree:0 packfile has no blobs or trees' '
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	! grep -E "tree|blob" objs
+'
+
+test_expect_success 'grab tree directly when using tree:0' '
+	# We should get the tree specified directly but not its blobs or subtrees.
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD:
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	awk "/tree|blob/{print \$1}" objs >trees_and_blobs &&
+	git -C r1 rev-parse HEAD: >expected &&
+	test_cmp trees_and_blobs expected
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index bbbe7537d..8eeb85fbc 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -154,6 +154,44 @@ test_expect_success 'partial clone with transfer.fsckobjects=1 uses index-pack -
 	grep "git index-pack.*--fsck-objects" trace
 '
 
+test_expect_success 'use fsck before and after manually fetching a missing subtree' '
+	# push new commit so server has a subtree
+	mkdir src/dir &&
+	echo "in dir" >src/dir/file.txt &&
+	git -C src add dir/file.txt &&
+	git -C src commit -m "file in dir" &&
+	git -C src push -u srv master &&
+	SUBTREE=$(git -C src rev-parse HEAD:dir) &&
+
+	rm -rf dst &&
+	git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
+	git -C dst fsck &&
+
+	# Make sure we only have commits, and all trees and blobs are missing.
+	git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
+	awk -f print_1.awk fetched_objects \
+		| xargs -n1 git -C dst cat-file -t >fetched_types &&
+	sort fetched_types -u >unique_types.observed &&
+	echo commit >unique_types.expected &&
+	test_cmp unique_types.observed unique_types.expected &&
+
+	# Auto-fetch a tree with cat-file.
+	git -C dst cat-file -p $SUBTREE >tree_contents &&
+	grep file.txt tree_contents &&
+
+	# fsck still works after an auto-fetch of a tree.
+	git -C dst fsck &&
+
+	# Auto-fetch all remaining trees and blobs with --missing=error
+	git -C dst rev-list master --missing=error --objects >fetched_objects &&
+	test_line_count = 70 fetched_objects &&
+	awk -f print_1.awk fetched_objects \
+		| xargs -n1 git -C dst cat-file -t >fetched_types &&
+	sort fetched_types -u >unique_types.observed &&
+	printf "blob\ncommit\ntree\n" >unique_types.expected &&
+	test_cmp unique_types.observed unique_types.expected
+'
+
 test_expect_success 'partial clone fetches blobs pointed to by refs even if normally filtered out' '
 	rm -rf src dst &&
 	git init src &&
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 2e07dadf0..a989a7082 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -222,6 +222,18 @@ test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for tre
 	test_must_be_empty rev_list_err
 '
 
+# Test tree:0 filter.
+
+test_expect_success 'verify tree:0 includes trees in "filtered" output' '
+	git -C r3 rev-list HEAD --quiet --objects --filter-print-omitted --filter=tree:0 \
+		| awk -f print_1.awk \
+		| sed s/~// \
+		| xargs -n1 git -C r3 cat-file -t \
+		| sort -u >filtered_types &&
+	printf "blob\ntree\n" > expected &&
+	test_cmp filtered_types expected
+'
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v8 5/7] revision: mark non-user-given objects instead
  2018-09-14  0:55   ` [PATCH v8 5/7] revision: mark non-user-given objects instead Matthew DeVore
@ 2018-09-14 17:23     ` Junio C Hamano
  2018-09-14 20:08       ` Matthew DeVore
  0 siblings, 1 reply; 151+ messages in thread
From: Junio C Hamano @ 2018-09-14 17:23 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: sbeller, git, git, jeffhost, peff, stefanbeller, jonathantanmy, pclouds

Matthew DeVore <matvore@google.com> writes:

> Currently, list-objects.c incorrectly treats all root trees of commits
> as USER_GIVEN. Also, it would be easier to mark objects that are
> non-user-given instead of user-given, since the places in the code
> where we access an object through a reference are more obvious than
> the places where we access an object that was given by the user.
>
> Resolve these two problems by introducing a flag NOT_USER_GIVEN that
> marks blobs and trees that are non-user-given, replacing USER_GIVEN.
> (Only blobs and trees are marked because this mark is only used when
> filtering objects, and filtering of other types of objects is not
> supported yet.)
>
> This fixes a bug in that git rev-list behaved differently from git
> pack-objects. pack-objects would *not* filter objects given explicitly
> on the command line and rev-list would filter. This was because the two
> commands used a different function to add objects to the rev_info
> struct. This seems to have been an oversight, and pack-objects has the
> correct behavior, so I added a test to make sure that rev-list now
> behaves properly.
>
> Signed-off-by: Matthew DeVore <matvore@google.com>
>
> fixup of 6defd70de

That's probably meant to go below "---".

> ---
>  list-objects.c                      | 31 +++++++++++++++++------------
>  revision.c                          |  1 -
>  revision.h                          | 11 ++++++++--
>  t/t6112-rev-list-filters-objects.sh | 10 ++++++++++
>  4 files changed, 37 insertions(+), 16 deletions(-)
>
> diff --git a/list-objects.c b/list-objects.c
> index 243192af5..7a1a0929d 100644
> --- a/list-objects.c
> +++ b/list-objects.c
> @@ -53,7 +53,7 @@ static void process_blob(struct traversal_context *ctx,
>  
>  	pathlen = path->len;
>  	strbuf_addstr(path, name);
> -	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
> +	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
>  		r = ctx->filter_fn(LOFS_BLOB, obj,
>  				   path->buf, &path->buf[pathlen],
>  				   ctx->filter_data);
> @@ -120,17 +120,19 @@ static void process_tree_contents(struct traversal_context *ctx,
>  				continue;
>  		}
>  
> -		if (S_ISDIR(entry.mode))
> -			process_tree(ctx,
> -				     lookup_tree(the_repository, entry.oid),
> -				     base, entry.path);
> +		if (S_ISDIR(entry.mode)) {
> +			struct tree *t = lookup_tree(the_repository, entry.oid);
> +			t->object.flags |= NOT_USER_GIVEN;
> +			process_tree(ctx, t, base, entry.path);
> +		}
>  		else if (S_ISGITLINK(entry.mode))
>  			process_gitlink(ctx, entry.oid->hash,
>  					base, entry.path);
> -		else
> -			process_blob(ctx,
> -				     lookup_blob(the_repository, entry.oid),
> -				     base, entry.path);
> +		else {
> +			struct blob *b = lookup_blob(the_repository, entry.oid);
> +			b->object.flags |= NOT_USER_GIVEN;
> +			process_blob(ctx, b, base, entry.path);
> +		}
>  	}
>  }
>  
> @@ -171,7 +173,7 @@ static void process_tree(struct traversal_context *ctx,
>  	}
>  
>  	strbuf_addstr(base, name);
> -	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
> +	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
>  		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
>  				   base->buf, &base->buf[baselen],
>  				   ctx->filter_data);
> @@ -185,7 +187,7 @@ static void process_tree(struct traversal_context *ctx,
>  	if (!failed_parse)
>  		process_tree_contents(ctx, tree, base);
>  
> -	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
> +	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
>  		r = ctx->filter_fn(LOFS_END_TREE, obj,
>  				   base->buf, &base->buf[baselen],
>  				   ctx->filter_data);
> @@ -301,8 +303,11 @@ static void do_traverse(struct traversal_context *ctx)
>  		 * an uninteresting boundary commit may not have its tree
>  		 * parsed yet, but we are not going to show them anyway
>  		 */
> -		if (get_commit_tree(commit))
> -			add_pending_tree(ctx->revs, get_commit_tree(commit));
> +		if (get_commit_tree(commit)) {
> +			struct tree *tree = get_commit_tree(commit);
> +			tree->object.flags |= NOT_USER_GIVEN;
> +			add_pending_tree(ctx->revs, tree);
> +		}
>  		ctx->show_commit(commit, ctx->show_data);
>  
>  		if (ctx->revs->tree_blobs_in_commit_order)
> diff --git a/revision.c b/revision.c
> index de4dce600..72d48a17f 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -175,7 +175,6 @@ static void add_pending_object_with_path(struct rev_info *revs,
>  		strbuf_release(&buf);
>  		return; /* do not add the commit itself */
>  	}
> -	obj->flags |= USER_GIVEN;
>  	add_object_array_with_path(obj, name, &revs->pending, mode, path);
>  }
>  
> diff --git a/revision.h b/revision.h
> index 5910613cb..83e164039 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -21,9 +21,16 @@
>  #define SYMMETRIC_LEFT	(1u<<8)
>  #define PATCHSAME	(1u<<9)
>  #define BOTTOM		(1u<<10)
> -#define USER_GIVEN	(1u<<25) /* given directly by the user */
> +/*
> + * Indicates object was reached by traversal. i.e. not given by user on
> + * command-line or stdin.
> + * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
> + * filtering trees and blobs, but it may be useful to support filtering commits
> + * in the future.
> + */
> +#define NOT_USER_GIVEN	(1u<<25)
>  #define TRACK_LINEAR	(1u<<26)
> -#define ALL_REV_FLAGS	(((1u<<11)-1) | USER_GIVEN | TRACK_LINEAR)
> +#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)
>  
>  #define DECORATE_SHORT_REFS	1
>  #define DECORATE_FULL_REFS	2
> diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
> index c662c97db..2e07dadf0 100755
> --- a/t/t6112-rev-list-filters-objects.sh
> +++ b/t/t6112-rev-list-filters-objects.sh
> @@ -30,6 +30,16 @@ test_expect_success 'verify blob:none omits all 5 blobs' '
>  	test_cmp observed expected
>  '
>  
> +test_expect_success 'specify blob explicitly prevents filtering' '
> +	file_3=$(git -C r1 ls-files -s file.3 \
> +		| awk -f print_2.awk) &&
> +	file_4=$(git -C r1 ls-files -s file.4 \
> +		| awk -f print_2.awk) &&
> +	git -C r1 rev-list HEAD --objects --filter=blob:none HEAD $file_3 >observed &&
> +	grep -q "$file_3" observed &&
> +	test_must_fail grep -q "$file_4" observed
> +'
> +
>  test_expect_success 'verify emitted+omitted == all' '
>  	git -C r1 rev-list HEAD --objects \
>  		| awk -f print_1.awk \

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v8 7/7] list-objects-filter: implement filter tree:0
  2018-09-14  0:55   ` [PATCH v8 7/7] list-objects-filter: implement filter tree:0 Matthew DeVore
@ 2018-09-14 17:39     ` Junio C Hamano
  2018-09-14 17:47       ` Junio C Hamano
  0 siblings, 1 reply; 151+ messages in thread
From: Junio C Hamano @ 2018-09-14 17:39 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: sbeller, git, git, jeffhost, peff, stefanbeller, jonathantanmy, pclouds

Matthew DeVore <matvore@google.com> writes:

> diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> index c0e2bd6a0..14f251de4 100644
> --- a/list-objects-filter-options.c
> +++ b/list-objects-filter-options.c
> @@ -50,6 +50,20 @@ static int gently_parse_list_objects_filter(
>  			return 0;
>  		}
>  
> +	} else if (skip_prefix(arg, "tree:", &v0)) {
> +		unsigned long depth;
> +		if (!git_parse_ulong(v0, &depth) || depth != 0) {
> +			if (errbuf) {
> +				strbuf_init(errbuf, 0);
> +				strbuf_addstr(
> +					errbuf,
> +					_("only 'tree:0' is supported"));

This is not a new issue with this patch, but I think strbuf_init()
at the location of filling done like this is a bad idea.  If the
caller gave you an errbuf that is pre-filled with something, we'd
leak memory and lose information.  It only makes sense to _init() if
the caller gave us an uninitialized garbage (or a strbuf that has
just been initialized and is empty).  

The existing callers seem to do STRBUF_INIT before passing it to
this function, so we probably should not do strbuf_init() here (and
other two places in this function) and simply add to it.

> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> index bbbe7537d..8eeb85fbc 100755
> --- a/t/t5616-partial-clone.sh
> +++ b/t/t5616-partial-clone.sh
> @@ -154,6 +154,44 @@ test_expect_success 'partial clone with transfer.fsckobjects=1 uses index-pack -
>  	grep "git index-pack.*--fsck-objects" trace
>  '
>  
> +test_expect_success 'use fsck before and after manually fetching a missing subtree' '
> +	# push new commit so server has a subtree
> +	mkdir src/dir &&
> +	echo "in dir" >src/dir/file.txt &&
> +	git -C src add dir/file.txt &&
> +	git -C src commit -m "file in dir" &&
> +	git -C src push -u srv master &&
> +	SUBTREE=$(git -C src rev-parse HEAD:dir) &&
> +
> +	rm -rf dst &&
> +	git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
> +	git -C dst fsck &&
> +
> +	# Make sure we only have commits, and all trees and blobs are missing.
> +	git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
> +	awk -f print_1.awk fetched_objects \
> +		| xargs -n1 git -C dst cat-file -t >fetched_types &&

Break line after pipe "|", not before, and lose the backslash.  You
do not need to over-indent the command on the downstream of the
pipe, i.e.

	awk ... |
	xargs -n1 git -C ... &&

Same comment applies elsewhere in this patch, not limited to this file.

> +	sort fetched_types -u >unique_types.observed &&

Make it a habit not to add dashed options after real arguments, i.e.

	sort -u fetched_types

> +	echo commit >unique_types.expected &&
> +	test_cmp unique_types.observed unique_types.expected &&

Always compare "expect" with "actual", not in the reverse order, i.e.

	test_cmp expect actual

not

	test_cmp actual expect

This is important because test_cmp reports failures by showing you
an output of "diff expect actual" and from "sh t5616-part*.sh -v"
you can see what additional/excess things were produced by the test
over what is expected, prefixed with "+", and what your code failed
to produce are shown prefixed with "-".

Thanks.


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v8 7/7] list-objects-filter: implement filter tree:0
  2018-09-14 17:39     ` Junio C Hamano
@ 2018-09-14 17:47       ` Junio C Hamano
  2018-09-15  0:41         ` Matthew DeVore
  0 siblings, 1 reply; 151+ messages in thread
From: Junio C Hamano @ 2018-09-14 17:47 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: sbeller, git, git, jeffhost, peff, stefanbeller, jonathantanmy, pclouds

Junio C Hamano <gitster@pobox.com> writes:

>> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
>> index bbbe7537d..8eeb85fbc 100755
>> --- a/t/t5616-partial-clone.sh
>> +++ b/t/t5616-partial-clone.sh
>> ...
>
> Break line after pipe "|", not before, and lose the backslash.  You
> do not need to over-indent the command on the downstream of the
> pipe, i.e.
>
> 	awk ... |
> 	xargs -n1 git -C ... &&
>
> Same comment applies elsewhere in this patch, not limited to this file.
>
>> +	sort fetched_types -u >unique_types.observed &&
>
> Make it a habit not to add dashed options after real arguments, i.e.
>
> 	sort -u fetched_types
>
>> +	echo commit >unique_types.expected &&
>> +	test_cmp unique_types.observed unique_types.expected &&
>
> Always compare "expect" with "actual", not in the reverse order, i.e.
>
> 	test_cmp expect actual
>
> not
>
> 	test_cmp actual expect
>
> This is important because test_cmp reports failures by showing you
> an output of "diff expect actual" and from "sh t5616-part*.sh -v"
> you can see what additional/excess things were produced by the test
> over what is expected, prefixed with "+", and what your code failed
> to produce are shown prefixed with "-".

I notice that patches to other files like 6112 in this series also
spread the above mistakes from existing lines.  Please do not view
what you see in these two test scripts before you start touching as
a good example to follow---rather, treat them as antipattern X-<.
5616 is not as bad as 6112, but they both need to be cleaned up.

We could alternatively do a post clean-up, but ideally we should
first have a clean-up patch before this series to co.

Thanks.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v8 5/7] revision: mark non-user-given objects instead
  2018-09-14 17:23     ` Junio C Hamano
@ 2018-09-14 20:08       ` Matthew DeVore
  0 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-14 20:08 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, git, git, jeffhost, Jeff King, Stefan Beller,
	Jonathan Tan, pclouds

On Fri, Sep 14, 2018 at 10:23 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matthew DeVore <matvore@google.com> writes:
>
> > Signed-off-by: Matthew DeVore <matvore@google.com>
> >
> > fixup of 6defd70de
>
> That's probably meant to go below "---".
>

That line shouldn't be there at all, sorry!

It came from me putting that text in a commit which was meant to be a
fixup of another commit when I ran rebase -i. I asked rebase to make
it a "squash" so I could edit the commit message of the earlier commit
(6defd70de). Then rebase merged the two descriptions and let me edit
them, but I didn't remember to delete the latter commit's message.

I probably should have made the earlier commit (6defd70de) a "reword",
and the later commit a "fixup", rather than "pick" followed by
"squash"

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v8 7/7] list-objects-filter: implement filter tree:0
  2018-09-14 17:47       ` Junio C Hamano
@ 2018-09-15  0:41         ` Matthew DeVore
  0 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-15  0:41 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, git, git, jeffhost, Jeff King, Stefan Beller,
	Jonathan Tan, pclouds

On Fri, Sep 14, 2018 at 10:47 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Break line after pipe "|", not before, and lose the backslash.  You
> > do not need to over-indent the command on the downstream of the
> > pipe, i.e.
> >
> >       awk ... |
> >       xargs -n1 git -C ... &&
> >
> > Same comment applies elsewhere in this patch, not limited to this file.
> >
> >> +    sort fetched_types -u >unique_types.observed &&
> >
> > Make it a habit not to add dashed options after real arguments, i.e.
> >
> >       sort -u fetched_types
> >
Done. I'm not sure why I made this mistake, since I usually prefer to
order flags before positional args. I didn't actually clean this up in
existing code as I did other mistakes, since it is very hard to find
and do thoroughly.

> >> +    echo commit >unique_types.expected &&
> >> +    test_cmp unique_types.observed unique_types.expected &&
> >
> > Always compare "expect" with "actual", not in the reverse order, i.e.
> >
> >       test_cmp expect actual
> >
> > not
> >
> >       test_cmp actual expect
> >
Done.

> > This is important because test_cmp reports failures by showing you
> > an output of "diff expect actual" and from "sh t5616-part*.sh -v"
> > you can see what additional/excess things were produced by the test
> > over what is expected, prefixed with "+", and what your code failed
> > to produce are shown prefixed with "-".
Hmm... I didn't know aout the -v flag. That's quite good to know, thanks!

>
> I notice that patches to other files like 6112 in this series also
> spread the above mistakes from existing lines.  Please do not view
> what you see in these two test scripts before you start touching as
> a good example to follow---rather, treat them as antipattern X-<.
> 5616 is not as bad as 6112, but they both need to be cleaned up.
>
> We could alternatively do a post clean-up, but ideally we should
> first have a clean-up patch before this series to co.

I cleaned up existing tests in a new patchset here:
https://public-inbox.org/git/cover.1536969438.git.matvore@google.com/T/#t
- that new patch corrects the pipe placement and test_cmp argument
ordering.

There is no dependency between this patchset and the new one, though I
assume you want to commit the clean-up once first so maintain
consistency.

Here is an interdiff for this particular patch series (I replaced \t
with 8 spaces so it would be readable after my mail client mangles
it):

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 14f251de4..e8da2e858 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -30,7 +30,6 @@ static int gently_parse_list_objects_filter(

         if (filter_options->choice) {
                 if (errbuf) {
-                        strbuf_init(errbuf, 0);
                         strbuf_addstr(
                                 errbuf,
                                 _("multiple filter-specs cannot be combined"));
@@ -54,7 +53,6 @@ static int gently_parse_list_objects_filter(
                 unsigned long depth;
                 if (!git_parse_ulong(v0, &depth) || depth != 0) {
                         if (errbuf) {
-                                strbuf_init(errbuf, 0);
                                 strbuf_addstr(
                                         errbuf,
                                         _("only 'tree:0' is supported"));
@@ -85,10 +83,9 @@ static int gently_parse_list_objects_filter(
                 return 0;
         }

-        if (errbuf) {
-                strbuf_init(errbuf, 0);
+        if (errbuf)
                 strbuf_addf(errbuf, "invalid filter-spec '%s'", arg);
-        }
+
         memset(filter_options, 0, sizeof(*filter_options));
         return 1;
 }
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index f02b9ae37..5bc5b4445 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -216,7 +216,7 @@ test_expect_success 'missing non-root tree object
and rev-list' '
         rm -rf repo &&
         test_create_repo repo &&
         mkdir repo/dir &&
-        echo foo > repo/dir/foo &&
+        echo foo >repo/dir/foo &&
         git -C repo add dir/foo &&
         git -C repo commit -m "commit dir/foo" &&

diff --git a/t/t5317-pack-objects-filter-objects.sh
b/t/t5317-pack-objects-filter-objects.sh
index 7a4d49ea1..510d3537f 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -61,7 +61,7 @@ test_expect_success 'verify normal and blob:none
packfiles have same commits/tre

 test_expect_success 'get an error for missing tree object' '
         git init r5 &&
-        echo foo > r5/foo &&
+        echo foo >r5/foo &&
         git -C r5 add foo &&
         git -C r5 commit -m "foo" &&
         del=$(git -C r5 rev-parse HEAD^{tree} | sed "s|..|&/|") &&
@@ -97,7 +97,7 @@ test_expect_success 'grab tree directly when using tree:0' '
         git -C r1 verify-pack -v ../commitsonly.pack >objs &&
         awk "/tree|blob/{print \$1}" objs >trees_and_blobs &&
         git -C r1 rev-parse HEAD: >expected &&
-        test_cmp trees_and_blobs expected
+        test_cmp expected trees_and_blobs
 '

 # Test blob:limit=<n>[kmg] filter.
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 8eeb85fbc..7b6294ca5 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -169,11 +169,12 @@ test_expect_success 'use fsck before and after
manually fetching a missing subtr

         # Make sure we only have commits, and all trees and blobs are missing.
         git -C dst rev-list master --missing=allow-any --objects
>fetched_objects &&
-        awk -f print_1.awk fetched_objects \
-                | xargs -n1 git -C dst cat-file -t >fetched_types &&
-        sort fetched_types -u >unique_types.observed &&
+        awk -f print_1.awk fetched_objects |
+        xargs -n1 git -C dst cat-file -t >fetched_types &&
+
+        sort -u fetched_types >unique_types.observed &&
         echo commit >unique_types.expected &&
-        test_cmp unique_types.observed unique_types.expected &&
+        test_cmp unique_types.expected unique_types.observed &&

         # Auto-fetch a tree with cat-file.
         git -C dst cat-file -p $SUBTREE >tree_contents &&
@@ -185,11 +186,13 @@ test_expect_success 'use fsck before and after
manually fetching a missing subtr
         # Auto-fetch all remaining trees and blobs with --missing=error
         git -C dst rev-list master --missing=error --objects
>fetched_objects &&
         test_line_count = 70 fetched_objects &&
-        awk -f print_1.awk fetched_objects \
-                | xargs -n1 git -C dst cat-file -t >fetched_types &&
-        sort fetched_types -u >unique_types.observed &&
+
+        awk -f print_1.awk fetched_objects |
+        xargs -n1 git -C dst cat-file -t >fetched_types &&
+
+        sort -u fetched_types >unique_types.observed &&
         printf "blob\ncommit\ntree\n" >unique_types.expected &&
-        test_cmp unique_types.observed unique_types.expected
+        test_cmp unique_types.expected unique_types.observed
 '

 test_expect_success 'partial clone fetches blobs pointed to by refs
even if normally filtered out' '
diff --git a/t/t6112-rev-list-filters-objects.sh
b/t/t6112-rev-list-filters-objects.sh
index a989a7082..6e5c41a68 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -31,11 +31,13 @@ test_expect_success 'verify blob:none omits all 5 blobs' '
 '

 test_expect_success 'specify blob explicitly prevents filtering' '
-        file_3=$(git -C r1 ls-files -s file.3 \
-                | awk -f print_2.awk) &&
-        file_4=$(git -C r1 ls-files -s file.4 \
-                | awk -f print_2.awk) &&
-        git -C r1 rev-list HEAD --objects --filter=blob:none HEAD
$file_3 >observed &&
+        file_3=$(git -C r1 ls-files -s file.3 |
+                 awk -f print_2.awk) &&
+
+        file_4=$(git -C r1 ls-files -s file.4 |
+                 awk -f print_2.awk) &&
+
+        git -C r1 rev-list --objects --filter=blob:none HEAD $file_3
>observed &&
         grep -q "$file_3" observed &&
         test_must_fail grep -q "$file_4" observed
 '
@@ -225,13 +227,14 @@ test_expect_success 'rev-list W/ --missing=print
and --missing=allow-any for tre
 # Test tree:0 filter.

 test_expect_success 'verify tree:0 includes trees in "filtered" output' '
-        git -C r3 rev-list HEAD --quiet --objects
--filter-print-omitted --filter=tree:0 \
-                | awk -f print_1.awk \
-                | sed s/~// \
-                | xargs -n1 git -C r3 cat-file -t \
-                | sort -u >filtered_types &&
-        printf "blob\ntree\n" > expected &&
-        test_cmp filtered_types expected
+        git -C r3 rev-list HEAD --quiet --objects
--filter-print-omitted --filter=tree:0 |
+        awk -f print_1.awk |
+        sed s/~// |
+        xargs -n1 git -C r3 cat-file -t |
+        sort -u >filtered_types &&
+
+        printf "blob\ntree\n" >expected &&
+        test_cmp expected filtered_types
 '

 # Delete some loose objects and use rev-list, but WITHOUT any filtering.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v9 0/8] filter: support for excluding all trees and blobs
  2018-08-09 22:44 [RFC PATCH 0/5] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (12 preceding siblings ...)
  2018-09-14  0:55 ` [PATCH v8 " Matthew DeVore
@ 2018-09-21 20:31 ` Matthew DeVore
  2018-09-21 20:31   ` [PATCH v9 1/8] list-objects: store common func args in struct Matthew DeVore
                     ` (7 more replies)
  2018-10-03 19:52 ` [PATCH v10 0/8] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (2 subsequent siblings)
  16 siblings, 8 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-21 20:31 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

Since v8, I cleaned up the test scripts in the following ways:
 - correct order of expect/actual arguments to test_cmp
 - correct pipe placement
 - put flags before positional arguments

Also, removed some junk in the commit message of the 5th patch.

Thank you,

Matthew DeVore (8):
  list-objects: store common func args in struct
  list-objects: refactor to process_tree_contents
  list-objects: always parse trees gently
  rev-list: handle missing tree objects properly
  revision: mark non-user-given objects instead
  list-objects-filter: use BUG rather than die
  list-objects-filter-options: do not over-strbuf_init
  list-objects-filter: implement filter tree:0

 Documentation/rev-list-options.txt     |   5 +
 builtin/rev-list.c                     |  11 +-
 list-objects-filter-options.c          |  19 +-
 list-objects-filter-options.h          |   1 +
 list-objects-filter.c                  |  60 ++++++-
 list-objects.c                         | 232 +++++++++++++------------
 revision.c                             |   1 -
 revision.h                             |  26 ++-
 t/t0410-partial-clone.sh               |  45 +++++
 t/t5317-pack-objects-filter-objects.sh |  41 +++++
 t/t5616-partial-clone.sh               |  41 +++++
 t/t6112-rev-list-filters-objects.sh    |  42 +++++
 12 files changed, 396 insertions(+), 128 deletions(-)

-- 
2.19.0.444.g18242da7ef-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v9 1/8] list-objects: store common func args in struct
  2018-09-21 20:31 ` [PATCH v9 0/8] filter: support for excluding all trees and blobs Matthew DeVore
@ 2018-09-21 20:31   ` Matthew DeVore
  2018-09-21 20:31   ` [PATCH v9 2/8] list-objects: refactor to process_tree_contents Matthew DeVore
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-21 20:31 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

This will make utility functions easier to create, as done by the next
patch.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 158 +++++++++++++++++++++++--------------------------
 1 file changed, 74 insertions(+), 84 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index c99c47ac1..584518a3f 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -12,20 +12,25 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static void process_blob(struct rev_info *revs,
+struct traversal_context {
+	struct rev_info *revs;
+	show_object_fn show_object;
+	show_commit_fn show_commit;
+	void *show_data;
+	filter_object_fn filter_fn;
+	void *filter_data;
+};
+
+static void process_blob(struct traversal_context *ctx,
 			 struct blob *blob,
-			 show_object_fn show,
 			 struct strbuf *path,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &blob->object;
 	size_t pathlen;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
-	if (!revs->blob_objects)
+	if (!ctx->revs->blob_objects)
 		return;
 	if (!obj)
 		die("bad blob object");
@@ -41,21 +46,21 @@ static void process_blob(struct rev_info *revs,
 	 * may cause the actual filter to report an incomplete list
 	 * of missing objects.
 	 */
-	if (revs->exclude_promisor_objects &&
+	if (ctx->revs->exclude_promisor_objects &&
 	    !has_object_file(&obj->oid) &&
 	    is_promisor_object(&obj->oid))
 		return;
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BLOB, obj,
-			      path->buf, &path->buf[pathlen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BLOB, obj,
+				   path->buf, &path->buf[pathlen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, path->buf, cb_data);
+		ctx->show_object(obj, path->buf, ctx->show_data);
 	strbuf_setlen(path, pathlen);
 }
 
@@ -81,26 +86,21 @@ static void process_blob(struct rev_info *revs,
  * the link, and how to do it. Whether it necessarily makes
  * any sense what-so-ever to ever do that is another issue.
  */
-static void process_gitlink(struct rev_info *revs,
+static void process_gitlink(struct traversal_context *ctx,
 			    const unsigned char *sha1,
-			    show_object_fn show,
 			    struct strbuf *path,
-			    const char *name,
-			    void *cb_data)
+			    const char *name)
 {
 	/* Nothing to do */
 }
 
-static void process_tree(struct rev_info *revs,
+static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
-			 show_object_fn show,
 			 struct strbuf *base,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &tree->object;
+	struct rev_info *revs = ctx->revs;
 	struct tree_desc desc;
 	struct name_entry entry;
 	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
@@ -133,14 +133,14 @@ static void process_tree(struct rev_info *revs,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BEGIN_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, base->buf, cb_data);
+		ctx->show_object(obj, base->buf, ctx->show_data);
 	if (base->len)
 		strbuf_addch(base, '/');
 
@@ -157,29 +157,25 @@ static void process_tree(struct rev_info *revs,
 		}
 
 		if (S_ISDIR(entry.mode))
-			process_tree(revs,
+			process_tree(ctx,
 				     lookup_tree(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(revs, entry.oid->hash,
-					show, base, entry.path,
-					cb_data);
+			process_gitlink(ctx, entry.oid->hash, base, entry.path);
 		else
-			process_blob(revs,
+			process_blob(ctx,
 				     lookup_blob(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 	}
 
-	if (!(obj->flags & USER_GIVEN) && filter_fn) {
-		r = filter_fn(LOFS_END_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+		r = ctx->filter_fn(LOFS_END_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 		if (r & LOFR_MARK_SEEN)
 			obj->flags |= SEEN;
 		if (r & LOFR_DO_SHOW)
-			show(obj, base->buf, cb_data);
+			ctx->show_object(obj, base->buf, ctx->show_data);
 	}
 
 	strbuf_setlen(base, baselen);
@@ -242,19 +238,15 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 	add_pending_object(revs, &tree->object, "");
 }
 
-static void traverse_trees_and_blobs(struct rev_info *revs,
-				     struct strbuf *base,
-				     show_object_fn show_object,
-				     void *show_data,
-				     filter_object_fn filter_fn,
-				     void *filter_data)
+static void traverse_trees_and_blobs(struct traversal_context *ctx,
+				     struct strbuf *base)
 {
 	int i;
 
 	assert(base->len == 0);
 
-	for (i = 0; i < revs->pending.nr; i++) {
-		struct object_array_entry *pending = revs->pending.objects + i;
+	for (i = 0; i < ctx->revs->pending.nr; i++) {
+		struct object_array_entry *pending = ctx->revs->pending.objects + i;
 		struct object *obj = pending->item;
 		const char *name = pending->name;
 		const char *path = pending->path;
@@ -262,62 +254,49 @@ static void traverse_trees_and_blobs(struct rev_info *revs,
 			continue;
 		if (obj->type == OBJ_TAG) {
 			obj->flags |= SEEN;
-			show_object(obj, name, show_data);
+			ctx->show_object(obj, name, ctx->show_data);
 			continue;
 		}
 		if (!path)
 			path = "";
 		if (obj->type == OBJ_TREE) {
-			process_tree(revs, (struct tree *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_tree(ctx, (struct tree *)obj, base, path);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
-			process_blob(revs, (struct blob *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_blob(ctx, (struct blob *)obj, base, path);
 			continue;
 		}
 		die("unknown pending object %s (%s)",
 		    oid_to_hex(&obj->oid), name);
 	}
-	object_array_clear(&revs->pending);
+	object_array_clear(&ctx->revs->pending);
 }
 
-static void do_traverse(struct rev_info *revs,
-			show_commit_fn show_commit,
-			show_object_fn show_object,
-			void *show_data,
-			filter_object_fn filter_fn,
-			void *filter_data)
+static void do_traverse(struct traversal_context *ctx)
 {
 	struct commit *commit;
 	struct strbuf csp; /* callee's scratch pad */
 	strbuf_init(&csp, PATH_MAX);
 
-	while ((commit = get_revision(revs)) != NULL) {
+	while ((commit = get_revision(ctx->revs)) != NULL) {
 		/*
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
 		if (get_commit_tree(commit))
-			add_pending_tree(revs, get_commit_tree(commit));
-		show_commit(commit, show_data);
+			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		ctx->show_commit(commit, ctx->show_data);
 
-		if (revs->tree_blobs_in_commit_order)
+		if (ctx->revs->tree_blobs_in_commit_order)
 			/*
 			 * NEEDSWORK: Adding the tree and then flushing it here
 			 * needs a reallocation for each commit. Can we pass the
 			 * tree directory without allocation churn?
 			 */
-			traverse_trees_and_blobs(revs, &csp,
-						 show_object, show_data,
-						 filter_fn, filter_data);
+			traverse_trees_and_blobs(ctx, &csp);
 	}
-	traverse_trees_and_blobs(revs, &csp,
-				 show_object, show_data,
-				 filter_fn, filter_data);
+	traverse_trees_and_blobs(ctx, &csp);
 	strbuf_release(&csp);
 }
 
@@ -326,7 +305,14 @@ void traverse_commit_list(struct rev_info *revs,
 			  show_object_fn show_object,
 			  void *show_data)
 {
-	do_traverse(revs, show_commit, show_object, show_data, NULL, NULL);
+	struct traversal_context ctx;
+	ctx.revs = revs;
+	ctx.show_commit = show_commit;
+	ctx.show_object = show_object;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+	ctx.filter_data = NULL;
+	do_traverse(&ctx);
 }
 
 void traverse_commit_list_filtered(
@@ -337,14 +323,18 @@ void traverse_commit_list_filtered(
 	void *show_data,
 	struct oidset *omitted)
 {
-	filter_object_fn filter_fn = NULL;
+	struct traversal_context ctx;
 	filter_free_fn filter_free_fn = NULL;
-	void *filter_data = NULL;
-
-	filter_data = list_objects_filter__init(omitted, filter_options,
-						&filter_fn, &filter_free_fn);
-	do_traverse(revs, show_commit, show_object, show_data,
-		    filter_fn, filter_data);
-	if (filter_data && filter_free_fn)
-		filter_free_fn(filter_data);
+
+	ctx.revs = revs;
+	ctx.show_object = show_object;
+	ctx.show_commit = show_commit;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+
+	ctx.filter_data = list_objects_filter__init(omitted, filter_options,
+						    &ctx.filter_fn, &filter_free_fn);
+	do_traverse(&ctx);
+	if (ctx.filter_data && filter_free_fn)
+		filter_free_fn(ctx.filter_data);
 }
-- 
2.19.0.444.g18242da7ef-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v9 2/8] list-objects: refactor to process_tree_contents
  2018-09-21 20:31 ` [PATCH v9 0/8] filter: support for excluding all trees and blobs Matthew DeVore
  2018-09-21 20:31   ` [PATCH v9 1/8] list-objects: store common func args in struct Matthew DeVore
@ 2018-09-21 20:31   ` Matthew DeVore
  2018-09-21 20:31   ` [PATCH v9 3/8] list-objects: always parse trees gently Matthew DeVore
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-21 20:31 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

This will be used in a follow-up patch to reduce indentation needed when
invoking the logic conditionally. i.e. rather than:

if (foo) {
	while (...) {
		/* this is very indented */
	}
}

we will have:

if (foo)
	process_tree_contents(...);

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 68 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 584518a3f..ccc529e5e 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -94,6 +94,46 @@ static void process_gitlink(struct traversal_context *ctx,
 	/* Nothing to do */
 }
 
+static void process_tree(struct traversal_context *ctx,
+			 struct tree *tree,
+			 struct strbuf *base,
+			 const char *name);
+
+static void process_tree_contents(struct traversal_context *ctx,
+				  struct tree *tree,
+				  struct strbuf *base)
+{
+	struct tree_desc desc;
+	struct name_entry entry;
+	enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
+		all_entries_interesting : entry_not_interesting;
+
+	init_tree_desc(&desc, tree->buffer, tree->size);
+
+	while (tree_entry(&desc, &entry)) {
+		if (match != all_entries_interesting) {
+			match = tree_entry_interesting(&entry, base, 0,
+						       &ctx->revs->diffopt.pathspec);
+			if (match == all_entries_not_interesting)
+				break;
+			if (match == entry_not_interesting)
+				continue;
+		}
+
+		if (S_ISDIR(entry.mode))
+			process_tree(ctx,
+				     lookup_tree(the_repository, entry.oid),
+				     base, entry.path);
+		else if (S_ISGITLINK(entry.mode))
+			process_gitlink(ctx, entry.oid->hash,
+					base, entry.path);
+		else
+			process_blob(ctx,
+				     lookup_blob(the_repository, entry.oid),
+				     base, entry.path);
+	}
+}
+
 static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
 			 struct strbuf *base,
@@ -101,10 +141,6 @@ static void process_tree(struct traversal_context *ctx,
 {
 	struct object *obj = &tree->object;
 	struct rev_info *revs = ctx->revs;
-	struct tree_desc desc;
-	struct name_entry entry;
-	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
-		all_entries_interesting: entry_not_interesting;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 	int gently = revs->ignore_missing_links ||
@@ -144,29 +180,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
-
-	while (tree_entry(&desc, &entry)) {
-		if (match != all_entries_interesting) {
-			match = tree_entry_interesting(&entry, base, 0,
-						       &revs->diffopt.pathspec);
-			if (match == all_entries_not_interesting)
-				break;
-			if (match == entry_not_interesting)
-				continue;
-		}
-
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
-		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(ctx, entry.oid->hash, base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
-	}
+	process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
-- 
2.19.0.444.g18242da7ef-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v9 3/8] list-objects: always parse trees gently
  2018-09-21 20:31 ` [PATCH v9 0/8] filter: support for excluding all trees and blobs Matthew DeVore
  2018-09-21 20:31   ` [PATCH v9 1/8] list-objects: store common func args in struct Matthew DeVore
  2018-09-21 20:31   ` [PATCH v9 2/8] list-objects: refactor to process_tree_contents Matthew DeVore
@ 2018-09-21 20:31   ` Matthew DeVore
  2018-09-21 20:32   ` [PATCH v9 4/8] rev-list: handle missing tree objects properly Matthew DeVore
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-21 20:31 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

If parsing fails when revs->ignore_missing_links and
revs->exclude_promisor_objects are both false, we print the OID anyway
in the die("bad tree object...") call, so any message printed by
parse_tree_gently() is superfluous.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index ccc529e5e..f9b51db7a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,8 +143,6 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
-	int gently = revs->ignore_missing_links ||
-		     revs->exclude_promisor_objects;
 
 	if (!revs->tree_objects)
 		return;
@@ -152,7 +150,7 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, gently) < 0) {
+	if (parse_tree_gently(tree, 1) < 0) {
 		if (revs->ignore_missing_links)
 			return;
 
-- 
2.19.0.444.g18242da7ef-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v9 4/8] rev-list: handle missing tree objects properly
  2018-09-21 20:31 ` [PATCH v9 0/8] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (2 preceding siblings ...)
  2018-09-21 20:31   ` [PATCH v9 3/8] list-objects: always parse trees gently Matthew DeVore
@ 2018-09-21 20:32   ` Matthew DeVore
  2018-09-21 20:32   ` [PATCH v9 5/8] revision: mark non-user-given objects instead Matthew DeVore
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-21 20:32 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

Previously, we assumed only blob objects could be missing. This patch
makes rev-list handle missing trees like missing blobs. The --missing=*
and --exclude-promisor-objects flags now work for trees as they already
do for blobs. This is demonstrated in t6112.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 builtin/rev-list.c                     | 11 ++++---
 list-objects.c                         | 11 +++++--
 revision.h                             | 15 +++++++++
 t/t0410-partial-clone.sh               | 45 ++++++++++++++++++++++++++
 t/t5317-pack-objects-filter-objects.sh | 13 ++++++++
 t/t6112-rev-list-filters-objects.sh    | 17 ++++++++++
 6 files changed, 105 insertions(+), 7 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5b07f3f4a..49d6deed7 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -6,6 +6,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "object.h"
 #include "object-store.h"
 #include "pack.h"
 #include "pack-bitmap.h"
@@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj)
 	 */
 	switch (arg_missing_action) {
 	case MA_ERROR:
-		die("missing blob object '%s'", oid_to_hex(&obj->oid));
+		die("missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	case MA_ALLOW_ANY:
@@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj)
 	case MA_ALLOW_PROMISOR:
 		if (is_promisor_object(&obj->oid))
 			return;
-		die("unexpected missing blob object '%s'",
-		    oid_to_hex(&obj->oid));
+		die("unexpected missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	default:
@@ -235,7 +237,7 @@ static inline void finish_object__ma(struct object *obj)
 static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
+	if (!has_object_file(&obj->oid)) {
 		finish_object__ma(obj);
 		return 1;
 	}
@@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	init_revisions(&revs, prefix);
 	revs.abbrev = DEFAULT_ABBREV;
 	revs.commit_format = CMIT_FMT_UNSPECIFIED;
+	revs.do_not_die_on_missing_tree = 1;
 
 	/*
 	 * Scan the argument list before invoking setup_revisions(), so that we
diff --git a/list-objects.c b/list-objects.c
index f9b51db7a..243192af5 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,6 +143,7 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
+	int failed_parse;
 
 	if (!revs->tree_objects)
 		return;
@@ -150,7 +151,9 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, 1) < 0) {
+
+	failed_parse = parse_tree_gently(tree, 1);
+	if (failed_parse) {
 		if (revs->ignore_missing_links)
 			return;
 
@@ -163,7 +166,8 @@ static void process_tree(struct traversal_context *ctx,
 		    is_promisor_object(&obj->oid))
 			return;
 
-		die("bad tree object %s", oid_to_hex(&obj->oid));
+		if (!revs->do_not_die_on_missing_tree)
+			die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
 	strbuf_addstr(base, name);
@@ -178,7 +182,8 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	process_tree_contents(ctx, tree, base);
+	if (!failed_parse)
+		process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
diff --git a/revision.h b/revision.h
index 007278cc1..5910613cb 100644
--- a/revision.h
+++ b/revision.h
@@ -126,6 +126,21 @@ struct rev_info {
 			line_level_traverse:1,
 			tree_blobs_in_commit_order:1,
 
+			/*
+			 * Blobs are shown without regard for their existence.
+			 * But not so for trees: unless exclude_promisor_objects
+			 * is set and the tree in question is a promisor object;
+			 * OR ignore_missing_links is set, the revision walker
+			 * dies with a "bad tree object HASH" message when
+			 * encountering a missing tree. For callers that can
+			 * handle missing trees and want them to be filterable
+			 * and showable, set this to true. The revision walker
+			 * will filter and show such a missing tree as usual,
+			 * but will not attempt to recurse into this tree
+			 * object.
+			 */
+			do_not_die_on_missing_tree:1,
+
 			/* for internal use only */
 			exclude_promisor_objects:1;
 
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 128130066..5bc5b4445 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -186,6 +186,51 @@ test_expect_success 'rev-list stops traversal at missing and promised commit' '
 	! grep $FOO out
 '
 
+test_expect_success 'missing tree objects with --missing=allow-promisor and --exclude-promisor-objects' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo foo &&
+	test_commit -C repo bar &&
+	test_commit -C repo baz &&
+
+	promise_and_delete $(git -C repo rev-parse bar^{tree}) &&
+	promise_and_delete $(git -C repo rev-parse foo^{tree}) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo rev-list --missing=allow-promisor --objects HEAD >objs 2>rev_list_err &&
+	test_must_be_empty rev_list_err &&
+	# 3 commits, 3 blobs, and 1 tree
+	test_line_count = 7 objs &&
+
+	# Do the same for --exclude-promisor-objects, but with all trees gone.
+	promise_and_delete $(git -C repo rev-parse baz^{tree}) &&
+	git -C repo rev-list --exclude-promisor-objects --objects HEAD >objs 2>rev_list_err &&
+	test_must_be_empty rev_list_err &&
+	# 3 commits, no blobs or trees
+	test_line_count = 3 objs
+'
+
+test_expect_success 'missing non-root tree object and rev-list' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	mkdir repo/dir &&
+	echo foo >repo/dir/foo &&
+	git -C repo add dir/foo &&
+	git -C repo commit -m "commit dir/foo" &&
+
+	promise_and_delete $(git -C repo rev-parse HEAD:dir) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err &&
+	test_must_be_empty rev_list_err &&
+	# 1 commit and 1 tree
+	test_line_count = 2 objs
+'
+
 test_expect_success 'rev-list stops traversal at missing and promised tree' '
 	rm -rf repo &&
 	test_create_repo repo &&
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 6710c8bc8..9839b48c1 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -59,6 +59,19 @@ test_expect_success 'verify normal and blob:none packfiles have same commits/tre
 	test_cmp observed expected
 '
 
+test_expect_success 'get an error for missing tree object' '
+	git init r5 &&
+	echo foo >r5/foo &&
+	git -C r5 add foo &&
+	git -C r5 commit -m "foo" &&
+	del=$(git -C r5 rev-parse HEAD^{tree} | sed "s|..|&/|") &&
+	rm r5/.git/objects/$del &&
+	test_must_fail git -C r5 pack-objects --rev --stdout 2>bad_tree <<-EOF &&
+	HEAD
+	EOF
+	grep -q "bad tree object" bad_tree
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index d4ff0b3be..c662c97db 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -195,6 +195,23 @@ test_expect_success 'verify sparse:oid=oid-ish omits top-level files' '
 	test_cmp observed expected
 '
 
+test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for trees' '
+	TREE=$(git -C r3 rev-parse HEAD:dir1) &&
+
+	rm r3/.git/objects/$(echo $TREE | sed "s|^..|&/|") &&
+
+	git -C r3 rev-list --quiet --missing=print --objects HEAD >missing_objs 2>rev_list_err &&
+	echo "?$TREE" >expected &&
+	test_cmp expected missing_objs &&
+
+	# do not complain when a missing tree cannot be parsed
+	test_must_be_empty rev_list_err &&
+
+	git -C r3 rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err &&
+	! grep $TREE objs &&
+	test_must_be_empty rev_list_err
+'
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.19.0.444.g18242da7ef-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v9 5/8] revision: mark non-user-given objects instead
  2018-09-21 20:31 ` [PATCH v9 0/8] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (3 preceding siblings ...)
  2018-09-21 20:32   ` [PATCH v9 4/8] rev-list: handle missing tree objects properly Matthew DeVore
@ 2018-09-21 20:32   ` Matthew DeVore
  2018-09-21 20:32   ` [PATCH v9 6/8] list-objects-filter: use BUG rather than die Matthew DeVore
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-21 20:32 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

Currently, list-objects.c incorrectly treats all root trees of commits
as USER_GIVEN. Also, it would be easier to mark objects that are
non-user-given instead of user-given, since the places in the code
where we access an object through a reference are more obvious than
the places where we access an object that was given by the user.

Resolve these two problems by introducing a flag NOT_USER_GIVEN that
marks blobs and trees that are non-user-given, replacing USER_GIVEN.
(Only blobs and trees are marked because this mark is only used when
filtering objects, and filtering of other types of objects is not
supported yet.)

This fixes a bug in that git rev-list behaved differently from git
pack-objects. pack-objects would *not* filter objects given explicitly
on the command line and rev-list would filter. This was because the two
commands used a different function to add objects to the rev_info
struct. This seems to have been an oversight, and pack-objects has the
correct behavior, so I added a test to make sure that rev-list now
behaves properly.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c                      | 31 +++++++++++++++++------------
 revision.c                          |  1 -
 revision.h                          | 11 ++++++++--
 t/t6112-rev-list-filters-objects.sh | 12 +++++++++++
 4 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 243192af5..7a1a0929d 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -53,7 +53,7 @@ static void process_blob(struct traversal_context *ctx,
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BLOB, obj,
 				   path->buf, &path->buf[pathlen],
 				   ctx->filter_data);
@@ -120,17 +120,19 @@ static void process_tree_contents(struct traversal_context *ctx,
 				continue;
 		}
 
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
+		if (S_ISDIR(entry.mode)) {
+			struct tree *t = lookup_tree(the_repository, entry.oid);
+			t->object.flags |= NOT_USER_GIVEN;
+			process_tree(ctx, t, base, entry.path);
+		}
 		else if (S_ISGITLINK(entry.mode))
 			process_gitlink(ctx, entry.oid->hash,
 					base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
+		else {
+			struct blob *b = lookup_blob(the_repository, entry.oid);
+			b->object.flags |= NOT_USER_GIVEN;
+			process_blob(ctx, b, base, entry.path);
+		}
 	}
 }
 
@@ -171,7 +173,7 @@ static void process_tree(struct traversal_context *ctx,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -185,7 +187,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (!failed_parse)
 		process_tree_contents(ctx, tree, base);
 
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -301,8 +303,11 @@ static void do_traverse(struct traversal_context *ctx)
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
-		if (get_commit_tree(commit))
-			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		if (get_commit_tree(commit)) {
+			struct tree *tree = get_commit_tree(commit);
+			tree->object.flags |= NOT_USER_GIVEN;
+			add_pending_tree(ctx->revs, tree);
+		}
 		ctx->show_commit(commit, ctx->show_data);
 
 		if (ctx->revs->tree_blobs_in_commit_order)
diff --git a/revision.c b/revision.c
index de4dce600..72d48a17f 100644
--- a/revision.c
+++ b/revision.c
@@ -175,7 +175,6 @@ static void add_pending_object_with_path(struct rev_info *revs,
 		strbuf_release(&buf);
 		return; /* do not add the commit itself */
 	}
-	obj->flags |= USER_GIVEN;
 	add_object_array_with_path(obj, name, &revs->pending, mode, path);
 }
 
diff --git a/revision.h b/revision.h
index 5910613cb..83e164039 100644
--- a/revision.h
+++ b/revision.h
@@ -21,9 +21,16 @@
 #define SYMMETRIC_LEFT	(1u<<8)
 #define PATCHSAME	(1u<<9)
 #define BOTTOM		(1u<<10)
-#define USER_GIVEN	(1u<<25) /* given directly by the user */
+/*
+ * Indicates object was reached by traversal. i.e. not given by user on
+ * command-line or stdin.
+ * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
+ * filtering trees and blobs, but it may be useful to support filtering commits
+ * in the future.
+ */
+#define NOT_USER_GIVEN	(1u<<25)
 #define TRACK_LINEAR	(1u<<26)
-#define ALL_REV_FLAGS	(((1u<<11)-1) | USER_GIVEN | TRACK_LINEAR)
+#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)
 
 #define DECORATE_SHORT_REFS	1
 #define DECORATE_FULL_REFS	2
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index c662c97db..11186209b 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -30,6 +30,18 @@ test_expect_success 'verify blob:none omits all 5 blobs' '
 	test_cmp observed expected
 '
 
+test_expect_success 'specify blob explicitly prevents filtering' '
+	file_3=$(git -C r1 ls-files -s file.3 |
+		 awk -f print_2.awk) &&
+
+	file_4=$(git -C r1 ls-files -s file.4 |
+		 awk -f print_2.awk) &&
+
+	git -C r1 rev-list --objects --filter=blob:none HEAD $file_3 >observed &&
+	grep -q "$file_3" observed &&
+	test_must_fail grep -q "$file_4" observed
+'
+
 test_expect_success 'verify emitted+omitted == all' '
 	git -C r1 rev-list HEAD --objects \
 		| awk -f print_1.awk \
-- 
2.19.0.444.g18242da7ef-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v9 6/8] list-objects-filter: use BUG rather than die
  2018-09-21 20:31 ` [PATCH v9 0/8] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (4 preceding siblings ...)
  2018-09-21 20:32   ` [PATCH v9 5/8] revision: mark non-user-given objects instead Matthew DeVore
@ 2018-09-21 20:32   ` Matthew DeVore
  2018-09-21 20:32   ` [PATCH v9 7/8] list-objects-filter-options: do not over-strbuf_init Matthew DeVore
  2018-09-21 20:32   ` [PATCH v9 8/8] list-objects-filter: implement filter tree:0 Matthew DeVore
  7 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-21 20:32 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

In some cases in this file, BUG makes more sense than die. In such
cases, a we get there from a coding error rather than a user error.

'return' has been removed following some instances of BUG since BUG does
not return.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects-filter.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..5f8b1a002 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -44,8 +44,7 @@ static enum list_objects_filter_result filter_blobs_none(
 
 	switch (filter_situation) {
 	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
@@ -102,8 +101,7 @@ static enum list_objects_filter_result filter_blobs_limit(
 
 	switch (filter_situation) {
 	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
@@ -208,8 +206,7 @@ static enum list_objects_filter_result filter_sparse(
 
 	switch (filter_situation) {
 	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
@@ -389,7 +386,7 @@ void *list_objects_filter__init(
 	assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);
 
 	if (filter_options->choice >= LOFC__COUNT)
-		die("invalid list-objects filter choice: %d",
+		BUG("invalid list-objects filter choice: %d",
 		    filter_options->choice);
 
 	init_fn = s_filters[filter_options->choice];
-- 
2.19.0.444.g18242da7ef-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v9 7/8] list-objects-filter-options: do not over-strbuf_init
  2018-09-21 20:31 ` [PATCH v9 0/8] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (5 preceding siblings ...)
  2018-09-21 20:32   ` [PATCH v9 6/8] list-objects-filter: use BUG rather than die Matthew DeVore
@ 2018-09-21 20:32   ` Matthew DeVore
  2018-09-21 20:32   ` [PATCH v9 8/8] list-objects-filter: implement filter tree:0 Matthew DeVore
  7 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-21 20:32 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

The function gently_parse_list_objects_filter is either called with
errbuf=STRBUF_INIT or errbuf=NULL, but that function calls strbuf_init
when errbuf is not NULL. strbuf_init is only necessary if errbuf
contains garbage, and risks a memory leak if errbuf already has a
non-STRBUF_INIT state. It should be the caller's responsibility to make
sure errbuf is not garbage, since garbage content is easily avoidable
with STRBUF_INIT.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects-filter-options.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..d259bdb2c 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -30,7 +30,6 @@ static int gently_parse_list_objects_filter(
 
 	if (filter_options->choice) {
 		if (errbuf) {
-			strbuf_init(errbuf, 0);
 			strbuf_addstr(
 				errbuf,
 				_("multiple filter-specs cannot be combined"));
@@ -71,10 +70,9 @@ static int gently_parse_list_objects_filter(
 		return 0;
 	}
 
-	if (errbuf) {
-		strbuf_init(errbuf, 0);
+	if (errbuf)
 		strbuf_addf(errbuf, "invalid filter-spec '%s'", arg);
-	}
+
 	memset(filter_options, 0, sizeof(*filter_options));
 	return 1;
 }
-- 
2.19.0.444.g18242da7ef-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v9 8/8] list-objects-filter: implement filter tree:0
  2018-09-21 20:31 ` [PATCH v9 0/8] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (6 preceding siblings ...)
  2018-09-21 20:32   ` [PATCH v9 7/8] list-objects-filter-options: do not over-strbuf_init Matthew DeVore
@ 2018-09-21 20:32   ` Matthew DeVore
  7 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-09-21 20:32 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

Teach list-objects the "tree:0" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:0 - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also considered only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
would filter out all but the root tree and blobs. In order to avoid
confusion between 0 and capital O, the documentation was worded in a
somewhat round-about way that also hints at this future improvement to
the feature.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 Documentation/rev-list-options.txt     |  5 +++
 list-objects-filter-options.c          | 13 +++++++
 list-objects-filter-options.h          |  1 +
 list-objects-filter.c                  | 49 ++++++++++++++++++++++++++
 t/t5317-pack-objects-filter-objects.sh | 28 +++++++++++++++
 t/t5616-partial-clone.sh               | 41 +++++++++++++++++++++
 t/t6112-rev-list-filters-objects.sh    | 13 +++++++
 7 files changed, 150 insertions(+)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 7b273635d..5f1672913 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -731,6 +731,11 @@ the requested refs.
 +
 The form '--filter=sparse:path=<path>' similarly uses a sparse-checkout
 specification contained in <path>.
++
+The form '--filter=tree:<depth>' omits all blobs and trees whose depth
+from the root tree is >= <depth> (minimum depth if an object is located
+at multiple depths in the commits traversed). Currently, only <depth>=0
+is supported, which omits all blobs and trees.
 
 --no-filter::
 	Turn off any previous `--filter=` argument.
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index d259bdb2c..e8da2e858 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -49,6 +49,19 @@ static int gently_parse_list_objects_filter(
 			return 0;
 		}
 
+	} else if (skip_prefix(arg, "tree:", &v0)) {
+		unsigned long depth;
+		if (!git_parse_ulong(v0, &depth) || depth != 0) {
+			if (errbuf) {
+				strbuf_addstr(
+					errbuf,
+					_("only 'tree:0' is supported"));
+			}
+			return 1;
+		}
+		filter_options->choice = LOFC_TREE_NONE;
+		return 0;
+
 	} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
 		struct object_context oc;
 		struct object_id sparse_oid;
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 0000a61f8..af64e5c66 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -10,6 +10,7 @@ enum list_objects_filter_choice {
 	LOFC_DISABLED = 0,
 	LOFC_BLOB_NONE,
 	LOFC_BLOB_LIMIT,
+	LOFC_TREE_NONE,
 	LOFC_SPARSE_OID,
 	LOFC_SPARSE_PATH,
 	LOFC__COUNT /* must be last */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 5f8b1a002..09b2b05d5 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -79,6 +79,54 @@ static void *filter_blobs_none__init(
 	return d;
 }
 
+/*
+ * A filter for list-objects to omit ALL trees and blobs from the traversal.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_trees_none_data {
+	struct oidset *omits;
+};
+
+static enum list_objects_filter_result filter_trees_none(
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data_)
+{
+	struct filter_trees_none_data *filter_data = filter_data_;
+
+	switch (filter_situation) {
+	default:
+		BUG("unknown filter_situation: %d", filter_situation);
+
+	case LOFS_BEGIN_TREE:
+	case LOFS_BLOB:
+		if (filter_data->omits)
+			oidset_insert(filter_data->omits, &obj->oid);
+		return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
+
+	case LOFS_END_TREE:
+		assert(obj->type == OBJ_TREE);
+		return LOFR_ZERO;
+
+	}
+}
+
+static void* filter_trees_none__init(
+	struct oidset *omitted,
+	struct list_objects_filter_options *filter_options,
+	filter_object_fn *filter_fn,
+	filter_free_fn *filter_free_fn)
+{
+	struct filter_trees_none_data *d = xcalloc(1, sizeof(*d));
+	d->omits = omitted;
+
+	*filter_fn = filter_trees_none;
+	*filter_free_fn = free;
+	return d;
+}
+
 /*
  * A filter for list-objects to omit large blobs.
  * And to OPTIONALLY collect a list of the omitted OIDs.
@@ -371,6 +419,7 @@ static filter_init_fn s_filters[] = {
 	NULL,
 	filter_blobs_none__init,
 	filter_blobs_limit__init,
+	filter_trees_none__init,
 	filter_sparse_oid__init,
 	filter_sparse_path__init,
 };
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 9839b48c1..510d3537f 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -72,6 +72,34 @@ test_expect_success 'get an error for missing tree object' '
 	grep -q "bad tree object" bad_tree
 '
 
+test_expect_success 'setup for tests of tree:0' '
+	mkdir r1/subtree &&
+	echo "This is a file in a subtree" >r1/subtree/file &&
+	git -C r1 add subtree/file &&
+	git -C r1 commit -m subtree
+'
+
+test_expect_success 'verify tree:0 packfile has no blobs or trees' '
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	! grep -E "tree|blob" objs
+'
+
+test_expect_success 'grab tree directly when using tree:0' '
+	# We should get the tree specified directly but not its blobs or subtrees.
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD:
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	awk "/tree|blob/{print \$1}" objs >trees_and_blobs &&
+	git -C r1 rev-parse HEAD: >expected &&
+	test_cmp expected trees_and_blobs
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index bbbe7537d..7b6294ca5 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -154,6 +154,47 @@ test_expect_success 'partial clone with transfer.fsckobjects=1 uses index-pack -
 	grep "git index-pack.*--fsck-objects" trace
 '
 
+test_expect_success 'use fsck before and after manually fetching a missing subtree' '
+	# push new commit so server has a subtree
+	mkdir src/dir &&
+	echo "in dir" >src/dir/file.txt &&
+	git -C src add dir/file.txt &&
+	git -C src commit -m "file in dir" &&
+	git -C src push -u srv master &&
+	SUBTREE=$(git -C src rev-parse HEAD:dir) &&
+
+	rm -rf dst &&
+	git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
+	git -C dst fsck &&
+
+	# Make sure we only have commits, and all trees and blobs are missing.
+	git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
+	awk -f print_1.awk fetched_objects |
+	xargs -n1 git -C dst cat-file -t >fetched_types &&
+
+	sort -u fetched_types >unique_types.observed &&
+	echo commit >unique_types.expected &&
+	test_cmp unique_types.expected unique_types.observed &&
+
+	# Auto-fetch a tree with cat-file.
+	git -C dst cat-file -p $SUBTREE >tree_contents &&
+	grep file.txt tree_contents &&
+
+	# fsck still works after an auto-fetch of a tree.
+	git -C dst fsck &&
+
+	# Auto-fetch all remaining trees and blobs with --missing=error
+	git -C dst rev-list master --missing=error --objects >fetched_objects &&
+	test_line_count = 70 fetched_objects &&
+
+	awk -f print_1.awk fetched_objects |
+	xargs -n1 git -C dst cat-file -t >fetched_types &&
+
+	sort -u fetched_types >unique_types.observed &&
+	printf "blob\ncommit\ntree\n" >unique_types.expected &&
+	test_cmp unique_types.expected unique_types.observed
+'
+
 test_expect_success 'partial clone fetches blobs pointed to by refs even if normally filtered out' '
 	rm -rf src dst &&
 	git init src &&
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 11186209b..6e5c41a68 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -224,6 +224,19 @@ test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for tre
 	test_must_be_empty rev_list_err
 '
 
+# Test tree:0 filter.
+
+test_expect_success 'verify tree:0 includes trees in "filtered" output' '
+	git -C r3 rev-list HEAD --quiet --objects --filter-print-omitted --filter=tree:0 |
+	awk -f print_1.awk |
+	sed s/~// |
+	xargs -n1 git -C r3 cat-file -t |
+	sort -u >filtered_types &&
+
+	printf "blob\ntree\n" >expected &&
+	test_cmp expected filtered_types
+'
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.19.0.444.g18242da7ef-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v3 5/5] list-objects-filter: implement filter tree:0
  2018-08-14 15:13     ` Jeff Hostetler
  2018-08-14 17:25       ` Matthew DeVore
@ 2018-10-03 19:00       ` Matthew DeVore
  1 sibling, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-03 19:00 UTC (permalink / raw)
  To: git; +Cc: git, jeffhost, Jeff King, Stefan Beller, Jonathan Tan

On Tue, Aug 14, 2018 at 8:13 AM Jeff Hostetler <git@jeffhostetler.com> wrote:
>
> There are a couple of options here:
> [] If really want to omit all trees and blobs (and we DO NOT want
>     the oidset of everything omitted), then we might be able to
>     shortcut the traversal and speed things up.
>
>     {} add a LOFR_SKIP_TREE bit to list_objects_filter_result
>     {} test this bit process_tree() and avoid the init_tree_desc() and
>        the while loop and some adjacent setup/tear-down code.
>     {} make this filter something like:
>
>         case LOFS_BEGIN_TREE:
>                 if (filter_data->omits) {
>                         oidset_insert(filter_data->omits, &obj->oid);
>                         return LOFR_MARK_SEEN; /* ... (hard omit) */
>                 } else
>                         return LOFR_SKIP_TREE;
>         case LOFS_BLOB:
>                 if (filter_data->omits) {
>                         oidset_insert(filter_data->omits, &obj->oid);
>                         return LOFR_MARK_SEEN; /* ... (hard omit) */
>                 else
>                         assert(...should not happen...);
>
> [] Later, if we choose to actually support a depth>0, we'll probably
>     want a different filter function to conditionally include/exclude
>     blobs, include shallow tree[node]s, and do some of the provisional-
>     omit logic on deep tree[nodes] (in case a tree appears at multiple
>     places/depths in the history).  But that can wait.
>
> Jeff
>

Jeff, have you made any progress on depth>0 support for the tree
filter? I'd like to take a stab at it without duplicating work :)

- Matt

^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v10 0/8] filter: support for excluding all trees and blobs
  2018-08-09 22:44 [RFC PATCH 0/5] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (13 preceding siblings ...)
  2018-09-21 20:31 ` [PATCH v9 0/8] filter: support for excluding all trees and blobs Matthew DeVore
@ 2018-10-03 19:52 ` Matthew DeVore
  2018-10-03 19:52   ` [PATCH v10 1/8] list-objects: store common func args in struct Matthew DeVore
                     ` (8 more replies)
  2018-10-05 21:31 ` [PATCH v11 " Matthew DeVore
  2018-10-12 20:01 ` [PATCH v12 0/8] filter: support for excluding all trees and blobs Matthew DeVore
  16 siblings, 9 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-03 19:52 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

This is a minor change to the previous rollup. It moves positional
arguments to the end of git-rev-list invocations. Here is an interdiff
from v9:

diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 7b6294ca5..53fbf7db8 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -168,7 +168,8 @@ test_expect_success 'use fsck before and after manually fetching a missing subtr
 	git -C dst fsck &&
 
 	# Make sure we only have commits, and all trees and blobs are missing.
-	git -C dst rev-list master --missing=allow-any --objects >fetched_objects &&
+	git -C dst rev-list --missing=allow-any --objects master \
+		>fetched_objects &&
 	awk -f print_1.awk fetched_objects |
 	xargs -n1 git -C dst cat-file -t >fetched_types &&
 
@@ -184,7 +185,7 @@ test_expect_success 'use fsck before and after manually fetching a missing subtr
 	git -C dst fsck &&
 
 	# Auto-fetch all remaining trees and blobs with --missing=error
-	git -C dst rev-list master --missing=error --objects >fetched_objects &&
+	git -C dst rev-list --missing=error --objects master >fetched_objects &&
 	test_line_count = 70 fetched_objects &&
 
 	awk -f print_1.awk fetched_objects |
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 6e5c41a68..5a61614b1 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -227,7 +227,8 @@ test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for tre
 # Test tree:0 filter.
 
 test_expect_success 'verify tree:0 includes trees in "filtered" output' '
-	git -C r3 rev-list HEAD --quiet --objects --filter-print-omitted --filter=tree:0 |
+	git -C r3 rev-list --quiet --objects --filter-print-omitted \
+		--filter=tree:0 HEAD |
 	awk -f print_1.awk |
 	sed s/~// |
 	xargs -n1 git -C r3 cat-file -t |

Thank you,

Matthew DeVore (8):
  list-objects: store common func args in struct
  list-objects: refactor to process_tree_contents
  list-objects: always parse trees gently
  rev-list: handle missing tree objects properly
  revision: mark non-user-given objects instead
  list-objects-filter: use BUG rather than die
  list-objects-filter-options: do not over-strbuf_init
  list-objects-filter: implement filter tree:0

 Documentation/rev-list-options.txt     |   5 +
 builtin/rev-list.c                     |  11 +-
 list-objects-filter-options.c          |  19 +-
 list-objects-filter-options.h          |   1 +
 list-objects-filter.c                  |  60 ++++++-
 list-objects.c                         | 232 +++++++++++++------------
 revision.c                             |   1 -
 revision.h                             |  26 ++-
 t/t0410-partial-clone.sh               |  45 +++++
 t/t5317-pack-objects-filter-objects.sh |  41 +++++
 t/t5616-partial-clone.sh               |  42 +++++
 t/t6112-rev-list-filters-objects.sh    |  43 +++++
 12 files changed, 398 insertions(+), 128 deletions(-)

-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v10 1/8] list-objects: store common func args in struct
  2018-10-03 19:52 ` [PATCH v10 0/8] filter: support for excluding all trees and blobs Matthew DeVore
@ 2018-10-03 19:52   ` Matthew DeVore
  2018-10-03 19:52   ` [PATCH v10 2/8] list-objects: refactor to process_tree_contents Matthew DeVore
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-03 19:52 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

This will make utility functions easier to create, as done by the next
patch.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 158 +++++++++++++++++++++++--------------------------
 1 file changed, 74 insertions(+), 84 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index c99c47ac1..584518a3f 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -12,20 +12,25 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static void process_blob(struct rev_info *revs,
+struct traversal_context {
+	struct rev_info *revs;
+	show_object_fn show_object;
+	show_commit_fn show_commit;
+	void *show_data;
+	filter_object_fn filter_fn;
+	void *filter_data;
+};
+
+static void process_blob(struct traversal_context *ctx,
 			 struct blob *blob,
-			 show_object_fn show,
 			 struct strbuf *path,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &blob->object;
 	size_t pathlen;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
-	if (!revs->blob_objects)
+	if (!ctx->revs->blob_objects)
 		return;
 	if (!obj)
 		die("bad blob object");
@@ -41,21 +46,21 @@ static void process_blob(struct rev_info *revs,
 	 * may cause the actual filter to report an incomplete list
 	 * of missing objects.
 	 */
-	if (revs->exclude_promisor_objects &&
+	if (ctx->revs->exclude_promisor_objects &&
 	    !has_object_file(&obj->oid) &&
 	    is_promisor_object(&obj->oid))
 		return;
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BLOB, obj,
-			      path->buf, &path->buf[pathlen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BLOB, obj,
+				   path->buf, &path->buf[pathlen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, path->buf, cb_data);
+		ctx->show_object(obj, path->buf, ctx->show_data);
 	strbuf_setlen(path, pathlen);
 }
 
@@ -81,26 +86,21 @@ static void process_blob(struct rev_info *revs,
  * the link, and how to do it. Whether it necessarily makes
  * any sense what-so-ever to ever do that is another issue.
  */
-static void process_gitlink(struct rev_info *revs,
+static void process_gitlink(struct traversal_context *ctx,
 			    const unsigned char *sha1,
-			    show_object_fn show,
 			    struct strbuf *path,
-			    const char *name,
-			    void *cb_data)
+			    const char *name)
 {
 	/* Nothing to do */
 }
 
-static void process_tree(struct rev_info *revs,
+static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
-			 show_object_fn show,
 			 struct strbuf *base,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &tree->object;
+	struct rev_info *revs = ctx->revs;
 	struct tree_desc desc;
 	struct name_entry entry;
 	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
@@ -133,14 +133,14 @@ static void process_tree(struct rev_info *revs,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BEGIN_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, base->buf, cb_data);
+		ctx->show_object(obj, base->buf, ctx->show_data);
 	if (base->len)
 		strbuf_addch(base, '/');
 
@@ -157,29 +157,25 @@ static void process_tree(struct rev_info *revs,
 		}
 
 		if (S_ISDIR(entry.mode))
-			process_tree(revs,
+			process_tree(ctx,
 				     lookup_tree(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(revs, entry.oid->hash,
-					show, base, entry.path,
-					cb_data);
+			process_gitlink(ctx, entry.oid->hash, base, entry.path);
 		else
-			process_blob(revs,
+			process_blob(ctx,
 				     lookup_blob(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 	}
 
-	if (!(obj->flags & USER_GIVEN) && filter_fn) {
-		r = filter_fn(LOFS_END_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+		r = ctx->filter_fn(LOFS_END_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 		if (r & LOFR_MARK_SEEN)
 			obj->flags |= SEEN;
 		if (r & LOFR_DO_SHOW)
-			show(obj, base->buf, cb_data);
+			ctx->show_object(obj, base->buf, ctx->show_data);
 	}
 
 	strbuf_setlen(base, baselen);
@@ -242,19 +238,15 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 	add_pending_object(revs, &tree->object, "");
 }
 
-static void traverse_trees_and_blobs(struct rev_info *revs,
-				     struct strbuf *base,
-				     show_object_fn show_object,
-				     void *show_data,
-				     filter_object_fn filter_fn,
-				     void *filter_data)
+static void traverse_trees_and_blobs(struct traversal_context *ctx,
+				     struct strbuf *base)
 {
 	int i;
 
 	assert(base->len == 0);
 
-	for (i = 0; i < revs->pending.nr; i++) {
-		struct object_array_entry *pending = revs->pending.objects + i;
+	for (i = 0; i < ctx->revs->pending.nr; i++) {
+		struct object_array_entry *pending = ctx->revs->pending.objects + i;
 		struct object *obj = pending->item;
 		const char *name = pending->name;
 		const char *path = pending->path;
@@ -262,62 +254,49 @@ static void traverse_trees_and_blobs(struct rev_info *revs,
 			continue;
 		if (obj->type == OBJ_TAG) {
 			obj->flags |= SEEN;
-			show_object(obj, name, show_data);
+			ctx->show_object(obj, name, ctx->show_data);
 			continue;
 		}
 		if (!path)
 			path = "";
 		if (obj->type == OBJ_TREE) {
-			process_tree(revs, (struct tree *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_tree(ctx, (struct tree *)obj, base, path);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
-			process_blob(revs, (struct blob *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_blob(ctx, (struct blob *)obj, base, path);
 			continue;
 		}
 		die("unknown pending object %s (%s)",
 		    oid_to_hex(&obj->oid), name);
 	}
-	object_array_clear(&revs->pending);
+	object_array_clear(&ctx->revs->pending);
 }
 
-static void do_traverse(struct rev_info *revs,
-			show_commit_fn show_commit,
-			show_object_fn show_object,
-			void *show_data,
-			filter_object_fn filter_fn,
-			void *filter_data)
+static void do_traverse(struct traversal_context *ctx)
 {
 	struct commit *commit;
 	struct strbuf csp; /* callee's scratch pad */
 	strbuf_init(&csp, PATH_MAX);
 
-	while ((commit = get_revision(revs)) != NULL) {
+	while ((commit = get_revision(ctx->revs)) != NULL) {
 		/*
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
 		if (get_commit_tree(commit))
-			add_pending_tree(revs, get_commit_tree(commit));
-		show_commit(commit, show_data);
+			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		ctx->show_commit(commit, ctx->show_data);
 
-		if (revs->tree_blobs_in_commit_order)
+		if (ctx->revs->tree_blobs_in_commit_order)
 			/*
 			 * NEEDSWORK: Adding the tree and then flushing it here
 			 * needs a reallocation for each commit. Can we pass the
 			 * tree directory without allocation churn?
 			 */
-			traverse_trees_and_blobs(revs, &csp,
-						 show_object, show_data,
-						 filter_fn, filter_data);
+			traverse_trees_and_blobs(ctx, &csp);
 	}
-	traverse_trees_and_blobs(revs, &csp,
-				 show_object, show_data,
-				 filter_fn, filter_data);
+	traverse_trees_and_blobs(ctx, &csp);
 	strbuf_release(&csp);
 }
 
@@ -326,7 +305,14 @@ void traverse_commit_list(struct rev_info *revs,
 			  show_object_fn show_object,
 			  void *show_data)
 {
-	do_traverse(revs, show_commit, show_object, show_data, NULL, NULL);
+	struct traversal_context ctx;
+	ctx.revs = revs;
+	ctx.show_commit = show_commit;
+	ctx.show_object = show_object;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+	ctx.filter_data = NULL;
+	do_traverse(&ctx);
 }
 
 void traverse_commit_list_filtered(
@@ -337,14 +323,18 @@ void traverse_commit_list_filtered(
 	void *show_data,
 	struct oidset *omitted)
 {
-	filter_object_fn filter_fn = NULL;
+	struct traversal_context ctx;
 	filter_free_fn filter_free_fn = NULL;
-	void *filter_data = NULL;
-
-	filter_data = list_objects_filter__init(omitted, filter_options,
-						&filter_fn, &filter_free_fn);
-	do_traverse(revs, show_commit, show_object, show_data,
-		    filter_fn, filter_data);
-	if (filter_data && filter_free_fn)
-		filter_free_fn(filter_data);
+
+	ctx.revs = revs;
+	ctx.show_object = show_object;
+	ctx.show_commit = show_commit;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+
+	ctx.filter_data = list_objects_filter__init(omitted, filter_options,
+						    &ctx.filter_fn, &filter_free_fn);
+	do_traverse(&ctx);
+	if (ctx.filter_data && filter_free_fn)
+		filter_free_fn(ctx.filter_data);
 }
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v10 2/8] list-objects: refactor to process_tree_contents
  2018-10-03 19:52 ` [PATCH v10 0/8] filter: support for excluding all trees and blobs Matthew DeVore
  2018-10-03 19:52   ` [PATCH v10 1/8] list-objects: store common func args in struct Matthew DeVore
@ 2018-10-03 19:52   ` Matthew DeVore
  2018-10-03 19:52   ` [PATCH v10 3/8] list-objects: always parse trees gently Matthew DeVore
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-03 19:52 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

This will be used in a follow-up patch to reduce indentation needed when
invoking the logic conditionally. i.e. rather than:

if (foo) {
	while (...) {
		/* this is very indented */
	}
}

we will have:

if (foo)
	process_tree_contents(...);

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 68 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 584518a3f..ccc529e5e 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -94,6 +94,46 @@ static void process_gitlink(struct traversal_context *ctx,
 	/* Nothing to do */
 }
 
+static void process_tree(struct traversal_context *ctx,
+			 struct tree *tree,
+			 struct strbuf *base,
+			 const char *name);
+
+static void process_tree_contents(struct traversal_context *ctx,
+				  struct tree *tree,
+				  struct strbuf *base)
+{
+	struct tree_desc desc;
+	struct name_entry entry;
+	enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
+		all_entries_interesting : entry_not_interesting;
+
+	init_tree_desc(&desc, tree->buffer, tree->size);
+
+	while (tree_entry(&desc, &entry)) {
+		if (match != all_entries_interesting) {
+			match = tree_entry_interesting(&entry, base, 0,
+						       &ctx->revs->diffopt.pathspec);
+			if (match == all_entries_not_interesting)
+				break;
+			if (match == entry_not_interesting)
+				continue;
+		}
+
+		if (S_ISDIR(entry.mode))
+			process_tree(ctx,
+				     lookup_tree(the_repository, entry.oid),
+				     base, entry.path);
+		else if (S_ISGITLINK(entry.mode))
+			process_gitlink(ctx, entry.oid->hash,
+					base, entry.path);
+		else
+			process_blob(ctx,
+				     lookup_blob(the_repository, entry.oid),
+				     base, entry.path);
+	}
+}
+
 static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
 			 struct strbuf *base,
@@ -101,10 +141,6 @@ static void process_tree(struct traversal_context *ctx,
 {
 	struct object *obj = &tree->object;
 	struct rev_info *revs = ctx->revs;
-	struct tree_desc desc;
-	struct name_entry entry;
-	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
-		all_entries_interesting: entry_not_interesting;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 	int gently = revs->ignore_missing_links ||
@@ -144,29 +180,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
-
-	while (tree_entry(&desc, &entry)) {
-		if (match != all_entries_interesting) {
-			match = tree_entry_interesting(&entry, base, 0,
-						       &revs->diffopt.pathspec);
-			if (match == all_entries_not_interesting)
-				break;
-			if (match == entry_not_interesting)
-				continue;
-		}
-
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
-		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(ctx, entry.oid->hash, base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
-	}
+	process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v10 3/8] list-objects: always parse trees gently
  2018-10-03 19:52 ` [PATCH v10 0/8] filter: support for excluding all trees and blobs Matthew DeVore
  2018-10-03 19:52   ` [PATCH v10 1/8] list-objects: store common func args in struct Matthew DeVore
  2018-10-03 19:52   ` [PATCH v10 2/8] list-objects: refactor to process_tree_contents Matthew DeVore
@ 2018-10-03 19:52   ` Matthew DeVore
  2018-10-03 19:52   ` [PATCH v10 4/8] rev-list: handle missing tree objects properly Matthew DeVore
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-03 19:52 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

If parsing fails when revs->ignore_missing_links and
revs->exclude_promisor_objects are both false, we print the OID anyway
in the die("bad tree object...") call, so any message printed by
parse_tree_gently() is superfluous.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index ccc529e5e..f9b51db7a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,8 +143,6 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
-	int gently = revs->ignore_missing_links ||
-		     revs->exclude_promisor_objects;
 
 	if (!revs->tree_objects)
 		return;
@@ -152,7 +150,7 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, gently) < 0) {
+	if (parse_tree_gently(tree, 1) < 0) {
 		if (revs->ignore_missing_links)
 			return;
 
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v10 4/8] rev-list: handle missing tree objects properly
  2018-10-03 19:52 ` [PATCH v10 0/8] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (2 preceding siblings ...)
  2018-10-03 19:52   ` [PATCH v10 3/8] list-objects: always parse trees gently Matthew DeVore
@ 2018-10-03 19:52   ` Matthew DeVore
  2018-10-03 19:52   ` [PATCH v10 5/8] revision: mark non-user-given objects instead Matthew DeVore
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-03 19:52 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

Previously, we assumed only blob objects could be missing. This patch
makes rev-list handle missing trees like missing blobs. The --missing=*
and --exclude-promisor-objects flags now work for trees as they already
do for blobs. This is demonstrated in t6112.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 builtin/rev-list.c                     | 11 ++++---
 list-objects.c                         | 11 +++++--
 revision.h                             | 15 +++++++++
 t/t0410-partial-clone.sh               | 45 ++++++++++++++++++++++++++
 t/t5317-pack-objects-filter-objects.sh | 13 ++++++++
 t/t6112-rev-list-filters-objects.sh    | 17 ++++++++++
 6 files changed, 105 insertions(+), 7 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5b07f3f4a..49d6deed7 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -6,6 +6,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "object.h"
 #include "object-store.h"
 #include "pack.h"
 #include "pack-bitmap.h"
@@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj)
 	 */
 	switch (arg_missing_action) {
 	case MA_ERROR:
-		die("missing blob object '%s'", oid_to_hex(&obj->oid));
+		die("missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	case MA_ALLOW_ANY:
@@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj)
 	case MA_ALLOW_PROMISOR:
 		if (is_promisor_object(&obj->oid))
 			return;
-		die("unexpected missing blob object '%s'",
-		    oid_to_hex(&obj->oid));
+		die("unexpected missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	default:
@@ -235,7 +237,7 @@ static inline void finish_object__ma(struct object *obj)
 static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
+	if (!has_object_file(&obj->oid)) {
 		finish_object__ma(obj);
 		return 1;
 	}
@@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	init_revisions(&revs, prefix);
 	revs.abbrev = DEFAULT_ABBREV;
 	revs.commit_format = CMIT_FMT_UNSPECIFIED;
+	revs.do_not_die_on_missing_tree = 1;
 
 	/*
 	 * Scan the argument list before invoking setup_revisions(), so that we
diff --git a/list-objects.c b/list-objects.c
index f9b51db7a..243192af5 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,6 +143,7 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
+	int failed_parse;
 
 	if (!revs->tree_objects)
 		return;
@@ -150,7 +151,9 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, 1) < 0) {
+
+	failed_parse = parse_tree_gently(tree, 1);
+	if (failed_parse) {
 		if (revs->ignore_missing_links)
 			return;
 
@@ -163,7 +166,8 @@ static void process_tree(struct traversal_context *ctx,
 		    is_promisor_object(&obj->oid))
 			return;
 
-		die("bad tree object %s", oid_to_hex(&obj->oid));
+		if (!revs->do_not_die_on_missing_tree)
+			die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
 	strbuf_addstr(base, name);
@@ -178,7 +182,8 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	process_tree_contents(ctx, tree, base);
+	if (!failed_parse)
+		process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
diff --git a/revision.h b/revision.h
index 007278cc1..5910613cb 100644
--- a/revision.h
+++ b/revision.h
@@ -126,6 +126,21 @@ struct rev_info {
 			line_level_traverse:1,
 			tree_blobs_in_commit_order:1,
 
+			/*
+			 * Blobs are shown without regard for their existence.
+			 * But not so for trees: unless exclude_promisor_objects
+			 * is set and the tree in question is a promisor object;
+			 * OR ignore_missing_links is set, the revision walker
+			 * dies with a "bad tree object HASH" message when
+			 * encountering a missing tree. For callers that can
+			 * handle missing trees and want them to be filterable
+			 * and showable, set this to true. The revision walker
+			 * will filter and show such a missing tree as usual,
+			 * but will not attempt to recurse into this tree
+			 * object.
+			 */
+			do_not_die_on_missing_tree:1,
+
 			/* for internal use only */
 			exclude_promisor_objects:1;
 
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 128130066..5bc5b4445 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -186,6 +186,51 @@ test_expect_success 'rev-list stops traversal at missing and promised commit' '
 	! grep $FOO out
 '
 
+test_expect_success 'missing tree objects with --missing=allow-promisor and --exclude-promisor-objects' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo foo &&
+	test_commit -C repo bar &&
+	test_commit -C repo baz &&
+
+	promise_and_delete $(git -C repo rev-parse bar^{tree}) &&
+	promise_and_delete $(git -C repo rev-parse foo^{tree}) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo rev-list --missing=allow-promisor --objects HEAD >objs 2>rev_list_err &&
+	test_must_be_empty rev_list_err &&
+	# 3 commits, 3 blobs, and 1 tree
+	test_line_count = 7 objs &&
+
+	# Do the same for --exclude-promisor-objects, but with all trees gone.
+	promise_and_delete $(git -C repo rev-parse baz^{tree}) &&
+	git -C repo rev-list --exclude-promisor-objects --objects HEAD >objs 2>rev_list_err &&
+	test_must_be_empty rev_list_err &&
+	# 3 commits, no blobs or trees
+	test_line_count = 3 objs
+'
+
+test_expect_success 'missing non-root tree object and rev-list' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	mkdir repo/dir &&
+	echo foo >repo/dir/foo &&
+	git -C repo add dir/foo &&
+	git -C repo commit -m "commit dir/foo" &&
+
+	promise_and_delete $(git -C repo rev-parse HEAD:dir) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err &&
+	test_must_be_empty rev_list_err &&
+	# 1 commit and 1 tree
+	test_line_count = 2 objs
+'
+
 test_expect_success 'rev-list stops traversal at missing and promised tree' '
 	rm -rf repo &&
 	test_create_repo repo &&
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 6710c8bc8..9839b48c1 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -59,6 +59,19 @@ test_expect_success 'verify normal and blob:none packfiles have same commits/tre
 	test_cmp observed expected
 '
 
+test_expect_success 'get an error for missing tree object' '
+	git init r5 &&
+	echo foo >r5/foo &&
+	git -C r5 add foo &&
+	git -C r5 commit -m "foo" &&
+	del=$(git -C r5 rev-parse HEAD^{tree} | sed "s|..|&/|") &&
+	rm r5/.git/objects/$del &&
+	test_must_fail git -C r5 pack-objects --rev --stdout 2>bad_tree <<-EOF &&
+	HEAD
+	EOF
+	grep -q "bad tree object" bad_tree
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index d4ff0b3be..c662c97db 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -195,6 +195,23 @@ test_expect_success 'verify sparse:oid=oid-ish omits top-level files' '
 	test_cmp observed expected
 '
 
+test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for trees' '
+	TREE=$(git -C r3 rev-parse HEAD:dir1) &&
+
+	rm r3/.git/objects/$(echo $TREE | sed "s|^..|&/|") &&
+
+	git -C r3 rev-list --quiet --missing=print --objects HEAD >missing_objs 2>rev_list_err &&
+	echo "?$TREE" >expected &&
+	test_cmp expected missing_objs &&
+
+	# do not complain when a missing tree cannot be parsed
+	test_must_be_empty rev_list_err &&
+
+	git -C r3 rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err &&
+	! grep $TREE objs &&
+	test_must_be_empty rev_list_err
+'
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v10 5/8] revision: mark non-user-given objects instead
  2018-10-03 19:52 ` [PATCH v10 0/8] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (3 preceding siblings ...)
  2018-10-03 19:52   ` [PATCH v10 4/8] rev-list: handle missing tree objects properly Matthew DeVore
@ 2018-10-03 19:52   ` Matthew DeVore
  2018-10-03 19:52   ` [PATCH v10 6/8] list-objects-filter: use BUG rather than die Matthew DeVore
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-03 19:52 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

Currently, list-objects.c incorrectly treats all root trees of commits
as USER_GIVEN. Also, it would be easier to mark objects that are
non-user-given instead of user-given, since the places in the code
where we access an object through a reference are more obvious than
the places where we access an object that was given by the user.

Resolve these two problems by introducing a flag NOT_USER_GIVEN that
marks blobs and trees that are non-user-given, replacing USER_GIVEN.
(Only blobs and trees are marked because this mark is only used when
filtering objects, and filtering of other types of objects is not
supported yet.)

This fixes a bug in that git rev-list behaved differently from git
pack-objects. pack-objects would *not* filter objects given explicitly
on the command line and rev-list would filter. This was because the two
commands used a different function to add objects to the rev_info
struct. This seems to have been an oversight, and pack-objects has the
correct behavior, so I added a test to make sure that rev-list now
behaves properly.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c                      | 31 +++++++++++++++++------------
 revision.c                          |  1 -
 revision.h                          | 11 ++++++++--
 t/t6112-rev-list-filters-objects.sh | 12 +++++++++++
 4 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 243192af5..7a1a0929d 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -53,7 +53,7 @@ static void process_blob(struct traversal_context *ctx,
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BLOB, obj,
 				   path->buf, &path->buf[pathlen],
 				   ctx->filter_data);
@@ -120,17 +120,19 @@ static void process_tree_contents(struct traversal_context *ctx,
 				continue;
 		}
 
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
+		if (S_ISDIR(entry.mode)) {
+			struct tree *t = lookup_tree(the_repository, entry.oid);
+			t->object.flags |= NOT_USER_GIVEN;
+			process_tree(ctx, t, base, entry.path);
+		}
 		else if (S_ISGITLINK(entry.mode))
 			process_gitlink(ctx, entry.oid->hash,
 					base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
+		else {
+			struct blob *b = lookup_blob(the_repository, entry.oid);
+			b->object.flags |= NOT_USER_GIVEN;
+			process_blob(ctx, b, base, entry.path);
+		}
 	}
 }
 
@@ -171,7 +173,7 @@ static void process_tree(struct traversal_context *ctx,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -185,7 +187,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (!failed_parse)
 		process_tree_contents(ctx, tree, base);
 
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -301,8 +303,11 @@ static void do_traverse(struct traversal_context *ctx)
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
-		if (get_commit_tree(commit))
-			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		if (get_commit_tree(commit)) {
+			struct tree *tree = get_commit_tree(commit);
+			tree->object.flags |= NOT_USER_GIVEN;
+			add_pending_tree(ctx->revs, tree);
+		}
 		ctx->show_commit(commit, ctx->show_data);
 
 		if (ctx->revs->tree_blobs_in_commit_order)
diff --git a/revision.c b/revision.c
index de4dce600..72d48a17f 100644
--- a/revision.c
+++ b/revision.c
@@ -175,7 +175,6 @@ static void add_pending_object_with_path(struct rev_info *revs,
 		strbuf_release(&buf);
 		return; /* do not add the commit itself */
 	}
-	obj->flags |= USER_GIVEN;
 	add_object_array_with_path(obj, name, &revs->pending, mode, path);
 }
 
diff --git a/revision.h b/revision.h
index 5910613cb..83e164039 100644
--- a/revision.h
+++ b/revision.h
@@ -21,9 +21,16 @@
 #define SYMMETRIC_LEFT	(1u<<8)
 #define PATCHSAME	(1u<<9)
 #define BOTTOM		(1u<<10)
-#define USER_GIVEN	(1u<<25) /* given directly by the user */
+/*
+ * Indicates object was reached by traversal. i.e. not given by user on
+ * command-line or stdin.
+ * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
+ * filtering trees and blobs, but it may be useful to support filtering commits
+ * in the future.
+ */
+#define NOT_USER_GIVEN	(1u<<25)
 #define TRACK_LINEAR	(1u<<26)
-#define ALL_REV_FLAGS	(((1u<<11)-1) | USER_GIVEN | TRACK_LINEAR)
+#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)
 
 #define DECORATE_SHORT_REFS	1
 #define DECORATE_FULL_REFS	2
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index c662c97db..11186209b 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -30,6 +30,18 @@ test_expect_success 'verify blob:none omits all 5 blobs' '
 	test_cmp observed expected
 '
 
+test_expect_success 'specify blob explicitly prevents filtering' '
+	file_3=$(git -C r1 ls-files -s file.3 |
+		 awk -f print_2.awk) &&
+
+	file_4=$(git -C r1 ls-files -s file.4 |
+		 awk -f print_2.awk) &&
+
+	git -C r1 rev-list --objects --filter=blob:none HEAD $file_3 >observed &&
+	grep -q "$file_3" observed &&
+	test_must_fail grep -q "$file_4" observed
+'
+
 test_expect_success 'verify emitted+omitted == all' '
 	git -C r1 rev-list HEAD --objects \
 		| awk -f print_1.awk \
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v10 6/8] list-objects-filter: use BUG rather than die
  2018-10-03 19:52 ` [PATCH v10 0/8] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (4 preceding siblings ...)
  2018-10-03 19:52   ` [PATCH v10 5/8] revision: mark non-user-given objects instead Matthew DeVore
@ 2018-10-03 19:52   ` Matthew DeVore
  2018-10-03 19:52   ` [PATCH v10 7/8] list-objects-filter-options: do not over-strbuf_init Matthew DeVore
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-03 19:52 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

In some cases in this file, BUG makes more sense than die. In such
cases, a we get there from a coding error rather than a user error.

'return' has been removed following some instances of BUG since BUG does
not return.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects-filter.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..5f8b1a002 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -44,8 +44,7 @@ static enum list_objects_filter_result filter_blobs_none(
 
 	switch (filter_situation) {
 	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
@@ -102,8 +101,7 @@ static enum list_objects_filter_result filter_blobs_limit(
 
 	switch (filter_situation) {
 	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
@@ -208,8 +206,7 @@ static enum list_objects_filter_result filter_sparse(
 
 	switch (filter_situation) {
 	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
@@ -389,7 +386,7 @@ void *list_objects_filter__init(
 	assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);
 
 	if (filter_options->choice >= LOFC__COUNT)
-		die("invalid list-objects filter choice: %d",
+		BUG("invalid list-objects filter choice: %d",
 		    filter_options->choice);
 
 	init_fn = s_filters[filter_options->choice];
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v10 7/8] list-objects-filter-options: do not over-strbuf_init
  2018-10-03 19:52 ` [PATCH v10 0/8] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (5 preceding siblings ...)
  2018-10-03 19:52   ` [PATCH v10 6/8] list-objects-filter: use BUG rather than die Matthew DeVore
@ 2018-10-03 19:52   ` Matthew DeVore
  2018-10-03 19:52   ` [PATCH v10 8/8] list-objects-filter: implement filter tree:0 Matthew DeVore
  2018-10-03 23:08   ` [PATCH v10 0/8] filter: support for excluding all trees and blobs Matthew DeVore
  8 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-03 19:52 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

The function gently_parse_list_objects_filter is either called with
errbuf=STRBUF_INIT or errbuf=NULL, but that function calls strbuf_init
when errbuf is not NULL. strbuf_init is only necessary if errbuf
contains garbage, and risks a memory leak if errbuf already has a
non-STRBUF_INIT state. It should be the caller's responsibility to make
sure errbuf is not garbage, since garbage content is easily avoidable
with STRBUF_INIT.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects-filter-options.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..d259bdb2c 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -30,7 +30,6 @@ static int gently_parse_list_objects_filter(
 
 	if (filter_options->choice) {
 		if (errbuf) {
-			strbuf_init(errbuf, 0);
 			strbuf_addstr(
 				errbuf,
 				_("multiple filter-specs cannot be combined"));
@@ -71,10 +70,9 @@ static int gently_parse_list_objects_filter(
 		return 0;
 	}
 
-	if (errbuf) {
-		strbuf_init(errbuf, 0);
+	if (errbuf)
 		strbuf_addf(errbuf, "invalid filter-spec '%s'", arg);
-	}
+
 	memset(filter_options, 0, sizeof(*filter_options));
 	return 1;
 }
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v10 8/8] list-objects-filter: implement filter tree:0
  2018-10-03 19:52 ` [PATCH v10 0/8] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (6 preceding siblings ...)
  2018-10-03 19:52   ` [PATCH v10 7/8] list-objects-filter-options: do not over-strbuf_init Matthew DeVore
@ 2018-10-03 19:52   ` Matthew DeVore
  2018-10-03 23:08   ` [PATCH v10 0/8] filter: support for excluding all trees and blobs Matthew DeVore
  8 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-03 19:52 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

Teach list-objects the "tree:0" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:0 - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also considered only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
would filter out all but the root tree and blobs. In order to avoid
confusion between 0 and capital O, the documentation was worded in a
somewhat round-about way that also hints at this future improvement to
the feature.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 Documentation/rev-list-options.txt     |  5 +++
 list-objects-filter-options.c          | 13 +++++++
 list-objects-filter-options.h          |  1 +
 list-objects-filter.c                  | 49 ++++++++++++++++++++++++++
 t/t5317-pack-objects-filter-objects.sh | 28 +++++++++++++++
 t/t5616-partial-clone.sh               | 42 ++++++++++++++++++++++
 t/t6112-rev-list-filters-objects.sh    | 14 ++++++++
 7 files changed, 152 insertions(+)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 7b273635d..5f1672913 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -731,6 +731,11 @@ the requested refs.
 +
 The form '--filter=sparse:path=<path>' similarly uses a sparse-checkout
 specification contained in <path>.
++
+The form '--filter=tree:<depth>' omits all blobs and trees whose depth
+from the root tree is >= <depth> (minimum depth if an object is located
+at multiple depths in the commits traversed). Currently, only <depth>=0
+is supported, which omits all blobs and trees.
 
 --no-filter::
 	Turn off any previous `--filter=` argument.
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index d259bdb2c..e8da2e858 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -49,6 +49,19 @@ static int gently_parse_list_objects_filter(
 			return 0;
 		}
 
+	} else if (skip_prefix(arg, "tree:", &v0)) {
+		unsigned long depth;
+		if (!git_parse_ulong(v0, &depth) || depth != 0) {
+			if (errbuf) {
+				strbuf_addstr(
+					errbuf,
+					_("only 'tree:0' is supported"));
+			}
+			return 1;
+		}
+		filter_options->choice = LOFC_TREE_NONE;
+		return 0;
+
 	} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
 		struct object_context oc;
 		struct object_id sparse_oid;
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 0000a61f8..af64e5c66 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -10,6 +10,7 @@ enum list_objects_filter_choice {
 	LOFC_DISABLED = 0,
 	LOFC_BLOB_NONE,
 	LOFC_BLOB_LIMIT,
+	LOFC_TREE_NONE,
 	LOFC_SPARSE_OID,
 	LOFC_SPARSE_PATH,
 	LOFC__COUNT /* must be last */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 5f8b1a002..09b2b05d5 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -79,6 +79,54 @@ static void *filter_blobs_none__init(
 	return d;
 }
 
+/*
+ * A filter for list-objects to omit ALL trees and blobs from the traversal.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_trees_none_data {
+	struct oidset *omits;
+};
+
+static enum list_objects_filter_result filter_trees_none(
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data_)
+{
+	struct filter_trees_none_data *filter_data = filter_data_;
+
+	switch (filter_situation) {
+	default:
+		BUG("unknown filter_situation: %d", filter_situation);
+
+	case LOFS_BEGIN_TREE:
+	case LOFS_BLOB:
+		if (filter_data->omits)
+			oidset_insert(filter_data->omits, &obj->oid);
+		return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
+
+	case LOFS_END_TREE:
+		assert(obj->type == OBJ_TREE);
+		return LOFR_ZERO;
+
+	}
+}
+
+static void* filter_trees_none__init(
+	struct oidset *omitted,
+	struct list_objects_filter_options *filter_options,
+	filter_object_fn *filter_fn,
+	filter_free_fn *filter_free_fn)
+{
+	struct filter_trees_none_data *d = xcalloc(1, sizeof(*d));
+	d->omits = omitted;
+
+	*filter_fn = filter_trees_none;
+	*filter_free_fn = free;
+	return d;
+}
+
 /*
  * A filter for list-objects to omit large blobs.
  * And to OPTIONALLY collect a list of the omitted OIDs.
@@ -371,6 +419,7 @@ static filter_init_fn s_filters[] = {
 	NULL,
 	filter_blobs_none__init,
 	filter_blobs_limit__init,
+	filter_trees_none__init,
 	filter_sparse_oid__init,
 	filter_sparse_path__init,
 };
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 9839b48c1..510d3537f 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -72,6 +72,34 @@ test_expect_success 'get an error for missing tree object' '
 	grep -q "bad tree object" bad_tree
 '
 
+test_expect_success 'setup for tests of tree:0' '
+	mkdir r1/subtree &&
+	echo "This is a file in a subtree" >r1/subtree/file &&
+	git -C r1 add subtree/file &&
+	git -C r1 commit -m subtree
+'
+
+test_expect_success 'verify tree:0 packfile has no blobs or trees' '
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	! grep -E "tree|blob" objs
+'
+
+test_expect_success 'grab tree directly when using tree:0' '
+	# We should get the tree specified directly but not its blobs or subtrees.
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD:
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	awk "/tree|blob/{print \$1}" objs >trees_and_blobs &&
+	git -C r1 rev-parse HEAD: >expected &&
+	test_cmp expected trees_and_blobs
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index bbbe7537d..53fbf7db8 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -154,6 +154,48 @@ test_expect_success 'partial clone with transfer.fsckobjects=1 uses index-pack -
 	grep "git index-pack.*--fsck-objects" trace
 '
 
+test_expect_success 'use fsck before and after manually fetching a missing subtree' '
+	# push new commit so server has a subtree
+	mkdir src/dir &&
+	echo "in dir" >src/dir/file.txt &&
+	git -C src add dir/file.txt &&
+	git -C src commit -m "file in dir" &&
+	git -C src push -u srv master &&
+	SUBTREE=$(git -C src rev-parse HEAD:dir) &&
+
+	rm -rf dst &&
+	git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
+	git -C dst fsck &&
+
+	# Make sure we only have commits, and all trees and blobs are missing.
+	git -C dst rev-list --missing=allow-any --objects master \
+		>fetched_objects &&
+	awk -f print_1.awk fetched_objects |
+	xargs -n1 git -C dst cat-file -t >fetched_types &&
+
+	sort -u fetched_types >unique_types.observed &&
+	echo commit >unique_types.expected &&
+	test_cmp unique_types.expected unique_types.observed &&
+
+	# Auto-fetch a tree with cat-file.
+	git -C dst cat-file -p $SUBTREE >tree_contents &&
+	grep file.txt tree_contents &&
+
+	# fsck still works after an auto-fetch of a tree.
+	git -C dst fsck &&
+
+	# Auto-fetch all remaining trees and blobs with --missing=error
+	git -C dst rev-list --missing=error --objects master >fetched_objects &&
+	test_line_count = 70 fetched_objects &&
+
+	awk -f print_1.awk fetched_objects |
+	xargs -n1 git -C dst cat-file -t >fetched_types &&
+
+	sort -u fetched_types >unique_types.observed &&
+	printf "blob\ncommit\ntree\n" >unique_types.expected &&
+	test_cmp unique_types.expected unique_types.observed
+'
+
 test_expect_success 'partial clone fetches blobs pointed to by refs even if normally filtered out' '
 	rm -rf src dst &&
 	git init src &&
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 11186209b..5a61614b1 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -224,6 +224,20 @@ test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for tre
 	test_must_be_empty rev_list_err
 '
 
+# Test tree:0 filter.
+
+test_expect_success 'verify tree:0 includes trees in "filtered" output' '
+	git -C r3 rev-list --quiet --objects --filter-print-omitted \
+		--filter=tree:0 HEAD |
+	awk -f print_1.awk |
+	sed s/~// |
+	xargs -n1 git -C r3 cat-file -t |
+	sort -u >filtered_types &&
+
+	printf "blob\ntree\n" >expected &&
+	test_cmp expected filtered_types
+'
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v10 0/8] filter: support for excluding all trees and blobs
  2018-10-03 19:52 ` [PATCH v10 0/8] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (7 preceding siblings ...)
  2018-10-03 19:52   ` [PATCH v10 8/8] list-objects-filter: implement filter tree:0 Matthew DeVore
@ 2018-10-03 23:08   ` Matthew DeVore
  8 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-03 23:08 UTC (permalink / raw)
  To: git
  Cc: Stefan Beller, git, jeffhost, Jeff King, Stefan Beller,
	Jonathan Tan, Junio C Hamano, pclouds

On Wed, Oct 3, 2018 at 12:52 PM Matthew DeVore <matvore@google.com> wrote:
>
> This is a minor change to the previous rollup. It moves positional
> arguments to the end of git-rev-list invocations. Here is an interdiff
> from v9:
...
There is another problem with this patchset related to dropped exit
codes and pipelines. In t6112, we run "git cat-file -t" on an object
with was rm'd without being promised. It was printing an error and
going undetected because it was upstream in a pipeline. The file was
removed in the previous test.

So I fixed the previous test to clone the repository before
manipulating it, and I fixed the latter test to not mask Git exit
codes :) (This is a really insidious pattern and I should have taken
it more seriously.) Below is an interdiff. The two tests are added in
different commits, so each commit had to be fixed up.

I'll send a re-roll in two days or so if there are no more comments.

diff --git a/t/t6112-rev-list-filters-objects.sh
b/t/t6112-rev-list-filters-objects.sh
index 5a61614b1..c8e3d87c4 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -210,16 +210,21 @@ test_expect_success 'verify sparse:oid=oid-ish
omits top-level files' '
 test_expect_success 'rev-list W/ --missing=print and
--missing=allow-any for trees' '
         TREE=$(git -C r3 rev-parse HEAD:dir1) &&

-        rm r3/.git/objects/$(echo $TREE | sed "s|^..|&/|") &&
+        # Create a spare repo because we will be deleting objects
from this one.
+        git clone r3 r3.b &&

-        git -C r3 rev-list --quiet --missing=print --objects HEAD
>missing_objs 2>rev_list_err &&
+        rm r3.b/.git/objects/$(echo $TREE | sed "s|^..|&/|") &&
+
+        git -C r3.b rev-list --quiet --missing=print --objects HEAD \
+                >missing_objs 2>rev_list_err &&
         echo "?$TREE" >expected &&
         test_cmp expected missing_objs &&

         # do not complain when a missing tree cannot be parsed
         test_must_be_empty rev_list_err &&

-        git -C r3 rev-list --missing=allow-any --objects HEAD >objs
2>rev_list_err &&
+        git -C r3.b rev-list --missing=allow-any --objects HEAD \
+                >objs 2>rev_list_err &&
         ! grep $TREE objs &&
         test_must_be_empty rev_list_err
 '
@@ -228,12 +233,13 @@ test_expect_success 'rev-list W/ --missing=print
and --missing=allow-any for tre

 test_expect_success 'verify tree:0 includes trees in "filtered" output' '
         git -C r3 rev-list --quiet --objects --filter-print-omitted \
-                --filter=tree:0 HEAD |
-        awk -f print_1.awk |
+                --filter=tree:0 HEAD >revs &&
+
+        awk -f print_1.awk revs |
         sed s/~// |
-        xargs -n1 git -C r3 cat-file -t |
-        sort -u >filtered_types &&
+        xargs -n1 git -C r3 cat-file -t >unsorted_filtered_types &&

+        sort -u unsorted_filtered_types >filtered_types &&
         printf "blob\ntree\n" >expected &&
         test_cmp expected filtered_types
 '

^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v11 0/8] filter: support for excluding all trees and blobs
  2018-08-09 22:44 [RFC PATCH 0/5] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (14 preceding siblings ...)
  2018-10-03 19:52 ` [PATCH v10 0/8] filter: support for excluding all trees and blobs Matthew DeVore
@ 2018-10-05 21:31 ` " Matthew DeVore
  2018-10-05 21:31   ` [PATCH v11 1/8] list-objects: store common func args in struct Matthew DeVore
                     ` (7 more replies)
  2018-10-12 20:01 ` [PATCH v12 0/8] filter: support for excluding all trees and blobs Matthew DeVore
  16 siblings, 8 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-05 21:31 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

Here is a clean re-rollup fixing the issue I found earlier. It fixes a problem
stemming from a discarded exit code, which masked a crash in Git. The crash was
not a bug because an earlier test deleted a loose object file. The fix was to
make that test manipulate a clone rather than the original repo.

An interdiff from v10 is below.

Thank you,
Matt

diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 5a61614b1..c8e3d87c4 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -210,16 +210,21 @@ test_expect_success 'verify sparse:oid=oid-ish omits top-level files' '
 test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for trees' '
         TREE=$(git -C r3 rev-parse HEAD:dir1) &&
 
-        rm r3/.git/objects/$(echo $TREE | sed "s|^..|&/|") &&
+        # Create a spare repo because we will be deleting objects from this one.
+        git clone r3 r3.b &&
 
-        git -C r3 rev-list --quiet --missing=print --objects HEAD >missing_objs 2>rev_list_err &&
+        rm r3.b/.git/objects/$(echo $TREE | sed "s|^..|&/|") &&
+
+        git -C r3.b rev-list --quiet --missing=print --objects HEAD \
+                >missing_objs 2>rev_list_err &&
         echo "?$TREE" >expected &&
         test_cmp expected missing_objs &&
 
         # do not complain when a missing tree cannot be parsed
         test_must_be_empty rev_list_err &&
 
-        git -C r3 rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err &&
+        git -C r3.b rev-list --missing=allow-any --objects HEAD \
+                >objs 2>rev_list_err &&
         ! grep $TREE objs &&
         test_must_be_empty rev_list_err
 '
@@ -228,12 +233,13 @@ test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for tre
 
 test_expect_success 'verify tree:0 includes trees in "filtered" output' '
         git -C r3 rev-list --quiet --objects --filter-print-omitted \
-                --filter=tree:0 HEAD |
-        awk -f print_1.awk |
+                --filter=tree:0 HEAD >revs &&
+
+        awk -f print_1.awk revs |
         sed s/~// |
-        xargs -n1 git -C r3 cat-file -t |
-        sort -u >filtered_types &&
+        xargs -n1 git -C r3 cat-file -t >unsorted_filtered_types &&
 
+        sort -u unsorted_filtered_types >filtered_types &&
         printf "blob\ntree\n" >expected &&
         test_cmp expected filtered_types
 '

Matthew DeVore (8):
  list-objects: store common func args in struct
  list-objects: refactor to process_tree_contents
  list-objects: always parse trees gently
  rev-list: handle missing tree objects properly
  revision: mark non-user-given objects instead
  list-objects-filter: use BUG rather than die
  list-objects-filter-options: do not over-strbuf_init
  list-objects-filter: implement filter tree:0

 Documentation/rev-list-options.txt     |   5 +
 builtin/rev-list.c                     |  11 +-
 list-objects-filter-options.c          |  19 +-
 list-objects-filter-options.h          |   1 +
 list-objects-filter.c                  |  60 ++++++-
 list-objects.c                         | 232 +++++++++++++------------
 revision.c                             |   1 -
 revision.h                             |  26 ++-
 t/t0410-partial-clone.sh               |  45 +++++
 t/t5317-pack-objects-filter-objects.sh |  41 +++++
 t/t5616-partial-clone.sh               |  42 +++++
 t/t6112-rev-list-filters-objects.sh    |  49 ++++++
 12 files changed, 404 insertions(+), 128 deletions(-)

-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v11 1/8] list-objects: store common func args in struct
  2018-10-05 21:31 ` [PATCH v11 " Matthew DeVore
@ 2018-10-05 21:31   ` Matthew DeVore
  2018-10-05 21:31   ` [PATCH v11 2/8] list-objects: refactor to process_tree_contents Matthew DeVore
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-05 21:31 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

This will make utility functions easier to create, as done by the next
patch.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 158 +++++++++++++++++++++++--------------------------
 1 file changed, 74 insertions(+), 84 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index c99c47ac1..584518a3f 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -12,20 +12,25 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static void process_blob(struct rev_info *revs,
+struct traversal_context {
+	struct rev_info *revs;
+	show_object_fn show_object;
+	show_commit_fn show_commit;
+	void *show_data;
+	filter_object_fn filter_fn;
+	void *filter_data;
+};
+
+static void process_blob(struct traversal_context *ctx,
 			 struct blob *blob,
-			 show_object_fn show,
 			 struct strbuf *path,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &blob->object;
 	size_t pathlen;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
-	if (!revs->blob_objects)
+	if (!ctx->revs->blob_objects)
 		return;
 	if (!obj)
 		die("bad blob object");
@@ -41,21 +46,21 @@ static void process_blob(struct rev_info *revs,
 	 * may cause the actual filter to report an incomplete list
 	 * of missing objects.
 	 */
-	if (revs->exclude_promisor_objects &&
+	if (ctx->revs->exclude_promisor_objects &&
 	    !has_object_file(&obj->oid) &&
 	    is_promisor_object(&obj->oid))
 		return;
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BLOB, obj,
-			      path->buf, &path->buf[pathlen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BLOB, obj,
+				   path->buf, &path->buf[pathlen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, path->buf, cb_data);
+		ctx->show_object(obj, path->buf, ctx->show_data);
 	strbuf_setlen(path, pathlen);
 }
 
@@ -81,26 +86,21 @@ static void process_blob(struct rev_info *revs,
  * the link, and how to do it. Whether it necessarily makes
  * any sense what-so-ever to ever do that is another issue.
  */
-static void process_gitlink(struct rev_info *revs,
+static void process_gitlink(struct traversal_context *ctx,
 			    const unsigned char *sha1,
-			    show_object_fn show,
 			    struct strbuf *path,
-			    const char *name,
-			    void *cb_data)
+			    const char *name)
 {
 	/* Nothing to do */
 }
 
-static void process_tree(struct rev_info *revs,
+static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
-			 show_object_fn show,
 			 struct strbuf *base,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &tree->object;
+	struct rev_info *revs = ctx->revs;
 	struct tree_desc desc;
 	struct name_entry entry;
 	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
@@ -133,14 +133,14 @@ static void process_tree(struct rev_info *revs,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BEGIN_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, base->buf, cb_data);
+		ctx->show_object(obj, base->buf, ctx->show_data);
 	if (base->len)
 		strbuf_addch(base, '/');
 
@@ -157,29 +157,25 @@ static void process_tree(struct rev_info *revs,
 		}
 
 		if (S_ISDIR(entry.mode))
-			process_tree(revs,
+			process_tree(ctx,
 				     lookup_tree(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(revs, entry.oid->hash,
-					show, base, entry.path,
-					cb_data);
+			process_gitlink(ctx, entry.oid->hash, base, entry.path);
 		else
-			process_blob(revs,
+			process_blob(ctx,
 				     lookup_blob(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 	}
 
-	if (!(obj->flags & USER_GIVEN) && filter_fn) {
-		r = filter_fn(LOFS_END_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+		r = ctx->filter_fn(LOFS_END_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 		if (r & LOFR_MARK_SEEN)
 			obj->flags |= SEEN;
 		if (r & LOFR_DO_SHOW)
-			show(obj, base->buf, cb_data);
+			ctx->show_object(obj, base->buf, ctx->show_data);
 	}
 
 	strbuf_setlen(base, baselen);
@@ -242,19 +238,15 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 	add_pending_object(revs, &tree->object, "");
 }
 
-static void traverse_trees_and_blobs(struct rev_info *revs,
-				     struct strbuf *base,
-				     show_object_fn show_object,
-				     void *show_data,
-				     filter_object_fn filter_fn,
-				     void *filter_data)
+static void traverse_trees_and_blobs(struct traversal_context *ctx,
+				     struct strbuf *base)
 {
 	int i;
 
 	assert(base->len == 0);
 
-	for (i = 0; i < revs->pending.nr; i++) {
-		struct object_array_entry *pending = revs->pending.objects + i;
+	for (i = 0; i < ctx->revs->pending.nr; i++) {
+		struct object_array_entry *pending = ctx->revs->pending.objects + i;
 		struct object *obj = pending->item;
 		const char *name = pending->name;
 		const char *path = pending->path;
@@ -262,62 +254,49 @@ static void traverse_trees_and_blobs(struct rev_info *revs,
 			continue;
 		if (obj->type == OBJ_TAG) {
 			obj->flags |= SEEN;
-			show_object(obj, name, show_data);
+			ctx->show_object(obj, name, ctx->show_data);
 			continue;
 		}
 		if (!path)
 			path = "";
 		if (obj->type == OBJ_TREE) {
-			process_tree(revs, (struct tree *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_tree(ctx, (struct tree *)obj, base, path);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
-			process_blob(revs, (struct blob *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_blob(ctx, (struct blob *)obj, base, path);
 			continue;
 		}
 		die("unknown pending object %s (%s)",
 		    oid_to_hex(&obj->oid), name);
 	}
-	object_array_clear(&revs->pending);
+	object_array_clear(&ctx->revs->pending);
 }
 
-static void do_traverse(struct rev_info *revs,
-			show_commit_fn show_commit,
-			show_object_fn show_object,
-			void *show_data,
-			filter_object_fn filter_fn,
-			void *filter_data)
+static void do_traverse(struct traversal_context *ctx)
 {
 	struct commit *commit;
 	struct strbuf csp; /* callee's scratch pad */
 	strbuf_init(&csp, PATH_MAX);
 
-	while ((commit = get_revision(revs)) != NULL) {
+	while ((commit = get_revision(ctx->revs)) != NULL) {
 		/*
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
 		if (get_commit_tree(commit))
-			add_pending_tree(revs, get_commit_tree(commit));
-		show_commit(commit, show_data);
+			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		ctx->show_commit(commit, ctx->show_data);
 
-		if (revs->tree_blobs_in_commit_order)
+		if (ctx->revs->tree_blobs_in_commit_order)
 			/*
 			 * NEEDSWORK: Adding the tree and then flushing it here
 			 * needs a reallocation for each commit. Can we pass the
 			 * tree directory without allocation churn?
 			 */
-			traverse_trees_and_blobs(revs, &csp,
-						 show_object, show_data,
-						 filter_fn, filter_data);
+			traverse_trees_and_blobs(ctx, &csp);
 	}
-	traverse_trees_and_blobs(revs, &csp,
-				 show_object, show_data,
-				 filter_fn, filter_data);
+	traverse_trees_and_blobs(ctx, &csp);
 	strbuf_release(&csp);
 }
 
@@ -326,7 +305,14 @@ void traverse_commit_list(struct rev_info *revs,
 			  show_object_fn show_object,
 			  void *show_data)
 {
-	do_traverse(revs, show_commit, show_object, show_data, NULL, NULL);
+	struct traversal_context ctx;
+	ctx.revs = revs;
+	ctx.show_commit = show_commit;
+	ctx.show_object = show_object;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+	ctx.filter_data = NULL;
+	do_traverse(&ctx);
 }
 
 void traverse_commit_list_filtered(
@@ -337,14 +323,18 @@ void traverse_commit_list_filtered(
 	void *show_data,
 	struct oidset *omitted)
 {
-	filter_object_fn filter_fn = NULL;
+	struct traversal_context ctx;
 	filter_free_fn filter_free_fn = NULL;
-	void *filter_data = NULL;
-
-	filter_data = list_objects_filter__init(omitted, filter_options,
-						&filter_fn, &filter_free_fn);
-	do_traverse(revs, show_commit, show_object, show_data,
-		    filter_fn, filter_data);
-	if (filter_data && filter_free_fn)
-		filter_free_fn(filter_data);
+
+	ctx.revs = revs;
+	ctx.show_object = show_object;
+	ctx.show_commit = show_commit;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+
+	ctx.filter_data = list_objects_filter__init(omitted, filter_options,
+						    &ctx.filter_fn, &filter_free_fn);
+	do_traverse(&ctx);
+	if (ctx.filter_data && filter_free_fn)
+		filter_free_fn(ctx.filter_data);
 }
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v11 2/8] list-objects: refactor to process_tree_contents
  2018-10-05 21:31 ` [PATCH v11 " Matthew DeVore
  2018-10-05 21:31   ` [PATCH v11 1/8] list-objects: store common func args in struct Matthew DeVore
@ 2018-10-05 21:31   ` Matthew DeVore
  2018-10-05 21:31   ` [PATCH v11 3/8] list-objects: always parse trees gently Matthew DeVore
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-05 21:31 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

This will be used in a follow-up patch to reduce indentation needed when
invoking the logic conditionally. i.e. rather than:

if (foo) {
	while (...) {
		/* this is very indented */
	}
}

we will have:

if (foo)
	process_tree_contents(...);

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 68 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 584518a3f..ccc529e5e 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -94,6 +94,46 @@ static void process_gitlink(struct traversal_context *ctx,
 	/* Nothing to do */
 }
 
+static void process_tree(struct traversal_context *ctx,
+			 struct tree *tree,
+			 struct strbuf *base,
+			 const char *name);
+
+static void process_tree_contents(struct traversal_context *ctx,
+				  struct tree *tree,
+				  struct strbuf *base)
+{
+	struct tree_desc desc;
+	struct name_entry entry;
+	enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
+		all_entries_interesting : entry_not_interesting;
+
+	init_tree_desc(&desc, tree->buffer, tree->size);
+
+	while (tree_entry(&desc, &entry)) {
+		if (match != all_entries_interesting) {
+			match = tree_entry_interesting(&entry, base, 0,
+						       &ctx->revs->diffopt.pathspec);
+			if (match == all_entries_not_interesting)
+				break;
+			if (match == entry_not_interesting)
+				continue;
+		}
+
+		if (S_ISDIR(entry.mode))
+			process_tree(ctx,
+				     lookup_tree(the_repository, entry.oid),
+				     base, entry.path);
+		else if (S_ISGITLINK(entry.mode))
+			process_gitlink(ctx, entry.oid->hash,
+					base, entry.path);
+		else
+			process_blob(ctx,
+				     lookup_blob(the_repository, entry.oid),
+				     base, entry.path);
+	}
+}
+
 static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
 			 struct strbuf *base,
@@ -101,10 +141,6 @@ static void process_tree(struct traversal_context *ctx,
 {
 	struct object *obj = &tree->object;
 	struct rev_info *revs = ctx->revs;
-	struct tree_desc desc;
-	struct name_entry entry;
-	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
-		all_entries_interesting: entry_not_interesting;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 	int gently = revs->ignore_missing_links ||
@@ -144,29 +180,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
-
-	while (tree_entry(&desc, &entry)) {
-		if (match != all_entries_interesting) {
-			match = tree_entry_interesting(&entry, base, 0,
-						       &revs->diffopt.pathspec);
-			if (match == all_entries_not_interesting)
-				break;
-			if (match == entry_not_interesting)
-				continue;
-		}
-
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
-		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(ctx, entry.oid->hash, base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
-	}
+	process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v11 3/8] list-objects: always parse trees gently
  2018-10-05 21:31 ` [PATCH v11 " Matthew DeVore
  2018-10-05 21:31   ` [PATCH v11 1/8] list-objects: store common func args in struct Matthew DeVore
  2018-10-05 21:31   ` [PATCH v11 2/8] list-objects: refactor to process_tree_contents Matthew DeVore
@ 2018-10-05 21:31   ` Matthew DeVore
  2018-10-05 21:31   ` [PATCH v11 4/8] rev-list: handle missing tree objects properly Matthew DeVore
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-05 21:31 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

If parsing fails when revs->ignore_missing_links and
revs->exclude_promisor_objects are both false, we print the OID anyway
in the die("bad tree object...") call, so any message printed by
parse_tree_gently() is superfluous.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index ccc529e5e..f9b51db7a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,8 +143,6 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
-	int gently = revs->ignore_missing_links ||
-		     revs->exclude_promisor_objects;
 
 	if (!revs->tree_objects)
 		return;
@@ -152,7 +150,7 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, gently) < 0) {
+	if (parse_tree_gently(tree, 1) < 0) {
 		if (revs->ignore_missing_links)
 			return;
 
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v11 4/8] rev-list: handle missing tree objects properly
  2018-10-05 21:31 ` [PATCH v11 " Matthew DeVore
                     ` (2 preceding siblings ...)
  2018-10-05 21:31   ` [PATCH v11 3/8] list-objects: always parse trees gently Matthew DeVore
@ 2018-10-05 21:31   ` Matthew DeVore
  2018-10-05 21:31   ` [PATCH v11 5/8] revision: mark non-user-given objects instead Matthew DeVore
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-05 21:31 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

Previously, we assumed only blob objects could be missing. This patch
makes rev-list handle missing trees like missing blobs. The --missing=*
and --exclude-promisor-objects flags now work for trees as they already
do for blobs. This is demonstrated in t6112.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 builtin/rev-list.c                     | 11 ++++---
 list-objects.c                         | 11 +++++--
 revision.h                             | 15 +++++++++
 t/t0410-partial-clone.sh               | 45 ++++++++++++++++++++++++++
 t/t5317-pack-objects-filter-objects.sh | 13 ++++++++
 t/t6112-rev-list-filters-objects.sh    | 22 +++++++++++++
 6 files changed, 110 insertions(+), 7 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5b07f3f4a..49d6deed7 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -6,6 +6,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "object.h"
 #include "object-store.h"
 #include "pack.h"
 #include "pack-bitmap.h"
@@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj)
 	 */
 	switch (arg_missing_action) {
 	case MA_ERROR:
-		die("missing blob object '%s'", oid_to_hex(&obj->oid));
+		die("missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	case MA_ALLOW_ANY:
@@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj)
 	case MA_ALLOW_PROMISOR:
 		if (is_promisor_object(&obj->oid))
 			return;
-		die("unexpected missing blob object '%s'",
-		    oid_to_hex(&obj->oid));
+		die("unexpected missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	default:
@@ -235,7 +237,7 @@ static inline void finish_object__ma(struct object *obj)
 static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
+	if (!has_object_file(&obj->oid)) {
 		finish_object__ma(obj);
 		return 1;
 	}
@@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	init_revisions(&revs, prefix);
 	revs.abbrev = DEFAULT_ABBREV;
 	revs.commit_format = CMIT_FMT_UNSPECIFIED;
+	revs.do_not_die_on_missing_tree = 1;
 
 	/*
 	 * Scan the argument list before invoking setup_revisions(), so that we
diff --git a/list-objects.c b/list-objects.c
index f9b51db7a..243192af5 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,6 +143,7 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
+	int failed_parse;
 
 	if (!revs->tree_objects)
 		return;
@@ -150,7 +151,9 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, 1) < 0) {
+
+	failed_parse = parse_tree_gently(tree, 1);
+	if (failed_parse) {
 		if (revs->ignore_missing_links)
 			return;
 
@@ -163,7 +166,8 @@ static void process_tree(struct traversal_context *ctx,
 		    is_promisor_object(&obj->oid))
 			return;
 
-		die("bad tree object %s", oid_to_hex(&obj->oid));
+		if (!revs->do_not_die_on_missing_tree)
+			die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
 	strbuf_addstr(base, name);
@@ -178,7 +182,8 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	process_tree_contents(ctx, tree, base);
+	if (!failed_parse)
+		process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
diff --git a/revision.h b/revision.h
index 007278cc1..5910613cb 100644
--- a/revision.h
+++ b/revision.h
@@ -126,6 +126,21 @@ struct rev_info {
 			line_level_traverse:1,
 			tree_blobs_in_commit_order:1,
 
+			/*
+			 * Blobs are shown without regard for their existence.
+			 * But not so for trees: unless exclude_promisor_objects
+			 * is set and the tree in question is a promisor object;
+			 * OR ignore_missing_links is set, the revision walker
+			 * dies with a "bad tree object HASH" message when
+			 * encountering a missing tree. For callers that can
+			 * handle missing trees and want them to be filterable
+			 * and showable, set this to true. The revision walker
+			 * will filter and show such a missing tree as usual,
+			 * but will not attempt to recurse into this tree
+			 * object.
+			 */
+			do_not_die_on_missing_tree:1,
+
 			/* for internal use only */
 			exclude_promisor_objects:1;
 
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 128130066..5bc5b4445 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -186,6 +186,51 @@ test_expect_success 'rev-list stops traversal at missing and promised commit' '
 	! grep $FOO out
 '
 
+test_expect_success 'missing tree objects with --missing=allow-promisor and --exclude-promisor-objects' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo foo &&
+	test_commit -C repo bar &&
+	test_commit -C repo baz &&
+
+	promise_and_delete $(git -C repo rev-parse bar^{tree}) &&
+	promise_and_delete $(git -C repo rev-parse foo^{tree}) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo rev-list --missing=allow-promisor --objects HEAD >objs 2>rev_list_err &&
+	test_must_be_empty rev_list_err &&
+	# 3 commits, 3 blobs, and 1 tree
+	test_line_count = 7 objs &&
+
+	# Do the same for --exclude-promisor-objects, but with all trees gone.
+	promise_and_delete $(git -C repo rev-parse baz^{tree}) &&
+	git -C repo rev-list --exclude-promisor-objects --objects HEAD >objs 2>rev_list_err &&
+	test_must_be_empty rev_list_err &&
+	# 3 commits, no blobs or trees
+	test_line_count = 3 objs
+'
+
+test_expect_success 'missing non-root tree object and rev-list' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	mkdir repo/dir &&
+	echo foo >repo/dir/foo &&
+	git -C repo add dir/foo &&
+	git -C repo commit -m "commit dir/foo" &&
+
+	promise_and_delete $(git -C repo rev-parse HEAD:dir) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err &&
+	test_must_be_empty rev_list_err &&
+	# 1 commit and 1 tree
+	test_line_count = 2 objs
+'
+
 test_expect_success 'rev-list stops traversal at missing and promised tree' '
 	rm -rf repo &&
 	test_create_repo repo &&
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 6710c8bc8..9839b48c1 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -59,6 +59,19 @@ test_expect_success 'verify normal and blob:none packfiles have same commits/tre
 	test_cmp observed expected
 '
 
+test_expect_success 'get an error for missing tree object' '
+	git init r5 &&
+	echo foo >r5/foo &&
+	git -C r5 add foo &&
+	git -C r5 commit -m "foo" &&
+	del=$(git -C r5 rev-parse HEAD^{tree} | sed "s|..|&/|") &&
+	rm r5/.git/objects/$del &&
+	test_must_fail git -C r5 pack-objects --rev --stdout 2>bad_tree <<-EOF &&
+	HEAD
+	EOF
+	grep -q "bad tree object" bad_tree
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index d4ff0b3be..efe5a2467 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -195,6 +195,28 @@ test_expect_success 'verify sparse:oid=oid-ish omits top-level files' '
 	test_cmp observed expected
 '
 
+test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for trees' '
+	TREE=$(git -C r3 rev-parse HEAD:dir1) &&
+
+	# Create a spare repo because we will be deleting objects from this one.
+	git clone r3 r3.b &&
+
+	rm r3.b/.git/objects/$(echo $TREE | sed "s|^..|&/|") &&
+
+	git -C r3.b rev-list --quiet --missing=print --objects HEAD \
+		>missing_objs 2>rev_list_err &&
+	echo "?$TREE" >expected &&
+	test_cmp expected missing_objs &&
+
+	# do not complain when a missing tree cannot be parsed
+	test_must_be_empty rev_list_err &&
+
+	git -C r3.b rev-list --missing=allow-any --objects HEAD \
+		>objs 2>rev_list_err &&
+	! grep $TREE objs &&
+	test_must_be_empty rev_list_err
+'
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v11 5/8] revision: mark non-user-given objects instead
  2018-10-05 21:31 ` [PATCH v11 " Matthew DeVore
                     ` (3 preceding siblings ...)
  2018-10-05 21:31   ` [PATCH v11 4/8] rev-list: handle missing tree objects properly Matthew DeVore
@ 2018-10-05 21:31   ` Matthew DeVore
  2018-10-05 21:31   ` [PATCH v11 6/8] list-objects-filter: use BUG rather than die Matthew DeVore
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-05 21:31 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

Currently, list-objects.c incorrectly treats all root trees of commits
as USER_GIVEN. Also, it would be easier to mark objects that are
non-user-given instead of user-given, since the places in the code
where we access an object through a reference are more obvious than
the places where we access an object that was given by the user.

Resolve these two problems by introducing a flag NOT_USER_GIVEN that
marks blobs and trees that are non-user-given, replacing USER_GIVEN.
(Only blobs and trees are marked because this mark is only used when
filtering objects, and filtering of other types of objects is not
supported yet.)

This fixes a bug in that git rev-list behaved differently from git
pack-objects. pack-objects would *not* filter objects given explicitly
on the command line and rev-list would filter. This was because the two
commands used a different function to add objects to the rev_info
struct. This seems to have been an oversight, and pack-objects has the
correct behavior, so I added a test to make sure that rev-list now
behaves properly.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c                      | 31 +++++++++++++++++------------
 revision.c                          |  1 -
 revision.h                          | 11 ++++++++--
 t/t6112-rev-list-filters-objects.sh | 12 +++++++++++
 4 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 243192af5..7a1a0929d 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -53,7 +53,7 @@ static void process_blob(struct traversal_context *ctx,
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BLOB, obj,
 				   path->buf, &path->buf[pathlen],
 				   ctx->filter_data);
@@ -120,17 +120,19 @@ static void process_tree_contents(struct traversal_context *ctx,
 				continue;
 		}
 
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
+		if (S_ISDIR(entry.mode)) {
+			struct tree *t = lookup_tree(the_repository, entry.oid);
+			t->object.flags |= NOT_USER_GIVEN;
+			process_tree(ctx, t, base, entry.path);
+		}
 		else if (S_ISGITLINK(entry.mode))
 			process_gitlink(ctx, entry.oid->hash,
 					base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
+		else {
+			struct blob *b = lookup_blob(the_repository, entry.oid);
+			b->object.flags |= NOT_USER_GIVEN;
+			process_blob(ctx, b, base, entry.path);
+		}
 	}
 }
 
@@ -171,7 +173,7 @@ static void process_tree(struct traversal_context *ctx,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -185,7 +187,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (!failed_parse)
 		process_tree_contents(ctx, tree, base);
 
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -301,8 +303,11 @@ static void do_traverse(struct traversal_context *ctx)
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
-		if (get_commit_tree(commit))
-			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		if (get_commit_tree(commit)) {
+			struct tree *tree = get_commit_tree(commit);
+			tree->object.flags |= NOT_USER_GIVEN;
+			add_pending_tree(ctx->revs, tree);
+		}
 		ctx->show_commit(commit, ctx->show_data);
 
 		if (ctx->revs->tree_blobs_in_commit_order)
diff --git a/revision.c b/revision.c
index de4dce600..72d48a17f 100644
--- a/revision.c
+++ b/revision.c
@@ -175,7 +175,6 @@ static void add_pending_object_with_path(struct rev_info *revs,
 		strbuf_release(&buf);
 		return; /* do not add the commit itself */
 	}
-	obj->flags |= USER_GIVEN;
 	add_object_array_with_path(obj, name, &revs->pending, mode, path);
 }
 
diff --git a/revision.h b/revision.h
index 5910613cb..83e164039 100644
--- a/revision.h
+++ b/revision.h
@@ -21,9 +21,16 @@
 #define SYMMETRIC_LEFT	(1u<<8)
 #define PATCHSAME	(1u<<9)
 #define BOTTOM		(1u<<10)
-#define USER_GIVEN	(1u<<25) /* given directly by the user */
+/*
+ * Indicates object was reached by traversal. i.e. not given by user on
+ * command-line or stdin.
+ * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
+ * filtering trees and blobs, but it may be useful to support filtering commits
+ * in the future.
+ */
+#define NOT_USER_GIVEN	(1u<<25)
 #define TRACK_LINEAR	(1u<<26)
-#define ALL_REV_FLAGS	(((1u<<11)-1) | USER_GIVEN | TRACK_LINEAR)
+#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)
 
 #define DECORATE_SHORT_REFS	1
 #define DECORATE_FULL_REFS	2
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index efe5a2467..ccbc64413 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -30,6 +30,18 @@ test_expect_success 'verify blob:none omits all 5 blobs' '
 	test_cmp observed expected
 '
 
+test_expect_success 'specify blob explicitly prevents filtering' '
+	file_3=$(git -C r1 ls-files -s file.3 |
+		 awk -f print_2.awk) &&
+
+	file_4=$(git -C r1 ls-files -s file.4 |
+		 awk -f print_2.awk) &&
+
+	git -C r1 rev-list --objects --filter=blob:none HEAD $file_3 >observed &&
+	grep -q "$file_3" observed &&
+	test_must_fail grep -q "$file_4" observed
+'
+
 test_expect_success 'verify emitted+omitted == all' '
 	git -C r1 rev-list HEAD --objects \
 		| awk -f print_1.awk \
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v11 6/8] list-objects-filter: use BUG rather than die
  2018-10-05 21:31 ` [PATCH v11 " Matthew DeVore
                     ` (4 preceding siblings ...)
  2018-10-05 21:31   ` [PATCH v11 5/8] revision: mark non-user-given objects instead Matthew DeVore
@ 2018-10-05 21:31   ` Matthew DeVore
  2018-10-05 21:31   ` [PATCH v11 7/8] list-objects-filter-options: do not over-strbuf_init Matthew DeVore
  2018-10-05 21:31   ` [PATCH v11 8/8] list-objects-filter: implement filter tree:0 Matthew DeVore
  7 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-05 21:31 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

In some cases in this file, BUG makes more sense than die. In such
cases, a we get there from a coding error rather than a user error.

'return' has been removed following some instances of BUG since BUG does
not return.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects-filter.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..5f8b1a002 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -44,8 +44,7 @@ static enum list_objects_filter_result filter_blobs_none(
 
 	switch (filter_situation) {
 	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
@@ -102,8 +101,7 @@ static enum list_objects_filter_result filter_blobs_limit(
 
 	switch (filter_situation) {
 	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
@@ -208,8 +206,7 @@ static enum list_objects_filter_result filter_sparse(
 
 	switch (filter_situation) {
 	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
@@ -389,7 +386,7 @@ void *list_objects_filter__init(
 	assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);
 
 	if (filter_options->choice >= LOFC__COUNT)
-		die("invalid list-objects filter choice: %d",
+		BUG("invalid list-objects filter choice: %d",
 		    filter_options->choice);
 
 	init_fn = s_filters[filter_options->choice];
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v11 7/8] list-objects-filter-options: do not over-strbuf_init
  2018-10-05 21:31 ` [PATCH v11 " Matthew DeVore
                     ` (5 preceding siblings ...)
  2018-10-05 21:31   ` [PATCH v11 6/8] list-objects-filter: use BUG rather than die Matthew DeVore
@ 2018-10-05 21:31   ` Matthew DeVore
  2018-10-05 21:31   ` [PATCH v11 8/8] list-objects-filter: implement filter tree:0 Matthew DeVore
  7 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-05 21:31 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

The function gently_parse_list_objects_filter is either called with
errbuf=STRBUF_INIT or errbuf=NULL, but that function calls strbuf_init
when errbuf is not NULL. strbuf_init is only necessary if errbuf
contains garbage, and risks a memory leak if errbuf already has a
non-STRBUF_INIT state. It should be the caller's responsibility to make
sure errbuf is not garbage, since garbage content is easily avoidable
with STRBUF_INIT.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects-filter-options.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..d259bdb2c 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -30,7 +30,6 @@ static int gently_parse_list_objects_filter(
 
 	if (filter_options->choice) {
 		if (errbuf) {
-			strbuf_init(errbuf, 0);
 			strbuf_addstr(
 				errbuf,
 				_("multiple filter-specs cannot be combined"));
@@ -71,10 +70,9 @@ static int gently_parse_list_objects_filter(
 		return 0;
 	}
 
-	if (errbuf) {
-		strbuf_init(errbuf, 0);
+	if (errbuf)
 		strbuf_addf(errbuf, "invalid filter-spec '%s'", arg);
-	}
+
 	memset(filter_options, 0, sizeof(*filter_options));
 	return 1;
 }
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v11 8/8] list-objects-filter: implement filter tree:0
  2018-10-05 21:31 ` [PATCH v11 " Matthew DeVore
                     ` (6 preceding siblings ...)
  2018-10-05 21:31   ` [PATCH v11 7/8] list-objects-filter-options: do not over-strbuf_init Matthew DeVore
@ 2018-10-05 21:31   ` Matthew DeVore
  2018-10-07  0:10     ` Junio C Hamano
  7 siblings, 1 reply; 151+ messages in thread
From: Matthew DeVore @ 2018-10-05 21:31 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

Teach list-objects the "tree:0" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:0 - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also considered only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
would filter out all but the root tree and blobs. In order to avoid
confusion between 0 and capital O, the documentation was worded in a
somewhat round-about way that also hints at this future improvement to
the feature.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 Documentation/rev-list-options.txt     |  5 +++
 list-objects-filter-options.c          | 13 +++++++
 list-objects-filter-options.h          |  1 +
 list-objects-filter.c                  | 49 ++++++++++++++++++++++++++
 t/t5317-pack-objects-filter-objects.sh | 28 +++++++++++++++
 t/t5616-partial-clone.sh               | 42 ++++++++++++++++++++++
 t/t6112-rev-list-filters-objects.sh    | 15 ++++++++
 7 files changed, 153 insertions(+)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 7b273635d..5f1672913 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -731,6 +731,11 @@ the requested refs.
 +
 The form '--filter=sparse:path=<path>' similarly uses a sparse-checkout
 specification contained in <path>.
++
+The form '--filter=tree:<depth>' omits all blobs and trees whose depth
+from the root tree is >= <depth> (minimum depth if an object is located
+at multiple depths in the commits traversed). Currently, only <depth>=0
+is supported, which omits all blobs and trees.
 
 --no-filter::
 	Turn off any previous `--filter=` argument.
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index d259bdb2c..e8da2e858 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -49,6 +49,19 @@ static int gently_parse_list_objects_filter(
 			return 0;
 		}
 
+	} else if (skip_prefix(arg, "tree:", &v0)) {
+		unsigned long depth;
+		if (!git_parse_ulong(v0, &depth) || depth != 0) {
+			if (errbuf) {
+				strbuf_addstr(
+					errbuf,
+					_("only 'tree:0' is supported"));
+			}
+			return 1;
+		}
+		filter_options->choice = LOFC_TREE_NONE;
+		return 0;
+
 	} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
 		struct object_context oc;
 		struct object_id sparse_oid;
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 0000a61f8..af64e5c66 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -10,6 +10,7 @@ enum list_objects_filter_choice {
 	LOFC_DISABLED = 0,
 	LOFC_BLOB_NONE,
 	LOFC_BLOB_LIMIT,
+	LOFC_TREE_NONE,
 	LOFC_SPARSE_OID,
 	LOFC_SPARSE_PATH,
 	LOFC__COUNT /* must be last */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 5f8b1a002..09b2b05d5 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -79,6 +79,54 @@ static void *filter_blobs_none__init(
 	return d;
 }
 
+/*
+ * A filter for list-objects to omit ALL trees and blobs from the traversal.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_trees_none_data {
+	struct oidset *omits;
+};
+
+static enum list_objects_filter_result filter_trees_none(
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data_)
+{
+	struct filter_trees_none_data *filter_data = filter_data_;
+
+	switch (filter_situation) {
+	default:
+		BUG("unknown filter_situation: %d", filter_situation);
+
+	case LOFS_BEGIN_TREE:
+	case LOFS_BLOB:
+		if (filter_data->omits)
+			oidset_insert(filter_data->omits, &obj->oid);
+		return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
+
+	case LOFS_END_TREE:
+		assert(obj->type == OBJ_TREE);
+		return LOFR_ZERO;
+
+	}
+}
+
+static void* filter_trees_none__init(
+	struct oidset *omitted,
+	struct list_objects_filter_options *filter_options,
+	filter_object_fn *filter_fn,
+	filter_free_fn *filter_free_fn)
+{
+	struct filter_trees_none_data *d = xcalloc(1, sizeof(*d));
+	d->omits = omitted;
+
+	*filter_fn = filter_trees_none;
+	*filter_free_fn = free;
+	return d;
+}
+
 /*
  * A filter for list-objects to omit large blobs.
  * And to OPTIONALLY collect a list of the omitted OIDs.
@@ -371,6 +419,7 @@ static filter_init_fn s_filters[] = {
 	NULL,
 	filter_blobs_none__init,
 	filter_blobs_limit__init,
+	filter_trees_none__init,
 	filter_sparse_oid__init,
 	filter_sparse_path__init,
 };
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 9839b48c1..510d3537f 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -72,6 +72,34 @@ test_expect_success 'get an error for missing tree object' '
 	grep -q "bad tree object" bad_tree
 '
 
+test_expect_success 'setup for tests of tree:0' '
+	mkdir r1/subtree &&
+	echo "This is a file in a subtree" >r1/subtree/file &&
+	git -C r1 add subtree/file &&
+	git -C r1 commit -m subtree
+'
+
+test_expect_success 'verify tree:0 packfile has no blobs or trees' '
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	! grep -E "tree|blob" objs
+'
+
+test_expect_success 'grab tree directly when using tree:0' '
+	# We should get the tree specified directly but not its blobs or subtrees.
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD:
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	awk "/tree|blob/{print \$1}" objs >trees_and_blobs &&
+	git -C r1 rev-parse HEAD: >expected &&
+	test_cmp expected trees_and_blobs
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index bbbe7537d..53fbf7db8 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -154,6 +154,48 @@ test_expect_success 'partial clone with transfer.fsckobjects=1 uses index-pack -
 	grep "git index-pack.*--fsck-objects" trace
 '
 
+test_expect_success 'use fsck before and after manually fetching a missing subtree' '
+	# push new commit so server has a subtree
+	mkdir src/dir &&
+	echo "in dir" >src/dir/file.txt &&
+	git -C src add dir/file.txt &&
+	git -C src commit -m "file in dir" &&
+	git -C src push -u srv master &&
+	SUBTREE=$(git -C src rev-parse HEAD:dir) &&
+
+	rm -rf dst &&
+	git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
+	git -C dst fsck &&
+
+	# Make sure we only have commits, and all trees and blobs are missing.
+	git -C dst rev-list --missing=allow-any --objects master \
+		>fetched_objects &&
+	awk -f print_1.awk fetched_objects |
+	xargs -n1 git -C dst cat-file -t >fetched_types &&
+
+	sort -u fetched_types >unique_types.observed &&
+	echo commit >unique_types.expected &&
+	test_cmp unique_types.expected unique_types.observed &&
+
+	# Auto-fetch a tree with cat-file.
+	git -C dst cat-file -p $SUBTREE >tree_contents &&
+	grep file.txt tree_contents &&
+
+	# fsck still works after an auto-fetch of a tree.
+	git -C dst fsck &&
+
+	# Auto-fetch all remaining trees and blobs with --missing=error
+	git -C dst rev-list --missing=error --objects master >fetched_objects &&
+	test_line_count = 70 fetched_objects &&
+
+	awk -f print_1.awk fetched_objects |
+	xargs -n1 git -C dst cat-file -t >fetched_types &&
+
+	sort -u fetched_types >unique_types.observed &&
+	printf "blob\ncommit\ntree\n" >unique_types.expected &&
+	test_cmp unique_types.expected unique_types.observed
+'
+
 test_expect_success 'partial clone fetches blobs pointed to by refs even if normally filtered out' '
 	rm -rf src dst &&
 	git init src &&
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index ccbc64413..c8e3d87c4 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -229,6 +229,21 @@ test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for tre
 	test_must_be_empty rev_list_err
 '
 
+# Test tree:0 filter.
+
+test_expect_success 'verify tree:0 includes trees in "filtered" output' '
+	git -C r3 rev-list --quiet --objects --filter-print-omitted \
+		--filter=tree:0 HEAD >revs &&
+
+	awk -f print_1.awk revs |
+	sed s/~// |
+	xargs -n1 git -C r3 cat-file -t >unsorted_filtered_types &&
+
+	sort -u unsorted_filtered_types >filtered_types &&
+	printf "blob\ntree\n" >expected &&
+	test_cmp expected filtered_types
+'
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v11 8/8] list-objects-filter: implement filter tree:0
  2018-10-05 21:31   ` [PATCH v11 8/8] list-objects-filter: implement filter tree:0 Matthew DeVore
@ 2018-10-07  0:10     ` Junio C Hamano
  2018-10-08 17:23       ` Matthew DeVore
  0 siblings, 1 reply; 151+ messages in thread
From: Junio C Hamano @ 2018-10-07  0:10 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: git, sbeller, git, jeffhost, peff, stefanbeller, jonathantanmy, pclouds

Matthew DeVore <matvore@google.com> writes:

> The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
> would filter out all but the root tree and blobs. In order to avoid
> confusion between 0 and capital O, the documentation was worded in a
> somewhat round-about way that also hints at this future improvement to
> the feature.
>
> Signed-off-by: Matthew DeVore <matvore@google.com>

Thanks.

> diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
> index 7b273635d..5f1672913 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -731,6 +731,11 @@ the requested refs.
>  +
>  The form '--filter=sparse:path=<path>' similarly uses a sparse-checkout
>  specification contained in <path>.
> ++
> +The form '--filter=tree:<depth>' omits all blobs and trees whose depth
> +from the root tree is >= <depth> (minimum depth if an object is located
> +at multiple depths in the commits traversed). Currently, only <depth>=0
> +is supported, which omits all blobs and trees.

OK.

> diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
> index 9839b48c1..510d3537f 100755
> --- a/t/t5317-pack-objects-filter-objects.sh
> +++ b/t/t5317-pack-objects-filter-objects.sh
> @@ -72,6 +72,34 @@ test_expect_success 'get an error for missing tree object' '
>  	grep -q "bad tree object" bad_tree
>  '

As output made inside test_expect_{succcess,failure} are discarded
by default and shown while debugging tests, there is no strong
reason to use "grep -q" in our tests.  I saw a few instances of
"grep -q" added in this series including this one

	test_must_fail grep -q "$file_4" observed

that should probably be

	! grep "$file_4" observed

> +	printf "blob\ncommit\ntree\n" >unique_types.expected &&
> ...
> +	printf "blob\ntree\n" >expected &&

Using test_write_lines is probably easier to read.

Other than these two minor classes of nits, the series look quite
well cooked to me.

Thanks.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v11 8/8] list-objects-filter: implement filter tree:0
  2018-10-07  0:10     ` Junio C Hamano
@ 2018-10-08 17:23       ` Matthew DeVore
  0 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-08 17:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Stefan Beller, git, jeffhost, Jeff King, Stefan Beller,
	Jonathan Tan, pclouds

On Sat, Oct 6, 2018 at 5:10 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> As output made inside test_expect_{succcess,failure} are discarded
> by default and shown while debugging tests, there is no strong
> reason to use "grep -q" in our tests.  I saw a few instances of
> "grep -q" added in this series including this one
>
>         test_must_fail grep -q "$file_4" observed
>
> that should probably be
>
>         ! grep "$file_4" observed
Yeah, I remember I read in the testing guidelines that you should just
use ! for non-Git commands since it's not our job to make sure these
tools are not crashing. Thank you for pointing this out.

>
> > +     printf "blob\ncommit\ntree\n" >unique_types.expected &&
> > ...
> > +     printf "blob\ntree\n" >expected &&
>
> Using test_write_lines is probably easier to read.

Done. Below is an interdiff. Let me know if you want a reroll soon.
Otherwise, I will send one later this week.

- Matt

diff --git a/t/t5317-pack-objects-filter-objects.sh
b/t/t5317-pack-objects-filter-objects.sh
index 510d3537f..d9dccf4d4 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -69,7 +69,7 @@ test_expect_success 'get an error for missing tree object' '
         test_must_fail git -C r5 pack-objects --rev --stdout
2>bad_tree <<-EOF &&
         HEAD
         EOF
-        grep -q "bad tree object" bad_tree
+        grep "bad tree object" bad_tree
 '

 test_expect_success 'setup for tests of tree:0' '
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 53fbf7db8..392caa08f 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -192,7 +192,7 @@ test_expect_success 'use fsck before and after
manually fetching a missing subtr
         xargs -n1 git -C dst cat-file -t >fetched_types &&

         sort -u fetched_types >unique_types.observed &&
-        printf "blob\ncommit\ntree\n" >unique_types.expected &&
+        test_write_lines blob commit tree >unique_types.expected &&
         test_cmp unique_types.expected unique_types.observed
 '

diff --git a/t/t6112-rev-list-filters-objects.sh
b/t/t6112-rev-list-filters-objects.sh
index c8e3d87c4..08e0c7db6 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -38,8 +38,8 @@ test_expect_success 'specify blob explicitly
prevents filtering' '
                  awk -f print_2.awk) &&

         git -C r1 rev-list --objects --filter=blob:none HEAD $file_3
>observed &&
-        grep -q "$file_3" observed &&
-        test_must_fail grep -q "$file_4" observed
+        grep "$file_3" observed &&
+        ! grep "$file_4" observed
 '

 test_expect_success 'verify emitted+omitted == all' '
@@ -240,7 +240,7 @@ test_expect_success 'verify tree:0 includes trees
in "filtered" output' '
         xargs -n1 git -C r3 cat-file -t >unsorted_filtered_types &&

         sort -u unsorted_filtered_types >filtered_types &&
-        printf "blob\ntree\n" >expected &&
+        test_write_lines blob tree >expected &&
         test_cmp expected filtered_types
 '

^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v12 0/8] filter: support for excluding all trees and blobs
  2018-08-09 22:44 [RFC PATCH 0/5] filter: support for excluding all trees and blobs Matthew DeVore
                   ` (15 preceding siblings ...)
  2018-10-05 21:31 ` [PATCH v11 " Matthew DeVore
@ 2018-10-12 20:01 ` Matthew DeVore
  2018-10-12 20:01   ` [PATCH v12 1/8] list-objects: store common func args in struct Matthew DeVore
                     ` (8 more replies)
  16 siblings, 9 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-12 20:01 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

Here is a re-roll-up since I haven't received any additional corrections for
almost a week. The changes are very slight and just for clean-up so it is ready
to be promoted to master.

This is the interdiff from last time:

diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 510d3537f..d9dccf4d4 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -69,7 +69,7 @@ test_expect_success 'get an error for missing tree object' '
         test_must_fail git -C r5 pack-objects --rev --stdout 2>bad_tree <<-EOF &&
         HEAD
         EOF
-        grep -q "bad tree object" bad_tree
+        grep "bad tree object" bad_tree
 '
 
 test_expect_success 'setup for tests of tree:0' '
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 53fbf7db8..392caa08f 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -192,7 +192,7 @@ test_expect_success 'use fsck before and after manually fetching a missing subtr
         xargs -n1 git -C dst cat-file -t >fetched_types &&
 
         sort -u fetched_types >unique_types.observed &&
-        printf "blob\ncommit\ntree\n" >unique_types.expected &&
+        test_write_lines blob commit tree >unique_types.expected &&
         test_cmp unique_types.expected unique_types.observed
 '
 
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index c8e3d87c4..08e0c7db6 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -38,8 +38,8 @@ test_expect_success 'specify blob explicitly prevents filtering' '
                  awk -f print_2.awk) &&
 
         git -C r1 rev-list --objects --filter=blob:none HEAD $file_3 >observed &&
-        grep -q "$file_3" observed &&
-        test_must_fail grep -q "$file_4" observed
+        grep "$file_3" observed &&
+        ! grep "$file_4" observed
 '
 
 test_expect_success 'verify emitted+omitted == all' '
@@ -240,7 +240,7 @@ test_expect_success 'verify tree:0 includes trees in "filtered" output' '
         xargs -n1 git -C r3 cat-file -t >unsorted_filtered_types &&
 
         sort -u unsorted_filtered_types >filtered_types &&
-        printf "blob\ntree\n" >expected &&
+        test_write_lines blob tree >expected &&
         test_cmp expected filtered_types
 '
 

Thanks,

Matthew DeVore (8):
  list-objects: store common func args in struct
  list-objects: refactor to process_tree_contents
  list-objects: always parse trees gently
  rev-list: handle missing tree objects properly
  revision: mark non-user-given objects instead
  list-objects-filter: use BUG rather than die
  list-objects-filter-options: do not over-strbuf_init
  list-objects-filter: implement filter tree:0

 Documentation/rev-list-options.txt     |   5 +
 builtin/rev-list.c                     |  11 +-
 list-objects-filter-options.c          |  19 +-
 list-objects-filter-options.h          |   1 +
 list-objects-filter.c                  |  60 ++++++-
 list-objects.c                         | 232 +++++++++++++------------
 revision.c                             |   1 -
 revision.h                             |  26 ++-
 t/t0410-partial-clone.sh               |  45 +++++
 t/t5317-pack-objects-filter-objects.sh |  41 +++++
 t/t5616-partial-clone.sh               |  42 +++++
 t/t6112-rev-list-filters-objects.sh    |  49 ++++++
 12 files changed, 404 insertions(+), 128 deletions(-)

-- 
2.19.1.331.ge82ca0e54c-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v12 1/8] list-objects: store common func args in struct
  2018-10-12 20:01 ` [PATCH v12 0/8] filter: support for excluding all trees and blobs Matthew DeVore
@ 2018-10-12 20:01   ` Matthew DeVore
  2018-10-12 20:01   ` [PATCH v12 2/8] list-objects: refactor to process_tree_contents Matthew DeVore
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-12 20:01 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

This will make utility functions easier to create, as done by the next
patch.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 158 +++++++++++++++++++++++--------------------------
 1 file changed, 74 insertions(+), 84 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index c99c47ac1..584518a3f 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -12,20 +12,25 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static void process_blob(struct rev_info *revs,
+struct traversal_context {
+	struct rev_info *revs;
+	show_object_fn show_object;
+	show_commit_fn show_commit;
+	void *show_data;
+	filter_object_fn filter_fn;
+	void *filter_data;
+};
+
+static void process_blob(struct traversal_context *ctx,
 			 struct blob *blob,
-			 show_object_fn show,
 			 struct strbuf *path,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &blob->object;
 	size_t pathlen;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
-	if (!revs->blob_objects)
+	if (!ctx->revs->blob_objects)
 		return;
 	if (!obj)
 		die("bad blob object");
@@ -41,21 +46,21 @@ static void process_blob(struct rev_info *revs,
 	 * may cause the actual filter to report an incomplete list
 	 * of missing objects.
 	 */
-	if (revs->exclude_promisor_objects &&
+	if (ctx->revs->exclude_promisor_objects &&
 	    !has_object_file(&obj->oid) &&
 	    is_promisor_object(&obj->oid))
 		return;
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BLOB, obj,
-			      path->buf, &path->buf[pathlen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BLOB, obj,
+				   path->buf, &path->buf[pathlen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, path->buf, cb_data);
+		ctx->show_object(obj, path->buf, ctx->show_data);
 	strbuf_setlen(path, pathlen);
 }
 
@@ -81,26 +86,21 @@ static void process_blob(struct rev_info *revs,
  * the link, and how to do it. Whether it necessarily makes
  * any sense what-so-ever to ever do that is another issue.
  */
-static void process_gitlink(struct rev_info *revs,
+static void process_gitlink(struct traversal_context *ctx,
 			    const unsigned char *sha1,
-			    show_object_fn show,
 			    struct strbuf *path,
-			    const char *name,
-			    void *cb_data)
+			    const char *name)
 {
 	/* Nothing to do */
 }
 
-static void process_tree(struct rev_info *revs,
+static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
-			 show_object_fn show,
 			 struct strbuf *base,
-			 const char *name,
-			 void *cb_data,
-			 filter_object_fn filter_fn,
-			 void *filter_data)
+			 const char *name)
 {
 	struct object *obj = &tree->object;
+	struct rev_info *revs = ctx->revs;
 	struct tree_desc desc;
 	struct name_entry entry;
 	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
@@ -133,14 +133,14 @@ static void process_tree(struct rev_info *revs,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && filter_fn)
-		r = filter_fn(LOFS_BEGIN_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
-		show(obj, base->buf, cb_data);
+		ctx->show_object(obj, base->buf, ctx->show_data);
 	if (base->len)
 		strbuf_addch(base, '/');
 
@@ -157,29 +157,25 @@ static void process_tree(struct rev_info *revs,
 		}
 
 		if (S_ISDIR(entry.mode))
-			process_tree(revs,
+			process_tree(ctx,
 				     lookup_tree(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(revs, entry.oid->hash,
-					show, base, entry.path,
-					cb_data);
+			process_gitlink(ctx, entry.oid->hash, base, entry.path);
 		else
-			process_blob(revs,
+			process_blob(ctx,
 				     lookup_blob(the_repository, entry.oid),
-				     show, base, entry.path,
-				     cb_data, filter_fn, filter_data);
+				     base, entry.path);
 	}
 
-	if (!(obj->flags & USER_GIVEN) && filter_fn) {
-		r = filter_fn(LOFS_END_TREE, obj,
-			      base->buf, &base->buf[baselen],
-			      filter_data);
+	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+		r = ctx->filter_fn(LOFS_END_TREE, obj,
+				   base->buf, &base->buf[baselen],
+				   ctx->filter_data);
 		if (r & LOFR_MARK_SEEN)
 			obj->flags |= SEEN;
 		if (r & LOFR_DO_SHOW)
-			show(obj, base->buf, cb_data);
+			ctx->show_object(obj, base->buf, ctx->show_data);
 	}
 
 	strbuf_setlen(base, baselen);
@@ -242,19 +238,15 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree)
 	add_pending_object(revs, &tree->object, "");
 }
 
-static void traverse_trees_and_blobs(struct rev_info *revs,
-				     struct strbuf *base,
-				     show_object_fn show_object,
-				     void *show_data,
-				     filter_object_fn filter_fn,
-				     void *filter_data)
+static void traverse_trees_and_blobs(struct traversal_context *ctx,
+				     struct strbuf *base)
 {
 	int i;
 
 	assert(base->len == 0);
 
-	for (i = 0; i < revs->pending.nr; i++) {
-		struct object_array_entry *pending = revs->pending.objects + i;
+	for (i = 0; i < ctx->revs->pending.nr; i++) {
+		struct object_array_entry *pending = ctx->revs->pending.objects + i;
 		struct object *obj = pending->item;
 		const char *name = pending->name;
 		const char *path = pending->path;
@@ -262,62 +254,49 @@ static void traverse_trees_and_blobs(struct rev_info *revs,
 			continue;
 		if (obj->type == OBJ_TAG) {
 			obj->flags |= SEEN;
-			show_object(obj, name, show_data);
+			ctx->show_object(obj, name, ctx->show_data);
 			continue;
 		}
 		if (!path)
 			path = "";
 		if (obj->type == OBJ_TREE) {
-			process_tree(revs, (struct tree *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_tree(ctx, (struct tree *)obj, base, path);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
-			process_blob(revs, (struct blob *)obj, show_object,
-				     base, path, show_data,
-				     filter_fn, filter_data);
+			process_blob(ctx, (struct blob *)obj, base, path);
 			continue;
 		}
 		die("unknown pending object %s (%s)",
 		    oid_to_hex(&obj->oid), name);
 	}
-	object_array_clear(&revs->pending);
+	object_array_clear(&ctx->revs->pending);
 }
 
-static void do_traverse(struct rev_info *revs,
-			show_commit_fn show_commit,
-			show_object_fn show_object,
-			void *show_data,
-			filter_object_fn filter_fn,
-			void *filter_data)
+static void do_traverse(struct traversal_context *ctx)
 {
 	struct commit *commit;
 	struct strbuf csp; /* callee's scratch pad */
 	strbuf_init(&csp, PATH_MAX);
 
-	while ((commit = get_revision(revs)) != NULL) {
+	while ((commit = get_revision(ctx->revs)) != NULL) {
 		/*
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
 		if (get_commit_tree(commit))
-			add_pending_tree(revs, get_commit_tree(commit));
-		show_commit(commit, show_data);
+			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		ctx->show_commit(commit, ctx->show_data);
 
-		if (revs->tree_blobs_in_commit_order)
+		if (ctx->revs->tree_blobs_in_commit_order)
 			/*
 			 * NEEDSWORK: Adding the tree and then flushing it here
 			 * needs a reallocation for each commit. Can we pass the
 			 * tree directory without allocation churn?
 			 */
-			traverse_trees_and_blobs(revs, &csp,
-						 show_object, show_data,
-						 filter_fn, filter_data);
+			traverse_trees_and_blobs(ctx, &csp);
 	}
-	traverse_trees_and_blobs(revs, &csp,
-				 show_object, show_data,
-				 filter_fn, filter_data);
+	traverse_trees_and_blobs(ctx, &csp);
 	strbuf_release(&csp);
 }
 
@@ -326,7 +305,14 @@ void traverse_commit_list(struct rev_info *revs,
 			  show_object_fn show_object,
 			  void *show_data)
 {
-	do_traverse(revs, show_commit, show_object, show_data, NULL, NULL);
+	struct traversal_context ctx;
+	ctx.revs = revs;
+	ctx.show_commit = show_commit;
+	ctx.show_object = show_object;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+	ctx.filter_data = NULL;
+	do_traverse(&ctx);
 }
 
 void traverse_commit_list_filtered(
@@ -337,14 +323,18 @@ void traverse_commit_list_filtered(
 	void *show_data,
 	struct oidset *omitted)
 {
-	filter_object_fn filter_fn = NULL;
+	struct traversal_context ctx;
 	filter_free_fn filter_free_fn = NULL;
-	void *filter_data = NULL;
-
-	filter_data = list_objects_filter__init(omitted, filter_options,
-						&filter_fn, &filter_free_fn);
-	do_traverse(revs, show_commit, show_object, show_data,
-		    filter_fn, filter_data);
-	if (filter_data && filter_free_fn)
-		filter_free_fn(filter_data);
+
+	ctx.revs = revs;
+	ctx.show_object = show_object;
+	ctx.show_commit = show_commit;
+	ctx.show_data = show_data;
+	ctx.filter_fn = NULL;
+
+	ctx.filter_data = list_objects_filter__init(omitted, filter_options,
+						    &ctx.filter_fn, &filter_free_fn);
+	do_traverse(&ctx);
+	if (ctx.filter_data && filter_free_fn)
+		filter_free_fn(ctx.filter_data);
 }
-- 
2.19.1.331.ge82ca0e54c-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v12 2/8] list-objects: refactor to process_tree_contents
  2018-10-12 20:01 ` [PATCH v12 0/8] filter: support for excluding all trees and blobs Matthew DeVore
  2018-10-12 20:01   ` [PATCH v12 1/8] list-objects: store common func args in struct Matthew DeVore
@ 2018-10-12 20:01   ` Matthew DeVore
  2018-10-12 20:01   ` [PATCH v12 3/8] list-objects: always parse trees gently Matthew DeVore
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-12 20:01 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

This will be used in a follow-up patch to reduce indentation needed when
invoking the logic conditionally. i.e. rather than:

if (foo) {
	while (...) {
		/* this is very indented */
	}
}

we will have:

if (foo)
	process_tree_contents(...);

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 68 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 584518a3f..ccc529e5e 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -94,6 +94,46 @@ static void process_gitlink(struct traversal_context *ctx,
 	/* Nothing to do */
 }
 
+static void process_tree(struct traversal_context *ctx,
+			 struct tree *tree,
+			 struct strbuf *base,
+			 const char *name);
+
+static void process_tree_contents(struct traversal_context *ctx,
+				  struct tree *tree,
+				  struct strbuf *base)
+{
+	struct tree_desc desc;
+	struct name_entry entry;
+	enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
+		all_entries_interesting : entry_not_interesting;
+
+	init_tree_desc(&desc, tree->buffer, tree->size);
+
+	while (tree_entry(&desc, &entry)) {
+		if (match != all_entries_interesting) {
+			match = tree_entry_interesting(&entry, base, 0,
+						       &ctx->revs->diffopt.pathspec);
+			if (match == all_entries_not_interesting)
+				break;
+			if (match == entry_not_interesting)
+				continue;
+		}
+
+		if (S_ISDIR(entry.mode))
+			process_tree(ctx,
+				     lookup_tree(the_repository, entry.oid),
+				     base, entry.path);
+		else if (S_ISGITLINK(entry.mode))
+			process_gitlink(ctx, entry.oid->hash,
+					base, entry.path);
+		else
+			process_blob(ctx,
+				     lookup_blob(the_repository, entry.oid),
+				     base, entry.path);
+	}
+}
+
 static void process_tree(struct traversal_context *ctx,
 			 struct tree *tree,
 			 struct strbuf *base,
@@ -101,10 +141,6 @@ static void process_tree(struct traversal_context *ctx,
 {
 	struct object *obj = &tree->object;
 	struct rev_info *revs = ctx->revs;
-	struct tree_desc desc;
-	struct name_entry entry;
-	enum interesting match = revs->diffopt.pathspec.nr == 0 ?
-		all_entries_interesting: entry_not_interesting;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 	int gently = revs->ignore_missing_links ||
@@ -144,29 +180,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
-
-	while (tree_entry(&desc, &entry)) {
-		if (match != all_entries_interesting) {
-			match = tree_entry_interesting(&entry, base, 0,
-						       &revs->diffopt.pathspec);
-			if (match == all_entries_not_interesting)
-				break;
-			if (match == entry_not_interesting)
-				continue;
-		}
-
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
-		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(ctx, entry.oid->hash, base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
-	}
+	process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
-- 
2.19.1.331.ge82ca0e54c-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v12 3/8] list-objects: always parse trees gently
  2018-10-12 20:01 ` [PATCH v12 0/8] filter: support for excluding all trees and blobs Matthew DeVore
  2018-10-12 20:01   ` [PATCH v12 1/8] list-objects: store common func args in struct Matthew DeVore
  2018-10-12 20:01   ` [PATCH v12 2/8] list-objects: refactor to process_tree_contents Matthew DeVore
@ 2018-10-12 20:01   ` Matthew DeVore
  2018-10-12 20:01   ` [PATCH v12 4/8] rev-list: handle missing tree objects properly Matthew DeVore
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-12 20:01 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

If parsing fails when revs->ignore_missing_links and
revs->exclude_promisor_objects are both false, we print the OID anyway
in the die("bad tree object...") call, so any message printed by
parse_tree_gently() is superfluous.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index ccc529e5e..f9b51db7a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,8 +143,6 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
-	int gently = revs->ignore_missing_links ||
-		     revs->exclude_promisor_objects;
 
 	if (!revs->tree_objects)
 		return;
@@ -152,7 +150,7 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, gently) < 0) {
+	if (parse_tree_gently(tree, 1) < 0) {
 		if (revs->ignore_missing_links)
 			return;
 
-- 
2.19.1.331.ge82ca0e54c-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v12 4/8] rev-list: handle missing tree objects properly
  2018-10-12 20:01 ` [PATCH v12 0/8] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (2 preceding siblings ...)
  2018-10-12 20:01   ` [PATCH v12 3/8] list-objects: always parse trees gently Matthew DeVore
@ 2018-10-12 20:01   ` Matthew DeVore
  2018-10-12 20:01   ` [PATCH v12 5/8] revision: mark non-user-given objects instead Matthew DeVore
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-12 20:01 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

Previously, we assumed only blob objects could be missing. This patch
makes rev-list handle missing trees like missing blobs. The --missing=*
and --exclude-promisor-objects flags now work for trees as they already
do for blobs. This is demonstrated in t6112.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 builtin/rev-list.c                     | 11 ++++---
 list-objects.c                         | 11 +++++--
 revision.h                             | 15 +++++++++
 t/t0410-partial-clone.sh               | 45 ++++++++++++++++++++++++++
 t/t5317-pack-objects-filter-objects.sh | 13 ++++++++
 t/t6112-rev-list-filters-objects.sh    | 22 +++++++++++++
 6 files changed, 110 insertions(+), 7 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5b07f3f4a..49d6deed7 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -6,6 +6,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "object.h"
 #include "object-store.h"
 #include "pack.h"
 #include "pack-bitmap.h"
@@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj)
 	 */
 	switch (arg_missing_action) {
 	case MA_ERROR:
-		die("missing blob object '%s'", oid_to_hex(&obj->oid));
+		die("missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	case MA_ALLOW_ANY:
@@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj)
 	case MA_ALLOW_PROMISOR:
 		if (is_promisor_object(&obj->oid))
 			return;
-		die("unexpected missing blob object '%s'",
-		    oid_to_hex(&obj->oid));
+		die("unexpected missing %s object '%s'",
+		    type_name(obj->type), oid_to_hex(&obj->oid));
 		return;
 
 	default:
@@ -235,7 +237,7 @@ static inline void finish_object__ma(struct object *obj)
 static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
+	if (!has_object_file(&obj->oid)) {
 		finish_object__ma(obj);
 		return 1;
 	}
@@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	init_revisions(&revs, prefix);
 	revs.abbrev = DEFAULT_ABBREV;
 	revs.commit_format = CMIT_FMT_UNSPECIFIED;
+	revs.do_not_die_on_missing_tree = 1;
 
 	/*
 	 * Scan the argument list before invoking setup_revisions(), so that we
diff --git a/list-objects.c b/list-objects.c
index f9b51db7a..243192af5 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,6 +143,7 @@ static void process_tree(struct traversal_context *ctx,
 	struct rev_info *revs = ctx->revs;
 	int baselen = base->len;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
+	int failed_parse;
 
 	if (!revs->tree_objects)
 		return;
@@ -150,7 +151,9 @@ static void process_tree(struct traversal_context *ctx,
 		die("bad tree object");
 	if (obj->flags & (UNINTERESTING | SEEN))
 		return;
-	if (parse_tree_gently(tree, 1) < 0) {
+
+	failed_parse = parse_tree_gently(tree, 1);
+	if (failed_parse) {
 		if (revs->ignore_missing_links)
 			return;
 
@@ -163,7 +166,8 @@ static void process_tree(struct traversal_context *ctx,
 		    is_promisor_object(&obj->oid))
 			return;
 
-		die("bad tree object %s", oid_to_hex(&obj->oid));
+		if (!revs->do_not_die_on_missing_tree)
+			die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
 	strbuf_addstr(base, name);
@@ -178,7 +182,8 @@ static void process_tree(struct traversal_context *ctx,
 	if (base->len)
 		strbuf_addch(base, '/');
 
-	process_tree_contents(ctx, tree, base);
+	if (!failed_parse)
+		process_tree_contents(ctx, tree, base);
 
 	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
diff --git a/revision.h b/revision.h
index 007278cc1..5910613cb 100644
--- a/revision.h
+++ b/revision.h
@@ -126,6 +126,21 @@ struct rev_info {
 			line_level_traverse:1,
 			tree_blobs_in_commit_order:1,
 
+			/*
+			 * Blobs are shown without regard for their existence.
+			 * But not so for trees: unless exclude_promisor_objects
+			 * is set and the tree in question is a promisor object;
+			 * OR ignore_missing_links is set, the revision walker
+			 * dies with a "bad tree object HASH" message when
+			 * encountering a missing tree. For callers that can
+			 * handle missing trees and want them to be filterable
+			 * and showable, set this to true. The revision walker
+			 * will filter and show such a missing tree as usual,
+			 * but will not attempt to recurse into this tree
+			 * object.
+			 */
+			do_not_die_on_missing_tree:1,
+
 			/* for internal use only */
 			exclude_promisor_objects:1;
 
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 128130066..5bc5b4445 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -186,6 +186,51 @@ test_expect_success 'rev-list stops traversal at missing and promised commit' '
 	! grep $FOO out
 '
 
+test_expect_success 'missing tree objects with --missing=allow-promisor and --exclude-promisor-objects' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	test_commit -C repo foo &&
+	test_commit -C repo bar &&
+	test_commit -C repo baz &&
+
+	promise_and_delete $(git -C repo rev-parse bar^{tree}) &&
+	promise_and_delete $(git -C repo rev-parse foo^{tree}) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo rev-list --missing=allow-promisor --objects HEAD >objs 2>rev_list_err &&
+	test_must_be_empty rev_list_err &&
+	# 3 commits, 3 blobs, and 1 tree
+	test_line_count = 7 objs &&
+
+	# Do the same for --exclude-promisor-objects, but with all trees gone.
+	promise_and_delete $(git -C repo rev-parse baz^{tree}) &&
+	git -C repo rev-list --exclude-promisor-objects --objects HEAD >objs 2>rev_list_err &&
+	test_must_be_empty rev_list_err &&
+	# 3 commits, no blobs or trees
+	test_line_count = 3 objs
+'
+
+test_expect_success 'missing non-root tree object and rev-list' '
+	rm -rf repo &&
+	test_create_repo repo &&
+	mkdir repo/dir &&
+	echo foo >repo/dir/foo &&
+	git -C repo add dir/foo &&
+	git -C repo commit -m "commit dir/foo" &&
+
+	promise_and_delete $(git -C repo rev-parse HEAD:dir) &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config extensions.partialclone "arbitrary string" &&
+
+	git -C repo rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err &&
+	test_must_be_empty rev_list_err &&
+	# 1 commit and 1 tree
+	test_line_count = 2 objs
+'
+
 test_expect_success 'rev-list stops traversal at missing and promised tree' '
 	rm -rf repo &&
 	test_create_repo repo &&
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 6710c8bc8..ba83f3829 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -59,6 +59,19 @@ test_expect_success 'verify normal and blob:none packfiles have same commits/tre
 	test_cmp observed expected
 '
 
+test_expect_success 'get an error for missing tree object' '
+	git init r5 &&
+	echo foo >r5/foo &&
+	git -C r5 add foo &&
+	git -C r5 commit -m "foo" &&
+	del=$(git -C r5 rev-parse HEAD^{tree} | sed "s|..|&/|") &&
+	rm r5/.git/objects/$del &&
+	test_must_fail git -C r5 pack-objects --rev --stdout 2>bad_tree <<-EOF &&
+	HEAD
+	EOF
+	grep "bad tree object" bad_tree
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index d4ff0b3be..efe5a2467 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -195,6 +195,28 @@ test_expect_success 'verify sparse:oid=oid-ish omits top-level files' '
 	test_cmp observed expected
 '
 
+test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for trees' '
+	TREE=$(git -C r3 rev-parse HEAD:dir1) &&
+
+	# Create a spare repo because we will be deleting objects from this one.
+	git clone r3 r3.b &&
+
+	rm r3.b/.git/objects/$(echo $TREE | sed "s|^..|&/|") &&
+
+	git -C r3.b rev-list --quiet --missing=print --objects HEAD \
+		>missing_objs 2>rev_list_err &&
+	echo "?$TREE" >expected &&
+	test_cmp expected missing_objs &&
+
+	# do not complain when a missing tree cannot be parsed
+	test_must_be_empty rev_list_err &&
+
+	git -C r3.b rev-list --missing=allow-any --objects HEAD \
+		>objs 2>rev_list_err &&
+	! grep $TREE objs &&
+	test_must_be_empty rev_list_err
+'
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.19.1.331.ge82ca0e54c-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v12 5/8] revision: mark non-user-given objects instead
  2018-10-12 20:01 ` [PATCH v12 0/8] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (3 preceding siblings ...)
  2018-10-12 20:01   ` [PATCH v12 4/8] rev-list: handle missing tree objects properly Matthew DeVore
@ 2018-10-12 20:01   ` Matthew DeVore
  2018-10-12 20:01   ` [PATCH v12 6/8] list-objects-filter: use BUG rather than die Matthew DeVore
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-12 20:01 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

Currently, list-objects.c incorrectly treats all root trees of commits
as USER_GIVEN. Also, it would be easier to mark objects that are
non-user-given instead of user-given, since the places in the code
where we access an object through a reference are more obvious than
the places where we access an object that was given by the user.

Resolve these two problems by introducing a flag NOT_USER_GIVEN that
marks blobs and trees that are non-user-given, replacing USER_GIVEN.
(Only blobs and trees are marked because this mark is only used when
filtering objects, and filtering of other types of objects is not
supported yet.)

This fixes a bug in that git rev-list behaved differently from git
pack-objects. pack-objects would *not* filter objects given explicitly
on the command line and rev-list would filter. This was because the two
commands used a different function to add objects to the rev_info
struct. This seems to have been an oversight, and pack-objects has the
correct behavior, so I added a test to make sure that rev-list now
behaves properly.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects.c                      | 31 +++++++++++++++++------------
 revision.c                          |  1 -
 revision.h                          | 11 ++++++++--
 t/t6112-rev-list-filters-objects.sh | 12 +++++++++++
 4 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 243192af5..7a1a0929d 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -53,7 +53,7 @@ static void process_blob(struct traversal_context *ctx,
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BLOB, obj,
 				   path->buf, &path->buf[pathlen],
 				   ctx->filter_data);
@@ -120,17 +120,19 @@ static void process_tree_contents(struct traversal_context *ctx,
 				continue;
 		}
 
-		if (S_ISDIR(entry.mode))
-			process_tree(ctx,
-				     lookup_tree(the_repository, entry.oid),
-				     base, entry.path);
+		if (S_ISDIR(entry.mode)) {
+			struct tree *t = lookup_tree(the_repository, entry.oid);
+			t->object.flags |= NOT_USER_GIVEN;
+			process_tree(ctx, t, base, entry.path);
+		}
 		else if (S_ISGITLINK(entry.mode))
 			process_gitlink(ctx, entry.oid->hash,
 					base, entry.path);
-		else
-			process_blob(ctx,
-				     lookup_blob(the_repository, entry.oid),
-				     base, entry.path);
+		else {
+			struct blob *b = lookup_blob(the_repository, entry.oid);
+			b->object.flags |= NOT_USER_GIVEN;
+			process_blob(ctx, b, base, entry.path);
+		}
 	}
 }
 
@@ -171,7 +173,7 @@ static void process_tree(struct traversal_context *ctx,
 	}
 
 	strbuf_addstr(base, name);
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
 		r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -185,7 +187,7 @@ static void process_tree(struct traversal_context *ctx,
 	if (!failed_parse)
 		process_tree_contents(ctx, tree, base);
 
-	if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
 		r = ctx->filter_fn(LOFS_END_TREE, obj,
 				   base->buf, &base->buf[baselen],
 				   ctx->filter_data);
@@ -301,8 +303,11 @@ static void do_traverse(struct traversal_context *ctx)
 		 * an uninteresting boundary commit may not have its tree
 		 * parsed yet, but we are not going to show them anyway
 		 */
-		if (get_commit_tree(commit))
-			add_pending_tree(ctx->revs, get_commit_tree(commit));
+		if (get_commit_tree(commit)) {
+			struct tree *tree = get_commit_tree(commit);
+			tree->object.flags |= NOT_USER_GIVEN;
+			add_pending_tree(ctx->revs, tree);
+		}
 		ctx->show_commit(commit, ctx->show_data);
 
 		if (ctx->revs->tree_blobs_in_commit_order)
diff --git a/revision.c b/revision.c
index de4dce600..72d48a17f 100644
--- a/revision.c
+++ b/revision.c
@@ -175,7 +175,6 @@ static void add_pending_object_with_path(struct rev_info *revs,
 		strbuf_release(&buf);
 		return; /* do not add the commit itself */
 	}
-	obj->flags |= USER_GIVEN;
 	add_object_array_with_path(obj, name, &revs->pending, mode, path);
 }
 
diff --git a/revision.h b/revision.h
index 5910613cb..83e164039 100644
--- a/revision.h
+++ b/revision.h
@@ -21,9 +21,16 @@
 #define SYMMETRIC_LEFT	(1u<<8)
 #define PATCHSAME	(1u<<9)
 #define BOTTOM		(1u<<10)
-#define USER_GIVEN	(1u<<25) /* given directly by the user */
+/*
+ * Indicates object was reached by traversal. i.e. not given by user on
+ * command-line or stdin.
+ * NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
+ * filtering trees and blobs, but it may be useful to support filtering commits
+ * in the future.
+ */
+#define NOT_USER_GIVEN	(1u<<25)
 #define TRACK_LINEAR	(1u<<26)
-#define ALL_REV_FLAGS	(((1u<<11)-1) | USER_GIVEN | TRACK_LINEAR)
+#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)
 
 #define DECORATE_SHORT_REFS	1
 #define DECORATE_FULL_REFS	2
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index efe5a2467..110d4f74d 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -30,6 +30,18 @@ test_expect_success 'verify blob:none omits all 5 blobs' '
 	test_cmp observed expected
 '
 
+test_expect_success 'specify blob explicitly prevents filtering' '
+	file_3=$(git -C r1 ls-files -s file.3 |
+		 awk -f print_2.awk) &&
+
+	file_4=$(git -C r1 ls-files -s file.4 |
+		 awk -f print_2.awk) &&
+
+	git -C r1 rev-list --objects --filter=blob:none HEAD $file_3 >observed &&
+	grep "$file_3" observed &&
+	! grep "$file_4" observed
+'
+
 test_expect_success 'verify emitted+omitted == all' '
 	git -C r1 rev-list HEAD --objects \
 		| awk -f print_1.awk \
-- 
2.19.1.331.ge82ca0e54c-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v12 6/8] list-objects-filter: use BUG rather than die
  2018-10-12 20:01 ` [PATCH v12 0/8] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (4 preceding siblings ...)
  2018-10-12 20:01   ` [PATCH v12 5/8] revision: mark non-user-given objects instead Matthew DeVore
@ 2018-10-12 20:01   ` Matthew DeVore
  2018-10-12 20:01   ` [PATCH v12 7/8] list-objects-filter-options: do not over-strbuf_init Matthew DeVore
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-12 20:01 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

In some cases in this file, BUG makes more sense than die. In such
cases, a we get there from a coding error rather than a user error.

'return' has been removed following some instances of BUG since BUG does
not return.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects-filter.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..5f8b1a002 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -44,8 +44,7 @@ static enum list_objects_filter_result filter_blobs_none(
 
 	switch (filter_situation) {
 	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
@@ -102,8 +101,7 @@ static enum list_objects_filter_result filter_blobs_limit(
 
 	switch (filter_situation) {
 	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
@@ -208,8 +206,7 @@ static enum list_objects_filter_result filter_sparse(
 
 	switch (filter_situation) {
 	default:
-		die("unknown filter_situation");
-		return LOFR_ZERO;
+		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
@@ -389,7 +386,7 @@ void *list_objects_filter__init(
 	assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);
 
 	if (filter_options->choice >= LOFC__COUNT)
-		die("invalid list-objects filter choice: %d",
+		BUG("invalid list-objects filter choice: %d",
 		    filter_options->choice);
 
 	init_fn = s_filters[filter_options->choice];
-- 
2.19.1.331.ge82ca0e54c-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v12 7/8] list-objects-filter-options: do not over-strbuf_init
  2018-10-12 20:01 ` [PATCH v12 0/8] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (5 preceding siblings ...)
  2018-10-12 20:01   ` [PATCH v12 6/8] list-objects-filter: use BUG rather than die Matthew DeVore
@ 2018-10-12 20:01   ` Matthew DeVore
  2018-10-12 20:01   ` [PATCH v12 8/8] list-objects-filter: implement filter tree:0 Matthew DeVore
  2018-10-15  2:37   ` [PATCH v12 0/8] filter: support for excluding all trees and blobs Junio C Hamano
  8 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-12 20:01 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

The function gently_parse_list_objects_filter is either called with
errbuf=STRBUF_INIT or errbuf=NULL, but that function calls strbuf_init
when errbuf is not NULL. strbuf_init is only necessary if errbuf
contains garbage, and risks a memory leak if errbuf already has a
non-STRBUF_INIT state. It should be the caller's responsibility to make
sure errbuf is not garbage, since garbage content is easily avoidable
with STRBUF_INIT.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects-filter-options.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..d259bdb2c 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -30,7 +30,6 @@ static int gently_parse_list_objects_filter(
 
 	if (filter_options->choice) {
 		if (errbuf) {
-			strbuf_init(errbuf, 0);
 			strbuf_addstr(
 				errbuf,
 				_("multiple filter-specs cannot be combined"));
@@ -71,10 +70,9 @@ static int gently_parse_list_objects_filter(
 		return 0;
 	}
 
-	if (errbuf) {
-		strbuf_init(errbuf, 0);
+	if (errbuf)
 		strbuf_addf(errbuf, "invalid filter-spec '%s'", arg);
-	}
+
 	memset(filter_options, 0, sizeof(*filter_options));
 	return 1;
 }
-- 
2.19.1.331.ge82ca0e54c-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v12 8/8] list-objects-filter: implement filter tree:0
  2018-10-12 20:01 ` [PATCH v12 0/8] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (6 preceding siblings ...)
  2018-10-12 20:01   ` [PATCH v12 7/8] list-objects-filter-options: do not over-strbuf_init Matthew DeVore
@ 2018-10-12 20:01   ` Matthew DeVore
  2018-10-15  2:37   ` [PATCH v12 0/8] filter: support for excluding all trees and blobs Junio C Hamano
  8 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-12 20:01 UTC (permalink / raw)
  To: git
  Cc: Matthew DeVore, sbeller, git, jeffhost, peff, stefanbeller,
	jonathantanmy, gitster, pclouds

Teach list-objects the "tree:0" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:0 - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also considered only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
would filter out all but the root tree and blobs. In order to avoid
confusion between 0 and capital O, the documentation was worded in a
somewhat round-about way that also hints at this future improvement to
the feature.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 Documentation/rev-list-options.txt     |  5 +++
 list-objects-filter-options.c          | 13 +++++++
 list-objects-filter-options.h          |  1 +
 list-objects-filter.c                  | 49 ++++++++++++++++++++++++++
 t/t5317-pack-objects-filter-objects.sh | 28 +++++++++++++++
 t/t5616-partial-clone.sh               | 42 ++++++++++++++++++++++
 t/t6112-rev-list-filters-objects.sh    | 15 ++++++++
 7 files changed, 153 insertions(+)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 7b273635d..5f1672913 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -731,6 +731,11 @@ the requested refs.
 +
 The form '--filter=sparse:path=<path>' similarly uses a sparse-checkout
 specification contained in <path>.
++
+The form '--filter=tree:<depth>' omits all blobs and trees whose depth
+from the root tree is >= <depth> (minimum depth if an object is located
+at multiple depths in the commits traversed). Currently, only <depth>=0
+is supported, which omits all blobs and trees.
 
 --no-filter::
 	Turn off any previous `--filter=` argument.
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index d259bdb2c..e8da2e858 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -49,6 +49,19 @@ static int gently_parse_list_objects_filter(
 			return 0;
 		}
 
+	} else if (skip_prefix(arg, "tree:", &v0)) {
+		unsigned long depth;
+		if (!git_parse_ulong(v0, &depth) || depth != 0) {
+			if (errbuf) {
+				strbuf_addstr(
+					errbuf,
+					_("only 'tree:0' is supported"));
+			}
+			return 1;
+		}
+		filter_options->choice = LOFC_TREE_NONE;
+		return 0;
+
 	} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
 		struct object_context oc;
 		struct object_id sparse_oid;
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 0000a61f8..af64e5c66 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -10,6 +10,7 @@ enum list_objects_filter_choice {
 	LOFC_DISABLED = 0,
 	LOFC_BLOB_NONE,
 	LOFC_BLOB_LIMIT,
+	LOFC_TREE_NONE,
 	LOFC_SPARSE_OID,
 	LOFC_SPARSE_PATH,
 	LOFC__COUNT /* must be last */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 5f8b1a002..09b2b05d5 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -79,6 +79,54 @@ static void *filter_blobs_none__init(
 	return d;
 }
 
+/*
+ * A filter for list-objects to omit ALL trees and blobs from the traversal.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_trees_none_data {
+	struct oidset *omits;
+};
+
+static enum list_objects_filter_result filter_trees_none(
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data_)
+{
+	struct filter_trees_none_data *filter_data = filter_data_;
+
+	switch (filter_situation) {
+	default:
+		BUG("unknown filter_situation: %d", filter_situation);
+
+	case LOFS_BEGIN_TREE:
+	case LOFS_BLOB:
+		if (filter_data->omits)
+			oidset_insert(filter_data->omits, &obj->oid);
+		return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
+
+	case LOFS_END_TREE:
+		assert(obj->type == OBJ_TREE);
+		return LOFR_ZERO;
+
+	}
+}
+
+static void* filter_trees_none__init(
+	struct oidset *omitted,
+	struct list_objects_filter_options *filter_options,
+	filter_object_fn *filter_fn,
+	filter_free_fn *filter_free_fn)
+{
+	struct filter_trees_none_data *d = xcalloc(1, sizeof(*d));
+	d->omits = omitted;
+
+	*filter_fn = filter_trees_none;
+	*filter_free_fn = free;
+	return d;
+}
+
 /*
  * A filter for list-objects to omit large blobs.
  * And to OPTIONALLY collect a list of the omitted OIDs.
@@ -371,6 +419,7 @@ static filter_init_fn s_filters[] = {
 	NULL,
 	filter_blobs_none__init,
 	filter_blobs_limit__init,
+	filter_trees_none__init,
 	filter_sparse_oid__init,
 	filter_sparse_path__init,
 };
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index ba83f3829..d9dccf4d4 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -72,6 +72,34 @@ test_expect_success 'get an error for missing tree object' '
 	grep "bad tree object" bad_tree
 '
 
+test_expect_success 'setup for tests of tree:0' '
+	mkdir r1/subtree &&
+	echo "This is a file in a subtree" >r1/subtree/file &&
+	git -C r1 add subtree/file &&
+	git -C r1 commit -m subtree
+'
+
+test_expect_success 'verify tree:0 packfile has no blobs or trees' '
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	! grep -E "tree|blob" objs
+'
+
+test_expect_success 'grab tree directly when using tree:0' '
+	# We should get the tree specified directly but not its blobs or subtrees.
+	git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack <<-EOF &&
+	HEAD:
+	EOF
+	git -C r1 index-pack ../commitsonly.pack &&
+	git -C r1 verify-pack -v ../commitsonly.pack >objs &&
+	awk "/tree|blob/{print \$1}" objs >trees_and_blobs &&
+	git -C r1 rev-parse HEAD: >expected &&
+	test_cmp expected trees_and_blobs
+'
+
 # Test blob:limit=<n>[kmg] filter.
 # We boundary test around the size parameter.  The filter is strictly less than
 # the value, so size 500 and 1000 should have the same results, but 1001 should
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index bbbe7537d..392caa08f 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -154,6 +154,48 @@ test_expect_success 'partial clone with transfer.fsckobjects=1 uses index-pack -
 	grep "git index-pack.*--fsck-objects" trace
 '
 
+test_expect_success 'use fsck before and after manually fetching a missing subtree' '
+	# push new commit so server has a subtree
+	mkdir src/dir &&
+	echo "in dir" >src/dir/file.txt &&
+	git -C src add dir/file.txt &&
+	git -C src commit -m "file in dir" &&
+	git -C src push -u srv master &&
+	SUBTREE=$(git -C src rev-parse HEAD:dir) &&
+
+	rm -rf dst &&
+	git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
+	git -C dst fsck &&
+
+	# Make sure we only have commits, and all trees and blobs are missing.
+	git -C dst rev-list --missing=allow-any --objects master \
+		>fetched_objects &&
+	awk -f print_1.awk fetched_objects |
+	xargs -n1 git -C dst cat-file -t >fetched_types &&
+
+	sort -u fetched_types >unique_types.observed &&
+	echo commit >unique_types.expected &&
+	test_cmp unique_types.expected unique_types.observed &&
+
+	# Auto-fetch a tree with cat-file.
+	git -C dst cat-file -p $SUBTREE >tree_contents &&
+	grep file.txt tree_contents &&
+
+	# fsck still works after an auto-fetch of a tree.
+	git -C dst fsck &&
+
+	# Auto-fetch all remaining trees and blobs with --missing=error
+	git -C dst rev-list --missing=error --objects master >fetched_objects &&
+	test_line_count = 70 fetched_objects &&
+
+	awk -f print_1.awk fetched_objects |
+	xargs -n1 git -C dst cat-file -t >fetched_types &&
+
+	sort -u fetched_types >unique_types.observed &&
+	test_write_lines blob commit tree >unique_types.expected &&
+	test_cmp unique_types.expected unique_types.observed
+'
+
 test_expect_success 'partial clone fetches blobs pointed to by refs even if normally filtered out' '
 	rm -rf src dst &&
 	git init src &&
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 110d4f74d..08e0c7db6 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -229,6 +229,21 @@ test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for tre
 	test_must_be_empty rev_list_err
 '
 
+# Test tree:0 filter.
+
+test_expect_success 'verify tree:0 includes trees in "filtered" output' '
+	git -C r3 rev-list --quiet --objects --filter-print-omitted \
+		--filter=tree:0 HEAD >revs &&
+
+	awk -f print_1.awk revs |
+	sed s/~// |
+	xargs -n1 git -C r3 cat-file -t >unsorted_filtered_types &&
+
+	sort -u unsorted_filtered_types >filtered_types &&
+	test_write_lines blob tree >expected &&
+	test_cmp expected filtered_types
+'
+
 # Delete some loose objects and use rev-list, but WITHOUT any filtering.
 # This models previously omitted objects that we did not receive.
 
-- 
2.19.1.331.ge82ca0e54c-goog


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v12 0/8] filter: support for excluding all trees and blobs
  2018-10-12 20:01 ` [PATCH v12 0/8] filter: support for excluding all trees and blobs Matthew DeVore
                     ` (7 preceding siblings ...)
  2018-10-12 20:01   ` [PATCH v12 8/8] list-objects-filter: implement filter tree:0 Matthew DeVore
@ 2018-10-15  2:37   ` Junio C Hamano
  2018-10-15  3:42     ` Junio C Hamano
  8 siblings, 1 reply; 151+ messages in thread
From: Junio C Hamano @ 2018-10-15  2:37 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: git, sbeller, git, jeffhost, peff, stefanbeller, jonathantanmy, pclouds

Matthew DeVore <matvore@google.com> writes:

> Here is a re-roll-up since I haven't received any additional corrections for
> almost a week.

Sorry, but doesn't this topic already in 'next'?  If so, please make
these small clean-ups as incremental patches.

Thansk.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v12 0/8] filter: support for excluding all trees and blobs
  2018-10-15  2:37   ` [PATCH v12 0/8] filter: support for excluding all trees and blobs Junio C Hamano
@ 2018-10-15  3:42     ` Junio C Hamano
  2018-10-16 15:00       ` Matthew DeVore
  0 siblings, 1 reply; 151+ messages in thread
From: Junio C Hamano @ 2018-10-15  3:42 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: git, sbeller, git, jeffhost, peff, stefanbeller, jonathantanmy, pclouds

Junio C Hamano <gitster@pobox.com> writes:

> Matthew DeVore <matvore@google.com> writes:
>
>> Here is a re-roll-up since I haven't received any additional corrections for
>> almost a week.
>
> Sorry, but doesn't this topic already sit in 'next'?  If so, please make
> these small clean-ups as incremental patches.

Here is what I'd queue for now, with forged s-o-by from you ;-).

Thanks.

-- >8 --
From: Matthew DeVore <matvore@google.com>
Date: Fri, 12 Oct 2018 13:01:41 -0700
Subject: [PATCH] filter-trees: code clean-up of tests

A few trivial updates to test to match the current best practices.

 - avoid "grep -q" that strips potentially useful output from running
   tests under "-v".

 - use test_write_lines to prepare multi-line expected output file

 - reserve use of test_must_fail to "git" commands.

Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t5317-pack-objects-filter-objects.sh | 2 +-
 t/t5616-partial-clone.sh               | 2 +-
 t/t6112-rev-list-filters-objects.sh    | 6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 510d3537f6..d9dccf4d4d 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -69,7 +69,7 @@ test_expect_success 'get an error for missing tree object' '
 	test_must_fail git -C r5 pack-objects --rev --stdout 2>bad_tree <<-EOF &&
 	HEAD
 	EOF
-	grep -q "bad tree object" bad_tree
+	grep "bad tree object" bad_tree
 '
 
 test_expect_success 'setup for tests of tree:0' '
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 53fbf7db88..392caa08fd 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -192,7 +192,7 @@ test_expect_success 'use fsck before and after manually fetching a missing subtr
 	xargs -n1 git -C dst cat-file -t >fetched_types &&
 
 	sort -u fetched_types >unique_types.observed &&
-	printf "blob\ncommit\ntree\n" >unique_types.expected &&
+	test_write_lines blob commit tree >unique_types.expected &&
 	test_cmp unique_types.expected unique_types.observed
 '
 
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 2cbb81d3bb..d24f9d5b5a 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -38,8 +38,8 @@ test_expect_success 'specify blob explicitly prevents filtering' '
 		 awk -f print_2.awk) &&
 
 	git -C r1 rev-list --objects --filter=blob:none HEAD $file_3 >observed &&
-	grep -q "$file_3" observed &&
-	test_must_fail grep -q "$file_4" observed
+	grep "$file_3" observed &&
+	! grep "$file_4" observed
 '
 
 test_expect_success 'verify emitted+omitted == all' '
@@ -241,7 +241,7 @@ test_expect_success 'verify tree:0 includes trees in "filtered" output' '
 	xargs -n1 git -C r3 cat-file -t >unsorted_filtered_types &&
 
 	sort -u unsorted_filtered_types >filtered_types &&
-	printf "blob\ntree\n" >expected &&
+	test_write_lines blob tree >expected &&
 	test_cmp expected filtered_types
 '
 
-- 
2.19.1-328-g5a0cc8aca7


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v12 0/8] filter: support for excluding all trees and blobs
  2018-10-15  3:42     ` Junio C Hamano
@ 2018-10-16 15:00       ` Matthew DeVore
  0 siblings, 0 replies; 151+ messages in thread
From: Matthew DeVore @ 2018-10-16 15:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Stefan Beller, git, jeffhost, Jeff King, Stefan Beller,
	Jonathan Tan, pclouds

On Sun, Oct 14, 2018 at 8:43 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Matthew DeVore <matvore@google.com> writes:
> >
> >> Here is a re-roll-up since I haven't received any additional corrections for
> >> almost a week.
> >
> > Sorry, but doesn't this topic already sit in 'next'?  If so, please make
> > these small clean-ups as incremental patches.
>
> Here is what I'd queue for now, with forged s-o-by from you ;-).
>

Yes, this is fine, thank you! I've reapplied the patch in my own repo
on top of "next" in case I need to fix it and re-send, but please
queue what you have as-is.

^ permalink raw reply	[flat|nested] 151+ messages in thread

end of thread, back to index

Thread overview: 151+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-09 22:44 [RFC PATCH 0/5] filter: support for excluding all trees and blobs Matthew DeVore
2018-08-09 22:45 ` [PATCH 1/5] revision: invert meaning of the USER_GIVEN flag Matthew DeVore
2018-08-10 18:43   ` Jonathan Tan
2018-08-09 22:45 ` [PATCH 2/5] list-objects-filter: implement filter only:commits Matthew DeVore
2018-08-10  0:14   ` Jonathan Tan
2018-08-09 22:45 ` [PATCH 3/5] list-objects: store common func args in struct Matthew DeVore
2018-08-09 22:45 ` [PATCH 4/5] list-objects: refactor to process_tree_contents Matthew DeVore
2018-08-09 22:45 ` [PATCH 5/5] rev-list: handle missing tree objects properly Matthew DeVore
2018-08-10  0:24   ` Jonathan Tan
2018-08-10 19:03 ` [RFC PATCH 0/5] filter: support for excluding all trees and blobs Jonathan Tan
2018-08-10 23:06 ` [PATCH v2 " Matthew DeVore
2018-08-10 23:06   ` [PATCH v2 1/5] list-objects: store common func args in struct Matthew DeVore
2018-08-10 23:06   ` [PATCH v2 2/5] list-objects: refactor to process_tree_contents Matthew DeVore
2018-08-10 23:06   ` [PATCH v2 3/5] rev-list: handle missing tree objects properly Matthew DeVore
2018-08-13 18:20     ` Jonathan Tan
2018-08-14  0:22       ` Matthew DeVore
2018-08-14 16:03         ` Jonathan Tan
2018-08-10 23:06   ` [PATCH v2 4/5] revision: mark non-user-given objects instead Matthew DeVore
2018-08-10 23:06   ` [PATCH v2 5/5] list-objects-filter: implement filter tree:none Matthew DeVore
2018-08-13 16:38     ` Jeff Hostetler
2018-08-14  0:57       ` Matthew DeVore
2018-08-13 18:29     ` Jonathan Tan
2018-08-14  0:55       ` Matthew DeVore
2018-08-13 18:14 ` [PATCH v3 0/5] filter: support for excluding all trees and blobs Matthew DeVore
2018-08-13 18:14   ` [PATCH v3 1/5] list-objects: store common func args in struct Matthew DeVore
2018-08-13 18:14   ` [PATCH v3 2/5] list-objects: refactor to process_tree_contents Matthew DeVore
2018-08-13 18:14   ` [PATCH v3 3/5] rev-list: handle missing tree objects properly Matthew DeVore
2018-08-13 18:14   ` [PATCH v3 4/5] revision: mark non-user-given objects instead Matthew DeVore
2018-08-13 18:14   ` [PATCH v3 5/5] list-objects-filter: implement filter tree:0 Matthew DeVore
2018-08-14 15:13     ` Jeff Hostetler
2018-08-14 17:25       ` Matthew DeVore
2018-10-03 19:00       ` Matthew DeVore
2018-08-14 17:28 ` [PATCH v4 0/6] filter: support for excluding all trees and blobs Matthew DeVore
2018-08-14 17:28   ` [PATCH v4 1/6] list-objects: store common func args in struct Matthew DeVore
2018-08-14 17:28   ` [PATCH v4 2/6] list-objects: refactor to process_tree_contents Matthew DeVore
2018-08-14 17:28   ` [PATCH v4 3/6] list-objects: always parse trees gently Matthew DeVore
2018-08-14 17:28   ` [PATCH v4 4/6] rev-list: handle missing tree objects properly Matthew DeVore
2018-08-14 18:06     ` Jonathan Tan
2018-08-14 22:43       ` Matthew DeVore
2018-08-14 22:56         ` Jonathan Tan
2018-08-14 23:14           ` Jonathan Tan
2018-08-14 17:28   ` [PATCH v4 5/6] revision: mark non-user-given objects instead Matthew DeVore
2018-08-14 17:28   ` [PATCH v4 6/6] list-objects-filter: implement filter tree:0 Matthew DeVore
2018-08-14 18:18     ` Jonathan Tan
2018-08-14 20:00       ` Matthew DeVore
2018-08-14 20:19         ` Jonathan Tan
2018-08-14 20:55           ` Junio C Hamano
2018-08-14 23:30             ` Matthew DeVore
2018-08-15 16:14               ` Junio C Hamano
2018-08-15 16:37                 ` Matthew DeVore
2018-08-14 20:01     ` Jeff King
2018-08-14 23:55       ` Matthew DeVore
2018-08-15  1:22         ` Jeff King
2018-08-15 16:17           ` Junio C Hamano
2018-08-15 17:54             ` Matthew DeVore
2018-08-15  0:22 ` [PATCH v5 0/6] filter: support for excluding all trees and blobs Matthew DeVore
2018-08-15  0:22   ` [PATCH v5 1/6] list-objects: store common func args in struct Matthew DeVore
2018-08-15  0:22   ` [PATCH v5 2/6] list-objects: refactor to process_tree_contents Matthew DeVore
2018-08-15  0:22   ` [PATCH v5 3/6] list-objects: always parse trees gently Matthew DeVore
2018-08-15  0:22   ` [PATCH v5 4/6] rev-list: handle missing tree objects properly Matthew DeVore
2018-08-15  0:22   ` [PATCH v5 5/6] revision: mark non-user-given objects instead Matthew DeVore
2018-08-15  0:22   ` [PATCH v5 6/6] list-objects-filter: implement filter tree:0 Matthew DeVore
2018-08-15 23:19 ` [PATCH v6 0/6] filter: support for excluding all trees and blobs Matthew DeVore
2018-08-15 23:19   ` [PATCH v6 1/6] list-objects: store common func args in struct Matthew DeVore
2018-08-15 23:19   ` [PATCH v6 2/6] list-objects: refactor to process_tree_contents Matthew DeVore
2018-08-15 23:19   ` [PATCH v6 3/6] list-objects: always parse trees gently Matthew DeVore
2018-08-15 23:19   ` [PATCH v6 4/6] rev-list: handle missing tree objects properly Matthew DeVore
2018-08-15 23:19   ` [PATCH v6 5/6] revision: mark non-user-given objects instead Matthew DeVore
2018-08-15 23:19   ` [PATCH v6 6/6] list-objects-filter: implement filter tree:0 Matthew DeVore
2018-08-17 21:42     ` Stefan Beller
2018-08-17 22:19       ` Matthew DeVore
2018-08-17 22:28         ` Stefan Beller
2018-08-20 23:30           ` Matthew DeVore
2018-08-21  0:29             ` Stefan Beller
2018-08-21 21:46               ` Junio C Hamano
2018-08-22 18:00                 ` Stefan Beller
2018-08-18 16:17     ` Duy Nguyen
2018-08-20 13:04       ` Matthew DeVore
2018-08-20 18:38         ` Stefan Beller
2018-08-20 23:20           ` Matthew DeVore
2018-08-21  0:36             ` Stefan Beller
2018-08-21 15:50           ` Duy Nguyen
2018-09-04 18:05 ` [PATCH v7 0/7] filter: support for excluding all trees and blobs Matthew DeVore
2018-09-04 18:05   ` [PATCH v7 1/7] list-objects: store common func args in struct Matthew DeVore
2018-09-04 18:05   ` [PATCH v7 2/7] list-objects: refactor to process_tree_contents Matthew DeVore
2018-09-04 18:05   ` [PATCH v7 3/7] list-objects: always parse trees gently Matthew DeVore
2018-09-04 18:05   ` [PATCH v7 4/7] rev-list: handle missing tree objects properly Matthew DeVore
2018-09-04 18:05   ` [PATCH v7 5/7] revision: mark non-user-given objects instead Matthew DeVore
2018-09-04 20:31     ` Junio C Hamano
2018-09-05 18:00       ` Matthew DeVore
2018-09-04 18:05   ` [PATCH v7 6/7] list-objects-filter: use BUG rather than die Matthew DeVore
2018-09-04 20:32     ` Junio C Hamano
2018-09-04 18:05   ` [PATCH v7 7/7] list-objects-filter: implement filter tree:0 Matthew DeVore
2018-09-04 20:44     ` Junio C Hamano
2018-09-06  0:08       ` Matthew DeVore
2018-09-04 18:41   ` [PATCH v7 0/7] filter: support for excluding all trees and blobs Stefan Beller
2018-09-14  0:55 ` [PATCH v8 " Matthew DeVore
2018-09-14  0:55   ` [PATCH v8 1/7] list-objects: store common func args in struct Matthew DeVore
2018-09-14  0:55   ` [PATCH v8 2/7] list-objects: refactor to process_tree_contents Matthew DeVore
2018-09-14  0:55   ` [PATCH v8 3/7] list-objects: always parse trees gently Matthew DeVore
2018-09-14  0:55   ` [PATCH v8 4/7] rev-list: handle missing tree objects properly Matthew DeVore
2018-09-14  0:55   ` [PATCH v8 5/7] revision: mark non-user-given objects instead Matthew DeVore
2018-09-14 17:23     ` Junio C Hamano
2018-09-14 20:08       ` Matthew DeVore
2018-09-14  0:55   ` [PATCH v8 6/7] list-objects-filter: use BUG rather than die Matthew DeVore
2018-09-14  0:55   ` [PATCH v8 7/7] list-objects-filter: implement filter tree:0 Matthew DeVore
2018-09-14 17:39     ` Junio C Hamano
2018-09-14 17:47       ` Junio C Hamano
2018-09-15  0:41         ` Matthew DeVore
2018-09-21 20:31 ` [PATCH v9 0/8] filter: support for excluding all trees and blobs Matthew DeVore
2018-09-21 20:31   ` [PATCH v9 1/8] list-objects: store common func args in struct Matthew DeVore
2018-09-21 20:31   ` [PATCH v9 2/8] list-objects: refactor to process_tree_contents Matthew DeVore
2018-09-21 20:31   ` [PATCH v9 3/8] list-objects: always parse trees gently Matthew DeVore
2018-09-21 20:32   ` [PATCH v9 4/8] rev-list: handle missing tree objects properly Matthew DeVore
2018-09-21 20:32   ` [PATCH v9 5/8] revision: mark non-user-given objects instead Matthew DeVore
2018-09-21 20:32   ` [PATCH v9 6/8] list-objects-filter: use BUG rather than die Matthew DeVore
2018-09-21 20:32   ` [PATCH v9 7/8] list-objects-filter-options: do not over-strbuf_init Matthew DeVore
2018-09-21 20:32   ` [PATCH v9 8/8] list-objects-filter: implement filter tree:0 Matthew DeVore
2018-10-03 19:52 ` [PATCH v10 0/8] filter: support for excluding all trees and blobs Matthew DeVore
2018-10-03 19:52   ` [PATCH v10 1/8] list-objects: store common func args in struct Matthew DeVore
2018-10-03 19:52   ` [PATCH v10 2/8] list-objects: refactor to process_tree_contents Matthew DeVore
2018-10-03 19:52   ` [PATCH v10 3/8] list-objects: always parse trees gently Matthew DeVore
2018-10-03 19:52   ` [PATCH v10 4/8] rev-list: handle missing tree objects properly Matthew DeVore
2018-10-03 19:52   ` [PATCH v10 5/8] revision: mark non-user-given objects instead Matthew DeVore
2018-10-03 19:52   ` [PATCH v10 6/8] list-objects-filter: use BUG rather than die Matthew DeVore
2018-10-03 19:52   ` [PATCH v10 7/8] list-objects-filter-options: do not over-strbuf_init Matthew DeVore
2018-10-03 19:52   ` [PATCH v10 8/8] list-objects-filter: implement filter tree:0 Matthew DeVore
2018-10-03 23:08   ` [PATCH v10 0/8] filter: support for excluding all trees and blobs Matthew DeVore
2018-10-05 21:31 ` [PATCH v11 " Matthew DeVore
2018-10-05 21:31   ` [PATCH v11 1/8] list-objects: store common func args in struct Matthew DeVore
2018-10-05 21:31   ` [PATCH v11 2/8] list-objects: refactor to process_tree_contents Matthew DeVore
2018-10-05 21:31   ` [PATCH v11 3/8] list-objects: always parse trees gently Matthew DeVore
2018-10-05 21:31   ` [PATCH v11 4/8] rev-list: handle missing tree objects properly Matthew DeVore
2018-10-05 21:31   ` [PATCH v11 5/8] revision: mark non-user-given objects instead Matthew DeVore
2018-10-05 21:31   ` [PATCH v11 6/8] list-objects-filter: use BUG rather than die Matthew DeVore
2018-10-05 21:31   ` [PATCH v11 7/8] list-objects-filter-options: do not over-strbuf_init Matthew DeVore
2018-10-05 21:31   ` [PATCH v11 8/8] list-objects-filter: implement filter tree:0 Matthew DeVore
2018-10-07  0:10     ` Junio C Hamano
2018-10-08 17:23       ` Matthew DeVore
2018-10-12 20:01 ` [PATCH v12 0/8] filter: support for excluding all trees and blobs Matthew DeVore
2018-10-12 20:01   ` [PATCH v12 1/8] list-objects: store common func args in struct Matthew DeVore
2018-10-12 20:01   ` [PATCH v12 2/8] list-objects: refactor to process_tree_contents Matthew DeVore
2018-10-12 20:01   ` [PATCH v12 3/8] list-objects: always parse trees gently Matthew DeVore
2018-10-12 20:01   ` [PATCH v12 4/8] rev-list: handle missing tree objects properly Matthew DeVore
2018-10-12 20:01   ` [PATCH v12 5/8] revision: mark non-user-given objects instead Matthew DeVore
2018-10-12 20:01   ` [PATCH v12 6/8] list-objects-filter: use BUG rather than die Matthew DeVore
2018-10-12 20:01   ` [PATCH v12 7/8] list-objects-filter-options: do not over-strbuf_init Matthew DeVore
2018-10-12 20:01   ` [PATCH v12 8/8] list-objects-filter: implement filter tree:0 Matthew DeVore
2018-10-15  2:37   ` [PATCH v12 0/8] filter: support for excluding all trees and blobs Junio C Hamano
2018-10-15  3:42     ` Junio C Hamano
2018-10-16 15:00       ` Matthew DeVore

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox