git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v1 0/5] Filter combination
@ 2019-05-22  0:21 Matthew DeVore
  2019-05-22  0:21 ` [PATCH v1 1/5] list-objects-filter: refactor into a context struct Matthew DeVore
                   ` (5 more replies)
  0 siblings, 6 replies; 41+ messages in thread
From: Matthew DeVore @ 2019-05-22  0:21 UTC (permalink / raw)
  To: jonathantanmy, jrn, git, dstolee, jeffhost, jrnieder, pclouds
  Cc: Matthew DeVore, matvore

This implements combining of filters. In any command which accepts the --filter
flag, this patch allows specifying multiple filter flags or using the
"combine:..." filter-spec form.

Combining filters means that only objects which are accepted by all filters get
shown or included.

Compared to the RFC version of this patch set, the following notable changes
and additions were made:

 - Simplification of the logic to execute combined filters.
 - Addition of test cases for existing logic and new logic.
 - Allowing to specify multiple --filter flags rather than requiring the
   combine: filter-spec form.
 - Require escaping a large number of reserved characters in "combine:..."
   filter specs in case we decide to do anything interesting with the filter
   language later.

Thank you,

Matthew DeVore (5):
  list-objects-filter: refactor into a context struct
  list-objects-filter-options: error is localizeable
  list-objects-filter: implement composite filters
  list-objects-filter-options: move error check up
  list-objects-filter-options: allow mult. --filter

 Documentation/rev-list-options.txt     |  14 ++
 builtin/fetch-pack.c                   |   5 +-
 builtin/rev-list.c                     |   5 +-
 contrib/completion/git-completion.bash |   2 +-
 fetch-pack.c                           |   5 +-
 list-objects-filter-options.c          | 252 +++++++++++++++++++++--
 list-objects-filter-options.h          |  17 +-
 list-objects-filter.c                  | 270 ++++++++++++++++---------
 list-objects-filter.h                  |  31 ++-
 list-objects.c                         |  45 ++---
 t/t5616-partial-clone.sh               |  19 ++
 t/t6112-rev-list-filters-objects.sh    | 193 +++++++++++++++++-
 transport.c                            |   5 +-
 upload-pack.c                          |  10 +-
 14 files changed, 708 insertions(+), 165 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v1 1/5] list-objects-filter: refactor into a context struct
  2019-05-22  0:21 [PATCH v1 0/5] Filter combination Matthew DeVore
@ 2019-05-22  0:21 ` Matthew DeVore
  2019-05-24  0:49   ` Emily Shaffer
  2019-05-22  0:21 ` [PATCH v1 2/5] list-objects-filter-options: error is localizeable Matthew DeVore
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 41+ messages in thread
From: Matthew DeVore @ 2019-05-22  0:21 UTC (permalink / raw)
  To: jonathantanmy, jrn, git, dstolee, jeffhost, jrnieder, pclouds
  Cc: Matthew DeVore, matvore

The next patch will create and manage filters in a new way, which means
that this bundle of data will have to be managed at a new callsite. Make
this bundle of data more manageable by putting it in a struct and
making it part of the list-objects-filter module's API.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects-filter.c | 156 +++++++++++++++---------------------------
 list-objects-filter.h |  31 ++++++---
 list-objects.c        |  45 ++++++------
 3 files changed, 96 insertions(+), 136 deletions(-)

diff --git a/list-objects-filter.c b/list-objects-filter.c
index ee449de3f7..8e8616b9b8 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -19,82 +19,60 @@
  * FILTER_SHOWN_BUT_REVISIT -- we set this bit on tree objects
  * that have been shown, but should be revisited if they appear
  * in the traversal (until we mark it SEEN).  This is a way to
  * let us silently de-dup calls to show() in the caller.  This
  * is subtly different from the "revision.h:SHOWN" and the
  * "sha1-name.c:ONELINE_SEEN" bits.  And also different from
  * the non-de-dup usage in pack-bitmap.c
  */
 #define FILTER_SHOWN_BUT_REVISIT (1<<21)
 
-/*
- * A filter for list-objects to omit ALL blobs from the traversal.
- * And to OPTIONALLY collect a list of the omitted OIDs.
- */
-struct filter_blobs_none_data {
-	struct oidset *omits;
-};
-
 static enum list_objects_filter_result filter_blobs_none(
 	struct repository *r,
 	enum list_objects_filter_situation filter_situation,
 	struct object *obj,
 	const char *pathname,
 	const char *filename,
-	void *filter_data_)
+	struct filter_context *ctx)
 {
-	struct filter_blobs_none_data *filter_data = filter_data_;
-
 	switch (filter_situation) {
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		/* always include all tree objects */
 		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
 	case LOFS_END_TREE:
 		assert(obj->type == OBJ_TREE);
 		return LOFR_ZERO;
 
 	case LOFS_BLOB:
 		assert(obj->type == OBJ_BLOB);
 		assert((obj->flags & SEEN) == 0);
 
-		if (filter_data->omits)
-			oidset_insert(filter_data->omits, &obj->oid);
+		if (ctx->omits)
+			oidset_insert(ctx->omits, &obj->oid);
 		return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
 	}
 }
 
-static void *filter_blobs_none__init(
-	struct oidset *omitted,
+static void filter_blobs_none__init(
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn)
+	struct filter_context *ctx)
 {
-	struct filter_blobs_none_data *d = xcalloc(1, sizeof(*d));
-	d->omits = omitted;
-
-	*filter_fn = filter_blobs_none;
-	*filter_free_fn = free;
-	return d;
+	ctx->filter_fn = filter_blobs_none;
 }
 
-/*
- * A filter for list-objects to omit ALL trees and blobs from the traversal.
- * Can OPTIONALLY collect a list of the omitted OIDs.
- */
+/* A filter for list-objects to omit ALL trees and blobs from the traversal. */
 struct filter_trees_depth_data {
-	struct oidset *omits;
-
 	/*
 	 * Maps trees to the minimum depth at which they were seen. It is not
 	 * necessary to re-traverse a tree at deeper or equal depths than it has
 	 * already been traversed.
 	 *
 	 * We can't use LOFR_MARK_SEEN for tree objects since this will prevent
 	 * it from being traversed at shallower depths.
 	 */
 	struct oidmap seen_at_depth;
 
@@ -103,41 +81,41 @@ struct filter_trees_depth_data {
 };
 
 struct seen_map_entry {
 	struct oidmap_entry base;
 	size_t depth;
 };
 
 /* Returns 1 if the oid was in the omits set before it was invoked. */
 static int filter_trees_update_omits(
 	struct object *obj,
-	struct filter_trees_depth_data *filter_data,
+	struct filter_context *ctx,
 	int include_it)
 {
-	if (!filter_data->omits)
+	if (!ctx->omits)
 		return 0;
 
 	if (include_it)
-		return oidset_remove(filter_data->omits, &obj->oid);
+		return oidset_remove(ctx->omits, &obj->oid);
 	else
-		return oidset_insert(filter_data->omits, &obj->oid);
+		return oidset_insert(ctx->omits, &obj->oid);
 }
 
 static enum list_objects_filter_result filter_trees_depth(
 	struct repository *r,
 	enum list_objects_filter_situation filter_situation,
 	struct object *obj,
 	const char *pathname,
 	const char *filename,
-	void *filter_data_)
+	struct filter_context *ctx)
 {
-	struct filter_trees_depth_data *filter_data = filter_data_;
+	struct filter_trees_depth_data *filter_data = ctx->data;
 	struct seen_map_entry *seen_info;
 	int include_it = filter_data->current_depth <
 		filter_data->exclude_depth;
 	int filter_res;
 	int already_seen;
 
 	/*
 	 * Note that we do not use _MARK_SEEN in order to allow re-traversal in
 	 * case we encounter a tree or blob again at a shallower depth.
 	 */
@@ -145,47 +123,47 @@ static enum list_objects_filter_result filter_trees_depth(
 	switch (filter_situation) {
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_END_TREE:
 		assert(obj->type == OBJ_TREE);
 		filter_data->current_depth--;
 		return LOFR_ZERO;
 
 	case LOFS_BLOB:
-		filter_trees_update_omits(obj, filter_data, include_it);
+		filter_trees_update_omits(obj, ctx, include_it);
 		return include_it ? LOFR_MARK_SEEN | LOFR_DO_SHOW : LOFR_ZERO;
 
 	case LOFS_BEGIN_TREE:
 		seen_info = oidmap_get(
 			&filter_data->seen_at_depth, &obj->oid);
 		if (!seen_info) {
 			seen_info = xcalloc(1, sizeof(*seen_info));
 			oidcpy(&seen_info->base.oid, &obj->oid);
 			seen_info->depth = filter_data->current_depth;
 			oidmap_put(&filter_data->seen_at_depth, seen_info);
 			already_seen = 0;
 		} else {
 			already_seen =
 				filter_data->current_depth >= seen_info->depth;
 		}
 
 		if (already_seen) {
 			filter_res = LOFR_SKIP_TREE;
 		} else {
 			int been_omitted = filter_trees_update_omits(
-				obj, filter_data, include_it);
+				obj, ctx, include_it);
 			seen_info->depth = filter_data->current_depth;
 
 			if (include_it)
 				filter_res = LOFR_DO_SHOW;
-			else if (filter_data->omits && !been_omitted)
+			else if (ctx->omits && !been_omitted)
 				/*
 				 * Must update omit information of children
 				 * recursively; they have not been omitted yet.
 				 */
 				filter_res = LOFR_ZERO;
 			else
 				filter_res = LOFR_SKIP_TREE;
 		}
 
 		filter_data->current_depth++;
@@ -194,55 +172,48 @@ static enum list_objects_filter_result filter_trees_depth(
 }
 
 static void filter_trees_free(void *filter_data) {
 	struct filter_trees_depth_data *d = filter_data;
 	if (!d)
 		return;
 	oidmap_free(&d->seen_at_depth, 1);
 	free(d);
 }
 
-static void *filter_trees_depth__init(
-	struct oidset *omitted,
+static void filter_trees_depth__init(
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn)
+	struct filter_context *ctx)
 {
 	struct filter_trees_depth_data *d = xcalloc(1, sizeof(*d));
-	d->omits = omitted;
 	oidmap_init(&d->seen_at_depth, 0);
 	d->exclude_depth = filter_options->tree_exclude_depth;
 	d->current_depth = 0;
 
-	*filter_fn = filter_trees_depth;
-	*filter_free_fn = filter_trees_free;
-	return d;
+	ctx->filter_fn = filter_trees_depth;
+	ctx->free_fn = filter_trees_free;
+	ctx->data = d;
 }
 
-/*
- * A filter for list-objects to omit large blobs.
- * And to OPTIONALLY collect a list of the omitted OIDs.
- */
+/* A filter for list-objects to omit large blobs. */
 struct filter_blobs_limit_data {
-	struct oidset *omits;
 	unsigned long max_bytes;
 };
 
 static enum list_objects_filter_result filter_blobs_limit(
 	struct repository *r,
 	enum list_objects_filter_situation filter_situation,
 	struct object *obj,
 	const char *pathname,
 	const char *filename,
-	void *filter_data_)
+	struct filter_context *ctx)
 {
-	struct filter_blobs_limit_data *filter_data = filter_data_;
+	struct filter_blobs_limit_data *filter_data = ctx->data;
 	unsigned long object_length;
 	enum object_type t;
 
 	switch (filter_situation) {
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		/* always include all tree objects */
@@ -263,44 +234,41 @@ static enum list_objects_filter_result filter_blobs_limit(
 			 * apply the size filter criteria.  Be conservative
 			 * and force show it (and let the caller deal with
 			 * the ambiguity).
 			 */
 			goto include_it;
 		}
 
 		if (object_length < filter_data->max_bytes)
 			goto include_it;
 
-		if (filter_data->omits)
-			oidset_insert(filter_data->omits, &obj->oid);
+		if (ctx->omits)
+			oidset_insert(ctx->omits, &obj->oid);
 		return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
 	}
 
 include_it:
-	if (filter_data->omits)
-		oidset_remove(filter_data->omits, &obj->oid);
+	if (ctx->omits)
+		oidset_remove(ctx->omits, &obj->oid);
 	return LOFR_MARK_SEEN | LOFR_DO_SHOW;
 }
 
-static void *filter_blobs_limit__init(
-	struct oidset *omitted,
+static void filter_blobs_limit__init(
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn)
+	struct filter_context *ctx)
 {
 	struct filter_blobs_limit_data *d = xcalloc(1, sizeof(*d));
-	d->omits = omitted;
 	d->max_bytes = filter_options->blob_limit_value;
 
-	*filter_fn = filter_blobs_limit;
-	*filter_free_fn = free;
-	return d;
+	ctx->filter_fn = filter_blobs_limit;
+	ctx->free_fn = free;
+	ctx->data = d;
 }
 
 /*
  * A filter driven by a sparse-checkout specification to only
  * include blobs that a sparse checkout would populate.
  *
  * The sparse-checkout spec can be loaded from a blob with the
  * given OID or from a local pathname.  We allow an OID because
  * the repo may be bare or we may be doing the filtering on the
  * server.
@@ -319,36 +287,35 @@ struct frame {
 	 * omitted objects.
 	 *
 	 * 0 if everything (recursively) contained in this directory
 	 * has been explicitly included (SHOWN) in the result and
 	 * the directory may be short-cut later in the traversal.
 	 */
 	unsigned child_prov_omit : 1;
 };
 
 struct filter_sparse_data {
-	struct oidset *omits;
 	struct exclude_list el;
 
 	size_t nr, alloc;
 	struct frame *array_frame;
 };
 
 static enum list_objects_filter_result filter_sparse(
 	struct repository *r,
 	enum list_objects_filter_situation filter_situation,
 	struct object *obj,
 	const char *pathname,
 	const char *filename,
-	void *filter_data_)
+	struct filter_context *ctx)
 {
-	struct filter_sparse_data *filter_data = filter_data_;
+	struct filter_sparse_data *filter_data = ctx->data;
 	int val, dtype;
 	struct frame *frame;
 
 	switch (filter_situation) {
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		dtype = DT_DIR;
@@ -414,128 +381,117 @@ static enum list_objects_filter_result filter_sparse(
 
 		frame = &filter_data->array_frame[filter_data->nr];
 
 		dtype = DT_REG;
 		val = is_excluded_from_list(pathname, strlen(pathname),
 					    filename, &dtype, &filter_data->el,
 					    r->index);
 		if (val < 0)
 			val = frame->defval;
 		if (val > 0) {
-			if (filter_data->omits)
-				oidset_remove(filter_data->omits, &obj->oid);
+			if (ctx->omits)
+				oidset_remove(ctx->omits, &obj->oid);
 			return LOFR_MARK_SEEN | LOFR_DO_SHOW;
 		}
 
 		/*
 		 * Provisionally omit it.  We've already established that
 		 * this pathname is not in the sparse-checkout specification
 		 * with the CURRENT pathname, so we *WANT* to omit this blob.
 		 *
 		 * However, a pathname elsewhere in the tree may also
 		 * reference this same blob, so we cannot reject it yet.
 		 * Leave the LOFR_ bits unset so that if the blob appears
 		 * again in the traversal, we will be asked again.
 		 */
-		if (filter_data->omits)
-			oidset_insert(filter_data->omits, &obj->oid);
+		if (ctx->omits)
+			oidset_insert(ctx->omits, &obj->oid);
 
 		/*
 		 * Remember that at least 1 blob in this tree was
 		 * provisionally omitted.  This prevents us from short
 		 * cutting the tree in future iterations.
 		 */
 		frame->child_prov_omit = 1;
 		return LOFR_ZERO;
 	}
 }
 
 
 static void filter_sparse_free(void *filter_data)
 {
 	struct filter_sparse_data *d = filter_data;
 	/* TODO free contents of 'd' */
 	free(d);
 }
 
-static void *filter_sparse_oid__init(
-	struct oidset *omitted,
+static void filter_sparse_oid__init(
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn)
+	struct filter_context *ctx)
 {
 	struct filter_sparse_data *d = xcalloc(1, sizeof(*d));
-	d->omits = omitted;
 	if (add_excludes_from_blob_to_list(filter_options->sparse_oid_value,
 					   NULL, 0, &d->el) < 0)
 		die("could not load filter specification");
 
 	ALLOC_GROW(d->array_frame, d->nr + 1, d->alloc);
 	d->array_frame[d->nr].defval = 0; /* default to include */
 	d->array_frame[d->nr].child_prov_omit = 0;
 
-	*filter_fn = filter_sparse;
-	*filter_free_fn = filter_sparse_free;
-	return d;
+	ctx->filter_fn = filter_sparse;
+	ctx->free_fn = filter_sparse_free;
+	ctx->data = d;
 }
 
-static void *filter_sparse_path__init(
-	struct oidset *omitted,
+static void filter_sparse_path__init(
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn)
+	struct filter_context *ctx)
 {
 	struct filter_sparse_data *d = xcalloc(1, sizeof(*d));
-	d->omits = omitted;
 	if (add_excludes_from_file_to_list(filter_options->sparse_path_value,
 					   NULL, 0, &d->el, NULL) < 0)
 		die("could not load filter specification");
 
 	ALLOC_GROW(d->array_frame, d->nr + 1, d->alloc);
 	d->array_frame[d->nr].defval = 0; /* default to include */
 	d->array_frame[d->nr].child_prov_omit = 0;
 
-	*filter_fn = filter_sparse;
-	*filter_free_fn = filter_sparse_free;
-	return d;
+	ctx->filter_fn = filter_sparse;
+	ctx->free_fn = filter_sparse_free;
+	ctx->data = d;
 }
 
-typedef void *(*filter_init_fn)(
-	struct oidset *omitted,
+typedef void (*filter_init_fn)(
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn);
+	struct filter_context *ctx);
 
 /*
  * Must match "enum list_objects_filter_choice".
  */
 static filter_init_fn s_filters[] = {
 	NULL,
 	filter_blobs_none__init,
 	filter_blobs_limit__init,
 	filter_trees_depth__init,
 	filter_sparse_oid__init,
 	filter_sparse_path__init,
 };
 
-void *list_objects_filter__init(
+void list_objects_filter__init(
 	struct oidset *omitted,
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn)
+	struct filter_context *ctx)
 {
 	filter_init_fn init_fn;
 
 	assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);
 
 	if (filter_options->choice >= LOFC__COUNT)
 		BUG("invalid list-objects filter choice: %d",
 		    filter_options->choice);
 
+	memset(ctx, 0, sizeof(*ctx));
+	ctx->omits = omitted;
 	init_fn = s_filters[filter_options->choice];
 	if (init_fn)
-		return init_fn(omitted, filter_options,
-			       filter_fn, filter_free_fn);
-	*filter_fn = NULL;
-	*filter_free_fn = NULL;
-	return NULL;
+		init_fn(filter_options, ctx);
 }
diff --git a/list-objects-filter.h b/list-objects-filter.h
index 1d45a4ad57..ee807f5d9b 100644
--- a/list-objects-filter.h
+++ b/list-objects-filter.h
@@ -53,37 +53,46 @@ enum list_objects_filter_result {
 	LOFR_DO_SHOW   = 1<<1,
 	LOFR_SKIP_TREE = 1<<2,
 };
 
 enum list_objects_filter_situation {
 	LOFS_BEGIN_TREE,
 	LOFS_END_TREE,
 	LOFS_BLOB
 };
 
-typedef enum list_objects_filter_result (*filter_object_fn)(
-	struct repository *r,
-	enum list_objects_filter_situation filter_situation,
-	struct object *obj,
-	const char *pathname,
-	const char *filename,
-	void *filter_data);
+struct filter_context {
+	enum list_objects_filter_result (*filter_fn)(
+		struct repository *r,
+		enum list_objects_filter_situation filter_situation,
+		struct object *obj,
+		const char *pathname,
+		const char *filename,
+		struct filter_context *ctx);
+	void (*free_fn)(void *filter_data);
 
-typedef void (*filter_free_fn)(void *filter_data);
+	struct oidset *omits;
+	void *data;
+};
 
 /*
  * Constructor for the set of defined list-objects filters.
  * Returns a generic "void *filter_data".
  *
  * The returned "filter_fn" will be used by traverse_commit_list()
  * to filter the results.
  *
  * The returned "filter_free_fn" is a destructor for the
  * filter_data.
  */
-void *list_objects_filter__init(
+void list_objects_filter__init(
 	struct oidset *omitted,
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn);
+	struct filter_context *ctx);
+
+static inline void list_objects_filter__release(struct filter_context *ctx) {
+	if (ctx->data && ctx->free_fn)
+		ctx->free_fn(ctx->data);
+	memset(ctx, 0, sizeof(*ctx));
+}
 
 #endif /* LIST_OBJECTS_FILTER_H */
diff --git a/list-objects.c b/list-objects.c
index b5651ddd5b..7a73f7deee 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -11,22 +11,21 @@
 #include "list-objects-filter-options.h"
 #include "packfile.h"
 #include "object-store.h"
 #include "trace.h"
 
 struct traversal_context {
 	struct rev_info *revs;
 	show_object_fn show_object;
 	show_commit_fn show_commit;
 	void *show_data;
-	filter_object_fn filter_fn;
-	void *filter_data;
+	struct filter_context filter_ctx;
 };
 
 static void process_blob(struct traversal_context *ctx,
 			 struct blob *blob,
 			 struct strbuf *path,
 			 const char *name)
 {
 	struct object *obj = &blob->object;
 	size_t pathlen;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
@@ -47,25 +46,25 @@ static void process_blob(struct traversal_context *ctx,
 	 * may cause the actual filter to report an incomplete list
 	 * of missing objects.
 	 */
 	if (ctx->revs->exclude_promisor_objects &&
 	    !has_object_file(&obj->oid) &&
 	    is_promisor_object(&obj->oid))
 		return;
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
-		r = ctx->filter_fn(ctx->revs->repo,
-				   LOFS_BLOB, obj,
-				   path->buf, &path->buf[pathlen],
-				   ctx->filter_data);
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_ctx.filter_fn)
+		r = ctx->filter_ctx.filter_fn(ctx->revs->repo,
+					      LOFS_BLOB, obj,
+					      path->buf, &path->buf[pathlen],
+					      &ctx->filter_ctx);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
 		ctx->show_object(obj, path->buf, ctx->show_data);
 	strbuf_setlen(path, pathlen);
 }
 
 /*
  * Processing a gitlink entry currently does nothing, since
  * we do not recurse into the subproject.
@@ -179,42 +178,42 @@ static void process_tree(struct traversal_context *ctx,
 		 */
 		if (revs->exclude_promisor_objects &&
 		    is_promisor_object(&obj->oid))
 			return;
 
 		if (!revs->do_not_die_on_missing_tree)
 			die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
 	strbuf_addstr(base, name);
-	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
-		r = ctx->filter_fn(ctx->revs->repo,
-				   LOFS_BEGIN_TREE, obj,
-				   base->buf, &base->buf[baselen],
-				   ctx->filter_data);
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_ctx.filter_fn)
+		r = ctx->filter_ctx.filter_fn(ctx->revs->repo,
+					      LOFS_BEGIN_TREE, obj,
+					      base->buf, &base->buf[baselen],
+					      &ctx->filter_ctx);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
 		ctx->show_object(obj, base->buf, ctx->show_data);
 	if (base->len)
 		strbuf_addch(base, '/');
 
 	if (r & LOFR_SKIP_TREE)
 		trace_printf("Skipping contents of tree %s...\n", base->buf);
 	else if (!failed_parse)
 		process_tree_contents(ctx, tree, base);
 
-	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
-		r = ctx->filter_fn(ctx->revs->repo,
-				   LOFS_END_TREE, obj,
-				   base->buf, &base->buf[baselen],
-				   ctx->filter_data);
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_ctx.filter_fn) {
+		r = ctx->filter_ctx.filter_fn(ctx->revs->repo,
+					      LOFS_END_TREE, obj,
+					      base->buf, &base->buf[baselen],
+					      &ctx->filter_ctx);
 		if (r & LOFR_MARK_SEEN)
 			obj->flags |= SEEN;
 		if (r & LOFR_DO_SHOW)
 			ctx->show_object(obj, base->buf, ctx->show_data);
 	}
 
 	strbuf_setlen(base, baselen);
 	free_tree_buffer(tree);
 }
 
@@ -395,38 +394,34 @@ static void do_traverse(struct traversal_context *ctx)
 void traverse_commit_list(struct rev_info *revs,
 			  show_commit_fn show_commit,
 			  show_object_fn show_object,
 			  void *show_data)
 {
 	struct traversal_context ctx;
 	ctx.revs = revs;
 	ctx.show_commit = show_commit;
 	ctx.show_object = show_object;
 	ctx.show_data = show_data;
-	ctx.filter_fn = NULL;
-	ctx.filter_data = NULL;
+	memset(&ctx.filter_ctx, 0, sizeof(ctx.filter_ctx));
 	do_traverse(&ctx);
 }
 
 void traverse_commit_list_filtered(
 	struct list_objects_filter_options *filter_options,
 	struct rev_info *revs,
 	show_commit_fn show_commit,
 	show_object_fn show_object,
 	void *show_data,
 	struct oidset *omitted)
 {
 	struct traversal_context ctx;
-	filter_free_fn filter_free_fn = NULL;
+	memset(&ctx, 0, sizeof(ctx));
 
 	ctx.revs = revs;
 	ctx.show_object = show_object;
 	ctx.show_commit = show_commit;
 	ctx.show_data = show_data;
-	ctx.filter_fn = NULL;
 
-	ctx.filter_data = list_objects_filter__init(omitted, filter_options,
-						    &ctx.filter_fn, &filter_free_fn);
+	list_objects_filter__init(omitted, filter_options, &ctx.filter_ctx);
 	do_traverse(&ctx);
-	if (ctx.filter_data && filter_free_fn)
-		filter_free_fn(ctx.filter_data);
+	list_objects_filter__release(&ctx.filter_ctx);
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 2/5] list-objects-filter-options: error is localizeable
  2019-05-22  0:21 [PATCH v1 0/5] Filter combination Matthew DeVore
  2019-05-22  0:21 ` [PATCH v1 1/5] list-objects-filter: refactor into a context struct Matthew DeVore
@ 2019-05-22  0:21 ` Matthew DeVore
  2019-05-24  0:55   ` Emily Shaffer
  2019-05-22  0:21 ` [PATCH v1 3/5] list-objects-filter: implement composite filters Matthew DeVore
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 41+ messages in thread
From: Matthew DeVore @ 2019-05-22  0:21 UTC (permalink / raw)
  To: jonathantanmy, jrn, git, dstolee, jeffhost, jrnieder, pclouds
  Cc: Matthew DeVore, matvore

The "invalid filter-spec" message is user-facing and not a BUG, so make
it localizeable.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects-filter-options.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0036f7378..e46ea467bc 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -81,21 +81,21 @@ static int gently_parse_list_objects_filter(
 		filter_options->choice = LOFC_SPARSE_PATH;
 		filter_options->sparse_path_value = strdup(v0);
 		return 0;
 	}
 	/*
 	 * Please update _git_fetch() in git-completion.bash when you
 	 * add new filters
 	 */
 
 	if (errbuf)
-		strbuf_addf(errbuf, "invalid filter-spec '%s'", arg);
+		strbuf_addf(errbuf, _("invalid filter-spec '%s'"), arg);
 
 	memset(filter_options, 0, sizeof(*filter_options));
 	return 1;
 }
 
 int parse_list_objects_filter(struct list_objects_filter_options *filter_options,
 			      const char *arg)
 {
 	struct strbuf buf = STRBUF_INIT;
 	if (gently_parse_list_objects_filter(filter_options, arg, &buf))
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-05-22  0:21 [PATCH v1 0/5] Filter combination Matthew DeVore
  2019-05-22  0:21 ` [PATCH v1 1/5] list-objects-filter: refactor into a context struct Matthew DeVore
  2019-05-22  0:21 ` [PATCH v1 2/5] list-objects-filter-options: error is localizeable Matthew DeVore
@ 2019-05-22  0:21 ` Matthew DeVore
  2019-05-24 21:01   ` Jeff Hostetler
  2019-05-28 21:53   ` Emily Shaffer
  2019-05-22  0:21 ` [PATCH v1 4/5] list-objects-filter-options: move error check up Matthew DeVore
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 41+ messages in thread
From: Matthew DeVore @ 2019-05-22  0:21 UTC (permalink / raw)
  To: jonathantanmy, jrn, git, dstolee, jeffhost, jrnieder, pclouds
  Cc: Matthew DeVore, matvore

Allow combining filters such that only objects accepted by all filters
are shown. The motivation for this is to allow getting directory
listings without also fetching blobs. This can be done by combining
blob:none with tree:<depth>. There are massive repositories that have
larger-than-expected trees - even if you include only a single commit.

The current usage requires passing the filter to rev-list, or sending
it over the wire, as:

	combine:<FILTER1>+<FILTER2>

(i.e.: git rev-list --filter=combine:tree:2+blob:limit=32k). This is
potentially awkward because individual filters must be URL-encoded if
they contain + or %. This can potentially be improved by supporting a
repeated flag syntax, e.g.:

	$ git rev-list --filter=tree:2 --filter=blob:limit=32k

Such usage is currently an error, so giving it a meaning is backwards-
compatible.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 Documentation/rev-list-options.txt     |  12 ++
 contrib/completion/git-completion.bash |   2 +-
 list-objects-filter-options.c          | 161 ++++++++++++++++++++++++-
 list-objects-filter-options.h          |  14 ++-
 list-objects-filter.c                  | 114 +++++++++++++++++
 t/t6112-rev-list-filters-objects.sh    | 159 +++++++++++++++++++++++-
 6 files changed, 455 insertions(+), 7 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index ddbc1de43f..4fb0c4fbb0 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -730,20 +730,32 @@ specification contained in <path>.
 +
 The form '--filter=tree:<depth>' omits all blobs and trees whose depth
 from the root tree is >= <depth> (minimum depth if an object is located
 at multiple depths in the commits traversed). <depth>=0 will not include
 any trees or blobs unless included explicitly in the command-line (or
 standard input when --stdin is used). <depth>=1 will include only the
 tree and blobs which are referenced directly by a commit reachable from
 <commit> or an explicitly-given object. <depth>=2 is like <depth>=1
 while also including trees and blobs one more level removed from an
 explicitly-given commit or tree.
++
+The form '--filter=combine:<filter1>+<filter2>+...<filterN>' combines
+several filters. Only objects which are accepted by every filter are
+included. Filters are joined by '{plus}' and individual filters are %-encoded
+(i.e. URL-encoded). Besides the '{plus}' and '%' characters, the following
+characters are reserved and also must be encoded:
+`~!@#$^&*()[]{}\;",<>?`+&#39;&#96;+ as well as all characters with ASCII code
+&lt;= `0x20`, which includes space and newline.
++
+Other arbitrary characters can also be encoded. For instance,
+'combine:tree:3+blob:none' and 'combine:tree%3A2+blob%3Anone' are
+equivalent.
 
 --no-filter::
 	Turn off any previous `--filter=` argument.
 
 --filter-print-omitted::
 	Only useful with `--filter=`; prints a list of the objects omitted
 	by the filter.  Object IDs are prefixed with a ``~'' character.
 
 --missing=<missing-action>::
 	A debug option to help with future "partial clone" development.
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 3eefbabdb1..0fd0a10d0c 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1529,21 +1529,21 @@ _git_difftool ()
 __git_fetch_recurse_submodules="yes on-demand no"
 
 _git_fetch ()
 {
 	case "$cur" in
 	--recurse-submodules=*)
 		__gitcomp "$__git_fetch_recurse_submodules" "" "${cur##--recurse-submodules=}"
 		return
 		;;
 	--filter=*)
-		__gitcomp "blob:none blob:limit= sparse:oid= sparse:path=" "" "${cur##--filter=}"
+		__gitcomp "blob:none blob:limit= sparse:oid= sparse:path= combine: tree:" "" "${cur##--filter=}"
 		return
 		;;
 	--*)
 		__gitcomp_builtin fetch
 		return
 		;;
 	esac
 	__git_complete_remote_or_refspec
 }
 
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index e46ea467bc..d7a1516188 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -1,19 +1,24 @@
 #include "cache.h"
 #include "commit.h"
 #include "config.h"
 #include "revision.h"
 #include "argv-array.h"
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
 
+static int parse_combine_filter(
+	struct list_objects_filter_options *filter_options,
+	const char *arg,
+	struct strbuf *errbuf);
+
 /*
  * Parse value of the argument to the "filter" keyword.
  * On the command line this looks like:
  *       --filter=<arg>
  * and in the pack protocol as:
  *       "filter" SP <arg>
  *
  * The filter keyword will be used by many commands.
  * See Documentation/rev-list-options.txt for allowed values for <arg>.
  *
@@ -31,22 +36,20 @@ static int gently_parse_list_objects_filter(
 
 	if (filter_options->choice) {
 		if (errbuf) {
 			strbuf_addstr(
 				errbuf,
 				_("multiple filter-specs cannot be combined"));
 		}
 		return 1;
 	}
 
-	filter_options->filter_spec = strdup(arg);
-
 	if (!strcmp(arg, "blob:none")) {
 		filter_options->choice = LOFC_BLOB_NONE;
 		return 0;
 
 	} else if (skip_prefix(arg, "blob:limit=", &v0)) {
 		if (git_parse_ulong(v0, &filter_options->blob_limit_value)) {
 			filter_options->choice = LOFC_BLOB_LIMIT;
 			return 0;
 		}
 
@@ -74,37 +77,183 @@ static int gently_parse_list_objects_filter(
 		if (!get_oid_with_context(the_repository, v0, GET_OID_BLOB,
 					  &sparse_oid, &oc))
 			filter_options->sparse_oid_value = oiddup(&sparse_oid);
 		filter_options->choice = LOFC_SPARSE_OID;
 		return 0;
 
 	} else if (skip_prefix(arg, "sparse:path=", &v0)) {
 		filter_options->choice = LOFC_SPARSE_PATH;
 		filter_options->sparse_path_value = strdup(v0);
 		return 0;
+
+	} else if (skip_prefix(arg, "combine:", &v0)) {
+		int sub_parse_res = parse_combine_filter(
+			filter_options, v0, errbuf);
+		if (sub_parse_res)
+			return sub_parse_res;
+		return 0;
+
 	}
 	/*
 	 * Please update _git_fetch() in git-completion.bash when you
 	 * add new filters
 	 */
 
 	if (errbuf)
 		strbuf_addf(errbuf, _("invalid filter-spec '%s'"), arg);
 
 	memset(filter_options, 0, sizeof(*filter_options));
 	return 1;
 }
 
+static int digit_value(int c, struct strbuf *errbuf) {
+	if (c >= '0' && c <= '9')
+		return c - '0';
+	if (c >= 'a' && c <= 'f')
+		return c - 'a' + 10;
+	if (c >= 'A' && c <= 'F')
+		return c - 'A' + 10;
+
+	if (!errbuf)
+		return -1;
+
+	strbuf_addf(errbuf, _("error in filter-spec - "));
+	if (c)
+		strbuf_addf(
+			errbuf,
+			_("expect two hex digits after %%, but got: '%c'"),
+			c);
+	else
+		strbuf_addf(
+			errbuf,
+			_("not enough hex digits after %%; expected two"));
+
+	return -1;
+}
+
+static int url_decode(struct strbuf *s, struct strbuf *errbuf) {
+	char *dest = s->buf;
+	char *src = s->buf;
+	size_t new_len;
+
+	while (*src) {
+		int digit_value_0, digit_value_1;
+
+		if (src[0] != '%') {
+			*dest++ = *src++;
+			continue;
+		}
+		src++;
+
+		digit_value_0 = digit_value(*src++, errbuf);
+		if (digit_value_0 < 0)
+			return 1;
+		digit_value_1 = digit_value(*src++, errbuf);
+		if (digit_value_1 < 0)
+			return 1;
+		*dest++ = digit_value_0 * 16 + digit_value_1;
+	}
+	new_len = dest - s->buf;
+	strbuf_remove(s, new_len, s->len - new_len);
+
+	return 0;
+}
+
+static const char *RESERVED_NON_WS = "~`!@#$^&*()[]{}\\;'\",<>?";
+
+static int has_reserved_character(
+	struct strbuf *sub_spec, struct strbuf *errbuf)
+{
+	const char *c = sub_spec->buf;
+	while (*c) {
+		if (*c <= ' ' || strchr(RESERVED_NON_WS, *c))
+			goto found_reserved;
+		c++;
+	}
+
+	return 0;
+
+found_reserved:
+	if (errbuf)
+		strbuf_addf(errbuf,
+			    "must escape char in sub-filter-spec: '%c'",
+			    *c);
+	return 1;
+}
+
+static int parse_combine_filter(
+	struct list_objects_filter_options *filter_options,
+	const char *arg,
+	struct strbuf *errbuf)
+{
+	struct strbuf **sub_specs = strbuf_split_str(arg, '+', 2);
+	int result;
+
+	if (!sub_specs[0]) {
+		if (errbuf)
+			strbuf_addf(errbuf,
+				    _("expected something after combine:"));
+		result = 1;
+		goto cleanup;
+	}
+
+	result = has_reserved_character(sub_specs[0], errbuf);
+	if (result)
+		goto cleanup;
+
+	/*
+	 * Only decode the first sub-filter, since the rest will be decoded on
+	 * the recursive call.
+	 */
+	result = url_decode(sub_specs[0], errbuf);
+	if (result)
+		goto cleanup;
+
+	if (!sub_specs[1]) {
+		/*
+		 * There is only one sub-filter, so we don't need the
+		 * combine: - just parse it as a non-composite filter.
+		 */
+		result = gently_parse_list_objects_filter(
+			filter_options, sub_specs[0]->buf, errbuf);
+		goto cleanup;
+	}
+
+	/* Remove trailing "+" so we can parse it. */
+	assert(sub_specs[0]->buf[sub_specs[0]->len - 1] == '+');
+	strbuf_remove(sub_specs[0], sub_specs[0]->len - 1, 1);
+
+	filter_options->choice = LOFC_COMBINE;
+	filter_options->lhs = xcalloc(1, sizeof(*filter_options->lhs));
+	filter_options->rhs = xcalloc(1, sizeof(*filter_options->rhs));
+
+	result = gently_parse_list_objects_filter(filter_options->lhs,
+						  sub_specs[0]->buf,
+						  errbuf) ||
+		parse_combine_filter(filter_options->rhs,
+				      sub_specs[1]->buf,
+				      errbuf);
+
+cleanup:
+	strbuf_list_free(sub_specs);
+	if (result) {
+		list_objects_filter_release(filter_options);
+		memset(filter_options, 0, sizeof(*filter_options));
+	}
+	return result;
+}
+
 int parse_list_objects_filter(struct list_objects_filter_options *filter_options,
 			      const char *arg)
 {
 	struct strbuf buf = STRBUF_INIT;
+	filter_options->filter_spec = strdup(arg);
 	if (gently_parse_list_objects_filter(filter_options, arg, &buf))
 		die("%s", buf.buf);
 	return 0;
 }
 
 int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset)
 {
 	struct list_objects_filter_options *filter_options = opt->value;
 
@@ -127,23 +276,29 @@ void expand_list_objects_filter_spec(
 	else if (filter->choice == LOFC_TREE_DEPTH)
 		strbuf_addf(expanded_spec, "tree:%lu",
 			    filter->tree_exclude_depth);
 	else
 		strbuf_addstr(expanded_spec, filter->filter_spec);
 }
 
 void list_objects_filter_release(
 	struct list_objects_filter_options *filter_options)
 {
+	if (!filter_options)
+		return;
 	free(filter_options->filter_spec);
 	free(filter_options->sparse_oid_value);
 	free(filter_options->sparse_path_value);
+	list_objects_filter_release(filter_options->lhs);
+	free(filter_options->lhs);
+	list_objects_filter_release(filter_options->rhs);
+	free(filter_options->rhs);
 	memset(filter_options, 0, sizeof(*filter_options));
 }
 
 void partial_clone_register(
 	const char *remote,
 	const struct list_objects_filter_options *filter_options)
 {
 	/*
 	 * Record the name of the partial clone remote in the
 	 * config and in the global variable -- the latter is
@@ -171,14 +326,16 @@ void partial_clone_register(
 }
 
 void partial_clone_get_default_filter_spec(
 	struct list_objects_filter_options *filter_options)
 {
 	/*
 	 * Parse default value, but silently ignore it if it is invalid.
 	 */
 	if (!core_partial_clone_filter_default)
 		return;
+
+	filter_options->filter_spec = strdup(core_partial_clone_filter_default);
 	gently_parse_list_objects_filter(filter_options,
 					 core_partial_clone_filter_default,
 					 NULL);
 }
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index e3adc78ebf..6c0f0ecd08 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -7,20 +7,21 @@
 /*
  * The list of defined filters for list-objects.
  */
 enum list_objects_filter_choice {
 	LOFC_DISABLED = 0,
 	LOFC_BLOB_NONE,
 	LOFC_BLOB_LIMIT,
 	LOFC_TREE_DEPTH,
 	LOFC_SPARSE_OID,
 	LOFC_SPARSE_PATH,
+	LOFC_COMBINE,
 	LOFC__COUNT /* must be last */
 };
 
 struct list_objects_filter_options {
 	/*
 	 * 'filter_spec' is the raw argument value given on the command line
 	 * or protocol request.  (The part after the "--keyword=".)  For
 	 * commands that launch filtering sub-processes, or for communication
 	 * over the network, don't use this value; use the result of
 	 * expand_list_objects_filter_spec() instead.
@@ -32,28 +33,35 @@ struct list_objects_filter_options {
 	 * the filtering algorithm to use.
 	 */
 	enum list_objects_filter_choice choice;
 
 	/*
 	 * Choice is LOFC_DISABLED because "--no-filter" was requested.
 	 */
 	unsigned int no_filter : 1;
 
 	/*
-	 * Parsed values (fields) from within the filter-spec.  These are
-	 * choice-specific; not all values will be defined for any given
-	 * choice.
+	 * BEGIN choice-specific parsed values from within the filter-spec. Only
+	 * some values will be defined for any given choice.
 	 */
+
 	struct object_id *sparse_oid_value;
 	char *sparse_path_value;
 	unsigned long blob_limit_value;
 	unsigned long tree_exclude_depth;
+
+	/* LOFC_COMBINE values */
+	struct list_objects_filter_options *lhs, *rhs;
+
+	/*
+	 * END choice-specific parsed values.
+	 */
 };
 
 /* Normalized command line arguments */
 #define CL_ARG__FILTER "filter"
 
 int parse_list_objects_filter(
 	struct list_objects_filter_options *filter_options,
 	const char *arg);
 
 int opt_parse_list_objects_filter(const struct option *opt,
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 8e8616b9b8..b97277a46f 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -453,34 +453,148 @@ static void filter_sparse_path__init(
 
 	ALLOC_GROW(d->array_frame, d->nr + 1, d->alloc);
 	d->array_frame[d->nr].defval = 0; /* default to include */
 	d->array_frame[d->nr].child_prov_omit = 0;
 
 	ctx->filter_fn = filter_sparse;
 	ctx->free_fn = filter_sparse_free;
 	ctx->data = d;
 }
 
+struct filter_combine_data {
+	/* sub[0] corresponds to lhs, sub[1] to rhs. */
+	struct {
+		struct filter_context ctx;
+		struct oidset seen;
+		struct object_id skip_tree;
+		unsigned is_skipping_tree : 1;
+	} sub[2];
+
+	struct oidset rhs_omits;
+};
+
+static void add_all(struct oidset *dest, struct oidset *src) {
+	struct oidset_iter iter;
+	struct object_id *src_oid;
+
+	oidset_iter_init(src, &iter);
+	while ((src_oid = oidset_iter_next(&iter)) != NULL)
+		oidset_insert(dest, src_oid);
+}
+
+static void filter_combine_free(void *filter_data)
+{
+	struct filter_combine_data *d = filter_data;
+	int i;
+
+	/* Anything omitted by rhs should be added to the overall omits set. */
+	if (d->sub[0].ctx.omits)
+		add_all(d->sub[0].ctx.omits, d->sub[1].ctx.omits);
+
+	for (i = 0; i < 2; i++) {
+		list_objects_filter__release(&d->sub[i].ctx);
+		oidset_clear(&d->sub[i].seen);
+	}
+	oidset_clear(&d->rhs_omits);
+	free(d);
+}
+
+static int should_delegate(enum list_objects_filter_situation filter_situation,
+			   struct object *obj,
+			   struct filter_combine_data *d,
+			   int side)
+{
+	if (!d->sub[side].is_skipping_tree)
+		return 1;
+	if (filter_situation == LOFS_END_TREE &&
+		oideq(&obj->oid, &d->sub[side].skip_tree)) {
+		d->sub[side].is_skipping_tree = 0;
+		return 1;
+	}
+	return 0;
+}
+
+static enum list_objects_filter_result filter_combine(
+	struct repository *r,
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	struct filter_context *ctx)
+{
+	struct filter_combine_data *d = ctx->data;
+	enum list_objects_filter_result result[2];
+	enum list_objects_filter_result combined_result = LOFR_ZERO;
+	int i;
+
+	for (i = 0; i < 2; i++) {
+		if (oidset_contains(&d->sub[i].seen, &obj->oid) ||
+			!should_delegate(filter_situation, obj, d, i)) {
+			result[i] = LOFR_ZERO;
+			continue;
+		}
+
+		result[i] = d->sub[i].ctx.filter_fn(
+			r, filter_situation, obj, pathname, filename,
+			&d->sub[i].ctx);
+
+		if (result[i] & LOFR_MARK_SEEN)
+			oidset_insert(&d->sub[i].seen, &obj->oid);
+
+		if (result[i] & LOFR_SKIP_TREE) {
+			d->sub[i].is_skipping_tree = 1;
+			d->sub[i].skip_tree = obj->oid;
+		}
+	}
+
+	if ((result[0] & LOFR_DO_SHOW) && (result[1] & LOFR_DO_SHOW))
+		combined_result |= LOFR_DO_SHOW;
+	if (d->sub[0].is_skipping_tree && d->sub[1].is_skipping_tree)
+		combined_result |= LOFR_SKIP_TREE;
+
+	return combined_result;
+}
+
+static void filter_combine__init(
+	struct list_objects_filter_options *filter_options,
+	struct filter_context *ctx)
+{
+	struct filter_combine_data *d = xcalloc(1, sizeof(*d));
+
+	if (ctx->omits)
+		oidset_init(&d->rhs_omits, 16);
+
+	list_objects_filter__init(ctx->omits, filter_options->lhs,
+				  &d->sub[0].ctx);
+	list_objects_filter__init(&d->rhs_omits, filter_options->rhs,
+				  &d->sub[1].ctx);
+
+	ctx->filter_fn = filter_combine;
+	ctx->free_fn = filter_combine_free;
+	ctx->data = d;
+}
+
 typedef void (*filter_init_fn)(
 	struct list_objects_filter_options *filter_options,
 	struct filter_context *ctx);
 
 /*
  * Must match "enum list_objects_filter_choice".
  */
 static filter_init_fn s_filters[] = {
 	NULL,
 	filter_blobs_none__init,
 	filter_blobs_limit__init,
 	filter_trees_depth__init,
 	filter_sparse_oid__init,
 	filter_sparse_path__init,
+	filter_combine__init,
 };
 
 void list_objects_filter__init(
 	struct oidset *omitted,
 	struct list_objects_filter_options *filter_options,
 	struct filter_context *ctx)
 {
 	filter_init_fn init_fn;
 
 	assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 9c11427719..ddfacb1a1a 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -284,21 +284,33 @@ test_expect_success 'verify tree:0 includes trees in "filtered" output' '
 # Make sure tree:0 does not iterate through any trees.
 
 test_expect_success 'verify skipping tree iteration when not collecting omits' '
 	GIT_TRACE=1 git -C r3 rev-list \
 		--objects --filter=tree:0 HEAD 2>filter_trace &&
 	grep "Skipping contents of tree [.][.][.]" filter_trace >actual &&
 	# One line for each commit traversed.
 	test_line_count = 2 actual &&
 
 	# Make sure no other trees were considered besides the root.
-	! grep "Skipping contents of tree [^.]" filter_trace
+	! grep "Skipping contents of tree [^.]" filter_trace &&
+
+	# Try this again with "combine:". If both sub-filters are skipping
+	# trees, the composite filter should also skip trees. This is not
+	# important unless the user does combine:tree:X+tree:Y or another filter
+	# besides "tree:" is implemented in the future which can skip trees.
+	GIT_TRACE=1 git -C r3 rev-list \
+		--objects --filter=combine:tree:1+tree:3 HEAD 2>filter_trace &&
+
+	# Only skip the dir1/ tree, which is shared between the two commits.
+	grep "Skipping contents of tree " filter_trace >actual &&
+	test_write_lines "Skipping contents of tree dir1/..." >expected &&
+	test_cmp expected actual
 '
 
 # Test tree:# filters.
 
 expect_has () {
 	commit=$1 &&
 	name=$2 &&
 
 	hash=$(git -C r3 rev-parse $commit:$name) &&
 	grep "^$hash $name$" actual
@@ -336,20 +348,134 @@ test_expect_success 'verify tree:3 includes everything expected' '
 	expect_has HEAD dir1/sparse1 &&
 	expect_has HEAD dir1/sparse2 &&
 	expect_has HEAD pattern &&
 	expect_has HEAD sparse1 &&
 	expect_has HEAD sparse2 &&
 
 	# There are also 2 commit objects
 	test_line_count = 10 actual
 '
 
+test_expect_success 'combine:... for a simple combination' '
+	git -C r3 rev-list --objects --filter=combine:tree:2+blob:none HEAD \
+		>actual &&
+
+	expect_has HEAD "" &&
+	expect_has HEAD~1 "" &&
+	expect_has HEAD dir1 &&
+
+	# There are also 2 commit objects
+	test_line_count = 5 actual
+'
+
+test_expect_success 'combine:... with URL encoding' '
+	git -C r3 rev-list --objects \
+		--filter=combine:tree%3a2+blob:%6Eon%65 HEAD >actual &&
+
+	expect_has HEAD "" &&
+	expect_has HEAD~1 "" &&
+	expect_has HEAD dir1 &&
+
+	# There are also 2 commit objects
+	test_line_count = 5 actual
+'
+
+expect_invalid_filter_spec () {
+	spec="$1" &&
+	err="$2" &&
+
+	test_must_fail git -C r3 rev-list --objects --filter="$spec" HEAD \
+		>actual 2>actual_stderr &&
+	test_must_be_empty actual &&
+	test_i18ngrep "$err" actual_stderr
+}
+
+test_expect_success 'combine:... while URL-encoding things that should not be' '
+	expect_invalid_filter_spec combine%3Atree:2+blob:none \
+		"invalid filter-spec"
+'
+
+test_expect_success 'combine: with nothing after the :' '
+	expect_invalid_filter_spec combine: "expected something after combine:"
+'
+
+test_expect_success 'parse error in first sub-filter in combine:' '
+	expect_invalid_filter_spec combine:tree:asdf+blob:none \
+		"expected .tree:<depth>."
+'
+
+test_expect_success 'combine:... with invalid URL-encoded sequences' '
+	expect_invalid_filter_spec combine:tree:2+blob:non%a \
+		"error in filter-spec - not enough hex digits after %" &&
+	# Edge cases for non-hex chars: "Gg/:"
+	expect_invalid_filter_spec combine:tree:2+blob%G5none \
+		"error in filter-spec - expect two hex digits .*: .G." &&
+	expect_invalid_filter_spec combine:tree:2+blob%g5none \
+		"error in filter-spec - expect two hex digits .*: .g." &&
+	expect_invalid_filter_spec combine:tree:2+blob%5/none \
+		"error in filter-spec - expect two hex digits .*: ./." &&
+	expect_invalid_filter_spec combine:%:5tree:2+blob:none \
+		"error in filter-spec - expect two hex digits .*: .:."
+'
+
+test_expect_success 'combine:... with non-encoded reserved chars' '
+	expect_invalid_filter_spec combine:tree:2+sparse:@xyz \
+		"must escape char in sub-filter-spec: .@." &&
+	expect_invalid_filter_spec combine:tree:2+sparse:\` \
+		"must escape char in sub-filter-spec: .\`." &&
+	expect_invalid_filter_spec combine:tree:2+sparse:~abc \
+		"must escape char in sub-filter-spec: .\~."
+'
+
+test_expect_success 'validate err msg for "combine:<valid-filter>+"' '
+	expect_invalid_filter_spec combine:tree:2+ "expected .tree:<depth>."
+'
+
+test_expect_success 'combine:... with edge-case hex digits: Ff Aa 0 9' '
+	git -C r3 rev-list --objects --filter="combine:tree:2+bl%6Fb:n%6fne" \
+		HEAD >actual &&
+	test_line_count = 5 actual &&
+	git -C r3 rev-list --objects --filter="combine:tree%3A2+blob%3anone" \
+		HEAD >actual &&
+	test_line_count = 5 actual &&
+	git -C r3 rev-list --objects --filter="combine:tree:%30" HEAD >actual &&
+	test_line_count = 2 actual &&
+	git -C r3 rev-list --objects --filter="combine:tree:%39+blob:none" \
+		HEAD >actual &&
+	test_line_count = 5 actual
+'
+
+test_expect_success 'combine:... with more than two sub-filters' '
+	git -C r3 rev-list --objects \
+		--filter=combine:tree:3+blob:limit=40+sparse:path=../pattern1 \
+		HEAD >actual &&
+
+	expect_has HEAD "" &&
+	expect_has HEAD~1 "" &&
+	expect_has HEAD dir1 &&
+	expect_has HEAD dir1/sparse1 &&
+	expect_has HEAD dir1/sparse2 &&
+
+	# Should also have 2 commits
+	test_line_count = 7 actual &&
+
+	# Try again, this time making sure the last sub-filter is only
+	# URL-decoded once.
+	cp pattern1 pattern1+renamed% &&
+	cp actual expect &&
+
+	git -C r3 rev-list --objects \
+		--filter=combine:tree:3+blob:limit=40+sparse:path=../pattern1%2brenamed%25 \
+		HEAD >actual &&
+	test_cmp expect actual
+'
+
 # Test provisional omit collection logic with a repo that has objects appearing
 # at multiple depths - first deeper than the filter's threshold, then shallow.
 
 test_expect_success 'setup r4' '
 	git init r4 &&
 
 	echo foo > r4/foo &&
 	mkdir r4/subdir &&
 	echo bar > r4/subdir/bar &&
 
@@ -379,20 +505,51 @@ test_expect_success 'test tree:# filter provisional omit for blob and tree' '
 
 test_expect_success 'verify skipping tree iteration when collecting omits' '
 	GIT_TRACE=1 git -C r4 rev-list --filter-print-omitted \
 		--objects --filter=tree:0 HEAD 2>filter_trace &&
 	grep "^Skipping contents of tree " filter_trace >actual &&
 
 	echo "Skipping contents of tree subdir/..." >expect &&
 	test_cmp expect actual
 '
 
+test_expect_success 'setup r5' '
+	git init r5 &&
+	mkdir -p r5/subdir &&
+
+	echo 1     >r5/short-root          &&
+	echo 12345 >r5/long-root           &&
+	echo a     >r5/subdir/short-subdir &&
+	echo abcde >r5/subdir/long-subdir  &&
+
+	git -C r5 add short-root long-root subdir &&
+	git -C r5 commit -m "commit msg"
+'
+
+test_expect_success 'verify collecting omits in combined: filter' '
+	# Note that this test guards against the naive implementation of simply
+	# giving both filters the same "omits" set and expecting it to
+	# automatically merge them.
+	git -C r5 rev-list --objects --quiet --filter-print-omitted \
+		--filter=combine:tree:2+blob:limit=3 HEAD >actual &&
+
+	# Expect 0 trees/commits, 3 blobs omitted (all blobs except short-root)
+	omitted_1=$(echo 12345 | git hash-object --stdin) &&
+	omitted_2=$(echo a     | git hash-object --stdin) &&
+	omitted_3=$(echo abcde | git hash-object --stdin) &&
+
+	grep ~$omitted_1 actual &&
+	grep ~$omitted_2 actual &&
+	grep ~$omitted_3 actual &&
+	test_line_count = 3 actual
+'
+
 # Test tree:<depth> where a tree is iterated to twice - once where a subentry is
 # too deep to be included, and again where the blob inside it is shallow enough
 # to be included. This makes sure we don't use LOFR_MARK_SEEN incorrectly (we
 # can't use it because a tree can be iterated over again at a lower depth).
 
 test_expect_success 'tree:<depth> where we iterate over tree at two levels' '
 	git init r5 &&
 
 	mkdir -p r5/a/subdir/b &&
 	echo foo > r5/a/subdir/b/foo &&
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 4/5] list-objects-filter-options: move error check up
  2019-05-22  0:21 [PATCH v1 0/5] Filter combination Matthew DeVore
                   ` (2 preceding siblings ...)
  2019-05-22  0:21 ` [PATCH v1 3/5] list-objects-filter: implement composite filters Matthew DeVore
@ 2019-05-22  0:21 ` Matthew DeVore
  2019-05-22  0:21 ` [PATCH v1 5/5] list-objects-filter-options: allow mult. --filter Matthew DeVore
  2019-06-06 22:44 ` [PATCH v1 0/5] Filter combination Matthew DeVore
  5 siblings, 0 replies; 41+ messages in thread
From: Matthew DeVore @ 2019-05-22  0:21 UTC (permalink / raw)
  To: jonathantanmy, jrn, git, dstolee, jeffhost, jrnieder, pclouds
  Cc: Matthew DeVore, matvore

Move the check that filter_options->choice is set to higher in the call
stack. This can only be set when the gentle parse function is called
from one of the two call sites.

This is important because in an upcoming patch this may or may not be an
error, and whether it is an error is only known to the
parse_list_objects_filter function.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects-filter-options.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index d7a1516188..647b2b220e 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -27,28 +27,22 @@ static int parse_combine_filter(
  * expand_list_objects_filter_spec() first).  We also "intern" the arg for the
  * convenience of the current command.
  */
 static int gently_parse_list_objects_filter(
 	struct list_objects_filter_options *filter_options,
 	const char *arg,
 	struct strbuf *errbuf)
 {
 	const char *v0;
 
-	if (filter_options->choice) {
-		if (errbuf) {
-			strbuf_addstr(
-				errbuf,
-				_("multiple filter-specs cannot be combined"));
-		}
-		return 1;
-	}
+	if (filter_options->choice)
+		BUG("filter_options already populated");
 
 	if (!strcmp(arg, "blob:none")) {
 		filter_options->choice = LOFC_BLOB_NONE;
 		return 0;
 
 	} else if (skip_prefix(arg, "blob:limit=", &v0)) {
 		if (git_parse_ulong(v0, &filter_options->blob_limit_value)) {
 			filter_options->choice = LOFC_BLOB_LIMIT;
 			return 0;
 		}
@@ -239,20 +233,22 @@ cleanup:
 		list_objects_filter_release(filter_options);
 		memset(filter_options, 0, sizeof(*filter_options));
 	}
 	return result;
 }
 
 int parse_list_objects_filter(struct list_objects_filter_options *filter_options,
 			      const char *arg)
 {
 	struct strbuf buf = STRBUF_INIT;
+	if (filter_options->choice)
+		die(_("multiple filter-specs cannot be combined"));
 	filter_options->filter_spec = strdup(arg);
 	if (gently_parse_list_objects_filter(filter_options, arg, &buf))
 		die("%s", buf.buf);
 	return 0;
 }
 
 int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset)
 {
 	struct list_objects_filter_options *filter_options = opt->value;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 5/5] list-objects-filter-options: allow mult. --filter
  2019-05-22  0:21 [PATCH v1 0/5] Filter combination Matthew DeVore
                   ` (3 preceding siblings ...)
  2019-05-22  0:21 ` [PATCH v1 4/5] list-objects-filter-options: move error check up Matthew DeVore
@ 2019-05-22  0:21 ` Matthew DeVore
  2019-06-06 22:44 ` [PATCH v1 0/5] Filter combination Matthew DeVore
  5 siblings, 0 replies; 41+ messages in thread
From: Matthew DeVore @ 2019-05-22  0:21 UTC (permalink / raw)
  To: jonathantanmy, jrn, git, dstolee, jeffhost, jrnieder, pclouds
  Cc: Matthew DeVore, matvore

Allow combining of multiple filters by simply repeating the --filter
flag. Before this patch, the user had to combine them in a single flag
somewhat awkwardly (e.g. --filter=combine:FOO+BAR), including
URL-encoding the individual filters.

To make this work, in the --filter flag parsing callback, rather than
error out when we detect that the filter_options struct is already
populated, we modify it in-place to contain the added sub-filter. The
existing sub-filter becomes the lhs of the combined filter, and the
next sub-filter becomes the rhs. We also have to URL-encode the LHS and
RHS sub-filters.

We can simplify the operation if the LHS is already a combine: filter.
In that case, we just append the URL-encoded RHS sub-filter to the LHS
spec to get the new spec.

Signed-off-by: Matthew DeVore <matvore@google.com>
---
 Documentation/rev-list-options.txt  | 16 +++---
 builtin/fetch-pack.c                |  5 +-
 builtin/rev-list.c                  |  5 +-
 fetch-pack.c                        |  5 +-
 list-objects-filter-options.c       | 83 +++++++++++++++++++++++++----
 list-objects-filter-options.h       |  3 +-
 t/t5616-partial-clone.sh            | 19 +++++++
 t/t6112-rev-list-filters-objects.sh | 38 ++++++++++++-
 transport.c                         |  5 +-
 upload-pack.c                       | 10 +++-
 10 files changed, 164 insertions(+), 25 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 4fb0c4fbb0..2be5f3a6d1 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -731,27 +731,29 @@ specification contained in <path>.
 The form '--filter=tree:<depth>' omits all blobs and trees whose depth
 from the root tree is >= <depth> (minimum depth if an object is located
 at multiple depths in the commits traversed). <depth>=0 will not include
 any trees or blobs unless included explicitly in the command-line (or
 standard input when --stdin is used). <depth>=1 will include only the
 tree and blobs which are referenced directly by a commit reachable from
 <commit> or an explicitly-given object. <depth>=2 is like <depth>=1
 while also including trees and blobs one more level removed from an
 explicitly-given commit or tree.
 +
-The form '--filter=combine:<filter1>+<filter2>+...<filterN>' combines
-several filters. Only objects which are accepted by every filter are
-included. Filters are joined by '{plus}' and individual filters are %-encoded
-(i.e. URL-encoded). Besides the '{plus}' and '%' characters, the following
-characters are reserved and also must be encoded:
-`~!@#$^&*()[]{}\;",<>?`+&#39;&#96;+ as well as all characters with ASCII code
-&lt;= `0x20`, which includes space and newline.
+Multiple '--filter=' flags can be specified to combine filters. Only
+objects which are accepted by every filter are included.
++
+The form '--filter=combine:<filter1>+<filter2>+...<filterN>' can also be
+used to combine filters. Filters are joined by '{plus}' and individual
+filters are %-encoded (i.e. URL-encoded). Besides the '{plus}' and '%'
+characters, the following characters are reserved and also must be
+encoded: `~!@#$^&*()[]{}\;",<>?`+&#39;&#96;+ as well as all characters
+with ASCII code &lt;= `0x20`, which includes space and newline.
 +
 Other arbitrary characters can also be encoded. For instance,
 'combine:tree:3+blob:none' and 'combine:tree%3A2+blob%3Anone' are
 equivalent.
 
 --no-filter::
 	Turn off any previous `--filter=` argument.
 
 --filter-print-omitted::
 	Only useful with `--filter=`; prints a list of the objects omitted
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index dc1485c8aa..cadcb2b915 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -151,21 +151,24 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 		}
 		if (!strcmp("--from-promisor", arg)) {
 			args.from_promisor = 1;
 			continue;
 		}
 		if (!strcmp("--no-dependents", arg)) {
 			args.no_dependents = 1;
 			continue;
 		}
 		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
-			parse_list_objects_filter(&args.filter_options, arg);
+			parse_list_objects_filter(
+				&args.filter_options,
+				arg,
+				/*allow_implicit_combine=*/1);
 			continue;
 		}
 		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
 			list_objects_filter_set_no_filter(&args.filter_options);
 			continue;
 		}
 		usage(fetch_pack_usage);
 	}
 	if (deepen_not.nr)
 		args.deepen_not = &deepen_not;
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 9f31837d30..e584e7d1ac 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -454,21 +454,24 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		if (!strcmp(arg, "--test-bitmap")) {
 			test_bitmap_walk(&revs);
 			return 0;
 		}
 		if (skip_prefix(arg, "--progress=", &arg)) {
 			show_progress = arg;
 			continue;
 		}
 
 		if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
-			parse_list_objects_filter(&filter_options, arg);
+			parse_list_objects_filter(
+				&filter_options,
+				arg,
+				/*allow_implicit_combine=*/1);
 			if (filter_options.choice && !revs.blob_objects)
 				die(_("object filtering requires --objects"));
 			if (filter_options.choice == LOFC_SPARSE_OID &&
 			    !filter_options.sparse_oid_value)
 				die(_("invalid sparse value '%s'"),
 				    filter_options.filter_spec);
 			continue;
 		}
 		if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
 			list_objects_filter_set_no_filter(&filter_options);
diff --git a/fetch-pack.c b/fetch-pack.c
index 3f24d0c8a6..c58fd9148a 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1661,21 +1661,24 @@ struct ref *fetch_pack(struct fetch_pack_args *args,
 		/*
 		 * The protocol does not support requesting that only the
 		 * wanted objects be sent, so approximate this by setting a
 		 * "blob:none" filter if no filter is already set. This works
 		 * for all object types: note that wanted blobs will still be
 		 * sent because they are directly specified as a "want".
 		 *
 		 * NEEDSWORK: Add an option in the protocol to request that
 		 * only the wanted objects be sent, and implement it.
 		 */
-		parse_list_objects_filter(&args->filter_options, "blob:none");
+		parse_list_objects_filter(
+			&args->filter_options,
+			"blob:none",
+			/*allow_implicit_combine=*/0);
 	}
 
 	if (version != protocol_v2 && !ref) {
 		packet_flush(fd[1]);
 		die(_("no matching remote head"));
 	}
 	if (version == protocol_v2) {
 		if (shallow->nr)
 			BUG("Protocol V2 does not provide shallows at this point in the fetch");
 		memset(&si, 0, sizeof(si));
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 647b2b220e..a0cc87c62b 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -1,18 +1,19 @@
 #include "cache.h"
 #include "commit.h"
 #include "config.h"
 #include "revision.h"
 #include "argv-array.h"
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "trace.h"
 
 static int parse_combine_filter(
 	struct list_objects_filter_options *filter_options,
 	const char *arg,
 	struct strbuf *errbuf);
 
 /*
  * Parse value of the argument to the "filter" keyword.
  * On the command line this looks like:
  *       --filter=<arg>
@@ -229,43 +230,107 @@ static int parse_combine_filter(
 
 cleanup:
 	strbuf_list_free(sub_specs);
 	if (result) {
 		list_objects_filter_release(filter_options);
 		memset(filter_options, 0, sizeof(*filter_options));
 	}
 	return result;
 }
 
-int parse_list_objects_filter(struct list_objects_filter_options *filter_options,
-			      const char *arg)
+static void add_url_encoded(struct strbuf *dest, const char *s)
 {
-	struct strbuf buf = STRBUF_INIT;
-	if (filter_options->choice)
-		die(_("multiple filter-specs cannot be combined"));
-	filter_options->filter_spec = strdup(arg);
-	if (gently_parse_list_objects_filter(filter_options, arg, &buf))
-		die("%s", buf.buf);
+	while (*s) {
+		if (*s <= ' ' || strchr(RESERVED_NON_WS, *s) ||
+			*s == '%' || *s == '+')
+			strbuf_addf(dest, "%%%02X", (int)*s);
+		else
+			strbuf_addf(dest, "%c", *s);
+		s++;
+	}
+}
+
+/*
+ * Returns a new filter-spec string by combining (with combine:) the two
+ * sub-specs. The caller gains ownership of a new string, and lhs and rhs are
+ * not freed.
+ */
+static char *combine_specs(const char *lhs, const char *rhs)
+{
+	struct strbuf combined = STRBUF_INIT;
+	if (starts_with(lhs, "combine:")) {
+		strbuf_addf(&combined, "%s", lhs);
+	} else {
+		strbuf_addf(&combined, "combine:");
+		add_url_encoded(&combined, lhs);
+	}
+	strbuf_addf(&combined, "+");
+
+	add_url_encoded(&combined, rhs);
+	trace_printf("Generated composite filter-spec: %s\n", combined.buf);
+	return strbuf_detach(&combined, NULL);
+}
+
+int parse_list_objects_filter(
+	struct list_objects_filter_options *filter_options,
+	const char *arg,
+	int allow_implicit_combine)
+{
+	struct strbuf errbuf = STRBUF_INIT;
+	if (filter_options->choice) {
+		struct list_objects_filter_options *lhs;
+
+		if (!allow_implicit_combine)
+			die(_("multiple filter-specs cannot be combined"));
+
+		lhs = xcalloc(1, sizeof(*lhs));
+		*lhs = *filter_options;
+		memset(filter_options, 0, sizeof(*filter_options));
+
+		filter_options->lhs = lhs;
+		filter_options->rhs = xcalloc(1, sizeof(*filter_options->rhs));
+		filter_options->choice = LOFC_COMBINE;
+
+		/*
+		 * Build up the filter-spec string using the already-parsed
+		 * portion (the lhs) and the to-be-parsed portion (the rhs).
+		 */
+		filter_options->filter_spec = combine_specs(
+			lhs->filter_spec, arg);
+		FREE_AND_NULL(lhs->filter_spec);
+
+		/*
+		 * The gentle parse function below will populate the rhs of the
+		 * combined filter. But the caller of *this* function sees
+		 * filter_options as the combined filter.
+		 */
+		filter_options = filter_options->rhs;
+	} else {
+		filter_options->filter_spec = strdup(arg);
+	}
+	if (gently_parse_list_objects_filter(filter_options, arg, &errbuf))
+		die("%s", errbuf.buf);
 	return 0;
 }
 
 int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset)
 {
 	struct list_objects_filter_options *filter_options = opt->value;
 
 	if (unset || !arg) {
 		list_objects_filter_set_no_filter(filter_options);
 		return 0;
 	}
 
-	return parse_list_objects_filter(filter_options, arg);
+	return parse_list_objects_filter(
+		filter_options, arg, /*allow_implicit_combine=*/1);
 }
 
 void expand_list_objects_filter_spec(
 	const struct list_objects_filter_options *filter,
 	struct strbuf *expanded_spec)
 {
 	strbuf_init(expanded_spec, strlen(filter->filter_spec));
 	if (filter->choice == LOFC_BLOB_LIMIT)
 		strbuf_addf(expanded_spec, "blob:limit=%lu",
 			    filter->blob_limit_value);
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 6c0f0ecd08..a3ad00fde2 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -55,21 +55,22 @@ struct list_objects_filter_options {
 	/*
 	 * END choice-specific parsed values.
 	 */
 };
 
 /* Normalized command line arguments */
 #define CL_ARG__FILTER "filter"
 
 int parse_list_objects_filter(
 	struct list_objects_filter_options *filter_options,
-	const char *arg);
+	const char *arg,
+	int allow_implicit_combine);
 
 int opt_parse_list_objects_filter(const struct option *opt,
 				  const char *arg, int unset);
 
 #define OPT_PARSE_LIST_OBJECTS_FILTER(fo) \
 	{ OPTION_CALLBACK, 0, CL_ARG__FILTER, fo, N_("args"), \
 	  N_("object filtering"), 0, \
 	  opt_parse_list_objects_filter }
 
 /*
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 9a8f9886b3..11536f4028 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -201,20 +201,39 @@ test_expect_success 'use fsck before and after manually fetching a missing subtr
 	test_line_count = 70 fetched_objects &&
 
 	awk -f print_1.awk fetched_objects |
 	xargs -n1 git -C dst cat-file -t >fetched_types &&
 
 	sort -u fetched_types >unique_types.observed &&
 	test_write_lines blob commit tree >unique_types.expected &&
 	test_cmp unique_types.expected unique_types.observed
 '
 
+test_expect_success 'implicitly construct combine: filter with repeated flags' '
+	GIT_TRACE=$(pwd)/trace git clone --bare \
+		--filter=blob:none --filter=tree:1 \
+		"file://$(pwd)/srv.bare" pc2 &&
+	grep "trace:.* git pack-objects .*--filter=combine:blob:none+tree:1" \
+		trace &&
+	git -C pc2 rev-list --objects --missing=allow-any HEAD >objects &&
+
+	# We should have gotten some root trees.
+	grep " $" objects &&
+	# Should not have gotten any non-root trees or blobs.
+	! grep " ." objects &&
+
+	xargs -n 1 git -C pc2 cat-file -t <objects >types &&
+	sort -u types >unique_types.actual &&
+	test_write_lines commit tree >unique_types.expected &&
+	test_cmp unique_types.expected unique_types.actual
+'
+
 test_expect_success 'partial clone fetches blobs pointed to by refs even if normally filtered out' '
 	rm -rf src dst &&
 	git init src &&
 	test_commit -C src x &&
 	test_config -C src uploadpack.allowfilter 1 &&
 	test_config -C src uploadpack.allowanysha1inwant 1 &&
 
 	# Create a tag pointing to a blob.
 	BLOB=$(echo blob-contents | git -C src hash-object --stdin -w) &&
 	git -C src tag myblob "$BLOB" &&
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index ddfacb1a1a..104248c73d 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -357,21 +357,30 @@ test_expect_success 'verify tree:3 includes everything expected' '
 
 test_expect_success 'combine:... for a simple combination' '
 	git -C r3 rev-list --objects --filter=combine:tree:2+blob:none HEAD \
 		>actual &&
 
 	expect_has HEAD "" &&
 	expect_has HEAD~1 "" &&
 	expect_has HEAD dir1 &&
 
 	# There are also 2 commit objects
-	test_line_count = 5 actual
+	test_line_count = 5 actual &&
+
+	cp actual expected &&
+
+	# Try again using repeated --filter - this is equivalent to a manual
+	# combine with "combine:...+..."
+	git -C r3 rev-list --objects --filter=combine:tree:2 \
+		--filter=blob:none HEAD >actual &&
+
+	test_cmp expected actual
 '
 
 test_expect_success 'combine:... with URL encoding' '
 	git -C r3 rev-list --objects \
 		--filter=combine:tree%3a2+blob:%6Eon%65 HEAD >actual &&
 
 	expect_has HEAD "" &&
 	expect_has HEAD~1 "" &&
 	expect_has HEAD dir1 &&
 
@@ -459,21 +468,46 @@ test_expect_success 'combine:... with more than two sub-filters' '
 	test_line_count = 7 actual &&
 
 	# Try again, this time making sure the last sub-filter is only
 	# URL-decoded once.
 	cp pattern1 pattern1+renamed% &&
 	cp actual expect &&
 
 	git -C r3 rev-list --objects \
 		--filter=combine:tree:3+blob:limit=40+sparse:path=../pattern1%2brenamed%25 \
 		HEAD >actual &&
-	test_cmp expect actual
+	test_cmp expect actual &&
+
+	# Use the same composite filter again, but with a pattern file name that
+	# requires encoding multiple characters, and use implicit filter
+	# combining.
+	cp pattern1 "p;at%ter+n" &&
+	GIT_TRACE=$(pwd)/trace git -C r3 rev-list --objects \
+		--filter=tree:3 --filter=blob:limit=40 \
+		--filter=sparse:path="../p;at%ter+n" \
+		HEAD >actual &&
+
+	test_cmp expect actual &&
+	grep "Generated composite filter-spec: combine:tree:3+blob:limit=40+sparse:path=../p%3Bat%25ter%2B" \
+		trace &&
+
+	# Repeat the above test, but this time, the characters to encode are in
+	# the LHS of the combined filter.
+	cp pattern1 "^~pattern" &&
+	GIT_TRACE=$(pwd)/trace git -C r3 rev-list --objects \
+		--filter=sparse:path="../^~pattern" \
+		--filter=tree:3 --filter=blob:limit=40 \
+		HEAD >actual &&
+
+	test_cmp expect actual &&
+	grep "Generated composite filter-spec: combine:sparse:path=../%5E%7Epattern+tree:3+blob:limit=40" \
+		trace
 '
 
 # Test provisional omit collection logic with a repo that has objects appearing
 # at multiple depths - first deeper than the filter's threshold, then shallow.
 
 test_expect_success 'setup r4' '
 	git init r4 &&
 
 	echo foo > r4/foo &&
 	mkdir r4/subdir &&
diff --git a/transport.c b/transport.c
index f1fcd2c4b0..63100da143 100644
--- a/transport.c
+++ b/transport.c
@@ -217,21 +217,24 @@ static int set_git_option(struct git_transport_options *opts,
 	} else if (!strcmp(name, TRANS_OPT_DEEPEN_RELATIVE)) {
 		opts->deepen_relative = !!value;
 		return 0;
 	} else if (!strcmp(name, TRANS_OPT_FROM_PROMISOR)) {
 		opts->from_promisor = !!value;
 		return 0;
 	} else if (!strcmp(name, TRANS_OPT_NO_DEPENDENTS)) {
 		opts->no_dependents = !!value;
 		return 0;
 	} else if (!strcmp(name, TRANS_OPT_LIST_OBJECTS_FILTER)) {
-		parse_list_objects_filter(&opts->filter_options, value);
+		parse_list_objects_filter(
+			&opts->filter_options,
+			value,
+			/*allow_implicit_combine=*/0);
 		return 0;
 	}
 	return 1;
 }
 
 static int connect_setup(struct transport *transport, int for_push)
 {
 	struct git_transport_data *data = transport->data;
 	int flags = transport->verbose > 0 ? CONNECT_VERBOSE : 0;
 
diff --git a/upload-pack.c b/upload-pack.c
index d2ea5eb20d..e3f2618600 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -877,21 +877,24 @@ static void receive_needs(struct packet_reader *reader, struct object_array *wan
 		if (process_deepen(reader->line, &depth))
 			continue;
 		if (process_deepen_since(reader->line, &deepen_since, &deepen_rev_list))
 			continue;
 		if (process_deepen_not(reader->line, &deepen_not, &deepen_rev_list))
 			continue;
 
 		if (skip_prefix(reader->line, "filter ", &arg)) {
 			if (!filter_capability_requested)
 				die("git upload-pack: filtering capability not negotiated");
-			parse_list_objects_filter(&filter_options, arg);
+			parse_list_objects_filter(
+				&filter_options,
+				arg,
+				/*allow_implicit_combine=*/0);
 			continue;
 		}
 
 		if (!skip_prefix(reader->line, "want ", &arg) ||
 		    parse_oid_hex(arg, &oid_buf, &features))
 			die("git upload-pack: protocol error, "
 			    "expected to get object ID, not '%s'", reader->line);
 
 		if (parse_feature_request(features, "deepen-relative"))
 			deepen_relative = 1;
@@ -1296,21 +1299,24 @@ static void process_args(struct packet_reader *request,
 			continue;
 		if (process_deepen_not(arg, &data->deepen_not,
 				       &data->deepen_rev_list))
 			continue;
 		if (!strcmp(arg, "deepen-relative")) {
 			data->deepen_relative = 1;
 			continue;
 		}
 
 		if (allow_filter && skip_prefix(arg, "filter ", &p)) {
-			parse_list_objects_filter(&filter_options, p);
+			parse_list_objects_filter(
+				&filter_options,
+				p,
+				/*allow_implicit_combine=*/0);
 			continue;
 		}
 
 		if ((git_env_bool("GIT_TEST_SIDEBAND_ALL", 0) ||
 		     allow_sideband_all) &&
 		    !strcmp(arg, "sideband-all")) {
 			data->writer.use_sideband = 1;
 			continue;
 		}
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 1/5] list-objects-filter: refactor into a context struct
  2019-05-22  0:21 ` [PATCH v1 1/5] list-objects-filter: refactor into a context struct Matthew DeVore
@ 2019-05-24  0:49   ` Emily Shaffer
  2019-05-28 18:48     ` Matthew DeVore
  0 siblings, 1 reply; 41+ messages in thread
From: Emily Shaffer @ 2019-05-24  0:49 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: jonathantanmy, jrn, git, dstolee, jeffhost, jrnieder, pclouds,
	matvore

On Tue, May 21, 2019 at 05:21:50PM -0700, Matthew DeVore wrote:
> The next patch will create and manage filters in a new way, which means
> that this bundle of data will have to be managed at a new callsite. Make
> this bundle of data more manageable by putting it in a struct and
> making it part of the list-objects-filter module's API.

This commit message might read more easily on its own if you define
"this bundle of data" at least once. Since there are things being moved
from both list-objects-filter.c (filter_blobs_none_data) and
list-objects-filter.h (list_objects_filter_result and filter_free_fn)
into the new struct in list-objects-filter.h, it's not immediately clear
to me from the diff what's going on.

[snip]
> -static void *filter_blobs_none__init(
> -	struct oidset *omitted,
> +static void filter_blobs_none__init(
>  	struct list_objects_filter_options *filter_options,
> -	filter_object_fn *filter_fn,
> -	filter_free_fn *filter_free_fn)
> +	struct filter_context *ctx)
>  {
> -	struct filter_blobs_none_data *d = xcalloc(1, sizeof(*d));
> -	d->omits = omitted;
> -
> -	*filter_fn = filter_blobs_none;
> -	*filter_free_fn = free;
> -	return d;
> +	ctx->filter_fn = filter_blobs_none;

I think you want to set ctx->free_fn here too, right? It seems like
you're not setting ctx->omitted anymore because you'd be reading that
information in from ctx->omitted (so it's redundant).

>  }

[snip]
> -/*
> - * A filter for list-objects to omit large blobs.
> - * And to OPTIONALLY collect a list of the omitted OIDs.
> - */
> +/* A filter for list-objects to omit large blobs. */
>  struct filter_blobs_limit_data {
> -	struct oidset *omits;
>  	unsigned long max_bytes;

I suppose I don't have a good enough grasp of the architecture here to
follow why you want to move 'omits' but not 'max_bytes' into the new
struct. Maybe it will become clear as I look at the rest of your patches
:)

>  };

Most of this patch looks like a pretty straightforward conversion.
Thanks.

 - Emily

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 2/5] list-objects-filter-options: error is localizeable
  2019-05-22  0:21 ` [PATCH v1 2/5] list-objects-filter-options: error is localizeable Matthew DeVore
@ 2019-05-24  0:55   ` Emily Shaffer
  2019-05-28 23:01     ` Matthew DeVore
  0 siblings, 1 reply; 41+ messages in thread
From: Emily Shaffer @ 2019-05-24  0:55 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: jonathantanmy, jrn, git, dstolee, jeffhost, jrnieder, pclouds,
	matvore

On Tue, May 21, 2019 at 05:21:51PM -0700, Matthew DeVore wrote:
> The "invalid filter-spec" message is user-facing and not a BUG, so make
> it localizeable.

What does it look like? Is it human-readable in its current form? I ask
because (without having looked ahead) I think you're going to move it
from a BUG to a user-facing error in a later patchset (which I don't
think the commit message reflects well). If I'm wrong, and today it's a
possibly user-facing string, then I think this patch could stand on its
own outside of this patchset, which would probably save you some effort
keeping a simple change in limbo for a long time. But if I'm right, I
think the commit message could use some work.

 - Emily

> 
> Signed-off-by: Matthew DeVore <matvore@google.com>
> ---
>  list-objects-filter-options.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> index c0036f7378..e46ea467bc 100644
> --- a/list-objects-filter-options.c
> +++ b/list-objects-filter-options.c
> @@ -81,21 +81,21 @@ static int gently_parse_list_objects_filter(
>  		filter_options->choice = LOFC_SPARSE_PATH;
>  		filter_options->sparse_path_value = strdup(v0);
>  		return 0;
>  	}
>  	/*
>  	 * Please update _git_fetch() in git-completion.bash when you
>  	 * add new filters
>  	 */
>  
>  	if (errbuf)
> -		strbuf_addf(errbuf, "invalid filter-spec '%s'", arg);
> +		strbuf_addf(errbuf, _("invalid filter-spec '%s'"), arg);
>  
>  	memset(filter_options, 0, sizeof(*filter_options));
>  	return 1;
>  }
>  
>  int parse_list_objects_filter(struct list_objects_filter_options *filter_options,
>  			      const char *arg)
>  {
>  	struct strbuf buf = STRBUF_INIT;
>  	if (gently_parse_list_objects_filter(filter_options, arg, &buf))
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-05-22  0:21 ` [PATCH v1 3/5] list-objects-filter: implement composite filters Matthew DeVore
@ 2019-05-24 21:01   ` Jeff Hostetler
  2019-05-28 17:59     ` Junio C Hamano
  2019-06-01  0:11     ` Matthew DeVore
  2019-05-28 21:53   ` Emily Shaffer
  1 sibling, 2 replies; 41+ messages in thread
From: Jeff Hostetler @ 2019-05-24 21:01 UTC (permalink / raw)
  To: Matthew DeVore, jonathantanmy, jrn, git, dstolee, jeffhost,
	jrnieder, pclouds
  Cc: matvore



On 5/21/2019 8:21 PM, Matthew DeVore wrote:
> Allow combining filters such that only objects accepted by all filters
> are shown. The motivation for this is to allow getting directory
> listings without also fetching blobs. This can be done by combining
> blob:none with tree:<depth>. There are massive repositories that have
> larger-than-expected trees - even if you include only a single commit.
> 
> The current usage requires passing the filter to rev-list, or sending
> it over the wire, as:
> 
> 	combine:<FILTER1>+<FILTER2>

I must admit I'm not a fan of this syntax and the URL-encoding
that it requires.  I see that this was already discussed in the
RFC version [1] last week, but I'll repeat it here.

I like the repeated used of the "--filter=<f_k>" command line option.


In the RFC version, there was discussion [2] of the wire format
and the need to be backwards compatible with existing servers and
so use the "combine:" syntax so that we only have a single filter
line on the wire.  Would it be better to have compliant servers
advertise a "filters" (plural) capability in addition to the
existing "filter" (singular) capability?  Then the client would
know that it could send a series of filter lines using the existing
syntax.  Likewise, if the "filters" capability was omitted, the
client could error out without the extra round-trip.


[1] https://public-inbox.org/git/xmqqwoip3gp0.fsf@gitster-ct.c.googlers.com/
[2] 
https://public-inbox.org/git/1E174CAA-BD57-400B-A83B-4AABFAFBC04B@comcast.net/


[...]
>   standard input when --stdin is used). <depth>=1 will include only the
>   tree and blobs which are referenced directly by a commit reachable from
>   <commit> or an explicitly-given object. <depth>=2 is like <depth>=1
>   while also including trees and blobs one more level removed from an
>   explicitly-given commit or tree.
> ++
> +The form '--filter=combine:<filter1>+<filter2>+...<filterN>' combines
> +several filters.

We are allowing an unlimited number of filters in the composition.
In the code, the compose filter data has space for a LHS and RHS, so
I'm assuming we're mapping

     --filter=f1 --filter=f2 --filter=f3 --filter=f4
or  --filter=combine:f1+f2+f3+f4
into basically
     (compose f1 (compose f2 (compose (f3 f4)))

I wonder if it would be easier to understand if we just built an array
or linked list, but I'll read on.

>                    Only objects which are accepted by every filter are
> +included. Filters are joined by '{plus}' and individual filters are %-encoded
> +(i.e. URL-encoded). Besides the '{plus}' and '%' characters, the following
> +characters are reserved and also must be encoded:
> +`~!@#$^&*()[]{}\;",<>?`+&#39;&#96;+ as well as all characters with ASCII code
> +&lt;= `0x20`, which includes space and newline.
[...]


> diff --git a/list-objects-filter.c b/list-objects-filter.c
> index 8e8616b9b8..b97277a46f 100644
> --- a/list-objects-filter.c
> +++ b/list-objects-filter.c
> @@ -453,34 +453,148 @@ static void filter_sparse_path__init(
>   
>   	ALLOC_GROW(d->array_frame, d->nr + 1, d->alloc);
>   	d->array_frame[d->nr].defval = 0; /* default to include */
>   	d->array_frame[d->nr].child_prov_omit = 0;
>   
>   	ctx->filter_fn = filter_sparse;
>   	ctx->free_fn = filter_sparse_free;
>   	ctx->data = d;
>   }
>   
> +struct filter_combine_data {
> +	/* sub[0] corresponds to lhs, sub[1] to rhs. */
> +	struct {
> +		struct filter_context ctx;
> +		struct oidset seen;
> +		struct object_id skip_tree;
> +		unsigned is_skipping_tree : 1;
> +	} sub[2];
> +
> +	struct oidset rhs_omits;
> +};
> +
> +static void add_all(struct oidset *dest, struct oidset *src) {
> +	struct oidset_iter iter;
> +	struct object_id *src_oid;
> +
> +	oidset_iter_init(src, &iter);
> +	while ((src_oid = oidset_iter_next(&iter)) != NULL)
> +		oidset_insert(dest, src_oid);
> +}
> +
> +static void filter_combine_free(void *filter_data)
> +{
> +	struct filter_combine_data *d = filter_data;
> +	int i;
> +
> +	/* Anything omitted by rhs should be added to the overall omits set. */
> +	if (d->sub[0].ctx.omits)
> +		add_all(d->sub[0].ctx.omits, d->sub[1].ctx.omits);
> +
> +	for (i = 0; i < 2; i++) {
> +		list_objects_filter__release(&d->sub[i].ctx);
> +		oidset_clear(&d->sub[i].seen);
> +	}
> +	oidset_clear(&d->rhs_omits);
> +	free(d);
> +}
> +
> +static int should_delegate(enum list_objects_filter_situation filter_situation,
> +			   struct object *obj,
> +			   struct filter_combine_data *d,
> +			   int side)
> +{
> +	if (!d->sub[side].is_skipping_tree)
> +		return 1;
> +	if (filter_situation == LOFS_END_TREE &&
> +		oideq(&obj->oid, &d->sub[side].skip_tree)) {
> +		d->sub[side].is_skipping_tree = 0;
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +static enum list_objects_filter_result filter_combine(
> +	struct repository *r,
> +	enum list_objects_filter_situation filter_situation,
> +	struct object *obj,
> +	const char *pathname,
> +	const char *filename,
> +	struct filter_context *ctx)
> +{
> +	struct filter_combine_data *d = ctx->data;
> +	enum list_objects_filter_result result[2];
> +	enum list_objects_filter_result combined_result = LOFR_ZERO;
> +	int i;
> +
> +	for (i = 0; i < 2; i++) {
> +		if (oidset_contains(&d->sub[i].seen, &obj->oid) ||
> +			!should_delegate(filter_situation, obj, d, i)) {

Should we swap the order of the terms in the || so that we always
clear the d->sub[i].is_skipping_tree on LOFS_END_TREE ?


> +			result[i] = LOFR_ZERO;
> +			continue;
> +		}
> +
> +		result[i] = d->sub[i].ctx.filter_fn(
> +			r, filter_situation, obj, pathname, filename,
> +			&d->sub[i].ctx);
> +
> +		if (result[i] & LOFR_MARK_SEEN)
> +			oidset_insert(&d->sub[i].seen, &obj->oid);

So filter[i] has said it never wants to show this object (hard omit).
And the guard at the top of the loop will prevent future invocations
from checking it again if the object is revisited.

> +
> +		if (result[i] & LOFR_SKIP_TREE) {
> +			d->sub[i].is_skipping_tree = 1;
> +			d->sub[i].skip_tree = obj->oid;

So this marks the tree object at the top of the skip as far as
filter[i] is concerned.

> +		}
> +	}
> +
> +	if ((result[0] & LOFR_DO_SHOW) && (result[1] & LOFR_DO_SHOW))
> +		combined_result |= LOFR_DO_SHOW;
> +	if (d->sub[0].is_skipping_tree && d->sub[1].is_skipping_tree)
> +		combined_result |= LOFR_SKIP_TREE;

Something about the above bothers me, but I can't quite say what
it is.

Do we need to do:
     if ((result[0] & LOFR_MARK_SEEN) && (result[1] & LOFR_MARK_SEEN))
         combined_result |= LOFR_MARK_SEEN;


> +
> +	return combined_result;
> +}
[...]


I'm out of time now, will pick this up again next week.

Thanks
Jeff


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-05-24 21:01   ` Jeff Hostetler
@ 2019-05-28 17:59     ` Junio C Hamano
  2019-05-29 15:02       ` Matthew DeVore
  2019-06-01  0:11     ` Matthew DeVore
  1 sibling, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2019-05-28 17:59 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Matthew DeVore, jonathantanmy, jrn, git, dstolee, jeffhost,
	jrnieder, pclouds, matvore

Jeff Hostetler <git@jeffhostetler.com> writes:

> In the RFC version, there was discussion [2] of the wire format
> and the need to be backwards compatible with existing servers and
> so use the "combine:" syntax so that we only have a single filter
> line on the wire.  Would it be better to have compliant servers
> advertise a "filters" (plural) capability in addition to the
> existing "filter" (singular) capability?  Then the client would
> know that it could send a series of filter lines using the existing
> syntax.  Likewise, if the "filters" capability was omitted, the
> client could error out without the extra round-trip.

All good ideas.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 1/5] list-objects-filter: refactor into a context struct
  2019-05-24  0:49   ` Emily Shaffer
@ 2019-05-28 18:48     ` Matthew DeVore
  2019-05-28 22:40       ` [PATCH] list-objects-filter: merge filter data structs Matthew DeVore
  0 siblings, 1 reply; 41+ messages in thread
From: Matthew DeVore @ 2019-05-28 18:48 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: Matthew DeVore, jonathantanmy, jrn, git, dstolee, jeffhost,
	jrnieder, pclouds

On Thu, May 23, 2019 at 05:49:38PM -0700, Emily Shaffer wrote:
> This commit message might read more easily on its own if you define
> "this bundle of data" at least once. Since there are things being moved
> from both list-objects-filter.c (filter_blobs_none_data) and
> list-objects-filter.h (list_objects_filter_result and filter_free_fn)
> into the new struct in list-objects-filter.h, it's not immediately clear
> to me from the diff what's going on.
> 

I will take care to define the context struct in the commit message and the
code comments.

I was originally intending to put this in the context struct: data common to all
filter types used when executing a filter. But now I'm leaning toward just
using the struct as a grab-bag for execution data for *all* filter types. This
would be more similar to how the list_objects_filter_options struct works, and
it would save having to define a special free_fn for each (actually, most)
filter types. I'll play around with this idea and probably put it in the next
roll-up (comments welcome, though).

> > -static void *filter_blobs_none__init(
> > -	struct oidset *omitted,
> > +static void filter_blobs_none__init(
> >  	struct list_objects_filter_options *filter_options,
> > -	filter_object_fn *filter_fn,
> > -	filter_free_fn *filter_free_fn)
> > +	struct filter_context *ctx)
> >  {
> > -	struct filter_blobs_none_data *d = xcalloc(1, sizeof(*d));
> > -	d->omits = omitted;
> > -
> > -	*filter_fn = filter_blobs_none;
> > -	*filter_free_fn = free;
> > -	return d;
> > +	ctx->filter_fn = filter_blobs_none;
> 
> I think you want to set ctx->free_fn here too, right? It seems like
> you're not setting ctx->omitted anymore because you'd be reading that
> information in from ctx->omitted (so it's redundant).
> 

Not necessary because the blobs:none filter type doesn't have filter-specific
data for it. blobs:none is unique in that regard.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-05-22  0:21 ` [PATCH v1 3/5] list-objects-filter: implement composite filters Matthew DeVore
  2019-05-24 21:01   ` Jeff Hostetler
@ 2019-05-28 21:53   ` Emily Shaffer
  2019-05-31 20:48     ` Matthew DeVore
  1 sibling, 1 reply; 41+ messages in thread
From: Emily Shaffer @ 2019-05-28 21:53 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: jonathantanmy, jrn, git, dstolee, jeffhost, jrnieder, pclouds,
	matvore

On Tue, May 21, 2019 at 05:21:52PM -0700, Matthew DeVore wrote:
> Allow combining filters such that only objects accepted by all filters
> are shown. The motivation for this is to allow getting directory
> listings without also fetching blobs. This can be done by combining
> blob:none with tree:<depth>. There are massive repositories that have
> larger-than-expected trees - even if you include only a single commit.
> 
> The current usage requires passing the filter to rev-list, or sending
> it over the wire, as:
> 
> 	combine:<FILTER1>+<FILTER2>
> 
> (i.e.: git rev-list --filter=combine:tree:2+blob:limit=32k). This is
> potentially awkward because individual filters must be URL-encoded if
> they contain + or %. This can potentially be improved by supporting a
> repeated flag syntax, e.g.:
> 
> 	$ git rev-list --filter=tree:2 --filter=blob:limit=32k
> 
> Such usage is currently an error, so giving it a meaning is backwards-
> compatible.
> 
> Signed-off-by: Matthew DeVore <matvore@google.com>
> ---
>  Documentation/rev-list-options.txt     |  12 ++
>  contrib/completion/git-completion.bash |   2 +-
>  list-objects-filter-options.c          | 161 ++++++++++++++++++++++++-
>  list-objects-filter-options.h          |  14 ++-
>  list-objects-filter.c                  | 114 +++++++++++++++++
>  t/t6112-rev-list-filters-objects.sh    | 159 +++++++++++++++++++++++-
>  6 files changed, 455 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
> index ddbc1de43f..4fb0c4fbb0 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -730,20 +730,32 @@ specification contained in <path>.
>  +
>  The form '--filter=tree:<depth>' omits all blobs and trees whose depth
>  from the root tree is >= <depth> (minimum depth if an object is located
>  at multiple depths in the commits traversed). <depth>=0 will not include
>  any trees or blobs unless included explicitly in the command-line (or
>  standard input when --stdin is used). <depth>=1 will include only the
>  tree and blobs which are referenced directly by a commit reachable from
>  <commit> or an explicitly-given object. <depth>=2 is like <depth>=1
>  while also including trees and blobs one more level removed from an
>  explicitly-given commit or tree.
> ++
> +The form '--filter=combine:<filter1>+<filter2>+...<filterN>' combines
> +several filters. Only objects which are accepted by every filter are
> +included. Filters are joined by '{plus}' and individual filters are %-encoded
> +(i.e. URL-encoded). Besides the '{plus}' and '%' characters, the following
> +characters are reserved and also must be encoded:
> +`~!@#$^&*()[]{}\;",<>?`+&#39;&#96;+ as well as all characters with ASCII code
> +&lt;= `0x20`, which includes space and newline.
> ++
> +Other arbitrary characters can also be encoded. For instance,
> +'combine:tree:3+blob:none' and 'combine:tree%3A2+blob%3Anone' are
> +equivalent.
>  
>  --no-filter::
>  	Turn off any previous `--filter=` argument.
>  
>  --filter-print-omitted::
>  	Only useful with `--filter=`; prints a list of the objects omitted
>  	by the filter.  Object IDs are prefixed with a ``~'' character.
>  
>  --missing=<missing-action>::
>  	A debug option to help with future "partial clone" development.
> diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
> index 3eefbabdb1..0fd0a10d0c 100644
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -1529,21 +1529,21 @@ _git_difftool ()
>  __git_fetch_recurse_submodules="yes on-demand no"
>  
>  _git_fetch ()
>  {
>  	case "$cur" in
>  	--recurse-submodules=*)
>  		__gitcomp "$__git_fetch_recurse_submodules" "" "${cur##--recurse-submodules=}"
>  		return
>  		;;
>  	--filter=*)
> -		__gitcomp "blob:none blob:limit= sparse:oid= sparse:path=" "" "${cur##--filter=}"
> +		__gitcomp "blob:none blob:limit= sparse:oid= sparse:path= combine: tree:" "" "${cur##--filter=}"
>  		return
>  		;;
>  	--*)
>  		__gitcomp_builtin fetch
>  		return
>  		;;
>  	esac
>  	__git_complete_remote_or_refspec
>  }
>  
> diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> index e46ea467bc..d7a1516188 100644
> --- a/list-objects-filter-options.c
> +++ b/list-objects-filter-options.c
> @@ -1,19 +1,24 @@
>  #include "cache.h"
>  #include "commit.h"
>  #include "config.h"
>  #include "revision.h"
>  #include "argv-array.h"
>  #include "list-objects.h"
>  #include "list-objects-filter.h"
>  #include "list-objects-filter-options.h"
>  
> +static int parse_combine_filter(
> +	struct list_objects_filter_options *filter_options,
> +	const char *arg,
> +	struct strbuf *errbuf);
> +
>  /*
>   * Parse value of the argument to the "filter" keyword.
>   * On the command line this looks like:
>   *       --filter=<arg>
>   * and in the pack protocol as:
>   *       "filter" SP <arg>
>   *
>   * The filter keyword will be used by many commands.
>   * See Documentation/rev-list-options.txt for allowed values for <arg>.
>   *
> @@ -31,22 +36,20 @@ static int gently_parse_list_objects_filter(
>  
>  	if (filter_options->choice) {
>  		if (errbuf) {
>  			strbuf_addstr(
>  				errbuf,
>  				_("multiple filter-specs cannot be combined"));
>  		}
>  		return 1;
>  	}
>  
> -	filter_options->filter_spec = strdup(arg);
> -
>  	if (!strcmp(arg, "blob:none")) {
>  		filter_options->choice = LOFC_BLOB_NONE;
>  		return 0;
>  
>  	} else if (skip_prefix(arg, "blob:limit=", &v0)) {
>  		if (git_parse_ulong(v0, &filter_options->blob_limit_value)) {
>  			filter_options->choice = LOFC_BLOB_LIMIT;
>  			return 0;
>  		}
>  
> @@ -74,37 +77,183 @@ static int gently_parse_list_objects_filter(
>  		if (!get_oid_with_context(the_repository, v0, GET_OID_BLOB,
>  					  &sparse_oid, &oc))
>  			filter_options->sparse_oid_value = oiddup(&sparse_oid);
>  		filter_options->choice = LOFC_SPARSE_OID;
>  		return 0;
>  
>  	} else if (skip_prefix(arg, "sparse:path=", &v0)) {
>  		filter_options->choice = LOFC_SPARSE_PATH;
>  		filter_options->sparse_path_value = strdup(v0);
>  		return 0;
> +
> +	} else if (skip_prefix(arg, "combine:", &v0)) {
> +		int sub_parse_res = parse_combine_filter(
> +			filter_options, v0, errbuf);
> +		if (sub_parse_res)
> +			return sub_parse_res;
> +		return 0;

Couldn't the three lines above be said more succinctly as "return
sub_parse_res;"?

> +
>  	}
>  	/*
>  	 * Please update _git_fetch() in git-completion.bash when you
>  	 * add new filters
>  	 */
>  
>  	if (errbuf)
>  		strbuf_addf(errbuf, _("invalid filter-spec '%s'"), arg);
>  
>  	memset(filter_options, 0, sizeof(*filter_options));
>  	return 1;
>  }
>  
> +static int digit_value(int c, struct strbuf *errbuf) {
> +	if (c >= '0' && c <= '9')
> +		return c - '0';
> +	if (c >= 'a' && c <= 'f')
> +		return c - 'a' + 10;
> +	if (c >= 'A' && c <= 'F')
> +		return c - 'A' + 10;

I'm sure there's something I'm missing here. But why are you manually
decoding hex instead of using strtol or sscanf or something?

> +
> +	if (!errbuf)
> +		return -1;
> +
> +	strbuf_addf(errbuf, _("error in filter-spec - "));
> +	if (c)
> +		strbuf_addf(
> +			errbuf,
> +			_("expect two hex digits after %%, but got: '%c'"),
> +			c);
> +	else
> +		strbuf_addf(
> +			errbuf,
> +			_("not enough hex digits after %%; expected two"));
> +
> +	return -1;
> +}
> +
> +static int url_decode(struct strbuf *s, struct strbuf *errbuf) {
> +	char *dest = s->buf;
> +	char *src = s->buf;
> +	size_t new_len;
> +
> +	while (*src) {
> +		int digit_value_0, digit_value_1;
> +
> +		if (src[0] != '%') {
> +			*dest++ = *src++;
> +			continue;
> +		}
> +		src++;
> +
> +		digit_value_0 = digit_value(*src++, errbuf);
> +		if (digit_value_0 < 0)
> +			return 1;
> +		digit_value_1 = digit_value(*src++, errbuf);
> +		if (digit_value_1 < 0)
> +			return 1;
> +		*dest++ = digit_value_0 * 16 + digit_value_1;
> +	}
> +	new_len = dest - s->buf;
> +	strbuf_remove(s, new_len, s->len - new_len);
> +
> +	return 0;
> +}
> +
> +static const char *RESERVED_NON_WS = "~`!@#$^&*()[]{}\\;'\",<>?";
> +
> +static int has_reserved_character(
> +	struct strbuf *sub_spec, struct strbuf *errbuf)
> +{
> +	const char *c = sub_spec->buf;
> +	while (*c) {
> +		if (*c <= ' ' || strchr(RESERVED_NON_WS, *c))
> +			goto found_reserved;
> +		c++;
> +	}
> +
> +	return 0;
> +
> +found_reserved:

What's the value of doing this in a goto instead of embedded in the
while loop?

> +	if (errbuf)
> +		strbuf_addf(errbuf,
> +			    "must escape char in sub-filter-spec: '%c'",
> +			    *c);
> +	return 1;
> +}
> +
> +static int parse_combine_filter(
> +	struct list_objects_filter_options *filter_options,
> +	const char *arg,
> +	struct strbuf *errbuf)
> +{
> +	struct strbuf **sub_specs = strbuf_split_str(arg, '+', 2);
> +	int result;
> +
> +	if (!sub_specs[0]) {
> +		if (errbuf)
> +			strbuf_addf(errbuf,
> +				    _("expected something after combine:"));
> +		result = 1;
> +		goto cleanup;
> +	}
> +
> +	result = has_reserved_character(sub_specs[0], errbuf);
> +	if (result)
> +		goto cleanup;
> +
> +	/*
> +	 * Only decode the first sub-filter, since the rest will be decoded on
> +	 * the recursive call.
> +	 */
> +	result = url_decode(sub_specs[0], errbuf);
> +	if (result)
> +		goto cleanup;
> +
> +	if (!sub_specs[1]) {
> +		/*
> +		 * There is only one sub-filter, so we don't need the
> +		 * combine: - just parse it as a non-composite filter.
> +		 */
> +		result = gently_parse_list_objects_filter(
> +			filter_options, sub_specs[0]->buf, errbuf);
> +		goto cleanup;
> +	}
> +
> +	/* Remove trailing "+" so we can parse it. */
> +	assert(sub_specs[0]->buf[sub_specs[0]->len - 1] == '+');
> +	strbuf_remove(sub_specs[0], sub_specs[0]->len - 1, 1);
> +
> +	filter_options->choice = LOFC_COMBINE;
> +	filter_options->lhs = xcalloc(1, sizeof(*filter_options->lhs));
> +	filter_options->rhs = xcalloc(1, sizeof(*filter_options->rhs));
> +
> +	result = gently_parse_list_objects_filter(filter_options->lhs,
> +						  sub_specs[0]->buf,
> +						  errbuf) ||
> +		parse_combine_filter(filter_options->rhs,
> +				      sub_specs[1]->buf,
> +				      errbuf);

I guess you're recursing to combine filter 2 onto filter 1 which has
been combined onto filter 0 here. But why not just use a list or array?

> +
> +cleanup:
> +	strbuf_list_free(sub_specs);
> +	if (result) {
> +		list_objects_filter_release(filter_options);
> +		memset(filter_options, 0, sizeof(*filter_options));
> +	}
> +	return result;
> +}
> +
>  int parse_list_objects_filter(struct list_objects_filter_options *filter_options,
>  			      const char *arg)
>  {
>  	struct strbuf buf = STRBUF_INIT;
> +	filter_options->filter_spec = strdup(arg);
>  	if (gently_parse_list_objects_filter(filter_options, arg, &buf))
>  		die("%s", buf.buf);
>  	return 0;
>  }
>  
>  int opt_parse_list_objects_filter(const struct option *opt,
>  				  const char *arg, int unset)
>  {
>  	struct list_objects_filter_options *filter_options = opt->value;
>  
> @@ -127,23 +276,29 @@ void expand_list_objects_filter_spec(
>  	else if (filter->choice == LOFC_TREE_DEPTH)
>  		strbuf_addf(expanded_spec, "tree:%lu",
>  			    filter->tree_exclude_depth);
>  	else
>  		strbuf_addstr(expanded_spec, filter->filter_spec);
>  }
>  
>  void list_objects_filter_release(
>  	struct list_objects_filter_options *filter_options)
>  {
> +	if (!filter_options)
> +		return;
>  	free(filter_options->filter_spec);
>  	free(filter_options->sparse_oid_value);
>  	free(filter_options->sparse_path_value);
> +	list_objects_filter_release(filter_options->lhs);
> +	free(filter_options->lhs);
> +	list_objects_filter_release(filter_options->rhs);
> +	free(filter_options->rhs);

Is there a reason that the free shouldn't be included in
list_objects_filter_release()? Maybe this is a common style guideline
I've missed, but it seems to me like I'd expect a magic memory cleanup
function to do it all, and not leave it to me to free.

>  	memset(filter_options, 0, sizeof(*filter_options));
>  }
>  
>  void partial_clone_register(
>  	const char *remote,
>  	const struct list_objects_filter_options *filter_options)
>  {
>  	/*
>  	 * Record the name of the partial clone remote in the
>  	 * config and in the global variable -- the latter is
> @@ -171,14 +326,16 @@ void partial_clone_register(
>  }
>  
>  void partial_clone_get_default_filter_spec(
>  	struct list_objects_filter_options *filter_options)
>  {
>  	/*
>  	 * Parse default value, but silently ignore it if it is invalid.
>  	 */
>  	if (!core_partial_clone_filter_default)
>  		return;
> +
> +	filter_options->filter_spec = strdup(core_partial_clone_filter_default);
>  	gently_parse_list_objects_filter(filter_options,
>  					 core_partial_clone_filter_default,
>  					 NULL);
>  }
> diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
> index e3adc78ebf..6c0f0ecd08 100644
> --- a/list-objects-filter-options.h
> +++ b/list-objects-filter-options.h
> @@ -7,20 +7,21 @@
>  /*
>   * The list of defined filters for list-objects.
>   */
>  enum list_objects_filter_choice {
>  	LOFC_DISABLED = 0,
>  	LOFC_BLOB_NONE,
>  	LOFC_BLOB_LIMIT,
>  	LOFC_TREE_DEPTH,
>  	LOFC_SPARSE_OID,
>  	LOFC_SPARSE_PATH,
> +	LOFC_COMBINE,
>  	LOFC__COUNT /* must be last */
>  };
>  
>  struct list_objects_filter_options {
>  	/*
>  	 * 'filter_spec' is the raw argument value given on the command line
>  	 * or protocol request.  (The part after the "--keyword=".)  For
>  	 * commands that launch filtering sub-processes, or for communication
>  	 * over the network, don't use this value; use the result of
>  	 * expand_list_objects_filter_spec() instead.
> @@ -32,28 +33,35 @@ struct list_objects_filter_options {
>  	 * the filtering algorithm to use.
>  	 */
>  	enum list_objects_filter_choice choice;
>  
>  	/*
>  	 * Choice is LOFC_DISABLED because "--no-filter" was requested.
>  	 */
>  	unsigned int no_filter : 1;
>  
>  	/*
> -	 * Parsed values (fields) from within the filter-spec.  These are
> -	 * choice-specific; not all values will be defined for any given
> -	 * choice.
> +	 * BEGIN choice-specific parsed values from within the filter-spec. Only
> +	 * some values will be defined for any given choice.
>  	 */
> +
>  	struct object_id *sparse_oid_value;
>  	char *sparse_path_value;
>  	unsigned long blob_limit_value;
>  	unsigned long tree_exclude_depth;
> +
> +	/* LOFC_COMBINE values */
> +	struct list_objects_filter_options *lhs, *rhs;
> +
> +	/*
> +	 * END choice-specific parsed values.
> +	 */
>  };
>  
>  /* Normalized command line arguments */
>  #define CL_ARG__FILTER "filter"
>  
>  int parse_list_objects_filter(
>  	struct list_objects_filter_options *filter_options,
>  	const char *arg);
>  
>  int opt_parse_list_objects_filter(const struct option *opt,
> diff --git a/list-objects-filter.c b/list-objects-filter.c
> index 8e8616b9b8..b97277a46f 100644
> --- a/list-objects-filter.c
> +++ b/list-objects-filter.c
> @@ -453,34 +453,148 @@ static void filter_sparse_path__init(
>  
>  	ALLOC_GROW(d->array_frame, d->nr + 1, d->alloc);
>  	d->array_frame[d->nr].defval = 0; /* default to include */
>  	d->array_frame[d->nr].child_prov_omit = 0;
>  
>  	ctx->filter_fn = filter_sparse;
>  	ctx->free_fn = filter_sparse_free;
>  	ctx->data = d;
>  }
>  
> +struct filter_combine_data {
> +	/* sub[0] corresponds to lhs, sub[1] to rhs. */

Jeff H had a comment about this too, but this seems unwieldy for >2
filters. (I also personally don't like using set index to incidate
lhs/rhs.) Why not an array of multiple `struct sub`? There's a macro
utility to generate types and helpers for an array of arbitrary struct
that may suit...

> +	struct {
> +		struct filter_context ctx;
> +		struct oidset seen;
> +		struct object_id skip_tree;
> +		unsigned is_skipping_tree : 1;
> +	} sub[2];
> +
> +	struct oidset rhs_omits;
> +};
> +
> +static void add_all(struct oidset *dest, struct oidset *src) {
> +	struct oidset_iter iter;
> +	struct object_id *src_oid;
> +
> +	oidset_iter_init(src, &iter);
> +	while ((src_oid = oidset_iter_next(&iter)) != NULL)
> +		oidset_insert(dest, src_oid);
> +}
> +
> +static void filter_combine_free(void *filter_data)
> +{
> +	struct filter_combine_data *d = filter_data;
> +	int i;
> +
> +	/* Anything omitted by rhs should be added to the overall omits set. */
> +	if (d->sub[0].ctx.omits)
> +		add_all(d->sub[0].ctx.omits, d->sub[1].ctx.omits);
> +
> +	for (i = 0; i < 2; i++) {
> +		list_objects_filter__release(&d->sub[i].ctx);
> +		oidset_clear(&d->sub[i].seen);
> +	}
> +	oidset_clear(&d->rhs_omits);
> +	free(d);
> +}
> +
> +static int should_delegate(enum list_objects_filter_situation filter_situation,
> +			   struct object *obj,
> +			   struct filter_combine_data *d,
> +			   int side)
> +{
> +	if (!d->sub[side].is_skipping_tree)
> +		return 1;
> +	if (filter_situation == LOFS_END_TREE &&
> +		oideq(&obj->oid, &d->sub[side].skip_tree)) {
> +		d->sub[side].is_skipping_tree = 0;
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +static enum list_objects_filter_result filter_combine(
> +	struct repository *r,
> +	enum list_objects_filter_situation filter_situation,
> +	struct object *obj,
> +	const char *pathname,
> +	const char *filename,
> +	struct filter_context *ctx)
> +{
> +	struct filter_combine_data *d = ctx->data;
> +	enum list_objects_filter_result result[2];
> +	enum list_objects_filter_result combined_result = LOFR_ZERO;
> +	int i;
> +
> +	for (i = 0; i < 2; i++) {

I suppose your lhs and rhs are in sub[0] and sub[1] in part for the sake
of this loop. But I think it would be easier to understand what is going
on if you were to perform the loop contents in a helper function (as the
name of the function would provide some more documentation).

> +		if (oidset_contains(&d->sub[i].seen, &obj->oid) ||
> +			!should_delegate(filter_situation, obj, d, i)) {
> +			result[i] = LOFR_ZERO;
> +			continue;
> +		}
> +
> +		result[i] = d->sub[i].ctx.filter_fn(
> +			r, filter_situation, obj, pathname, filename,
> +			&d->sub[i].ctx);
> +
> +		if (result[i] & LOFR_MARK_SEEN)
> +			oidset_insert(&d->sub[i].seen, &obj->oid);
> +
> +		if (result[i] & LOFR_SKIP_TREE) {
> +			d->sub[i].is_skipping_tree = 1;
> +			d->sub[i].skip_tree = obj->oid;
> +		}
> +	}
> +
> +	if ((result[0] & LOFR_DO_SHOW) && (result[1] & LOFR_DO_SHOW))
> +		combined_result |= LOFR_DO_SHOW;
> +	if (d->sub[0].is_skipping_tree && d->sub[1].is_skipping_tree)
> +		combined_result |= LOFR_SKIP_TREE;
> +
> +	return combined_result;
> +}

I see that you tested that >2 filters works okay. But by doing it the
way you have it seems like you're setting up to need recursion all over
the place to check against all the filters. I suppose I don't see the
benefit of doing all this recursively, as compared to doing it
iteratively.

 - Emily

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] list-objects-filter: merge filter data structs
  2019-05-28 18:48     ` Matthew DeVore
@ 2019-05-28 22:40       ` Matthew DeVore
  2019-05-29 19:48         ` Junio C Hamano
  0 siblings, 1 reply; 41+ messages in thread
From: Matthew DeVore @ 2019-05-28 22:40 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: Matthew DeVore, jonathantanmy, jrn, git, dstolee, jeffhost,
	jrnieder, pclouds

[This is what I had in mind re:merging the different filter type structs. This
is essentially a clean-up which makes the following patches a little simpler,
even more simpler than the previous version of the patchset.]

Before this patch, there is a lot of boilerplate code associated with
invoking a filter and maintaining its running state. This makes creating
a composite filter more cumbersome than it needs to be.

Simplify the filter execution data logic and structs by putting all
execution data for all filter types in a single struct. This results in
a tiny overhead for each filter instance, and in exchange, invoking
filters is not only easier but the list-objects-filter public API is
simpler and more opaque.

Helped-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects-filter-options.h |  19 +++
 list-objects-filter.c         | 227 +++++++++++++---------------------
 list-objects-filter.h         |  29 ++---
 list-objects.c                |  44 +++----
 4 files changed, 139 insertions(+), 180 deletions(-)

diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index e3adc78ebf..ac320c3312 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -2,25 +2,44 @@
 #define LIST_OBJECTS_FILTER_OPTIONS_H
 
 #include "parse-options.h"
 #include "strbuf.h"
 
 /*
  * The list of defined filters for list-objects.
  */
 enum list_objects_filter_choice {
 	LOFC_DISABLED = 0,
+
+	/* A filter for list-objects to omit ALL blobs from the traversal. */
 	LOFC_BLOB_NONE,
+
+	/* A filter for list-objects to omit large blobs. */
 	LOFC_BLOB_LIMIT,
+
+	/*
+	 * A filter for list-objects to omit ALL trees and blobs from the
+	 * traversal.
+	 */
 	LOFC_TREE_DEPTH,
+
+	/*
+	 * Filters driven by a sparse-checkout specification to only include
+	 * blobs that a sparse checkout would populate.
+	 *
+	 * The sparse-checkout spec can be loaded from a blob with the given
+	 * OID or from a local pathname.  We allow an OID because the repo may
+	 * be bare or we may be doing the filtering on the server.
+	 */
 	LOFC_SPARSE_OID,
 	LOFC_SPARSE_PATH,
+
 	LOFC__COUNT /* must be last */
 };
 
 struct list_objects_filter_options {
 	/*
 	 * 'filter_spec' is the raw argument value given on the command line
 	 * or protocol request.  (The part after the "--keyword=".)  For
 	 * commands that launch filtering sub-processes, or for communication
 	 * over the network, don't use this value; use the result of
 	 * expand_list_objects_filter_spec() instead.
diff --git a/list-objects-filter.c b/list-objects-filter.c
index ee449de3f7..334b0d959b 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -19,38 +19,67 @@
  * FILTER_SHOWN_BUT_REVISIT -- we set this bit on tree objects
  * that have been shown, but should be revisited if they appear
  * in the traversal (until we mark it SEEN).  This is a way to
  * let us silently de-dup calls to show() in the caller.  This
  * is subtly different from the "revision.h:SHOWN" and the
  * "sha1-name.c:ONELINE_SEEN" bits.  And also different from
  * the non-de-dup usage in pack-bitmap.c
  */
 #define FILTER_SHOWN_BUT_REVISIT (1<<21)
 
-/*
- * A filter for list-objects to omit ALL blobs from the traversal.
- * And to OPTIONALLY collect a list of the omitted OIDs.
- */
-struct filter_blobs_none_data {
+struct filter_data {
+	/* Used by all filter types. */
 	struct oidset *omits;
+
+	enum list_objects_filter_result (*filter_object_fn)(
+		struct repository *r,
+		enum list_objects_filter_situation filter_situation,
+		struct object *obj,
+		const char *pathname,
+		const char *filename,
+		struct filter_data *filter_data);
+
+	/* BEGIN tree:<depth> filter data */
+
+	/*
+	 * Maps trees to the minimum depth at which they were seen. It is not
+	 * necessary to re-traverse a tree at deeper or equal depths than it has
+	 * already been traversed.
+	 *
+	 * We can't use LOFR_MARK_SEEN for tree objects since this will prevent
+	 * it from being traversed at shallower depths.
+	 */
+	struct oidmap seen_at_depth;
+
+	unsigned long exclude_depth;
+	unsigned long current_depth;
+
+	/* BEGIN blobs:limit=<limit> filter data */
+
+	unsigned long max_bytes;
+
+	/* BEGIN sparse:... filter data */
+
+	struct exclude_list el;
+
+	size_t nr, alloc;
+	struct frame *array_frame;
 };
 
 static enum list_objects_filter_result filter_blobs_none(
 	struct repository *r,
 	enum list_objects_filter_situation filter_situation,
 	struct object *obj,
 	const char *pathname,
 	const char *filename,
-	void *filter_data_)
+	struct filter_data *filter_data)
 {
-	struct filter_blobs_none_data *filter_data = filter_data_;
-
 	switch (filter_situation) {
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		/* always include all tree objects */
 		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
 	case LOFS_END_TREE:
@@ -60,84 +89,55 @@ static enum list_objects_filter_result filter_blobs_none(
 	case LOFS_BLOB:
 		assert(obj->type == OBJ_BLOB);
 		assert((obj->flags & SEEN) == 0);
 
 		if (filter_data->omits)
 			oidset_insert(filter_data->omits, &obj->oid);
 		return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
 	}
 }
 
-static void *filter_blobs_none__init(
-	struct oidset *omitted,
+static void filter_blobs_none__init(
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn)
+	struct filter_data *d)
 {
-	struct filter_blobs_none_data *d = xcalloc(1, sizeof(*d));
-	d->omits = omitted;
-
-	*filter_fn = filter_blobs_none;
-	*filter_free_fn = free;
-	return d;
+	d->filter_object_fn = filter_blobs_none;
 }
 
-/*
- * A filter for list-objects to omit ALL trees and blobs from the traversal.
- * Can OPTIONALLY collect a list of the omitted OIDs.
- */
-struct filter_trees_depth_data {
-	struct oidset *omits;
-
-	/*
-	 * Maps trees to the minimum depth at which they were seen. It is not
-	 * necessary to re-traverse a tree at deeper or equal depths than it has
-	 * already been traversed.
-	 *
-	 * We can't use LOFR_MARK_SEEN for tree objects since this will prevent
-	 * it from being traversed at shallower depths.
-	 */
-	struct oidmap seen_at_depth;
-
-	unsigned long exclude_depth;
-	unsigned long current_depth;
-};
-
 struct seen_map_entry {
 	struct oidmap_entry base;
 	size_t depth;
 };
 
 /* Returns 1 if the oid was in the omits set before it was invoked. */
 static int filter_trees_update_omits(
 	struct object *obj,
-	struct filter_trees_depth_data *filter_data,
+	struct filter_data *filter_data,
 	int include_it)
 {
 	if (!filter_data->omits)
 		return 0;
 
 	if (include_it)
 		return oidset_remove(filter_data->omits, &obj->oid);
 	else
 		return oidset_insert(filter_data->omits, &obj->oid);
 }
 
 static enum list_objects_filter_result filter_trees_depth(
 	struct repository *r,
 	enum list_objects_filter_situation filter_situation,
 	struct object *obj,
 	const char *pathname,
 	const char *filename,
-	void *filter_data_)
+	struct filter_data *filter_data)
 {
-	struct filter_trees_depth_data *filter_data = filter_data_;
 	struct seen_map_entry *seen_info;
 	int include_it = filter_data->current_depth <
 		filter_data->exclude_depth;
 	int filter_res;
 	int already_seen;
 
 	/*
 	 * Note that we do not use _MARK_SEEN in order to allow re-traversal in
 	 * case we encounter a tree or blob again at a shallower depth.
 	 */
@@ -186,63 +186,38 @@ static enum list_objects_filter_result filter_trees_depth(
 				filter_res = LOFR_ZERO;
 			else
 				filter_res = LOFR_SKIP_TREE;
 		}
 
 		filter_data->current_depth++;
 		return filter_res;
 	}
 }
 
-static void filter_trees_free(void *filter_data) {
-	struct filter_trees_depth_data *d = filter_data;
-	if (!d)
-		return;
-	oidmap_free(&d->seen_at_depth, 1);
-	free(d);
-}
-
-static void *filter_trees_depth__init(
-	struct oidset *omitted,
+static void filter_trees_depth__init(
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn)
+	struct filter_data *d)
 {
-	struct filter_trees_depth_data *d = xcalloc(1, sizeof(*d));
-	d->omits = omitted;
 	oidmap_init(&d->seen_at_depth, 0);
 	d->exclude_depth = filter_options->tree_exclude_depth;
 	d->current_depth = 0;
-
-	*filter_fn = filter_trees_depth;
-	*filter_free_fn = filter_trees_free;
-	return d;
+	d->filter_object_fn = filter_trees_depth;
 }
 
-/*
- * A filter for list-objects to omit large blobs.
- * And to OPTIONALLY collect a list of the omitted OIDs.
- */
-struct filter_blobs_limit_data {
-	struct oidset *omits;
-	unsigned long max_bytes;
-};
-
 static enum list_objects_filter_result filter_blobs_limit(
 	struct repository *r,
 	enum list_objects_filter_situation filter_situation,
 	struct object *obj,
 	const char *pathname,
 	const char *filename,
-	void *filter_data_)
+	struct filter_data *filter_data)
 {
-	struct filter_blobs_limit_data *filter_data = filter_data_;
 	unsigned long object_length;
 	enum object_type t;
 
 	switch (filter_situation) {
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		/* always include all tree objects */
@@ -274,81 +249,57 @@ static enum list_objects_filter_result filter_blobs_limit(
 			oidset_insert(filter_data->omits, &obj->oid);
 		return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
 	}
 
 include_it:
 	if (filter_data->omits)
 		oidset_remove(filter_data->omits, &obj->oid);
 	return LOFR_MARK_SEEN | LOFR_DO_SHOW;
 }
 
-static void *filter_blobs_limit__init(
-	struct oidset *omitted,
+static void filter_blobs_limit__init(
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn)
+	struct filter_data *d)
 {
-	struct filter_blobs_limit_data *d = xcalloc(1, sizeof(*d));
-	d->omits = omitted;
 	d->max_bytes = filter_options->blob_limit_value;
-
-	*filter_fn = filter_blobs_limit;
-	*filter_free_fn = free;
-	return d;
+	d->filter_object_fn = filter_blobs_limit;
 }
 
-/*
- * A filter driven by a sparse-checkout specification to only
- * include blobs that a sparse checkout would populate.
- *
- * The sparse-checkout spec can be loaded from a blob with the
- * given OID or from a local pathname.  We allow an OID because
- * the repo may be bare or we may be doing the filtering on the
- * server.
- */
+/* For use with sparse checkout filters. */
 struct frame {
 	/*
 	 * defval is the usual default include/exclude value that
 	 * should be inherited as we recurse into directories based
 	 * upon pattern matching of the directory itself or of a
 	 * containing directory.
 	 */
 	int defval;
 
 	/*
 	 * 1 if the directory (recursively) contains any provisionally
 	 * omitted objects.
 	 *
 	 * 0 if everything (recursively) contained in this directory
 	 * has been explicitly included (SHOWN) in the result and
 	 * the directory may be short-cut later in the traversal.
 	 */
 	unsigned child_prov_omit : 1;
 };
 
-struct filter_sparse_data {
-	struct oidset *omits;
-	struct exclude_list el;
-
-	size_t nr, alloc;
-	struct frame *array_frame;
-};
-
 static enum list_objects_filter_result filter_sparse(
 	struct repository *r,
 	enum list_objects_filter_situation filter_situation,
 	struct object *obj,
 	const char *pathname,
 	const char *filename,
-	void *filter_data_)
+	struct filter_data *filter_data)
 {
-	struct filter_sparse_data *filter_data = filter_data_;
 	int val, dtype;
 	struct frame *frame;
 
 	switch (filter_situation) {
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		dtype = DT_DIR;
@@ -442,100 +393,96 @@ static enum list_objects_filter_result filter_sparse(
 		/*
 		 * Remember that at least 1 blob in this tree was
 		 * provisionally omitted.  This prevents us from short
 		 * cutting the tree in future iterations.
 		 */
 		frame->child_prov_omit = 1;
 		return LOFR_ZERO;
 	}
 }
 
-
-static void filter_sparse_free(void *filter_data)
-{
-	struct filter_sparse_data *d = filter_data;
-	/* TODO free contents of 'd' */
-	free(d);
-}
-
-static void *filter_sparse_oid__init(
-	struct oidset *omitted,
+static void filter_sparse_oid__init(
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn)
+	struct filter_data *d)
 {
-	struct filter_sparse_data *d = xcalloc(1, sizeof(*d));
-	d->omits = omitted;
 	if (add_excludes_from_blob_to_list(filter_options->sparse_oid_value,
 					   NULL, 0, &d->el) < 0)
 		die("could not load filter specification");
 
 	ALLOC_GROW(d->array_frame, d->nr + 1, d->alloc);
 	d->array_frame[d->nr].defval = 0; /* default to include */
 	d->array_frame[d->nr].child_prov_omit = 0;
 
-	*filter_fn = filter_sparse;
-	*filter_free_fn = filter_sparse_free;
-	return d;
+	d->filter_object_fn = filter_sparse;
 }
 
-static void *filter_sparse_path__init(
-	struct oidset *omitted,
+static void filter_sparse_path__init(
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn)
+	struct filter_data *d)
 {
-	struct filter_sparse_data *d = xcalloc(1, sizeof(*d));
-	d->omits = omitted;
 	if (add_excludes_from_file_to_list(filter_options->sparse_path_value,
 					   NULL, 0, &d->el, NULL) < 0)
 		die("could not load filter specification");
 
 	ALLOC_GROW(d->array_frame, d->nr + 1, d->alloc);
 	d->array_frame[d->nr].defval = 0; /* default to include */
 	d->array_frame[d->nr].child_prov_omit = 0;
 
-	*filter_fn = filter_sparse;
-	*filter_free_fn = filter_sparse_free;
-	return d;
+	d->filter_object_fn = filter_sparse;
 }
 
-typedef void *(*filter_init_fn)(
-	struct oidset *omitted,
+typedef void (*filter_init_fn)(
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn);
+	struct filter_data *filter_data);
 
 /*
  * Must match "enum list_objects_filter_choice".
  */
 static filter_init_fn s_filters[] = {
 	NULL,
 	filter_blobs_none__init,
 	filter_blobs_limit__init,
 	filter_trees_depth__init,
 	filter_sparse_oid__init,
 	filter_sparse_path__init,
 };
 
-void *list_objects_filter__init(
+struct filter_data *list_objects_filter__init(
 	struct oidset *omitted,
-	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn)
+	struct list_objects_filter_options *filter_options)
 {
 	filter_init_fn init_fn;
+	struct filter_data *filter_data;
 
 	assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);
 
 	if (filter_options->choice >= LOFC__COUNT)
 		BUG("invalid list-objects filter choice: %d",
 		    filter_options->choice);
 
 	init_fn = s_filters[filter_options->choice];
-	if (init_fn)
-		return init_fn(omitted, filter_options,
-			       filter_fn, filter_free_fn);
-	*filter_fn = NULL;
-	*filter_free_fn = NULL;
-	return NULL;
+	if (!init_fn)
+		return NULL;
+
+	filter_data = xcalloc(1, sizeof(*filter_data));
+	filter_data->omits = omitted;
+	init_fn(filter_options, filter_data);
+
+	return filter_data;
+}
+
+enum list_objects_filter_result list_objects_filter__filter_object(
+	struct repository *r,
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	struct filter_data *filter_data)
+{
+	return filter_data->filter_object_fn(
+		r, filter_situation, obj, pathname, filename, filter_data);
+}
+
+void list_objects_filter__free(struct filter_data *d) {
+	oidmap_free(&d->seen_at_depth, 1);
+	free(d);
 }
diff --git a/list-objects-filter.h b/list-objects-filter.h
index 1d45a4ad57..2bc59147a7 100644
--- a/list-objects-filter.h
+++ b/list-objects-filter.h
@@ -53,37 +53,34 @@ enum list_objects_filter_result {
 	LOFR_DO_SHOW   = 1<<1,
 	LOFR_SKIP_TREE = 1<<2,
 };
 
 enum list_objects_filter_situation {
 	LOFS_BEGIN_TREE,
 	LOFS_END_TREE,
 	LOFS_BLOB
 };
 
-typedef enum list_objects_filter_result (*filter_object_fn)(
-	struct repository *r,
-	enum list_objects_filter_situation filter_situation,
-	struct object *obj,
-	const char *pathname,
-	const char *filename,
-	void *filter_data);
-
-typedef void (*filter_free_fn)(void *filter_data);
+struct filter_data;
 
 /*
  * Constructor for the set of defined list-objects filters.
  * Returns a generic "void *filter_data".
  *
  * The returned "filter_fn" will be used by traverse_commit_list()
  * to filter the results.
- *
- * The returned "filter_free_fn" is a destructor for the
- * filter_data.
  */
-void *list_objects_filter__init(
+struct filter_data *list_objects_filter__init(
 	struct oidset *omitted,
-	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn);
+	struct list_objects_filter_options *filter_options);
+
+enum list_objects_filter_result list_objects_filter__filter_object(
+	struct repository *r,
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	struct filter_data *filter_data);
+
+void list_objects_filter__free(struct filter_data *filter_data);
 
 #endif /* LIST_OBJECTS_FILTER_H */
diff --git a/list-objects.c b/list-objects.c
index b5651ddd5b..9d5cd1b1e5 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -11,22 +11,21 @@
 #include "list-objects-filter-options.h"
 #include "packfile.h"
 #include "object-store.h"
 #include "trace.h"
 
 struct traversal_context {
 	struct rev_info *revs;
 	show_object_fn show_object;
 	show_commit_fn show_commit;
 	void *show_data;
-	filter_object_fn filter_fn;
-	void *filter_data;
+	struct filter_data *filter_data;
 };
 
 static void process_blob(struct traversal_context *ctx,
 			 struct blob *blob,
 			 struct strbuf *path,
 			 const char *name)
 {
 	struct object *obj = &blob->object;
 	size_t pathlen;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
@@ -47,25 +46,26 @@ static void process_blob(struct traversal_context *ctx,
 	 * may cause the actual filter to report an incomplete list
 	 * of missing objects.
 	 */
 	if (ctx->revs->exclude_promisor_objects &&
 	    !has_object_file(&obj->oid) &&
 	    is_promisor_object(&obj->oid))
 		return;
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
-		r = ctx->filter_fn(ctx->revs->repo,
-				   LOFS_BLOB, obj,
-				   path->buf, &path->buf[pathlen],
-				   ctx->filter_data);
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_data)
+		r = list_objects_filter__filter_object(
+			ctx->revs->repo,
+			LOFS_BLOB, obj,
+			path->buf, &path->buf[pathlen],
+			ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
 		ctx->show_object(obj, path->buf, ctx->show_data);
 	strbuf_setlen(path, pathlen);
 }
 
 /*
  * Processing a gitlink entry currently does nothing, since
  * we do not recurse into the subproject.
@@ -179,42 +179,44 @@ static void process_tree(struct traversal_context *ctx,
 		 */
 		if (revs->exclude_promisor_objects &&
 		    is_promisor_object(&obj->oid))
 			return;
 
 		if (!revs->do_not_die_on_missing_tree)
 			die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
 	strbuf_addstr(base, name);
-	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
-		r = ctx->filter_fn(ctx->revs->repo,
-				   LOFS_BEGIN_TREE, obj,
-				   base->buf, &base->buf[baselen],
-				   ctx->filter_data);
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_data)
+		r = list_objects_filter__filter_object(
+			ctx->revs->repo,
+			LOFS_BEGIN_TREE, obj,
+			base->buf, &base->buf[baselen],
+			ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
 		ctx->show_object(obj, base->buf, ctx->show_data);
 	if (base->len)
 		strbuf_addch(base, '/');
 
 	if (r & LOFR_SKIP_TREE)
 		trace_printf("Skipping contents of tree %s...\n", base->buf);
 	else if (!failed_parse)
 		process_tree_contents(ctx, tree, base);
 
-	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
-		r = ctx->filter_fn(ctx->revs->repo,
-				   LOFS_END_TREE, obj,
-				   base->buf, &base->buf[baselen],
-				   ctx->filter_data);
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_data) {
+		r = list_objects_filter__filter_object(
+			ctx->revs->repo,
+			LOFS_END_TREE, obj,
+			base->buf, &base->buf[baselen],
+			ctx->filter_data);
 		if (r & LOFR_MARK_SEEN)
 			obj->flags |= SEEN;
 		if (r & LOFR_DO_SHOW)
 			ctx->show_object(obj, base->buf, ctx->show_data);
 	}
 
 	strbuf_setlen(base, baselen);
 	free_tree_buffer(tree);
 }
 
@@ -395,38 +397,32 @@ static void do_traverse(struct traversal_context *ctx)
 void traverse_commit_list(struct rev_info *revs,
 			  show_commit_fn show_commit,
 			  show_object_fn show_object,
 			  void *show_data)
 {
 	struct traversal_context ctx;
 	ctx.revs = revs;
 	ctx.show_commit = show_commit;
 	ctx.show_object = show_object;
 	ctx.show_data = show_data;
-	ctx.filter_fn = NULL;
 	ctx.filter_data = NULL;
 	do_traverse(&ctx);
 }
 
 void traverse_commit_list_filtered(
 	struct list_objects_filter_options *filter_options,
 	struct rev_info *revs,
 	show_commit_fn show_commit,
 	show_object_fn show_object,
 	void *show_data,
 	struct oidset *omitted)
 {
 	struct traversal_context ctx;
-	filter_free_fn filter_free_fn = NULL;
 
 	ctx.revs = revs;
 	ctx.show_object = show_object;
 	ctx.show_commit = show_commit;
 	ctx.show_data = show_data;
-	ctx.filter_fn = NULL;
 
-	ctx.filter_data = list_objects_filter__init(omitted, filter_options,
-						    &ctx.filter_fn, &filter_free_fn);
+	ctx.filter_data = list_objects_filter__init(omitted, filter_options);
 	do_traverse(&ctx);
-	if (ctx.filter_data && filter_free_fn)
-		filter_free_fn(ctx.filter_data);
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 2/5] list-objects-filter-options: error is localizeable
  2019-05-24  0:55   ` Emily Shaffer
@ 2019-05-28 23:01     ` Matthew DeVore
  0 siblings, 0 replies; 41+ messages in thread
From: Matthew DeVore @ 2019-05-28 23:01 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: Matthew DeVore, jonathantanmy, jrn, git, dstolee, jeffhost,
	jrnieder, pclouds

On Thu, May 23, 2019 at 05:55:12PM -0700, Emily Shaffer wrote:
> What does it look like? Is it human-readable in its current form? I ask
> because (without having looked ahead) I think you're going to move it

You can make the error appear in this way:

$ git rev-list --filter=blob:nonse --objects HEAD
fatal: invalid filter-spec 'blob:nonse'

I included it in this patchset since I'm changing code in the vicinity and it's
possible the patchset may mutate to a point where these lines get moved to a
separate function or their indentation changes, meaning the maintainer has to
handle merge conflicts. But that may be more caution than necessary.

I don't feel so certain about what's going to happen either way, nor do I mind
having the patch stick around in this set, so maybe I'll just keep things as
they are for now.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-05-28 17:59     ` Junio C Hamano
@ 2019-05-29 15:02       ` Matthew DeVore
  2019-05-29 21:29         ` Jeff Hostetler
  0 siblings, 1 reply; 41+ messages in thread
From: Matthew DeVore @ 2019-05-29 15:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff Hostetler, Matthew DeVore, jonathantanmy, jrn, git, dstolee,
	jeffhost, jrnieder, pclouds

On Tue, May 28, 2019 at 10:59:31AM -0700, Junio C Hamano wrote:
> Jeff Hostetler <git@jeffhostetler.com> writes:
> 
> > In the RFC version, there was discussion [2] of the wire format
> > and the need to be backwards compatible with existing servers and
> > so use the "combine:" syntax so that we only have a single filter
> > line on the wire.  Would it be better to have compliant servers
> > advertise a "filters" (plural) capability in addition to the

This is a good idea and I hadn't considered it. It does seem to make the
repeated filter lines a safer bet than I though.

> > existing "filter" (singular) capability?  Then the client would
> > know that it could send a series of filter lines using the existing
> > syntax.  Likewise, if the "filters" capability was omitted, the
> > client could error out without the extra round-trip.
> 
> All good ideas.

After hacking the code halfway together to make the above idea work, and
learning quite a lot in the process, I saw set_git_option in transport.c and
realized that all existing transport options are assumed to be ? (0 or 1) rather
than * (0 or more). So "filter" would be the first transport option that is
repeated.

Even though multiple reviewers have weighed in supporting repeated filter lines,
I'm still conflicted about it. It seems the drawback to the + syntax is the
requirement for encoding the individual filters, but this encoding is no longer
required since the sparse:path=... filter no longer has to be supported. And the
URL encoding, if it is ever reintroduced, is just boilerplate and is unlikely to
change later or cause a significant maintainance burden.

The essence of the repeated filter line is that we need additional high-level
machinery just for the sake of making the lower-level machinery... marginally
simpler, hopefully? And if we ever need to add new filter combinations (like OR
or XOR rather than AND) this repeated filter line thing will be a legacy
annoyance (users will wonder why does repeated "filter" mean AND rather than
one of the other supported combination methods?). Repeating filter lines seems
like a leaky abstraction to me.

I would be helped if someone re-iterated why the repeated filter lines are a
good idea in light of the fact that URL escaping is no longer required to make
it work.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] list-objects-filter: merge filter data structs
  2019-05-28 22:40       ` [PATCH] list-objects-filter: merge filter data structs Matthew DeVore
@ 2019-05-29 19:48         ` Junio C Hamano
  2019-05-29 20:57           ` Jeff Hostetler
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2019-05-29 19:48 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: Emily Shaffer, Matthew DeVore, jonathantanmy, jrn, git, dstolee,
	jeffhost, jrnieder, pclouds

Matthew DeVore <matvore@comcast.net> writes:

> Simplify the filter execution data logic and structs by putting all
> execution data for all filter types in a single struct. This results in
> a tiny overhead for each filter instance, and in exchange, invoking
> filters is not only easier but the list-objects-filter public API is
> simpler and more opaque.

Hmmm...

> +struct filter_data {
> +	/* Used by all filter types. */
>  	struct oidset *omits;
> +
> +	enum list_objects_filter_result (*filter_object_fn)(
> +		struct repository *r,
> +		enum list_objects_filter_situation filter_situation,
> +		struct object *obj,
> +		const char *pathname,
> +		const char *filename,
> +		struct filter_data *filter_data);
> +
> +	/* BEGIN tree:<depth> filter data */
> +
> +	/*
> +	 * Maps trees to the minimum depth at which they were seen. It is not
> +	 * necessary to re-traverse a tree at deeper or equal depths than it has
> +	 * already been traversed.
> +	 *
> +	 * We can't use LOFR_MARK_SEEN for tree objects since this will prevent
> +	 * it from being traversed at shallower depths.
> +	 */
> +	struct oidmap seen_at_depth;
> +
> +	unsigned long exclude_depth;
> +	unsigned long current_depth;
> +
> +	/* BEGIN blobs:limit=<limit> filter data */
> +
> +	unsigned long max_bytes;
> +
> +	/* BEGIN sparse:... filter data */
> +
> +	struct exclude_list el;
> +
> +	size_t nr, alloc;
> +	struct frame *array_frame;
>  };

I am hoping that I am not misreading the intention but you do not
plan to use the above so that you can say "apply 'tree:depth=4' and
'blobs:limit=1G' at the same time" by filling the fields in a single
struct, do you?  For combined filter, you'll still have multiple
instances of filter_data struct, strung together in a list that says
"all of these must be satisfied" or something like that, right?

And if that is the case, I am not sure why the above "struct with
all these fields" is a good idea.  If these three (and probably we
will have more as the system evolves) sets of fields in this outer
struct for different filters were enclosed in a union, that would be
a different story, though.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] list-objects-filter: merge filter data structs
  2019-05-29 19:48         ` Junio C Hamano
@ 2019-05-29 20:57           ` Jeff Hostetler
  2019-05-29 23:10             ` Matthew DeVore
                               ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Jeff Hostetler @ 2019-05-29 20:57 UTC (permalink / raw)
  To: Junio C Hamano, Matthew DeVore
  Cc: Emily Shaffer, Matthew DeVore, jonathantanmy, jrn, git, dstolee,
	jeffhost, jrnieder, pclouds



On 5/29/2019 3:48 PM, Junio C Hamano wrote:
> Matthew DeVore <matvore@comcast.net> writes:
> 
>> Simplify the filter execution data logic and structs by putting all
>> execution data for all filter types in a single struct. This results in
>> a tiny overhead for each filter instance, and in exchange, invoking
>> filters is not only easier but the list-objects-filter public API is
>> simpler and more opaque.
> 
> Hmmm...
> 
>> +struct filter_data {
>> +	/* Used by all filter types. */
>>   	struct oidset *omits;
>> +
>> +	enum list_objects_filter_result (*filter_object_fn)(
>> +		struct repository *r,
>> +		enum list_objects_filter_situation filter_situation,
>> +		struct object *obj,
>> +		const char *pathname,
>> +		const char *filename,
>> +		struct filter_data *filter_data);
>> +
>> +	/* BEGIN tree:<depth> filter data */
>> +
>> +	/*
>> +	 * Maps trees to the minimum depth at which they were seen. It is not
>> +	 * necessary to re-traverse a tree at deeper or equal depths than it has
>> +	 * already been traversed.
>> +	 *
>> +	 * We can't use LOFR_MARK_SEEN for tree objects since this will prevent
>> +	 * it from being traversed at shallower depths.
>> +	 */
>> +	struct oidmap seen_at_depth;
>> +
>> +	unsigned long exclude_depth;
>> +	unsigned long current_depth;
>> +
>> +	/* BEGIN blobs:limit=<limit> filter data */
>> +
>> +	unsigned long max_bytes;
>> +
>> +	/* BEGIN sparse:... filter data */
>> +
>> +	struct exclude_list el;
>> +
>> +	size_t nr, alloc;
>> +	struct frame *array_frame;
>>   };
> 
> I am hoping that I am not misreading the intention but you do not
> plan to use the above so that you can say "apply 'tree:depth=4' and
> 'blobs:limit=1G' at the same time" by filling the fields in a single
> struct, do you?  For combined filter, you'll still have multiple
> instances of filter_data struct, strung together in a list that says
> "all of these must be satisfied" or something like that, right?
> 
> And if that is the case, I am not sure why the above "struct with
> all these fields" is a good idea.  If these three (and probably we
> will have more as the system evolves) sets of fields in this outer
> struct for different filters were enclosed in a union, that would be
> a different story, though.
> 

I'm not sure I like the combined structure as proposed.
But let's think about it.

I think part of problem with my original version was putting the
filter_fn and filter_free_fn in the traversal_context rather than
inside the filter_*_data structure.

I did a simple combined structure for the list_objects_filter_options
and kind of regretted it because it wasn't obvious which data fields
were defined or undefined in each filter constructor.  But it was
convenient when parsing the command line.

I think having a combined structure with a union enclosing a structure
for the data fields in each filter type would be worth considering.
That way you have a somewhat self-documenting sub-structure for each
filter type that indicates which fields are defined.

I'd also suggest keeping the "oidset omits" inside each of the
sub-structures, but that's just me.


BTW, I don't see a free_fn.  That may collapse out with your proposal
but I wanted to ask.

Thanks
Jeff


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-05-29 15:02       ` Matthew DeVore
@ 2019-05-29 21:29         ` Jeff Hostetler
  2019-05-29 23:27           ` Matthew DeVore
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff Hostetler @ 2019-05-29 21:29 UTC (permalink / raw)
  To: Matthew DeVore, Junio C Hamano
  Cc: Matthew DeVore, jonathantanmy, jrn, git, dstolee, jeffhost,
	jrnieder, pclouds



On 5/29/2019 11:02 AM, Matthew DeVore wrote:
> On Tue, May 28, 2019 at 10:59:31AM -0700, Junio C Hamano wrote:
>> Jeff Hostetler <git@jeffhostetler.com> writes:
>>
>>> In the RFC version, there was discussion [2] of the wire format
>>> and the need to be backwards compatible with existing servers and
>>> so use the "combine:" syntax so that we only have a single filter
>>> line on the wire.  Would it be better to have compliant servers
>>> advertise a "filters" (plural) capability in addition to the
> 
> This is a good idea and I hadn't considered it. It does seem to make the
> repeated filter lines a safer bet than I though.
> 
>>> existing "filter" (singular) capability?  Then the client would
>>> know that it could send a series of filter lines using the existing
>>> syntax.  Likewise, if the "filters" capability was omitted, the
>>> client could error out without the extra round-trip.
>>
>> All good ideas.
> 
> After hacking the code halfway together to make the above idea work, and
> learning quite a lot in the process, I saw set_git_option in transport.c and
> realized that all existing transport options are assumed to be ? (0 or 1) rather
> than * (0 or more). So "filter" would be the first transport option that is
> repeated.
> 
> Even though multiple reviewers have weighed in supporting repeated filter lines,
> I'm still conflicted about it. It seems the drawback to the + syntax is the
> requirement for encoding the individual filters, but this encoding is no longer
> required since the sparse:path=... filter no longer has to be supported. And the
> URL encoding, if it is ever reintroduced, is just boilerplate and is unlikely to
> change later or cause a significant maintainance burden.

Was sparse:path filter the only reason for needing all the URL encoding?
The sparse:oid form allows values <ref>:<path> and these (or at least
the <path> portion) may contain special characters.  So don't we need to
URL encode this form too?


> 
> The essence of the repeated filter line is that we need additional high-level
> machinery just for the sake of making the lower-level machinery... marginally
> simpler, hopefully? And if we ever need to add new filter combinations (like OR
> or XOR rather than AND) this repeated filter line thing will be a legacy
> annoyance (users will wonder why does repeated "filter" mean AND rather than
> one of the other supported combination methods?). Repeating filter lines seems
> like a leaky abstraction to me.
> 
> I would be helped if someone re-iterated why the repeated filter lines are a
> good idea in light of the fact that URL escaping is no longer required to make
> it work.
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] list-objects-filter: merge filter data structs
  2019-05-29 20:57           ` Jeff Hostetler
@ 2019-05-29 23:10             ` Matthew DeVore
  2019-05-30  1:56             ` [RFC PATCH v2] " Matthew DeVore
  2019-05-30 19:05             ` [PATCH] " Matthew DeVore
  2 siblings, 0 replies; 41+ messages in thread
From: Matthew DeVore @ 2019-05-29 23:10 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Junio C Hamano, Emily Shaffer, Matthew DeVore, jonathantanmy, jrn,
	git, dstolee, jeffhost, jrnieder, pclouds

> > I am hoping that I am not misreading the intention but you do not
> > plan to use the above so that you can say "apply 'tree:depth=4' and
> > 'blobs:limit=1G' at the same time" by filling the fields in a single
> > struct, do you?  For combined filter, you'll still have multiple
> > instances of filter_data struct, strung together in a list that says
> > "all of these must be satisfied" or something like that, right?

It is the latter and not the former. I don't want to populate multiple filters
in a single struct. My idea was basically that when the fields got too numerous
we could use unions or add NULLable pointers for the bigger filter data structs,
so the data would be properly "optional".

On Wed, May 29, 2019 at 04:57:23PM -0400, Jeff Hostetler wrote:
> I'm not sure I like the combined structure as proposed.
> But let's think about it.
> 
> I think part of problem with my original version was putting the
> filter_fn and filter_free_fn in the traversal_context rather than
> inside the filter_*_data structure.

Agreed. This cleanup I'm proposing is basically something I was itching to do in
the process of bundling up the filter_fn and filter_free_fn pointers in a single
pointer, which makes the LOFC_COMBINE-particular filter data more concise.

I can still bundle up the pointers into a single pointer and make this cleanup
less aggressive.

> I did a simple combined structure for the list_objects_filter_options
> and kind of regretted it because it wasn't obvious which data fields
> were defined or undefined in each filter constructor.  But it was
> convenient when parsing the command line.
> 
> I think having a combined structure with a union enclosing a structure
> for the data fields in each filter type would be worth considering.
> That way you have a somewhat self-documenting sub-structure for each
> filter type that indicates which fields are defined.

I'm OK with the union approach. The drawback is that the __free function now
needs a switch block to choose the correct union, but the union is also good for
the self-documenting aspect you mention.

> 
> I'd also suggest keeping the "oidset omits" inside each of the
> sub-structures, but that's just me.
> 
> 
> BTW, I don't see a free_fn.  That may collapse out with your proposal
> but I wanted to ask.

See the list_objects_filter__free function. It's trivial, but it inherits the
leakiness of the sparse filters' free function it subsumes.

Thank you for considering the patch.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-05-29 21:29         ` Jeff Hostetler
@ 2019-05-29 23:27           ` Matthew DeVore
  2019-05-30 14:01             ` Jeff Hostetler
  0 siblings, 1 reply; 41+ messages in thread
From: Matthew DeVore @ 2019-05-29 23:27 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Junio C Hamano, Matthew DeVore, jonathantanmy, jrn, git, dstolee,
	jeffhost, jrnieder, pclouds

On Wed, May 29, 2019 at 05:29:14PM -0400, Jeff Hostetler wrote:
> Was sparse:path filter the only reason for needing all the URL encoding?
> The sparse:oid form allows values <ref>:<path> and these (or at least
> the <path> portion) may contain special characters.  So don't we need to
> URL encode this form too?

Oh, I missed this. I was only thinking an oid was allowed after "sparse:". So as
I suspected I was overlooking something obvious.

Now I just want to understand the objection to URL encoding a little better. I
haven't worked with in a project that requires a lot of boilerplate before, so I
may be asking obvious things again. If so, sorry in advance.

So the objections, as I interpret them so far, are that:

 a the URL encoding/decoding complicates the code base
 b explaining the URL encoding, while it allows for future expansion, requires
   some verbose documentation in git-rev-list that is potentially distracting or
   confusing
 c there may be a better way to allow for future expansion that does not require
   URL encoding
 d the URL encoding is unpleasant to use (note that my patchset makes it
   optional for the user to use and it is only mandatory in sending it over the
   wire)

I think these are reasonable and I'm willing to stop digging my heels in :) Does
the above sum everything up?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [RFC PATCH v2] list-objects-filter: merge filter data structs
  2019-05-29 20:57           ` Jeff Hostetler
  2019-05-29 23:10             ` Matthew DeVore
@ 2019-05-30  1:56             ` Matthew DeVore
  2019-05-30 16:12               ` Junio C Hamano
  2019-05-30 19:05             ` [PATCH] " Matthew DeVore
  2 siblings, 1 reply; 41+ messages in thread
From: Matthew DeVore @ 2019-05-30  1:56 UTC (permalink / raw)
  To: jeffhost, gitster
  Cc: git, matvore, emilyshaffer, jonathantanmy, jrn, dstolee, jrnieder,
	pclouds

Address two issues regarding filter types:

 a) Creating a new filter type requires more boilerplate than necessary.
 b) Using the list-filter-objects API requires keeping track of the
    filter function as well as the cleanup (free) function.

Address (a) by putting all execution data for all filter types in a
single struct. Also put data that is not particular to any one type in
this struct (omits, filter_fn). The different fields needed for each
filter type are bundled in anonymous structs and those structs are
unioned together, so that the total size of filter_data does not grow
unmanageably as the number of filter types increases.

This results in a tiny overhead for each filter instance, and in
exchange, less boilerplate is needed for creating a new filter.

Address (b) by including the filter and free functions in the new mega
structure. This makes the list-objects-filter public API simpler and
more opaque.

Also rename the individual "free data used by filter_data" functions
to __clear rather than __free, and stop "free"ing the filter_data struct
in them. This approach allows us to avoid casting filter_data in each
__clear function since we are not passing void * anymore. The "free"
operation is handled by the trampoline function for all filter types,
so simple filters that don't do heap allocation can avoid implementing
a __clear function.

Helped-by: Emily Shaffer <emilyshaffer@google.com>
Helped-by: Jeff Hostetler <jeffhost@microsoft.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Matthew DeVore <matvore@google.com>
---
 list-objects-filter-options.h |  19 +++
 list-objects-filter.c         | 237 +++++++++++++++-------------------
 list-objects-filter.h         |  29 ++---
 list-objects.c                |  45 +++----
 4 files changed, 155 insertions(+), 175 deletions(-)

diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index e3adc78ebf..ac320c3312 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -2,25 +2,44 @@
 #define LIST_OBJECTS_FILTER_OPTIONS_H
 
 #include "parse-options.h"
 #include "strbuf.h"
 
 /*
  * The list of defined filters for list-objects.
  */
 enum list_objects_filter_choice {
 	LOFC_DISABLED = 0,
+
+	/* A filter for list-objects to omit ALL blobs from the traversal. */
 	LOFC_BLOB_NONE,
+
+	/* A filter for list-objects to omit large blobs. */
 	LOFC_BLOB_LIMIT,
+
+	/*
+	 * A filter for list-objects to omit ALL trees and blobs from the
+	 * traversal.
+	 */
 	LOFC_TREE_DEPTH,
+
+	/*
+	 * Filters driven by a sparse-checkout specification to only include
+	 * blobs that a sparse checkout would populate.
+	 *
+	 * The sparse-checkout spec can be loaded from a blob with the given
+	 * OID or from a local pathname.  We allow an OID because the repo may
+	 * be bare or we may be doing the filtering on the server.
+	 */
 	LOFC_SPARSE_OID,
 	LOFC_SPARSE_PATH,
+
 	LOFC__COUNT /* must be last */
 };
 
 struct list_objects_filter_options {
 	/*
 	 * 'filter_spec' is the raw argument value given on the command line
 	 * or protocol request.  (The part after the "--keyword=".)  For
 	 * commands that launch filtering sub-processes, or for communication
 	 * over the network, don't use this value; use the result of
 	 * expand_list_objects_filter_spec() instead.
diff --git a/list-objects-filter.c b/list-objects-filter.c
index ee449de3f7..d35bd5c904 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -19,38 +19,73 @@
  * FILTER_SHOWN_BUT_REVISIT -- we set this bit on tree objects
  * that have been shown, but should be revisited if they appear
  * in the traversal (until we mark it SEEN).  This is a way to
  * let us silently de-dup calls to show() in the caller.  This
  * is subtly different from the "revision.h:SHOWN" and the
  * "sha1-name.c:ONELINE_SEEN" bits.  And also different from
  * the non-de-dup usage in pack-bitmap.c
  */
 #define FILTER_SHOWN_BUT_REVISIT (1<<21)
 
-/*
- * A filter for list-objects to omit ALL blobs from the traversal.
- * And to OPTIONALLY collect a list of the omitted OIDs.
- */
-struct filter_blobs_none_data {
+struct filter_data {
+	/* Used by all filter types. */
 	struct oidset *omits;
+
+	enum list_objects_filter_result (*filter_object_fn)(
+		struct repository *r,
+		enum list_objects_filter_situation filter_situation,
+		struct object *obj,
+		const char *pathname,
+		const char *filename,
+		struct filter_data *filter_data);
+
+	void (*filter_clear_fn)(struct filter_data *filter_data);
+
+	union {
+		struct {
+			/*
+			 * Maps trees to the minimum depth at which they were
+			 * seen. It is not necessary to re-traverse a tree at
+			 * deeper or equal depths than it has already been
+			 * traversed.
+			 *
+			 * We can't use LOFR_MARK_SEEN for tree objects since
+			 * this will prevent it from being traversed at
+			 * shallower depths.
+			 */
+			struct oidmap seen_at_depth;
+
+			unsigned long exclude_depth;
+			unsigned long current_depth;
+		};
+
+		struct {
+			unsigned long max_bytes;
+		};
+
+		struct {
+			struct exclude_list el;
+
+			size_t nr, alloc;
+			struct frame *array_frame;
+		};
+	};
 };
 
 static enum list_objects_filter_result filter_blobs_none(
 	struct repository *r,
 	enum list_objects_filter_situation filter_situation,
 	struct object *obj,
 	const char *pathname,
 	const char *filename,
-	void *filter_data_)
+	struct filter_data *filter_data)
 {
-	struct filter_blobs_none_data *filter_data = filter_data_;
-
 	switch (filter_situation) {
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		/* always include all tree objects */
 		return LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
 	case LOFS_END_TREE:
@@ -60,84 +95,55 @@ static enum list_objects_filter_result filter_blobs_none(
 	case LOFS_BLOB:
 		assert(obj->type == OBJ_BLOB);
 		assert((obj->flags & SEEN) == 0);
 
 		if (filter_data->omits)
 			oidset_insert(filter_data->omits, &obj->oid);
 		return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
 	}
 }
 
-static void *filter_blobs_none__init(
-	struct oidset *omitted,
+static void filter_blobs_none__init(
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn)
+	struct filter_data *d)
 {
-	struct filter_blobs_none_data *d = xcalloc(1, sizeof(*d));
-	d->omits = omitted;
-
-	*filter_fn = filter_blobs_none;
-	*filter_free_fn = free;
-	return d;
+	d->filter_object_fn = filter_blobs_none;
 }
 
-/*
- * A filter for list-objects to omit ALL trees and blobs from the traversal.
- * Can OPTIONALLY collect a list of the omitted OIDs.
- */
-struct filter_trees_depth_data {
-	struct oidset *omits;
-
-	/*
-	 * Maps trees to the minimum depth at which they were seen. It is not
-	 * necessary to re-traverse a tree at deeper or equal depths than it has
-	 * already been traversed.
-	 *
-	 * We can't use LOFR_MARK_SEEN for tree objects since this will prevent
-	 * it from being traversed at shallower depths.
-	 */
-	struct oidmap seen_at_depth;
-
-	unsigned long exclude_depth;
-	unsigned long current_depth;
-};
-
 struct seen_map_entry {
 	struct oidmap_entry base;
 	size_t depth;
 };
 
 /* Returns 1 if the oid was in the omits set before it was invoked. */
 static int filter_trees_update_omits(
 	struct object *obj,
-	struct filter_trees_depth_data *filter_data,
+	struct filter_data *filter_data,
 	int include_it)
 {
 	if (!filter_data->omits)
 		return 0;
 
 	if (include_it)
 		return oidset_remove(filter_data->omits, &obj->oid);
 	else
 		return oidset_insert(filter_data->omits, &obj->oid);
 }
 
 static enum list_objects_filter_result filter_trees_depth(
 	struct repository *r,
 	enum list_objects_filter_situation filter_situation,
 	struct object *obj,
 	const char *pathname,
 	const char *filename,
-	void *filter_data_)
+	struct filter_data *filter_data)
 {
-	struct filter_trees_depth_data *filter_data = filter_data_;
 	struct seen_map_entry *seen_info;
 	int include_it = filter_data->current_depth <
 		filter_data->exclude_depth;
 	int filter_res;
 	int already_seen;
 
 	/*
 	 * Note that we do not use _MARK_SEEN in order to allow re-traversal in
 	 * case we encounter a tree or blob again at a shallower depth.
 	 */
@@ -186,63 +192,44 @@ static enum list_objects_filter_result filter_trees_depth(
 				filter_res = LOFR_ZERO;
 			else
 				filter_res = LOFR_SKIP_TREE;
 		}
 
 		filter_data->current_depth++;
 		return filter_res;
 	}
 }
 
-static void filter_trees_free(void *filter_data) {
-	struct filter_trees_depth_data *d = filter_data;
-	if (!d)
-		return;
-	oidmap_free(&d->seen_at_depth, 1);
-	free(d);
+static void filter_trees_clear(struct filter_data *filter_data)
+{
+	oidmap_free(&filter_data->seen_at_depth, 1);
 }
 
-static void *filter_trees_depth__init(
-	struct oidset *omitted,
+static void filter_trees_depth__init(
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn)
+	struct filter_data *d)
 {
-	struct filter_trees_depth_data *d = xcalloc(1, sizeof(*d));
-	d->omits = omitted;
 	oidmap_init(&d->seen_at_depth, 0);
 	d->exclude_depth = filter_options->tree_exclude_depth;
 	d->current_depth = 0;
-
-	*filter_fn = filter_trees_depth;
-	*filter_free_fn = filter_trees_free;
-	return d;
+	d->filter_object_fn = filter_trees_depth;
+	d->filter_clear_fn = filter_trees_clear;
 }
 
-/*
- * A filter for list-objects to omit large blobs.
- * And to OPTIONALLY collect a list of the omitted OIDs.
- */
-struct filter_blobs_limit_data {
-	struct oidset *omits;
-	unsigned long max_bytes;
-};
-
 static enum list_objects_filter_result filter_blobs_limit(
 	struct repository *r,
 	enum list_objects_filter_situation filter_situation,
 	struct object *obj,
 	const char *pathname,
 	const char *filename,
-	void *filter_data_)
+	struct filter_data *filter_data)
 {
-	struct filter_blobs_limit_data *filter_data = filter_data_;
 	unsigned long object_length;
 	enum object_type t;
 
 	switch (filter_situation) {
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		/* always include all tree objects */
@@ -274,81 +261,57 @@ static enum list_objects_filter_result filter_blobs_limit(
 			oidset_insert(filter_data->omits, &obj->oid);
 		return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
 	}
 
 include_it:
 	if (filter_data->omits)
 		oidset_remove(filter_data->omits, &obj->oid);
 	return LOFR_MARK_SEEN | LOFR_DO_SHOW;
 }
 
-static void *filter_blobs_limit__init(
-	struct oidset *omitted,
+static void filter_blobs_limit__init(
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn)
+	struct filter_data *d)
 {
-	struct filter_blobs_limit_data *d = xcalloc(1, sizeof(*d));
-	d->omits = omitted;
 	d->max_bytes = filter_options->blob_limit_value;
-
-	*filter_fn = filter_blobs_limit;
-	*filter_free_fn = free;
-	return d;
+	d->filter_object_fn = filter_blobs_limit;
 }
 
-/*
- * A filter driven by a sparse-checkout specification to only
- * include blobs that a sparse checkout would populate.
- *
- * The sparse-checkout spec can be loaded from a blob with the
- * given OID or from a local pathname.  We allow an OID because
- * the repo may be bare or we may be doing the filtering on the
- * server.
- */
+/* For use with sparse checkout filters. */
 struct frame {
 	/*
 	 * defval is the usual default include/exclude value that
 	 * should be inherited as we recurse into directories based
 	 * upon pattern matching of the directory itself or of a
 	 * containing directory.
 	 */
 	int defval;
 
 	/*
 	 * 1 if the directory (recursively) contains any provisionally
 	 * omitted objects.
 	 *
 	 * 0 if everything (recursively) contained in this directory
 	 * has been explicitly included (SHOWN) in the result and
 	 * the directory may be short-cut later in the traversal.
 	 */
 	unsigned child_prov_omit : 1;
 };
 
-struct filter_sparse_data {
-	struct oidset *omits;
-	struct exclude_list el;
-
-	size_t nr, alloc;
-	struct frame *array_frame;
-};
-
 static enum list_objects_filter_result filter_sparse(
 	struct repository *r,
 	enum list_objects_filter_situation filter_situation,
 	struct object *obj,
 	const char *pathname,
 	const char *filename,
-	void *filter_data_)
+	struct filter_data *filter_data)
 {
-	struct filter_sparse_data *filter_data = filter_data_;
 	int val, dtype;
 	struct frame *frame;
 
 	switch (filter_situation) {
 	default:
 		BUG("unknown filter_situation: %d", filter_situation);
 
 	case LOFS_BEGIN_TREE:
 		assert(obj->type == OBJ_TREE);
 		dtype = DT_DIR;
@@ -442,100 +405,104 @@ static enum list_objects_filter_result filter_sparse(
 		/*
 		 * Remember that at least 1 blob in this tree was
 		 * provisionally omitted.  This prevents us from short
 		 * cutting the tree in future iterations.
 		 */
 		frame->child_prov_omit = 1;
 		return LOFR_ZERO;
 	}
 }
 
-
-static void filter_sparse_free(void *filter_data)
+static void filter_sparse_clear(struct filter_data *filter_data)
 {
-	struct filter_sparse_data *d = filter_data;
-	/* TODO free contents of 'd' */
-	free(d);
+	FREE_AND_NULL(filter_data->array_frame);
 }
 
-static void *filter_sparse_oid__init(
-	struct oidset *omitted,
+static void filter_sparse_oid__init(
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn)
+	struct filter_data *d)
 {
-	struct filter_sparse_data *d = xcalloc(1, sizeof(*d));
-	d->omits = omitted;
 	if (add_excludes_from_blob_to_list(filter_options->sparse_oid_value,
 					   NULL, 0, &d->el) < 0)
 		die("could not load filter specification");
 
 	ALLOC_GROW(d->array_frame, d->nr + 1, d->alloc);
 	d->array_frame[d->nr].defval = 0; /* default to include */
 	d->array_frame[d->nr].child_prov_omit = 0;
 
-	*filter_fn = filter_sparse;
-	*filter_free_fn = filter_sparse_free;
-	return d;
+	d->filter_object_fn = filter_sparse;
+	d->filter_clear_fn = filter_sparse_clear;
 }
 
-static void *filter_sparse_path__init(
-	struct oidset *omitted,
+static void filter_sparse_path__init(
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn)
+	struct filter_data *d)
 {
-	struct filter_sparse_data *d = xcalloc(1, sizeof(*d));
-	d->omits = omitted;
 	if (add_excludes_from_file_to_list(filter_options->sparse_path_value,
 					   NULL, 0, &d->el, NULL) < 0)
 		die("could not load filter specification");
 
 	ALLOC_GROW(d->array_frame, d->nr + 1, d->alloc);
 	d->array_frame[d->nr].defval = 0; /* default to include */
 	d->array_frame[d->nr].child_prov_omit = 0;
 
-	*filter_fn = filter_sparse;
-	*filter_free_fn = filter_sparse_free;
-	return d;
+	d->filter_object_fn = filter_sparse;
+	d->filter_clear_fn = filter_sparse_clear;
 }
 
-typedef void *(*filter_init_fn)(
-	struct oidset *omitted,
+typedef void (*filter_init_fn)(
 	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn);
+	struct filter_data *filter_data);
 
 /*
  * Must match "enum list_objects_filter_choice".
  */
 static filter_init_fn s_filters[] = {
 	NULL,
 	filter_blobs_none__init,
 	filter_blobs_limit__init,
 	filter_trees_depth__init,
 	filter_sparse_oid__init,
 	filter_sparse_path__init,
 };
 
-void *list_objects_filter__init(
+struct filter_data *list_objects_filter__init(
 	struct oidset *omitted,
-	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn)
+	struct list_objects_filter_options *filter_options)
 {
 	filter_init_fn init_fn;
+	struct filter_data *filter_data;
 
 	assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);
 
 	if (filter_options->choice >= LOFC__COUNT)
 		BUG("invalid list-objects filter choice: %d",
 		    filter_options->choice);
 
 	init_fn = s_filters[filter_options->choice];
-	if (init_fn)
-		return init_fn(omitted, filter_options,
-			       filter_fn, filter_free_fn);
-	*filter_fn = NULL;
-	*filter_free_fn = NULL;
-	return NULL;
+	if (!init_fn)
+		return NULL;
+
+	filter_data = xcalloc(1, sizeof(*filter_data));
+	filter_data->omits = omitted;
+	init_fn(filter_options, filter_data);
+
+	return filter_data;
+}
+
+enum list_objects_filter_result list_objects_filter__filter_object(
+	struct repository *r,
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	struct filter_data *filter_data)
+{
+	return filter_data->filter_object_fn(
+		r, filter_situation, obj, pathname, filename, filter_data);
+}
+
+void list_objects_filter__free(struct filter_data *d) {
+	if (d->filter_clear_fn)
+		d->filter_clear_fn(d);
+	FREE_AND_NULL(d);
 }
diff --git a/list-objects-filter.h b/list-objects-filter.h
index 1d45a4ad57..2bc59147a7 100644
--- a/list-objects-filter.h
+++ b/list-objects-filter.h
@@ -53,37 +53,34 @@ enum list_objects_filter_result {
 	LOFR_DO_SHOW   = 1<<1,
 	LOFR_SKIP_TREE = 1<<2,
 };
 
 enum list_objects_filter_situation {
 	LOFS_BEGIN_TREE,
 	LOFS_END_TREE,
 	LOFS_BLOB
 };
 
-typedef enum list_objects_filter_result (*filter_object_fn)(
-	struct repository *r,
-	enum list_objects_filter_situation filter_situation,
-	struct object *obj,
-	const char *pathname,
-	const char *filename,
-	void *filter_data);
-
-typedef void (*filter_free_fn)(void *filter_data);
+struct filter_data;
 
 /*
  * Constructor for the set of defined list-objects filters.
  * Returns a generic "void *filter_data".
  *
  * The returned "filter_fn" will be used by traverse_commit_list()
  * to filter the results.
- *
- * The returned "filter_free_fn" is a destructor for the
- * filter_data.
  */
-void *list_objects_filter__init(
+struct filter_data *list_objects_filter__init(
 	struct oidset *omitted,
-	struct list_objects_filter_options *filter_options,
-	filter_object_fn *filter_fn,
-	filter_free_fn *filter_free_fn);
+	struct list_objects_filter_options *filter_options);
+
+enum list_objects_filter_result list_objects_filter__filter_object(
+	struct repository *r,
+	enum list_objects_filter_situation filter_situation,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	struct filter_data *filter_data);
+
+void list_objects_filter__free(struct filter_data *filter_data);
 
 #endif /* LIST_OBJECTS_FILTER_H */
diff --git a/list-objects.c b/list-objects.c
index b5651ddd5b..a5502a226d 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -11,22 +11,21 @@
 #include "list-objects-filter-options.h"
 #include "packfile.h"
 #include "object-store.h"
 #include "trace.h"
 
 struct traversal_context {
 	struct rev_info *revs;
 	show_object_fn show_object;
 	show_commit_fn show_commit;
 	void *show_data;
-	filter_object_fn filter_fn;
-	void *filter_data;
+	struct filter_data *filter_data;
 };
 
 static void process_blob(struct traversal_context *ctx,
 			 struct blob *blob,
 			 struct strbuf *path,
 			 const char *name)
 {
 	struct object *obj = &blob->object;
 	size_t pathlen;
 	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
@@ -47,25 +46,26 @@ static void process_blob(struct traversal_context *ctx,
 	 * may cause the actual filter to report an incomplete list
 	 * of missing objects.
 	 */
 	if (ctx->revs->exclude_promisor_objects &&
 	    !has_object_file(&obj->oid) &&
 	    is_promisor_object(&obj->oid))
 		return;
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
-		r = ctx->filter_fn(ctx->revs->repo,
-				   LOFS_BLOB, obj,
-				   path->buf, &path->buf[pathlen],
-				   ctx->filter_data);
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_data)
+		r = list_objects_filter__filter_object(
+			ctx->revs->repo,
+			LOFS_BLOB, obj,
+			path->buf, &path->buf[pathlen],
+			ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
 		ctx->show_object(obj, path->buf, ctx->show_data);
 	strbuf_setlen(path, pathlen);
 }
 
 /*
  * Processing a gitlink entry currently does nothing, since
  * we do not recurse into the subproject.
@@ -179,42 +179,44 @@ static void process_tree(struct traversal_context *ctx,
 		 */
 		if (revs->exclude_promisor_objects &&
 		    is_promisor_object(&obj->oid))
 			return;
 
 		if (!revs->do_not_die_on_missing_tree)
 			die("bad tree object %s", oid_to_hex(&obj->oid));
 	}
 
 	strbuf_addstr(base, name);
-	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
-		r = ctx->filter_fn(ctx->revs->repo,
-				   LOFS_BEGIN_TREE, obj,
-				   base->buf, &base->buf[baselen],
-				   ctx->filter_data);
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_data)
+		r = list_objects_filter__filter_object(
+			ctx->revs->repo,
+			LOFS_BEGIN_TREE, obj,
+			base->buf, &base->buf[baselen],
+			ctx->filter_data);
 	if (r & LOFR_MARK_SEEN)
 		obj->flags |= SEEN;
 	if (r & LOFR_DO_SHOW)
 		ctx->show_object(obj, base->buf, ctx->show_data);
 	if (base->len)
 		strbuf_addch(base, '/');
 
 	if (r & LOFR_SKIP_TREE)
 		trace_printf("Skipping contents of tree %s...\n", base->buf);
 	else if (!failed_parse)
 		process_tree_contents(ctx, tree, base);
 
-	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
-		r = ctx->filter_fn(ctx->revs->repo,
-				   LOFS_END_TREE, obj,
-				   base->buf, &base->buf[baselen],
-				   ctx->filter_data);
+	if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_data) {
+		r = list_objects_filter__filter_object(
+			ctx->revs->repo,
+			LOFS_END_TREE, obj,
+			base->buf, &base->buf[baselen],
+			ctx->filter_data);
 		if (r & LOFR_MARK_SEEN)
 			obj->flags |= SEEN;
 		if (r & LOFR_DO_SHOW)
 			ctx->show_object(obj, base->buf, ctx->show_data);
 	}
 
 	strbuf_setlen(base, baselen);
 	free_tree_buffer(tree);
 }
 
@@ -395,38 +397,33 @@ static void do_traverse(struct traversal_context *ctx)
 void traverse_commit_list(struct rev_info *revs,
 			  show_commit_fn show_commit,
 			  show_object_fn show_object,
 			  void *show_data)
 {
 	struct traversal_context ctx;
 	ctx.revs = revs;
 	ctx.show_commit = show_commit;
 	ctx.show_object = show_object;
 	ctx.show_data = show_data;
-	ctx.filter_fn = NULL;
 	ctx.filter_data = NULL;
 	do_traverse(&ctx);
 }
 
 void traverse_commit_list_filtered(
 	struct list_objects_filter_options *filter_options,
 	struct rev_info *revs,
 	show_commit_fn show_commit,
 	show_object_fn show_object,
 	void *show_data,
 	struct oidset *omitted)
 {
 	struct traversal_context ctx;
-	filter_free_fn filter_free_fn = NULL;
 
 	ctx.revs = revs;
 	ctx.show_object = show_object;
 	ctx.show_commit = show_commit;
 	ctx.show_data = show_data;
-	ctx.filter_fn = NULL;
 
-	ctx.filter_data = list_objects_filter__init(omitted, filter_options,
-						    &ctx.filter_fn, &filter_free_fn);
+	ctx.filter_data = list_objects_filter__init(omitted, filter_options);
 	do_traverse(&ctx);
-	if (ctx.filter_data && filter_free_fn)
-		filter_free_fn(ctx.filter_data);
+	list_objects_filter__free(ctx.filter_data);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-05-29 23:27           ` Matthew DeVore
@ 2019-05-30 14:01             ` Jeff Hostetler
  2019-05-31 20:53               ` Matthew DeVore
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff Hostetler @ 2019-05-30 14:01 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: Junio C Hamano, Matthew DeVore, jonathantanmy, jrn, git, dstolee,
	jeffhost, jrnieder, pclouds



On 5/29/2019 7:27 PM, Matthew DeVore wrote:
> On Wed, May 29, 2019 at 05:29:14PM -0400, Jeff Hostetler wrote:
>> Was sparse:path filter the only reason for needing all the URL encoding?
>> The sparse:oid form allows values <ref>:<path> and these (or at least
>> the <path> portion) may contain special characters.  So don't we need to
>> URL encode this form too?
> 
> Oh, I missed this. I was only thinking an oid was allowed after "sparse:". So as
> I suspected I was overlooking something obvious.
> 
> Now I just want to understand the objection to URL encoding a little better. I
> haven't worked with in a project that requires a lot of boilerplate before, so I
> may be asking obvious things again. If so, sorry in advance.
> 
> So the objections, as I interpret them so far, are that:
> 
>   a the URL encoding/decoding complicates the code base
>   b explaining the URL encoding, while it allows for future expansion, requires
>     some verbose documentation in git-rev-list that is potentially distracting or
>     confusing
>   c there may be a better way to allow for future expansion that does not require
>     URL encoding
>   d the URL encoding is unpleasant to use (note that my patchset makes it
>     optional for the user to use and it is only mandatory in sending it over the
>     wire)
> 
> I think these are reasonable and I'm willing to stop digging my heels in :) Does
> the above sum everything up?
> 

My primary concern was how awkward it would be to use the URL
encoding syntax on the command line, but as you say, that can be
avoided by using the multiple --filter args.

And to be honest, the wire format is hidden from user view, so it
doesn't really matter there.  So either approach is fine.  I was
hoping that the "filters (plural)" approach would let us avoid URL
encoding, but that comes with its own baggage as you suggested.
And besides, URL encoding is well-understood.

And I don't want to prematurely complicate this with ANDs ORs and
XORs as you mention in another thread.

So don't let me stop this effort.


BTW, I don't think I've seen this mentioned anywhere and I don't
remember if this got into the code or not.  But we discussed having
a repo-local config setting to remember the filter-spec used by the
partial clone that would be inherited by a subsequent (partial) fetch.
Or would be set by the first partial fetch following a normal clone.
Having a single composite filter spec would help with this.

Jeff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2] list-objects-filter: merge filter data structs
  2019-05-30  1:56             ` [RFC PATCH v2] " Matthew DeVore
@ 2019-05-30 16:12               ` Junio C Hamano
  2019-05-30 18:29                 ` Matthew DeVore
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2019-05-30 16:12 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: jeffhost, git, emilyshaffer, jonathantanmy, jrn, dstolee,
	jrnieder, pclouds

Matthew DeVore <matvore@google.com> writes:

> +struct filter_data {
> +	/* Used by all filter types. */
>  	struct oidset *omits;
> +
> +	enum list_objects_filter_result (*filter_object_fn)(
> +		struct repository *r,
> +		enum list_objects_filter_situation filter_situation,
> +		struct object *obj,
> +		const char *pathname,
> +		const char *filename,
> +		struct filter_data *filter_data);
> +
> +	void (*filter_clear_fn)(struct filter_data *filter_data);
> +
> +	union {
> +		struct {
> +			/*
> +			 * Maps trees to the minimum depth at which they were
> +			 * seen. It is not necessary to re-traverse a tree at
> +			 * deeper or equal depths than it has already been
> +			 * traversed.
> +			 *
> +			 * We can't use LOFR_MARK_SEEN for tree objects since
> +			 * this will prevent it from being traversed at
> +			 * shallower depths.
> +			 */
> +			struct oidmap seen_at_depth;
> +
> +			unsigned long exclude_depth;
> +			unsigned long current_depth;
> +		};

Name this, and the ohter two union members, and the union itself as
one member inside the outer struct; some compilers would be unhappy
with the GCC extension that allows you to refer to the member in
this struct as ((struct filter_data *)p)->seen_at_depth, no?

> +		struct {
> +			unsigned long max_bytes;
> +		};
> +
> +		struct {
> +			struct exclude_list el;
> +
> +			size_t nr, alloc;
> +			struct frame *array_frame;
> +		};
> +	};
>  };


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH v2] list-objects-filter: merge filter data structs
  2019-05-30 16:12               ` Junio C Hamano
@ 2019-05-30 18:29                 ` Matthew DeVore
  0 siblings, 0 replies; 41+ messages in thread
From: Matthew DeVore @ 2019-05-30 18:29 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Matthew DeVore, jeffhost, git, emilyshaffer, jonathantanmy, jrn,
	dstolee, jrnieder, pclouds

On Thu, May 30, 2019 at 09:12:06AM -0700, Junio C Hamano wrote:
> > +	union {
> > +		struct {
> 
> Name this, and the ohter two union members, and the union itself as
> one member inside the outer struct; some compilers would be unhappy
> with the GCC extension that allows you to refer to the member in
> this struct as ((struct filter_data *)p)->seen_at_depth, no?
> 

Anonymous unions and structs apparently require C11, which I guess is not
something we want to require.

If I have to name the union and struct, a lot of the conciseness of this
refactor is lost. All accesses of filter data either have to be qualified with
something like filter_data->type_specific.tree.seen_at_depth. If we want to
avoid that messy construct, we are back to assigning an alias pointer of the
correct type as we are doing before this patch anyway.

It's possible there is a clean and concise way to do C-style OO that is better
than what is already here, but this seems to be not worth our time at this
point. For the time being, I'll just change this patch to simplify the API (so
everything is emcompassed in an opaque `struct filter *`) and keep the
implementation more or less as-is.

Thank you for taking a look.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] list-objects-filter: merge filter data structs
  2019-05-29 20:57           ` Jeff Hostetler
  2019-05-29 23:10             ` Matthew DeVore
  2019-05-30  1:56             ` [RFC PATCH v2] " Matthew DeVore
@ 2019-05-30 19:05             ` Matthew DeVore
  2 siblings, 0 replies; 41+ messages in thread
From: Matthew DeVore @ 2019-05-30 19:05 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Junio C Hamano, Emily Shaffer, Matthew DeVore, jonathantanmy, jrn,
	git, dstolee, jeffhost, jrnieder, pclouds

On Wed, May 29, 2019 at 04:57:23PM -0400, Jeff Hostetler wrote:
> I'd also suggest keeping the "oidset omits" inside each of the
> sub-structures, but that's just me.

I just reminded myself that the omits need to be outside of the specialized
sub-structures because the combine filter logic needs to access them in a
type-agnostic manner in order to merge them once the traversal is over.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-05-28 21:53   ` Emily Shaffer
@ 2019-05-31 20:48     ` Matthew DeVore
  2019-05-31 21:10       ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Matthew DeVore @ 2019-05-31 20:48 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: Matthew DeVore, jonathantanmy, jrn, git, dstolee, jeffhost,
	jrnieder, pclouds

On Tue, May 28, 2019 at 02:53:59PM -0700, Emily Shaffer wrote:
> > +	} else if (skip_prefix(arg, "combine:", &v0)) {
> > +		int sub_parse_res = parse_combine_filter(
> > +			filter_options, v0, errbuf);
> > +		if (sub_parse_res)
> > +			return sub_parse_res;
> > +		return 0;
> 
> Couldn't the three lines above be said more succinctly as "return
> sub_parse_res;"?

Oh yes, that's much better. Don't even need the sub_parse_res variable.

> > +static int digit_value(int c, struct strbuf *errbuf) {
> > +	if (c >= '0' && c <= '9')
> > +		return c - '0';
> > +	if (c >= 'a' && c <= 'f')
> > +		return c - 'a' + 10;
> > +	if (c >= 'A' && c <= 'F')
> > +		return c - 'A' + 10;
> 
> I'm sure there's something I'm missing here. But why are you manually
> decoding hex instead of using strtol or sscanf or something?
> 

I'll have to give this a try. Thank you for the suggestion.

> > +static int has_reserved_character(
> > +	struct strbuf *sub_spec, struct strbuf *errbuf)
> > +{
> > +	const char *c = sub_spec->buf;
> > +	while (*c) {
> > +		if (*c <= ' ' || strchr(RESERVED_NON_WS, *c))
> > +			goto found_reserved;
> > +		c++;
> > +	}
> > +
> > +	return 0;
> > +
> > +found_reserved:
> 
> What's the value of doing this in a goto instead of embedded in the
> while loop?
> 

That's to reduce indentation. Note that if I "inlined" the goto logic in the
while loop, I'd get at least 5 tabs of indentation, and the error message would
be split across a couple lines.

> > +
> > +	result = gently_parse_list_objects_filter(filter_options->lhs,
> > +						  sub_specs[0]->buf,
> > +						  errbuf) ||
> > +		parse_combine_filter(filter_options->rhs,
> > +				      sub_specs[1]->buf,
> > +				      errbuf);
> 
> I guess you're recursing to combine filter 2 onto filter 1 which has
> been combined onto filter 0 here. But why not just use a list or array?
> 

I switched this to use an array at your and Jeff's proddings, and it's much
better now. Thanks! It will be in the next roll-up.

> >  
> >  void list_objects_filter_release(
> >  	struct list_objects_filter_options *filter_options)
> >  {
> > +	if (!filter_options)
> > +		return;
> >  	free(filter_options->filter_spec);
> >  	free(filter_options->sparse_oid_value);
> >  	free(filter_options->sparse_path_value);
> > +	list_objects_filter_release(filter_options->lhs);
> > +	free(filter_options->lhs);
> > +	list_objects_filter_release(filter_options->rhs);
> > +	free(filter_options->rhs);
> 
> Is there a reason that the free shouldn't be included in
> list_objects_filter_release()? Maybe this is a common style guideline
> I've missed, but it seems to me like I'd expect a magic memory cleanup
> function to do it all, and not leave it to me to free.
> 

Because there are a couple times the list_objects_filter_options struct is
allocated on the stack or inline in some other struct. This is similar to how
strbuf and other such utility structs are used.

> Jeff H had a comment about this too, but this seems unwieldy for >2
> filters. (I also personally don't like using set index to incidate
> lhs/rhs.) Why not an array of multiple `struct sub`? There's a macro
> utility to generate types and helpers for an array of arbitrary struct
> that may suit...
> 

This code is now cleaner that it's using an array.

> > +static enum list_objects_filter_result filter_combine(
> > +	struct repository *r,
> > +	enum list_objects_filter_situation filter_situation,
> > +	struct object *obj,
> > +	const char *pathname,
> > +	const char *filename,
> > +	struct filter_context *ctx)
> > +{
> > +	struct filter_combine_data *d = ctx->data;
> > +	enum list_objects_filter_result result[2];
> > +	enum list_objects_filter_result combined_result = LOFR_ZERO;
> > +	int i;
> > +
> > +	for (i = 0; i < 2; i++) {
> 
> I suppose your lhs and rhs are in sub[0] and sub[1] in part for the sake
> of this loop. But I think it would be easier to understand what is going
> on if you were to perform the loop contents in a helper function (as the
> name of the function would provide some more documentation).
> 

Agreed, this is how it will be done in the next roll-up.

> I see that you tested that >2 filters works okay. But by doing it the
> way you have it seems like you're setting up to need recursion all over
> the place to check against all the filters. I suppose I don't see the
> benefit of doing all this recursively, as compared to doing it
> iteratively.

Somehow, the recursive appraoch made more sense to me when I was first writing
the code. But using an array is nicer.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-05-30 14:01             ` Jeff Hostetler
@ 2019-05-31 20:53               ` Matthew DeVore
  2019-06-03 21:04                 ` Jeff Hostetler
  0 siblings, 1 reply; 41+ messages in thread
From: Matthew DeVore @ 2019-05-31 20:53 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Junio C Hamano, Matthew DeVore, jonathantanmy, jrn, git, dstolee,
	jeffhost, jrnieder, pclouds

On Thu, May 30, 2019 at 10:01:47AM -0400, Jeff Hostetler wrote:
> BTW, I don't think I've seen this mentioned anywhere and I don't
> remember if this got into the code or not.  But we discussed having
> a repo-local config setting to remember the filter-spec used by the
> partial clone that would be inherited by a subsequent (partial) fetch.
> Or would be set by the first partial fetch following a normal clone.
> Having a single composite filter spec would help with this.

Isn't that what the partial_clone_get_default_filter_spec function is for? I
forgot about that. Perhaps with Emily's suggestion to use parsing functions in
the C library and the other cleanups I've applied since the first roll-up, using
the URL encoding will seem nicer. Let me try that...

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-05-31 20:48     ` Matthew DeVore
@ 2019-05-31 21:10       ` Jeff King
  2019-06-01  0:12         ` Matthew DeVore
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-05-31 21:10 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: Emily Shaffer, Matthew DeVore, jonathantanmy, jrn, git, dstolee,
	jeffhost, jrnieder, pclouds

On Fri, May 31, 2019 at 01:48:21PM -0700, Matthew DeVore wrote:

> > > +static int digit_value(int c, struct strbuf *errbuf) {
> > > +	if (c >= '0' && c <= '9')
> > > +		return c - '0';
> > > +	if (c >= 'a' && c <= 'f')
> > > +		return c - 'a' + 10;
> > > +	if (c >= 'A' && c <= 'F')
> > > +		return c - 'A' + 10;
> > 
> > I'm sure there's something I'm missing here. But why are you manually
> > decoding hex instead of using strtol or sscanf or something?
> > 
> 
> I'll have to give this a try. Thank you for the suggestion.

Try our hex_to_bytes() helper (or if you really want to go low-level,
your conditionals can be replaced by lookups in the hexval table).

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-05-24 21:01   ` Jeff Hostetler
  2019-05-28 17:59     ` Junio C Hamano
@ 2019-06-01  0:11     ` Matthew DeVore
  1 sibling, 0 replies; 41+ messages in thread
From: Matthew DeVore @ 2019-06-01  0:11 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Matthew DeVore, jonathantanmy, jrn, git, dstolee, jeffhost,
	jrnieder, pclouds

On Fri, May 24, 2019 at 05:01:15PM -0400, Jeff Hostetler wrote:
> We are allowing an unlimited number of filters in the composition.
> In the code, the compose filter data has space for a LHS and RHS, so
> I'm assuming we're mapping
> 
>     --filter=f1 --filter=f2 --filter=f3 --filter=f4
> or  --filter=combine:f1+f2+f3+f4
> into basically
>     (compose f1 (compose f2 (compose (f3 f4)))
> 
> I wonder if it would be easier to understand if we just built an array
> or linked list, but I'll read on.

As I mentioned in earlier messages, I have changed this to use an array. It's
nicer now.

(nit: the filters were left-associative rather than right-associative)

> Should we swap the order of the terms in the || so that we always
> clear the d->sub[i].is_skipping_tree on LOFS_END_TREE ?
> 

Done, and added a comment:

	/*
	 * Check should_delegate before oidset_contains so that
	 * is_skipping_tree gets unset even when the object is marked as seen.
	 * As of this writing, no filter uses LOFR_MARK_SEEN on trees that also
	 * uses LOFR_SKIP_TREE, so the ordering is only theoretically
	 * important. Be cautious if you change the order of the below checks
	 * and more filters have been added!
	 */

> 
> > +			result[i] = LOFR_ZERO;
> > +			continue;
> > +		}
> > +
> > +		result[i] = d->sub[i].ctx.filter_fn(
> > +			r, filter_situation, obj, pathname, filename,
> > +			&d->sub[i].ctx);
> > +
> > +		if (result[i] & LOFR_MARK_SEEN)
> > +			oidset_insert(&d->sub[i].seen, &obj->oid);
> 
> So filter[i] has said it never wants to show this object (hard omit).
> And the guard at the top of the loop will prevent future invocations
> from checking it again if the object is revisited.
> 

Yes.

> > +
> > +		if (result[i] & LOFR_SKIP_TREE) {
> > +			d->sub[i].is_skipping_tree = 1;
> > +			d->sub[i].skip_tree = obj->oid;
> 
> So this marks the tree object at the top of the skip as far as
> filter[i] is concerned.
> 

Yes.

> > +		}
> > +	}
> > +
> > +	if ((result[0] & LOFR_DO_SHOW) && (result[1] & LOFR_DO_SHOW))
> > +		combined_result |= LOFR_DO_SHOW;
> > +	if (d->sub[0].is_skipping_tree && d->sub[1].is_skipping_tree)
> > +		combined_result |= LOFR_SKIP_TREE;
> 
> Something about the above bothers me, but I can't quite say what
> it is.
> 

It looks nicer now that it's array-based. Let me know what you think after I
send the next roll-up.

> Do we need to do:
>     if ((result[0] & LOFR_MARK_SEEN) && (result[1] & LOFR_MARK_SEEN))
>         combined_result |= LOFR_MARK_SEEN;

This should be a O(1) sort of optimization, since if we don't set it, the top
filter will still be called, but won't delegate to any sub-filters. It doesn't
complicate the code much, so it seems worth it to add. Done.

> I'm out of time now, will pick this up again next week.

Thank you for taking a look and for your patience so far.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-05-31 21:10       ` Jeff King
@ 2019-06-01  0:12         ` Matthew DeVore
  2019-06-03 12:34           ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Matthew DeVore @ 2019-06-01  0:12 UTC (permalink / raw)
  To: Jeff King
  Cc: Emily Shaffer, Matthew DeVore, jonathantanmy, jrn, git, dstolee,
	jeffhost, jrnieder, pclouds

On Fri, May 31, 2019 at 05:10:42PM -0400, Jeff King wrote:
> On Fri, May 31, 2019 at 01:48:21PM -0700, Matthew DeVore wrote:
> 
> > > > +static int digit_value(int c, struct strbuf *errbuf) {
> > > > +	if (c >= '0' && c <= '9')
> > > > +		return c - '0';
> > > > +	if (c >= 'a' && c <= 'f')
> > > > +		return c - 'a' + 10;
> > > > +	if (c >= 'A' && c <= 'F')
> > > > +		return c - 'A' + 10;
> > > 
> > > I'm sure there's something I'm missing here. But why are you manually
> > > decoding hex instead of using strtol or sscanf or something?
> > > 
> > 
> > I'll have to give this a try. Thank you for the suggestion.
> 
> Try our hex_to_bytes() helper (or if you really want to go low-level,
> your conditionals can be replaced by lookups in the hexval table).
> 
> -Peff

Using hex_to_bytes worked out quite nicely, thanks!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-06-01  0:12         ` Matthew DeVore
@ 2019-06-03 12:34           ` Jeff King
  2019-06-03 22:22             ` Matthew DeVore
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-06-03 12:34 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: Emily Shaffer, Matthew DeVore, jonathantanmy, jrn, git, dstolee,
	jeffhost, jrnieder, pclouds

On Fri, May 31, 2019 at 05:12:31PM -0700, Matthew DeVore wrote:

> On Fri, May 31, 2019 at 05:10:42PM -0400, Jeff King wrote:
> > On Fri, May 31, 2019 at 01:48:21PM -0700, Matthew DeVore wrote:
> > 
> > > > > +static int digit_value(int c, struct strbuf *errbuf) {
> > > > > +	if (c >= '0' && c <= '9')
> > > > > +		return c - '0';
> > > > > +	if (c >= 'a' && c <= 'f')
> > > > > +		return c - 'a' + 10;
> > > > > +	if (c >= 'A' && c <= 'F')
> > > > > +		return c - 'A' + 10;
> > > > 
> > > > I'm sure there's something I'm missing here. But why are you manually
> > > > decoding hex instead of using strtol or sscanf or something?
> > > > 
> > > 
> > > I'll have to give this a try. Thank you for the suggestion.
> > 
> > Try our hex_to_bytes() helper (or if you really want to go low-level,
> > your conditionals can be replaced by lookups in the hexval table).
> 
> Using hex_to_bytes worked out quite nicely, thanks!

Great. We might want to stop there, but it's possible could reuse even
more code. I didn't look closely before, but it seems this code is
decoding a URL. We already have a url_decode() routine in url.c. Could
it be reused?

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-05-31 20:53               ` Matthew DeVore
@ 2019-06-03 21:04                 ` Jeff Hostetler
  0 siblings, 0 replies; 41+ messages in thread
From: Jeff Hostetler @ 2019-06-03 21:04 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: Junio C Hamano, Matthew DeVore, jonathantanmy, jrn, git, dstolee,
	jeffhost, jrnieder, pclouds



On 5/31/2019 4:53 PM, Matthew DeVore wrote:
> On Thu, May 30, 2019 at 10:01:47AM -0400, Jeff Hostetler wrote:
>> BTW, I don't think I've seen this mentioned anywhere and I don't
>> remember if this got into the code or not.  But we discussed having
>> a repo-local config setting to remember the filter-spec used by the
>> partial clone that would be inherited by a subsequent (partial) fetch.
>> Or would be set by the first partial fetch following a normal clone.
>> Having a single composite filter spec would help with this.
> 
> Isn't that what the partial_clone_get_default_filter_spec function is for? I
> forgot about that. Perhaps with Emily's suggestion to use parsing functions in
> the C library and the other cleanups I've applied since the first roll-up, using
> the URL encoding will seem nicer. Let me try that...
> 

Yes, thanks.  That's what I was thinking about.

Jeff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-06-03 12:34           ` Jeff King
@ 2019-06-03 22:22             ` Matthew DeVore
  2019-06-04 16:13               ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Matthew DeVore @ 2019-06-03 22:22 UTC (permalink / raw)
  To: Jeff King
  Cc: Emily Shaffer, Matthew DeVore, jonathantanmy, jrn, git, dstolee,
	jeffhost, jrnieder, pclouds

On Mon, Jun 03, 2019 at 08:34:35AM -0400, Jeff King wrote:
> Great. We might want to stop there, but it's possible could reuse even
> more code. I didn't look closely before, but it seems this code is
> decoding a URL. We already have a url_decode() routine in url.c. Could
> it be reused?

Very nice. Here is an interdiff and the changes will be included in v3 of my
patchset:

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index ed02c88eb6..0f135602a7 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -1,19 +1,20 @@
 #include "cache.h"
 #include "commit.h"
 #include "config.h"
 #include "revision.h"
 #include "argv-array.h"
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
 #include "trace.h"
+#include "url.h"
 
 static int parse_combine_filter(
 	struct list_objects_filter_options *filter_options,
 	const char *arg,
 	struct strbuf *errbuf);
 
 /*
  * Parse value of the argument to the "filter" keyword.
  * On the command line this looks like:
  *       --filter=<arg>
@@ -84,54 +85,20 @@ static int gently_parse_list_objects_filter(
 	 * Please update _git_fetch() in git-completion.bash when you
 	 * add new filters
 	 */
 
 	strbuf_addf(errbuf, "invalid filter-spec '%s'", arg);
 
 	memset(filter_options, 0, sizeof(*filter_options));
 	return 1;
 }
 
-static int url_decode(struct strbuf *s, struct strbuf *errbuf)
-{
-	char *dest = s->buf;
-	char *src = s->buf;
-	size_t new_len;
-
-	while (*src) {
-		if (src[0] != '%') {
-			*dest++ = *src++;
-			continue;
-		}
-
-		if (hex_to_bytes((unsigned char *)dest, src + 1, 1)) {
-			strbuf_addstr(errbuf,
-				      "error in filter-spec - "
-				      "invalid hex sequence after %");
-			return 1;
-		}
-
-		if (!*dest) {
-			strbuf_addstr(errbuf,
-				      "error in filter-spec - unexpected %00");
-			return 1;
-		}
-
-		src += 3;
-		dest++;
-	}
-	new_len = dest - s->buf;
-	strbuf_remove(s, new_len, s->len - new_len);
-
-	return 0;
-}
-
 static const char *RESERVED_NON_WS = "~`!@#$^&*()[]{}\\;'\",<>?";
 
 static int has_reserved_character(
 	struct strbuf *sub_spec, struct strbuf *errbuf)
 {
 	const char *c = sub_spec->buf;
 	while (*c) {
 		if (*c <= ' ' || strchr(RESERVED_NON_WS, *c)) {
 			strbuf_addf(errbuf,
 				    "must escape char in sub-filter-spec: '%c'",
@@ -147,56 +114,57 @@ static int has_reserved_character(
 static int parse_combine_subfilter(
 	struct list_objects_filter_options *filter_options,
 	struct strbuf *subspec,
 	struct strbuf *errbuf)
 {
 	size_t new_index = filter_options->sub_nr;
 
 	ALLOC_GROW_BY(filter_options->sub, filter_options->sub_nr, 1,
 		      filter_options->sub_alloc);
 
-	return has_reserved_character(subspec, errbuf) ||
-		url_decode(subspec, errbuf) ||
-		gently_parse_list_objects_filter(
-			&filter_options->sub[new_index], subspec->buf, errbuf);
+	decoded = url_percent_decode(subspec->buf);
+
+	result = gently_parse_list_objects_filter(
+		&filter_options->sub[new_index], decoded, errbuf);
+
+	free(decoded);
+	return result;
 }
 
 static int parse_combine_filter(
 	struct list_objects_filter_options *filter_options,
 	const char *arg,
 	struct strbuf *errbuf)
 {
 	struct strbuf **subspecs = strbuf_split_str(arg, '+', 0);
 	size_t sub;
-	int result;
+	int result = 0;
 
 	if (!subspecs[0]) {
 		strbuf_addf(errbuf,
 			    _("expected something after combine:"));
 		result = 1;
 		goto cleanup;
 	}
 
-	for (sub = 0; subspecs[sub]; sub++) {
+	for (sub = 0; subspecs[sub] && !result; sub++) {
 		if (subspecs[sub + 1]) {
 			/*
 			 * This is not the last subspec. Remove trailing "+" so
 			 * we can parse it.
 			 */
 			size_t last = subspecs[sub]->len - 1;
 			assert(subspecs[sub]->buf[last] == '+');
 			strbuf_remove(subspecs[sub], last, 1);
 		}
 		result = parse_combine_subfilter(
 			filter_options, subspecs[sub], errbuf);
-		if (result)
-			goto cleanup;
 	}
 
 	filter_options->choice = LOFC_COMBINE;
 
 cleanup:
 	strbuf_list_free(subspecs);
 	if (result) {
 		list_objects_filter_release(filter_options);
 		memset(filter_options, 0, sizeof(*filter_options));
 	}
diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh
index 7fb5e50cde..e1bf3ed038 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -405,32 +405,20 @@ test_expect_success 'combine:... while URL-encoding things that should not be' '
 
 test_expect_success 'combine: with nothing after the :' '
 	expect_invalid_filter_spec combine: "expected something after combine:"
 '
 
 test_expect_success 'parse error in first sub-filter in combine:' '
 	expect_invalid_filter_spec combine:tree:asdf+blob:none \
 		"expected .tree:<depth>."
 '
 
-test_expect_success 'combine:... with invalid URL-encoded sequences' '
-	# Not enough hex chars
-	expect_invalid_filter_spec combine:tree:2+blob:non%a \
-		"error in filter-spec - invalid hex sequence after %" &&
-	# Non-hex digit after %
-	expect_invalid_filter_spec combine:tree:2+blob%G5none \
-		"error in filter-spec - invalid hex sequence after %" &&
-	# Null byte encoded by %
-	expect_invalid_filter_spec combine:tree:2+blob%00none \
-		"error in filter-spec - unexpected %00"
-'
-
 test_expect_success 'combine:... with non-encoded reserved chars' '
 	expect_invalid_filter_spec combine:tree:2+sparse:@xyz \
 		"must escape char in sub-filter-spec: .@." &&
 	expect_invalid_filter_spec combine:tree:2+sparse:\` \
 		"must escape char in sub-filter-spec: .\`." &&
 	expect_invalid_filter_spec combine:tree:2+sparse:~abc \
 		"must escape char in sub-filter-spec: .\~."
 '
 
 test_expect_success 'validate err msg for "combine:<valid-filter>+"' '
diff --git a/url.c b/url.c
index 25576c390b..bdede647bc 100644
--- a/url.c
+++ b/url.c
@@ -79,20 +79,26 @@ char *url_decode_mem(const char *url, int len)
 
 	/* Skip protocol part if present */
 	if (colon && url < colon) {
 		strbuf_add(&out, url, colon - url);
 		len -= colon - url;
 		url = colon;
 	}
 	return url_decode_internal(&url, len, NULL, &out, 0);
 }
 
+char *url_percent_decode(const char *encoded)
+{
+	struct strbuf out = STRBUF_INIT;
+	return url_decode_internal(&encoded, strlen(encoded), NULL, &out, 0);
+}
+
 char *url_decode_parameter_name(const char **query)
 {
 	struct strbuf out = STRBUF_INIT;
 	return url_decode_internal(query, -1, "&=", &out, 1);
 }
 
 char *url_decode_parameter_value(const char **query)
 {
 	struct strbuf out = STRBUF_INIT;
 	return url_decode_internal(query, -1, "&", &out, 1);
diff --git a/url.h b/url.h
index 00b7d58c33..2a27c34277 100644
--- a/url.h
+++ b/url.h
@@ -1,16 +1,24 @@
 #ifndef URL_H
 #define URL_H
 
 struct strbuf;
 
 int is_url(const char *url);
 int is_urlschemechar(int first_flag, int ch);
 char *url_decode(const char *url);
 char *url_decode_mem(const char *url, int len);
+
+/*
+ * Similar to the url_decode_{,mem} methods above, but doesn't assume there
+ * is a scheme followed by a : at the start of the string. Instead, %-sequences
+ * before any : are also parsed.
+ */
+char *url_percent_decode(const char *encoded);
+
 char *url_decode_parameter_name(const char **query);
 char *url_decode_parameter_value(const char **query);
 
 void end_url_with_slash(struct strbuf *buf, const char *url);
 void str_end_url_with_slash(const char *url, char **dest);
 
 #endif /* URL_H */

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-06-03 22:22             ` Matthew DeVore
@ 2019-06-04 16:13               ` Jeff King
  2019-06-04 17:19                 ` Matthew DeVore
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-06-04 16:13 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: Emily Shaffer, Matthew DeVore, jonathantanmy, jrn, git, dstolee,
	jeffhost, jrnieder, pclouds

On Mon, Jun 03, 2019 at 03:22:47PM -0700, Matthew DeVore wrote:

> On Mon, Jun 03, 2019 at 08:34:35AM -0400, Jeff King wrote:
> > Great. We might want to stop there, but it's possible could reuse even
> > more code. I didn't look closely before, but it seems this code is
> > decoding a URL. We already have a url_decode() routine in url.c. Could
> > it be reused?
> 
> Very nice. Here is an interdiff and the changes will be included in v3 of my
> patchset:

Nice to see a reduction in duplication (and I see you found some
problems in the existing code elsewhere; thanks for cleaning that up).

> -	return has_reserved_character(subspec, errbuf) ||
> -		url_decode(subspec, errbuf) ||
> -		gently_parse_list_objects_filter(
> -			&filter_options->sub[new_index], subspec->buf, errbuf);
> +	decoded = url_percent_decode(subspec->buf);

I think you can get rid of has_reserved_character() now, too.

The reserved character list is still used on the encoding side. But I
think you could switch to strbuf_add_urlencode() there?

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-06-04 16:13               ` Jeff King
@ 2019-06-04 17:19                 ` Matthew DeVore
  2019-06-04 18:51                   ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Matthew DeVore @ 2019-06-04 17:19 UTC (permalink / raw)
  To: Jeff King
  Cc: Emily Shaffer, Matthew DeVore, jonathantanmy, jrn, git, dstolee,
	jeffhost, jrnieder, pclouds

On Tue, Jun 04, 2019 at 12:13:32PM -0400, Jeff King wrote:
> > -	return has_reserved_character(subspec, errbuf) ||
> > -		url_decode(subspec, errbuf) ||
> > -		gently_parse_list_objects_filter(
> > -			&filter_options->sub[new_index], subspec->buf, errbuf);
> > +	decoded = url_percent_decode(subspec->buf);
> 
> I think you can get rid of has_reserved_character() now, too.

The purpose of has_reserved_character is to allow for future
extensibility if someone decides to implement a more sophisticated DSL
and give meaning to these characters. That may be a long-shot, but it
seems worth it.

> The reserved character list is still used on the encoding side. But I
> think you could switch to strbuf_add_urlencode() there?

strbuf_addstr_urlencode will either escape or not escape all rfc3986
reserved characters, and that set includes both : and +. The former
should not require escaping since it's a common character in filter
specs, and I would like the hand-encoded combine specs to be relatively
easy to type and read. The + must be escaped since it is used as part of
the combine:... syntax to delimit sub filters. So
strbuf_addstr_url_encode would have to be more customizable to make it
work for this context. I'd like to add a parameterizable should_escape
predicate (iow function pointer) which strbuf_addstr_urlencode accepts.
I actually think this will be more readable than the current strbuf API.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-06-04 17:19                 ` Matthew DeVore
@ 2019-06-04 18:51                   ` Jeff King
  2019-06-04 22:59                     ` Matthew DeVore
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-06-04 18:51 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: Emily Shaffer, Matthew DeVore, jonathantanmy, jrn, git, dstolee,
	jeffhost, jrnieder, pclouds

On Tue, Jun 04, 2019 at 10:19:52AM -0700, Matthew DeVore wrote:

> On Tue, Jun 04, 2019 at 12:13:32PM -0400, Jeff King wrote:
> > > -	return has_reserved_character(subspec, errbuf) ||
> > > -		url_decode(subspec, errbuf) ||
> > > -		gently_parse_list_objects_filter(
> > > -			&filter_options->sub[new_index], subspec->buf, errbuf);
> > > +	decoded = url_percent_decode(subspec->buf);
> > 
> > I think you can get rid of has_reserved_character() now, too.
> 
> The purpose of has_reserved_character is to allow for future
> extensibility if someone decides to implement a more sophisticated DSL
> and give meaning to these characters. That may be a long-shot, but it
> seems worth it.

I think you'll find that -Wunused-function complains, though, if nobody
is calling it. I wasn't sure if what you showed in the interdiff was
meant to be final (I had to add a few other variable declarations to
make it compile, too).

> > The reserved character list is still used on the encoding side. But I
> > think you could switch to strbuf_add_urlencode() there?
> 
> strbuf_addstr_urlencode will either escape or not escape all rfc3986
> reserved characters, and that set includes both : and +. The former
> should not require escaping since it's a common character in filter
> specs, and I would like the hand-encoded combine specs to be relatively
> easy to type and read. The + must be escaped since it is used as part of
> the combine:... syntax to delimit sub filters. So
> strbuf_addstr_url_encode would have to be more customizable to make it
> work for this context. I'd like to add a parameterizable should_escape
> predicate (iow function pointer) which strbuf_addstr_urlencode accepts.
> I actually think this will be more readable than the current strbuf API.

That makes some sense, and I agree that readability is a good goal. Do
we not need to be escaping colons in other URLs? Or are the strings
you're generating not true by-the-book URLs? I'm just wondering if we
could take this opportunity to improve the URLs we output elsewhere,
too.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-06-04 18:51                   ` Jeff King
@ 2019-06-04 22:59                     ` Matthew DeVore
  2019-06-04 23:14                       ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Matthew DeVore @ 2019-06-04 22:59 UTC (permalink / raw)
  To: Jeff King
  Cc: Emily Shaffer, Matthew DeVore, jonathantanmy, jrn, git, dstolee,
	jeffhost, jrnieder, pclouds

On Tue, Jun 04, 2019 at 02:51:08PM -0400, Jeff King wrote:
> > The purpose of has_reserved_character is to allow for future
> > extensibility if someone decides to implement a more sophisticated DSL
> > and give meaning to these characters. That may be a long-shot, but it
> > seems worth it.
> 
> I think you'll find that -Wunused-function complains, though, if nobody
> is calling it. I wasn't sure if what you showed in the interdiff was
> meant to be final (I had to add a few other variable declarations to
> make it compile, too).

Sorry, my last interdiff was a mess because I made a mistake during git rebase
-i. It was missing a call to has_reserved_char. Below is another diff that
fixes the problems:

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 0f135602a7..6b206dc58b 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -110,28 +110,31 @@ static int has_reserved_character(
 
 	return 0;
 }
 
 static int parse_combine_subfilter(
 	struct list_objects_filter_options *filter_options,
 	struct strbuf *subspec,
 	struct strbuf *errbuf)
 {
 	size_t new_index = filter_options->sub_nr;
+	char *decoded;
+	int result;
 
 	ALLOC_GROW_BY(filter_options->sub, filter_options->sub_nr, 1,
 		      filter_options->sub_alloc);
 
 	decoded = url_percent_decode(subspec->buf);
 
-	result = gently_parse_list_objects_filter(
-		&filter_options->sub[new_index], decoded, errbuf);
+	result = has_reserved_character(subspec, errbuf) ||
+		gently_parse_list_objects_filter(
+			&filter_options->sub[new_index], decoded, errbuf);
 
 	free(decoded);
 	return result;
 }
 
 static int parse_combine_filter(
 	struct list_objects_filter_options *filter_options,
 	const char *arg,
 	struct strbuf *errbuf)
 {

> > strbuf_addstr_urlencode will either escape or not escape all rfc3986
> > reserved characters, and that set includes both : and +. The former
> > should not require escaping since it's a common character in filter
> > specs, and I would like the hand-encoded combine specs to be relatively
> > easy to type and read. The + must be escaped since it is used as part of
> > the combine:... syntax to delimit sub filters. So
> > strbuf_addstr_url_encode would have to be more customizable to make it
> > work for this context. I'd like to add a parameterizable should_escape
> > predicate (iow function pointer) which strbuf_addstr_urlencode accepts.
> > I actually think this will be more readable than the current strbuf API.
> 
> That makes some sense, and I agree that readability is a good goal. Do
> we not need to be escaping colons in other URLs? Or are the strings
> you're generating not true by-the-book URLs? I'm just wondering if we
> could take this opportunity to improve the URLs we output elsewhere,
> too.

The strings I'm generating are not URLs. Also, in http.c, we have to use : to
delimit a username and password:

	strbuf_addstr_urlencode(&s, proxy_auth.username, 1);
	strbuf_addch(&s, ':');
	strbuf_addstr_urlencode(&s, proxy_auth.password, 1);

I think this is dictated by libcurl and is not flexible.

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-06-04 22:59                     ` Matthew DeVore
@ 2019-06-04 23:14                       ` Jeff King
  2019-06-04 23:49                         ` Matthew DeVore
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-06-04 23:14 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: Emily Shaffer, Matthew DeVore, jonathantanmy, jrn, git, dstolee,
	jeffhost, jrnieder, pclouds

On Tue, Jun 04, 2019 at 03:59:21PM -0700, Matthew DeVore wrote:

> > I think you'll find that -Wunused-function complains, though, if nobody
> > is calling it. I wasn't sure if what you showed in the interdiff was
> > meant to be final (I had to add a few other variable declarations to
> > make it compile, too).
> 
> Sorry, my last interdiff was a mess because I made a mistake during git rebase
> -i. It was missing a call to has_reserved_char. Below is another diff that
> fixes the problems:

Ah, OK, that makes sense then (and keeping the function is obviously the
right thing to do).

> > That makes some sense, and I agree that readability is a good goal. Do
> > we not need to be escaping colons in other URLs? Or are the strings
> > you're generating not true by-the-book URLs? I'm just wondering if we
> > could take this opportunity to improve the URLs we output elsewhere,
> > too.
> 
> The strings I'm generating are not URLs. Also, in http.c, we have to use : to
> delimit a username and password:
> 
> 	strbuf_addstr_urlencode(&s, proxy_auth.username, 1);
> 	strbuf_addch(&s, ':');
> 	strbuf_addstr_urlencode(&s, proxy_auth.password, 1);
> 
> I think this is dictated by libcurl and is not flexible.

Right, that has to be a real colon because it's syntactically
significant (but a colon in the username _must_ be encoded). That strbuf
function doesn't really understand whole URLs, and it's up to the caller
to assemble the parts.

Anyway, we've veered off of your patch series enough. Yeah, it sounds
like using strbuf's url-encoding is not quite what you want.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-06-04 23:14                       ` Jeff King
@ 2019-06-04 23:49                         ` Matthew DeVore
  2019-06-09 12:36                           ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Matthew DeVore @ 2019-06-04 23:49 UTC (permalink / raw)
  To: Jeff King
  Cc: Emily Shaffer, Matthew DeVore, jonathantanmy, jrn, git, dstolee,
	jeffhost, jrnieder, pclouds

On Tue, Jun 04, 2019 at 07:14:18PM -0400, Jeff King wrote:
> Right, that has to be a real colon because it's syntactically
> significant (but a colon in the username _must_ be encoded). That strbuf
> function doesn't really understand whole URLs, and it's up to the caller
> to assemble the parts.
> 
> Anyway, we've veered off of your patch series enough. Yeah, it sounds
> like using strbuf's url-encoding is not quite what you want.

I tried to do it anyway :) I think this makes the strbuf API a bit easier to
reason about, and strbuf.h is a bit more self-documenting. WDYT?

diff --git a/credential-store.c b/credential-store.c
index ac295420dd..c010497cb2 100644
--- a/credential-store.c
+++ b/credential-store.c
@@ -65,29 +65,30 @@ static void rewrite_credential_file(const char *fn, struct credential *c,
 	parse_credential_file(fn, c, NULL, print_line);
 	if (commit_lock_file(&credential_lock) < 0)
 		die_errno("unable to write credential store");
 }
 
 static void store_credential_file(const char *fn, struct credential *c)
 {
 	struct strbuf buf = STRBUF_INIT;
 
 	strbuf_addf(&buf, "%s://", c->protocol);
-	strbuf_addstr_urlencode(&buf, c->username, 1);
+	strbuf_addstr_urlencode(&buf, c->username, is_rfc3986_unreserved);
 	strbuf_addch(&buf, ':');
-	strbuf_addstr_urlencode(&buf, c->password, 1);
+	strbuf_addstr_urlencode(&buf, c->password, is_rfc3986_unreserved);
 	strbuf_addch(&buf, '@');
 	if (c->host)
-		strbuf_addstr_urlencode(&buf, c->host, 1);
+		strbuf_addstr_urlencode(&buf, c->host, is_rfc3986_unreserved);
 	if (c->path) {
 		strbuf_addch(&buf, '/');
-		strbuf_addstr_urlencode(&buf, c->path, 0);
+		strbuf_addstr_urlencode(&buf, c->path,
+					is_rfc3986_reserved_or_unreserved);
 	}
 
 	rewrite_credential_file(fn, c, &buf);
 	strbuf_release(&buf);
 }
 
 static void store_credential(const struct string_list *fns, struct credential *c)
 {
 	struct string_list_item *fn;
 
diff --git a/http.c b/http.c
index 27aa0a3192..938b9e55af 100644
--- a/http.c
+++ b/http.c
@@ -506,23 +506,25 @@ static void var_override(const char **var, char *value)
 static void set_proxyauth_name_password(CURL *result)
 {
 #if LIBCURL_VERSION_NUM >= 0x071301
 		curl_easy_setopt(result, CURLOPT_PROXYUSERNAME,
 			proxy_auth.username);
 		curl_easy_setopt(result, CURLOPT_PROXYPASSWORD,
 			proxy_auth.password);
 #else
 		struct strbuf s = STRBUF_INIT;
 
-		strbuf_addstr_urlencode(&s, proxy_auth.username, 1);
+		strbuf_addstr_urlencode(&s, proxy_auth.username,
+					is_rfc3986_unreserved);
 		strbuf_addch(&s, ':');
-		strbuf_addstr_urlencode(&s, proxy_auth.password, 1);
+		strbuf_addstr_urlencode(&s, proxy_auth.password,
+					is_rfc3986_unreserved);
 		curl_proxyuserpwd = strbuf_detach(&s, NULL);
 		curl_easy_setopt(result, CURLOPT_PROXYUSERPWD, curl_proxyuserpwd);
 #endif
 }
 
 static void init_curl_proxy_auth(CURL *result)
 {
 	if (proxy_auth.username) {
 		if (!proxy_auth.password)
 			credential_fill(&proxy_auth);
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 6b206dc58b..9a5677c2c8 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -167,30 +167,25 @@ static int parse_combine_filter(
 
 cleanup:
 	strbuf_list_free(subspecs);
 	if (result) {
 		list_objects_filter_release(filter_options);
 		memset(filter_options, 0, sizeof(*filter_options));
 	}
 	return result;
 }
 
-static void add_url_encoded(struct strbuf *dest, const char *s)
+static int allow_unencoded(char ch)
 {
-	while (*s) {
-		if (*s <= ' ' || strchr(RESERVED_NON_WS, *s) ||
-			*s == '%' || *s == '+')
-			strbuf_addf(dest, "%%%02X", (int)*s);
-		else
-			strbuf_addf(dest, "%c", *s);
-		s++;
-	}
+	if (ch <= ' ' || ch == '%' || ch == '+')
+		return 0;
+	return !strchr(RESERVED_NON_WS, ch);
 }
 
 /*
  * Changes filter_options into an equivalent LOFC_COMBINE filter options
  * instance. Does not do anything if filter_options is already LOFC_COMBINE.
  */
 static void transform_to_combine_type(
 	struct list_objects_filter_options *filter_options)
 {
 	assert(filter_options->choice);
@@ -202,22 +197,23 @@ static void transform_to_combine_type(
 			xcalloc(initial_sub_alloc, sizeof(*sub_array));
 		sub_array[0] = *filter_options;
 		memset(filter_options, 0, sizeof(*filter_options));
 		filter_options->sub = sub_array;
 		filter_options->sub_alloc = initial_sub_alloc;
 	}
 	filter_options->sub_nr = 1;
 	filter_options->choice = LOFC_COMBINE;
 	strbuf_init(&filter_options->filter_spec, 0);
 	strbuf_addstr(&filter_options->filter_spec, "combine:");
-	add_url_encoded(&filter_options->filter_spec,
-			filter_options->sub[0].filter_spec.buf);
+	strbuf_addstr_urlencode(&filter_options->filter_spec,
+				filter_options->sub[0].filter_spec.buf,
+				allow_unencoded);
 	/*
 	 * We don't need the filter_spec strings for subfilter specs, only the
 	 * top level.
 	 */
 	strbuf_release(&filter_options->sub[0].filter_spec);
 }
 
 void list_objects_filter_die_if_populated(
 	struct list_objects_filter_options *filter_options)
 {
@@ -239,21 +235,22 @@ void parse_list_objects_filter(
 		parse_error = gently_parse_list_objects_filter(
 			filter_options, arg, &errbuf);
 	} else {
 		/*
 		 * Make filter_options an LOFC_COMBINE spec so we can trivially
 		 * add subspecs to it.
 		 */
 		transform_to_combine_type(filter_options);
 
 		strbuf_addstr(&filter_options->filter_spec, "+");
-		add_url_encoded(&filter_options->filter_spec, arg);
+		strbuf_addstr_urlencode(&filter_options->filter_spec, arg,
+					allow_unencoded);
 		trace_printf("Generated composite filter-spec: %s\n",
 			     filter_options->filter_spec.buf);
 		ALLOC_GROW_BY(filter_options->sub, filter_options->sub_nr, 1,
 			      filter_options->sub_alloc);
 
 		parse_error = gently_parse_list_objects_filter(
 			&filter_options->sub[filter_options->sub_nr - 1], arg,
 			&errbuf);
 	}
 	if (parse_error)
diff --git a/strbuf.c b/strbuf.c
index 0e18b259ce..60ab5144f2 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -767,55 +767,56 @@ void strbuf_addstr_xml_quoted(struct strbuf *buf, const char *s)
 		case '&':
 			strbuf_addstr(buf, "&amp;");
 			break;
 		case 0:
 			return;
 		}
 		s++;
 	}
 }
 
-static int is_rfc3986_reserved(char ch)
+int is_rfc3986_reserved_or_unreserved(char ch)
 {
+	if (is_rfc3986_unreserved(ch))
+		return 1;
 	switch (ch) {
 		case '!': case '*': case '\'': case '(': case ')': case ';':
 		case ':': case '@': case '&': case '=': case '+': case '$':
 		case ',': case '/': case '?': case '#': case '[': case ']':
 			return 1;
 	}
 	return 0;
 }
 
-static int is_rfc3986_unreserved(char ch)
+int is_rfc3986_unreserved(char ch)
 {
 	return isalnum(ch) ||
 		ch == '-' || ch == '_' || ch == '.' || ch == '~';
 }
 
 static void strbuf_add_urlencode(struct strbuf *sb, const char *s, size_t len,
-				 int reserved)
+				 char_predicate allow_unencoded_fn)
 {
 	strbuf_grow(sb, len);
 	while (len--) {
 		char ch = *s++;
-		if (is_rfc3986_unreserved(ch) ||
-		    (!reserved && is_rfc3986_reserved(ch)))
+		if (allow_unencoded_fn(ch))
 			strbuf_addch(sb, ch);
 		else
 			strbuf_addf(sb, "%%%02x", (unsigned char)ch);
 	}
 }
 
 void strbuf_addstr_urlencode(struct strbuf *sb, const char *s,
-			     int reserved)
+			     char_predicate allow_unencoded_fn)
 {
-	strbuf_add_urlencode(sb, s, strlen(s), reserved);
+	strbuf_add_urlencode(sb, s, strlen(s), allow_unencoded_fn);
 }
 
 void strbuf_humanise_bytes(struct strbuf *buf, off_t bytes)
 {
 	if (bytes > 1 << 30) {
 		strbuf_addf(buf, "%u.%2.2u GiB",
 			    (unsigned)(bytes >> 30),
 			    (unsigned)(bytes & ((1 << 30) - 1)) / 10737419);
 	} else if (bytes > 1 << 20) {
 		unsigned x = bytes + 5243;  /* for rounding */
diff --git a/strbuf.h b/strbuf.h
index c8d98dfb95..346d722492 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -659,22 +659,27 @@ void strbuf_branchname(struct strbuf *sb, const char *name,
 		       unsigned allowed);
 
 /*
  * Like strbuf_branchname() above, but confirm that the result is
  * syntactically valid to be used as a local branch name in refs/heads/.
  *
  * The return value is "0" if the result is valid, and "-1" otherwise.
  */
 int strbuf_check_branch_ref(struct strbuf *sb, const char *name);
 
+typedef int (*char_predicate)(char ch);
+
+int is_rfc3986_unreserved(char ch);
+int is_rfc3986_reserved_or_unreserved(char ch);
+
 void strbuf_addstr_urlencode(struct strbuf *sb, const char *name,
-			     int reserved);
+			     char_predicate allow_unencoded_fn);
 
 __attribute__((format (printf,1,2)))
 int printf_ln(const char *fmt, ...);
 __attribute__((format (printf,2,3)))
 int fprintf_ln(FILE *fp, const char *fmt, ...);
 
 char *xstrdup_tolower(const char *);
 char *xstrdup_toupper(const char *);
 
 /**

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 0/5] Filter combination
  2019-05-22  0:21 [PATCH v1 0/5] Filter combination Matthew DeVore
                   ` (4 preceding siblings ...)
  2019-05-22  0:21 ` [PATCH v1 5/5] list-objects-filter-options: allow mult. --filter Matthew DeVore
@ 2019-06-06 22:44 ` Matthew DeVore
  5 siblings, 0 replies; 41+ messages in thread
From: Matthew DeVore @ 2019-06-06 22:44 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: jonathantanmy, jrn, git, dstolee, jeffhost, jrnieder, pclouds

If you are looking for the latest version of this patchset, it is here:

https://public-inbox.org/git/20190601003603.90794-1-matvore@google.com/

But also has these interdiffs:

https://public-inbox.org/git/20190604234951.GB43275@comcast.net/#t
https://public-inbox.org/git/20190603222247.GG4641@comcast.net/
https://public-inbox.org/git/20190604225921.GA43275@comcast.net/

Sorry for not making v2 a response in this thread. Later versions will be
posted as responses to v2 (the first link).

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
  2019-06-04 23:49                         ` Matthew DeVore
@ 2019-06-09 12:36                           ` Jeff King
  0 siblings, 0 replies; 41+ messages in thread
From: Jeff King @ 2019-06-09 12:36 UTC (permalink / raw)
  To: Matthew DeVore
  Cc: Emily Shaffer, Matthew DeVore, jonathantanmy, jrn, git, dstolee,
	jeffhost, jrnieder, pclouds

On Tue, Jun 04, 2019 at 04:49:51PM -0700, Matthew DeVore wrote:

> I tried to do it anyway :) I think this makes the strbuf API a bit easier to
> reason about, and strbuf.h is a bit more self-documenting. WDYT?
>
> [...]
>
> +typedef int (*char_predicate)(char ch);
> +
> +int is_rfc3986_unreserved(char ch);
> +int is_rfc3986_reserved_or_unreserved(char ch);
> +
>  void strbuf_addstr_urlencode(struct strbuf *sb, const char *name,
> -			     int reserved);
> +			     char_predicate allow_unencoded_fn);

Yeah, that seems reasonable. I worry slightly about adding function-call
overhead to something that's processing a string character-by-character,
but these strings tend to be short and infrequent.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2019-06-09 12:36 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-22  0:21 [PATCH v1 0/5] Filter combination Matthew DeVore
2019-05-22  0:21 ` [PATCH v1 1/5] list-objects-filter: refactor into a context struct Matthew DeVore
2019-05-24  0:49   ` Emily Shaffer
2019-05-28 18:48     ` Matthew DeVore
2019-05-28 22:40       ` [PATCH] list-objects-filter: merge filter data structs Matthew DeVore
2019-05-29 19:48         ` Junio C Hamano
2019-05-29 20:57           ` Jeff Hostetler
2019-05-29 23:10             ` Matthew DeVore
2019-05-30  1:56             ` [RFC PATCH v2] " Matthew DeVore
2019-05-30 16:12               ` Junio C Hamano
2019-05-30 18:29                 ` Matthew DeVore
2019-05-30 19:05             ` [PATCH] " Matthew DeVore
2019-05-22  0:21 ` [PATCH v1 2/5] list-objects-filter-options: error is localizeable Matthew DeVore
2019-05-24  0:55   ` Emily Shaffer
2019-05-28 23:01     ` Matthew DeVore
2019-05-22  0:21 ` [PATCH v1 3/5] list-objects-filter: implement composite filters Matthew DeVore
2019-05-24 21:01   ` Jeff Hostetler
2019-05-28 17:59     ` Junio C Hamano
2019-05-29 15:02       ` Matthew DeVore
2019-05-29 21:29         ` Jeff Hostetler
2019-05-29 23:27           ` Matthew DeVore
2019-05-30 14:01             ` Jeff Hostetler
2019-05-31 20:53               ` Matthew DeVore
2019-06-03 21:04                 ` Jeff Hostetler
2019-06-01  0:11     ` Matthew DeVore
2019-05-28 21:53   ` Emily Shaffer
2019-05-31 20:48     ` Matthew DeVore
2019-05-31 21:10       ` Jeff King
2019-06-01  0:12         ` Matthew DeVore
2019-06-03 12:34           ` Jeff King
2019-06-03 22:22             ` Matthew DeVore
2019-06-04 16:13               ` Jeff King
2019-06-04 17:19                 ` Matthew DeVore
2019-06-04 18:51                   ` Jeff King
2019-06-04 22:59                     ` Matthew DeVore
2019-06-04 23:14                       ` Jeff King
2019-06-04 23:49                         ` Matthew DeVore
2019-06-09 12:36                           ` Jeff King
2019-05-22  0:21 ` [PATCH v1 4/5] list-objects-filter-options: move error check up Matthew DeVore
2019-05-22  0:21 ` [PATCH v1 5/5] list-objects-filter-options: allow mult. --filter Matthew DeVore
2019-06-06 22:44 ` [PATCH v1 0/5] Filter combination Matthew DeVore

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).