git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: matvore@google.com
Cc: git@vger.kernel.org, Jonathan Tan <jonathantanmy@google.com>
Subject: Re: [PATCH v4 04/10] list-objects-filter: implement composite filters
Date: Fri, 21 Jun 2019 17:26:26 -0700	[thread overview]
Message-ID: <20190622002626.245441-1-jonathantanmy@google.com> (raw)
In-Reply-To: <47a2680875e6f68fbf1f2e5a5a2630d263cdf426.1560558910.git.matvore@google.com>

> Allow combining filters such that only objects accepted by all filters
> are shown. The motivation for this is to allow getting directory
> listings without also fetching blobs. This can be done by combining
> blob:none with tree:<depth>. There are massive repositories that have
> larger-than-expected trees - even if you include only a single commit.

First of all, patches 2 and 3 are straightforward and LGTM. On to patch
4...

[snip]

> The current usage requires passing the filter to rev-list in the
> following form:
> 
> 	--filter=<FILTER1> --filter=<FILTER2> ...
> 
> Such usage is currently an error, so giving it a meaning is backwards-
> compatible.
> 
> The URL-encoding scheme is being introduced before the repeated flag
> logic, and the user-facing documentation for URL-encoding is being
> withheld until the repeated flag feature is implemented. The
> URL-encoding is in general not meant to be used directly by the user,
> and it is better to describe the URL-encoding feature in terms of the
> repeated flag.

As of this commit, we don't support such arguments passed to rev-list in
this way, so I would write these paragraphs as:

  A combined filter supports any number of subfilters, and is written in
  the following form:

    combine:<filter 1>+<filter 2>+<filter 3>

  Certain non-alphanumeric characters in each filter must be
  URL-encoded.

  For now, combined filters must be specified in this form. In a
  subsequent commit, rev-list will support multiple --filter arguments
  which will have the same effect as specifying one filter argument
  starting with "combine:".

> Helped-by: Emily Shaffer <emilyshaffer@google.com>
> Helped-by: Jeff Hostetler <git@jeffhostetler.com>
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Matthew DeVore <matvore@google.com>
> ---
>  list-objects-filter-options.c       | 106 ++++++++++++++++++-
>  list-objects-filter-options.h       |  17 ++-
>  list-objects-filter.c               | 159 ++++++++++++++++++++++++++++
>  t/t6112-rev-list-filters-objects.sh | 151 +++++++++++++++++++++++++-
>  url.c                               |   6 ++
>  url.h                               |   8 ++
>  6 files changed, 441 insertions(+), 6 deletions(-)
> 
> @@ -28,22 +34,20 @@ static int gently_parse_list_objects_filter(
>  	struct strbuf *errbuf)
>  {
>  	const char *v0;
>  
>  	if (filter_options->choice) {
>  		strbuf_addstr(
>  			errbuf, _("multiple filter-specs cannot be combined"));
>  		return 1;
>  	}
>  
> -	filter_options->filter_spec = strdup(arg);
> -

This line has been removed from gently_parse_list_objects_filter()
because this function gains another caller that does not need it.
To compensate, this line has been added to both its existing callers.

> @@ -31,27 +32,37 @@ struct list_objects_filter_options {
>  	 * the filtering algorithm to use.
>  	 */
>  	enum list_objects_filter_choice choice;
>  
>  	/*
>  	 * Choice is LOFC_DISABLED because "--no-filter" was requested.
>  	 */
>  	unsigned int no_filter : 1;
>  
>  	/*
> -	 * Parsed values (fields) from within the filter-spec.  These are
> -	 * choice-specific; not all values will be defined for any given
> -	 * choice.
> +	 * BEGIN choice-specific parsed values from within the filter-spec. Only
> +	 * some values will be defined for any given choice.
>  	 */
> +
>  	struct object_id *sparse_oid_value;
>  	unsigned long blob_limit_value;
>  	unsigned long tree_exclude_depth;
> +
> +	/* LOFC_COMBINE values */
> +
> +	/* This array contains all the subfilters which this filter combines. */
> +	size_t sub_nr, sub_alloc;
> +	struct list_objects_filter_options *sub;
> +
> +	/*
> +	 * END choice-specific parsed values.
> +	 */
>  };

I still think it's cleaner to just have a "left subfilter" and "right
subfilter", but I don't feel strongly about it. In any case, this is an
internal detail and can always be changed in the future.

> +	/*
> +	 * Optional. If this function is supplied and the filter needs to
> +	 * collect omits, then this function is called once before free_fn is
> +	 * called.
> +	 */
> +	void (*finalize_omits_fn)(struct oidset *omits, void *filter_data);

This is needed because a combined filter's omits actually lie in the
subfilters. Resolving it this way means that callers must call
list_objects_filter__free() before using the omits set. Can you add
documentation to __init() (which is the first function to take in the
omits set) and __free() describing this?

(As stated in the test below, we cannot just share one omits set amongst
all the subfilters - see filter_trees_update_omits and the call site
that relies on its return value.)

Here comes the tricky part...

> +static int should_delegate(enum list_objects_filter_situation filter_situation,
> +			   struct object *obj,
> +			   struct subfilter *sub)
> +{
> +	if (!sub->is_skipping_tree)
> +		return 1;
> +	if (filter_situation == LOFS_END_TREE &&
> +		oideq(&obj->oid, &sub->skip_tree)) {
> +		sub->is_skipping_tree = 0;
> +		return 1;
> +	}
> +	return 0;
> +}

Optional: I think this should be called "test_and_set_skip_tree" or
something like that, made to return the inverse of its current return
value, and documented:

  Returns the value of sub->is_skipping_tree at the moment of
  invocation. If iteration is at the LOFS_END_TREE of the tree currently
  being skipped, first clears sub->is_skipping_tree before returning.

> +static enum list_objects_filter_result process_subfilter(
> +	struct repository *r,
> +	enum list_objects_filter_situation filter_situation,
> +	struct object *obj,
> +	const char *pathname,
> +	const char *filename,
> +	struct subfilter *sub)
> +{
> +	enum list_objects_filter_result result;
> +
> +	/*
> +	 * Check should_delegate before oidset_contains so that
> +	 * is_skipping_tree gets unset even when the object is marked as seen.
> +	 * As of this writing, no filter uses LOFR_MARK_SEEN on trees that also
> +	 * uses LOFR_SKIP_TREE, so the ordering is only theoretically
> +	 * important. Be cautious if you change the order of the below checks
> +	 * and more filters have been added!
> +	 */
> +	if (!should_delegate(filter_situation, obj, sub))
> +		return LOFR_ZERO;
> +	if (oidset_contains(&sub->seen, &obj->oid))
> +		return LOFR_ZERO;
> +
> +	result = list_objects_filter__filter_object(
> +		r, filter_situation, obj, pathname, filename, sub->filter);
> +
> +	if (result & LOFR_MARK_SEEN)
> +		oidset_insert(&sub->seen, &obj->oid);
> +
> +	if (result & LOFR_SKIP_TREE) {
> +		sub->is_skipping_tree = 1;
> +		sub->skip_tree = obj->oid;
> +	}
> +
> +	return result;
> +}

Looks good.

> +static enum list_objects_filter_result filter_combine(
> +	struct repository *r,
> +	enum list_objects_filter_situation filter_situation,
> +	struct object *obj,
> +	const char *pathname,
> +	const char *filename,
> +	struct oidset *omits,
> +	void *filter_data)
> +{
> +	struct combine_filter_data *d = filter_data;
> +	enum list_objects_filter_result combined_result =
> +		LOFR_DO_SHOW | LOFR_MARK_SEEN | LOFR_SKIP_TREE;
> +	size_t sub;
> +
> +	for (sub = 0; sub < d->nr; sub++) {
> +		enum list_objects_filter_result sub_result = process_subfilter(
> +			r, filter_situation, obj, pathname, filename,
> +			&d->sub[sub]);
> +		if (!(sub_result & LOFR_DO_SHOW))
> +			combined_result &= ~LOFR_DO_SHOW;
> +		if (!(sub_result & LOFR_MARK_SEEN))
> +			combined_result &= ~LOFR_MARK_SEEN;
> +		if (!d->sub[sub].is_skipping_tree)
> +			combined_result &= ~LOFR_SKIP_TREE;
> +	}
> +
> +	return combined_result;
> +}

And also looks good. Might be confusing for tree skipping to be
communicated through is_skipping_tree instead of the return value, but
is_skipping_tree needs to be set anyway for other reasons, so that's
convenient.

  parent reply	other threads:[~2019-06-22  0:26 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-01  0:35 [PATCH v2 0/9] Filter combination Matthew DeVore
2019-06-01  0:35 ` [PATCH v2 1/9] list-objects-filter: make API easier to use Matthew DeVore
2019-06-01  0:35 ` [PATCH v2 2/9] list-objects-filter: put omits set in filter struct Matthew DeVore
2019-06-01  0:35 ` [PATCH v2 3/9] list-objects-filter-options: always supply *errbuf Matthew DeVore
2019-06-01  0:35 ` [PATCH v2 4/9] list-objects-filter: implement composite filters Matthew DeVore
2019-06-03 21:51   ` Jeff Hostetler
2019-06-06 22:32     ` Matthew DeVore
2019-06-07 17:58       ` Jeff Hostetler
2019-06-01  0:35 ` [PATCH v2 5/9] list-objects-filter-options: move error check up Matthew DeVore
2019-06-01  0:36 ` [PATCH v2 6/9] list-objects-filter-options: make filter_spec a strbuf Matthew DeVore
2019-06-10 20:13   ` Junio C Hamano
2019-06-11  0:34     ` Matthew DeVore
2019-06-11 17:33       ` Junio C Hamano
2019-06-11 18:44         ` Matthew DeVore
2019-06-11 21:34           ` Matthew DeVore
2019-06-11 21:48           ` Junio C Hamano
2019-06-12  0:37             ` Matthew DeVore
2019-06-12 14:55               ` Matthew DeVore
2019-06-01  0:36 ` [PATCH v2 7/9] list-objects-filter-options: allow mult. --filter Matthew DeVore
2019-06-01  0:36 ` [PATCH v2 8/9] list-objects-filter-options: clean up use of ALLOC_GROW Matthew DeVore
2019-06-03 22:07   ` Jacob Keller
2019-06-03 22:39     ` Matthew DeVore
2019-06-04  3:16       ` Jacob Keller
2019-06-01  0:36 ` [PATCH v2 9/9] list-objects-filter-options: make parser void Matthew DeVore
2019-06-03 21:35 ` [PATCH v2 0/9] Filter combination Jeff Hostetler
2019-06-13 21:51 ` [PATCH v3 00/10] " Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 01/10] list-objects-filter: make API easier to use Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 02/10] list-objects-filter: put omits set in filter struct Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 03/10] list-objects-filter-options: always supply *errbuf Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 04/10] list-objects-filter: implement composite filters Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 05/10] list-objects-filter-options: move error check up Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 06/10] list-objects-filter-options: make filter_spec a string_list Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 07/10] strbuf: give URL-encoding API a char predicate fn Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 08/10] list-objects-filter-options: allow mult. --filter Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 09/10] list-objects-filter-options: clean up use of ALLOC_GROW Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 10/10] list-objects-filter-options: make parser void Matthew DeVore
2019-06-14 19:50   ` [PATCH v3 00/10] Filter combination Junio C Hamano
2019-06-15  0:40 ` [PATCH v4 " Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 01/10] list-objects-filter: make API easier to use Matthew DeVore
2019-06-21 22:58     ` Jonathan Tan
2019-06-27  0:46       ` Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 02/10] list-objects-filter: put omits set in filter struct Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 03/10] list-objects-filter-options: always supply *errbuf Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 04/10] list-objects-filter: implement composite filters Matthew DeVore
2019-06-18  8:42     ` Johannes Schindelin
2019-06-18 20:22       ` Matthew DeVore
2019-06-21 18:17         ` Johannes Schindelin
2019-06-22  0:26     ` Jonathan Tan [this message]
2019-06-27 21:12       ` Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 05/10] list-objects-filter-options: move error check up Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 06/10] list-objects-filter-options: make filter_spec a string_list Matthew DeVore
2019-06-22  0:37     ` Jonathan Tan
2019-06-27 21:17       ` Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 07/10] strbuf: give URL-encoding API a char predicate fn Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 08/10] list-objects-filter-options: allow mult. --filter Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 09/10] list-objects-filter-options: clean up use of ALLOC_GROW Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 10/10] list-objects-filter-options: make parser void Matthew DeVore
2019-06-22  0:46     ` Jonathan Tan
2019-06-27 21:24       ` Matthew DeVore
2019-06-27 22:27         ` Matthew DeVore
2019-06-18  1:25   ` [PATCH v4 00/10] Filter combination Junio C Hamano
2019-06-27 22:54 ` [PATCH v5 " Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 01/10] list-objects-filter: encapsulate filter components Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 02/10] list-objects-filter: put omits set in filter struct Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 03/10] list-objects-filter-options: always supply *errbuf Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 04/10] list-objects-filter: implement composite filters Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 05/10] list-objects-filter-options: move error check up Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 06/10] list-objects-filter-options: make filter_spec a string_list Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 07/10] strbuf: give URL-encoding API a char predicate fn Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 08/10] list-objects-filter-options: allow mult. --filter Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 09/10] list-objects-filter-options: clean up use of ALLOC_GROW Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 10/10] list-objects-filter-options: make parser void Matthew DeVore
2019-06-28 16:05   ` [PATCH v5 00/10] Filter combination Junio C Hamano
2019-06-28 17:16   ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190622002626.245441-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=matvore@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).