git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Matthew DeVore <matvore@comcast.net>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: matvore@google.com, git@vger.kernel.org
Subject: Re: [PATCH v4 04/10] list-objects-filter: implement composite filters
Date: Thu, 27 Jun 2019 14:12:47 -0700	[thread overview]
Message-ID: <20190627211247.GB54617@comcast.net> (raw)
In-Reply-To: <20190622002626.245441-1-jonathantanmy@google.com>

On Fri, Jun 21, 2019 at 05:26:26PM -0700, Jonathan Tan wrote:
> > Allow combining filters such that only objects accepted by all filters
> > are shown. The motivation for this is to allow getting directory
> > listings without also fetching blobs. This can be done by combining
> > blob:none with tree:<depth>. There are massive repositories that have
> > larger-than-expected trees - even if you include only a single commit.
> 
> First of all, patches 2 and 3 are straightforward and LGTM. On to patch
> 4...
> 
> [snip]
> 
> > The current usage requires passing the filter to rev-list in the
> > following form:
> > 
> > 	--filter=<FILTER1> --filter=<FILTER2> ...
> > 
> > Such usage is currently an error, so giving it a meaning is backwards-
> > compatible.
> > 
> > The URL-encoding scheme is being introduced before the repeated flag
> > logic, and the user-facing documentation for URL-encoding is being
> > withheld until the repeated flag feature is implemented. The
> > URL-encoding is in general not meant to be used directly by the user,
> > and it is better to describe the URL-encoding feature in terms of the
> > repeated flag.
> 
> As of this commit, we don't support such arguments passed to rev-list in
> this way, so I would write these paragraphs as:
> 
>   A combined filter supports any number of subfilters, and is written in
>   the following form:
> 
>     combine:<filter 1>+<filter 2>+<filter 3>
> 
>   Certain non-alphanumeric characters in each filter must be
>   URL-encoded.
> 
>   For now, combined filters must be specified in this form. In a
>   subsequent commit, rev-list will support multiple --filter arguments
>   which will have the same effect as specifying one filter argument
>   starting with "combine:".

Done, but I've amended your last paragraph to include the excuse about the
missing documentation:

    For now, combined filters must be specified in this form. In a
    subsequent commit, rev-list will support multiple --filter arguments
    which will have the same effect as specifying one filter argument
    starting with "combine:". The documentation will be updated in that
    commit, as the URL-encoding scheme is in general not meant to be used
    directly by the user, and it is better to describe the URL-encoding
    feature in terms of the repeated flag.

> > +
> > +	/* LOFC_COMBINE values */
> > +
> > +	/* This array contains all the subfilters which this filter combines. */
> > +	size_t sub_nr, sub_alloc;
> > +	struct list_objects_filter_options *sub;
> > +
> > +	/*
> > +	 * END choice-specific parsed values.
> > +	 */
> >  };
> 
> I still think it's cleaner to just have a "left subfilter" and "right
> subfilter", but I don't feel strongly about it. In any case, this is an
> internal detail and can always be changed in the future.
> 

Interesting. I think there are two reasonable ways of thinking about it:

a. we are parsing a nested left-associative binary expression, and it's
   helpful if the internal representation matches that semantic.  (Maybe
   this is what you're thinking?)

b. linked-lists are somewhat aberrational and should only be used in a
   very narrow class of code. Arrays are more universal and therefore more
   readable.

I'll keep it as-is for now since I don't see a great reason to revert it to the
previous style. Thank you for pointing this out - I *did* suspect there was a
big subjective factor to people's preference for the array form.

> > +	/*
> > +	 * Optional. If this function is supplied and the filter needs to
> > +	 * collect omits, then this function is called once before free_fn is
> > +	 * called.
> > +	 */
> > +	void (*finalize_omits_fn)(struct oidset *omits, void *filter_data);
> 
> This is needed because a combined filter's omits actually lie in the
> subfilters. Resolving it this way means that callers must call
> list_objects_filter__free() before using the omits set. Can you add
> documentation to __init() (which is the first function to take in the
> omits set) and __free() describing this?
> 
> (As stated in the test below, we cannot just share one omits set amongst
> all the subfilters - see filter_trees_update_omits and the call site
> that relies on its return value.)
> 

I documented __init as follows:

/*
 * Constructor for the set of defined list-objects filters.
 * The `omitted` set is optional. It is populated with objects that the
 * filter excludes. This set should not be considered finalized until
 * after list_objects_filter__free is called on the returned `struct
 * filter *`.
 */

And __free:

/*
 * Destroys `filter` and finalizes the `omitted` set, if present. Does
 * nothing if `filter` is null.
 */

And finalize_omits_fn (which has the internal specifics):

	/*
	 * Optional. If this function is supplied and the filter needs
	 * to collect omits, then this function is called once before
	 * free_fn is called.
	 *
	 * This is required because the following two conditions hold:
	 *
	 *   a. A tree filter can add and remove objects as an object
	 *      graph is traversed.
	 *   b. A combine filter's omit set is the union of all its
	 *      subfilters, which may include tree: filters.
	 *
	 * As such, the omits sets must be separate sets, and can only
	 * be unioned after the traversal is completed.
	 */

> Here comes the tricky part...
> 
> > +static int should_delegate(enum list_objects_filter_situation filter_situation,
> > +			   struct object *obj,
> > +			   struct subfilter *sub)
> > +{
> > +	if (!sub->is_skipping_tree)
> > +		return 1;
> > +	if (filter_situation == LOFS_END_TREE &&
> > +		oideq(&obj->oid, &sub->skip_tree)) {
> > +		sub->is_skipping_tree = 0;
> > +		return 1;
> > +	}
> > +	return 0;
> > +}
> 
> Optional: I think this should be called "test_and_set_skip_tree" or
> something like that, made to return the inverse of its current return
> value, and documented:
> 
>   Returns the value of sub->is_skipping_tree at the moment of
>   invocation. If iteration is at the LOFS_END_TREE of the tree currently
>   being skipped, first clears sub->is_skipping_tree before returning.
> 

That is not totally accurate, since in the second if block
(LOFS_END_TREE) we would return 0 even though is_skipping_tree starts
out as 1.

Since it's not equal to is_skipping_tree before invocation, I don't
think there is a better name to give it ATM.

So in the interest of removing "trickiness" I've inlined the function
(since there is no real way to express what it's doing in a function
name anyway) and I think it's more readable. This is the new callsite:

	/*
	 * Check and update is_skipping_tree before oidset_contains so
	 * that is_skipping_tree gets unset even when the object is
	 * marked as seen.  As of this writing, no filter uses
	 * LOFR_MARK_SEEN on trees that also uses LOFR_SKIP_TREE, so the
	 * ordering is only theoretically important. Be cautious if you
	 * change the order of the below checks and more filters have
	 * been added!
	 */
	if (sub->is_skipping_tree) {
		if (filter_situation == LOFS_END_TREE &&
		    oideq(&obj->oid, &sub->skip_tree))
			sub->is_skipping_tree = 0;
		else
			return LOFR_ZERO;
	}

Hopefully that's agreeable for everyone.

  reply	other threads:[~2019-06-27 21:13 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-01  0:35 [PATCH v2 0/9] Filter combination Matthew DeVore
2019-06-01  0:35 ` [PATCH v2 1/9] list-objects-filter: make API easier to use Matthew DeVore
2019-06-01  0:35 ` [PATCH v2 2/9] list-objects-filter: put omits set in filter struct Matthew DeVore
2019-06-01  0:35 ` [PATCH v2 3/9] list-objects-filter-options: always supply *errbuf Matthew DeVore
2019-06-01  0:35 ` [PATCH v2 4/9] list-objects-filter: implement composite filters Matthew DeVore
2019-06-03 21:51   ` Jeff Hostetler
2019-06-06 22:32     ` Matthew DeVore
2019-06-07 17:58       ` Jeff Hostetler
2019-06-01  0:35 ` [PATCH v2 5/9] list-objects-filter-options: move error check up Matthew DeVore
2019-06-01  0:36 ` [PATCH v2 6/9] list-objects-filter-options: make filter_spec a strbuf Matthew DeVore
2019-06-10 20:13   ` Junio C Hamano
2019-06-11  0:34     ` Matthew DeVore
2019-06-11 17:33       ` Junio C Hamano
2019-06-11 18:44         ` Matthew DeVore
2019-06-11 21:34           ` Matthew DeVore
2019-06-11 21:48           ` Junio C Hamano
2019-06-12  0:37             ` Matthew DeVore
2019-06-12 14:55               ` Matthew DeVore
2019-06-01  0:36 ` [PATCH v2 7/9] list-objects-filter-options: allow mult. --filter Matthew DeVore
2019-06-01  0:36 ` [PATCH v2 8/9] list-objects-filter-options: clean up use of ALLOC_GROW Matthew DeVore
2019-06-03 22:07   ` Jacob Keller
2019-06-03 22:39     ` Matthew DeVore
2019-06-04  3:16       ` Jacob Keller
2019-06-01  0:36 ` [PATCH v2 9/9] list-objects-filter-options: make parser void Matthew DeVore
2019-06-03 21:35 ` [PATCH v2 0/9] Filter combination Jeff Hostetler
2019-06-13 21:51 ` [PATCH v3 00/10] " Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 01/10] list-objects-filter: make API easier to use Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 02/10] list-objects-filter: put omits set in filter struct Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 03/10] list-objects-filter-options: always supply *errbuf Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 04/10] list-objects-filter: implement composite filters Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 05/10] list-objects-filter-options: move error check up Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 06/10] list-objects-filter-options: make filter_spec a string_list Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 07/10] strbuf: give URL-encoding API a char predicate fn Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 08/10] list-objects-filter-options: allow mult. --filter Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 09/10] list-objects-filter-options: clean up use of ALLOC_GROW Matthew DeVore
2019-06-13 21:51   ` [PATCH v3 10/10] list-objects-filter-options: make parser void Matthew DeVore
2019-06-14 19:50   ` [PATCH v3 00/10] Filter combination Junio C Hamano
2019-06-15  0:40 ` [PATCH v4 " Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 01/10] list-objects-filter: make API easier to use Matthew DeVore
2019-06-21 22:58     ` Jonathan Tan
2019-06-27  0:46       ` Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 02/10] list-objects-filter: put omits set in filter struct Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 03/10] list-objects-filter-options: always supply *errbuf Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 04/10] list-objects-filter: implement composite filters Matthew DeVore
2019-06-18  8:42     ` Johannes Schindelin
2019-06-18 20:22       ` Matthew DeVore
2019-06-21 18:17         ` Johannes Schindelin
2019-06-22  0:26     ` Jonathan Tan
2019-06-27 21:12       ` Matthew DeVore [this message]
2019-06-15  0:40   ` [PATCH v4 05/10] list-objects-filter-options: move error check up Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 06/10] list-objects-filter-options: make filter_spec a string_list Matthew DeVore
2019-06-22  0:37     ` Jonathan Tan
2019-06-27 21:17       ` Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 07/10] strbuf: give URL-encoding API a char predicate fn Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 08/10] list-objects-filter-options: allow mult. --filter Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 09/10] list-objects-filter-options: clean up use of ALLOC_GROW Matthew DeVore
2019-06-15  0:40   ` [PATCH v4 10/10] list-objects-filter-options: make parser void Matthew DeVore
2019-06-22  0:46     ` Jonathan Tan
2019-06-27 21:24       ` Matthew DeVore
2019-06-27 22:27         ` Matthew DeVore
2019-06-18  1:25   ` [PATCH v4 00/10] Filter combination Junio C Hamano
2019-06-27 22:54 ` [PATCH v5 " Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 01/10] list-objects-filter: encapsulate filter components Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 02/10] list-objects-filter: put omits set in filter struct Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 03/10] list-objects-filter-options: always supply *errbuf Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 04/10] list-objects-filter: implement composite filters Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 05/10] list-objects-filter-options: move error check up Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 06/10] list-objects-filter-options: make filter_spec a string_list Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 07/10] strbuf: give URL-encoding API a char predicate fn Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 08/10] list-objects-filter-options: allow mult. --filter Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 09/10] list-objects-filter-options: clean up use of ALLOC_GROW Matthew DeVore
2019-06-27 22:54   ` [PATCH v5 10/10] list-objects-filter-options: make parser void Matthew DeVore
2019-06-28 16:05   ` [PATCH v5 00/10] Filter combination Junio C Hamano
2019-06-28 17:16   ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190627211247.GB54617@comcast.net \
    --to=matvore@comcast.net \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=matvore@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).