git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Matthew DeVore <matvore@google.com>
Cc: git@vger.kernel.org, sbeller@google.com, git@jeffhostetler.com,
	jeffhost@microsoft.com, peff@peff.net, stefanbeller@gmail.com,
	jonathantanmy@google.com, pclouds@gmail.com
Subject: Re: [RFC PATCH 3/3] list-objects-filter: teach tree:# how to handle >0
Date: Mon, 15 Oct 2018 11:31:30 +0900	[thread overview]
Message-ID: <xmqqlg706iv1.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <e340f743e6749e65525e1342dc47faaa6540f04b.1539298957.git.matvore@google.com> (Matthew DeVore's message of "Thu, 11 Oct 2018 16:09:02 -0700")

Matthew DeVore <matvore@google.com> writes:

> The long-term goal at the end of this is to allow a partial clone to
> eagerly fetch an entire directory of files by fetching a tree and
> specifying <depth>=1. This, for instance, would make a build operation
> fast and convenient

This would reduce round-trip, as you do not have to fetch the tree
to see what its contents are before issuing another set of fetches
for them.  Those who are building virtual filesystem that let you
mount a specific tree object, perhaps via fuse, may find it useful,
too, even though I suspect that may not be your primary focus.

> diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
> index c2c1c40e6..c78985c41 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -734,8 +734,12 @@ specification contained in <path>.
>  +
>  The form '--filter=tree:<depth>' omits all blobs and trees whose depth
>  from the root tree is >= <depth> (minimum depth if an object is located
> -at multiple depths in the commits traversed). Currently, only <depth>=0
> -is supported, which omits all blobs and trees.
> +at multiple depths in the commits traversed). <depth>=0 will not include
> +any trees or blobs unless included explicitly in <object>. <depth>=1

Here, <object> refers to the objects directly requested on the
command line (or --stdin)?  Triggering this question from me is a
sign that this description may want to say a bit more to avoid the
same question from the real readers.  Perhaps replace "included
explicitly in <object>" with "explicitly requested by listing them
on the command line or feeding them with `--stdin`", or something
like that?

> diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> index e8da2e858..9dc61d6e6 100644
> --- a/list-objects-filter-options.c
> +++ b/list-objects-filter-options.c
> @@ -50,12 +50,12 @@ static int gently_parse_list_objects_filter(
>  		}
>  
>  	} else if (skip_prefix(arg, "tree:", &v0)) {
> -		unsigned long depth;
> -		if (!git_parse_ulong(v0, &depth) || depth != 0) {
> +		if (!git_parse_ulong(v0,
> +				     &filter_options->tree_depth_limit_value)) {
>  			if (errbuf) {
>  				strbuf_addstr(
>  					errbuf,
> -					_("only 'tree:0' is supported"));
> +					_("expected 'tree:<int>'"));

We do not accept "tree:-1", even though "-1" is an int.  Is it too
obvious to worry about?  I do not think we want to say tree:<uint>
even if we do want to make it clear we reject "tree:-1"

I am wondering if "expected 'tree:<depth>'" would work better.

> diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
> index af64e5c66..c1ae70cd8 100644
> --- a/list-objects-filter-options.h
> +++ b/list-objects-filter-options.h
> @@ -44,6 +44,7 @@ struct list_objects_filter_options {
>  	struct object_id *sparse_oid_value;
>  	char *sparse_path_value;
>  	unsigned long blob_limit_value;
> +	unsigned long tree_depth_limit_value;
>  };

This change gets it right by adding "depth" in the field name and it
is not a comment on this patch, but someday not in too distant
future we should rename the "blob_limit_value" to make it clear that
it is filtering by number of bytes, as other filtering criteria on
blobs that can be expressed in ulong are quite possible.

> -static enum list_objects_filter_result filter_trees_none(
> +static enum list_objects_filter_result filter_trees_depth(
>  	enum list_objects_filter_situation filter_situation,
>  	struct object *obj,
>  	const char *pathname,
>  	const char *filename,
>  	void *filter_data_)
>  {
> -	struct filter_trees_none_data *filter_data = filter_data_;
> +	struct filter_trees_depth_data *filter_data = filter_data_;
> +
> +	int too_deep = filter_data->current_depth >= filter_data->max_depth;

Does max mean "maximum allowed", or "this and anything larger are
rejected".  The latter sound wrong, but I offhand do not know if
your current_depth counts from 0 or 1, so there may not be
off-by-one.

 - dir.c::within_depth() that is used by pathspec matching that in turn
   is used by "grep --max-depth=1" does "if (depth > max_depth)", which
   sounds more in line with the usual convention, I would think.

 - pack-objects has max_delta_cache_size, which also is used as
   "maximum allowed", not "this is already too big".  So is its
   max_depth.

There may be other examples.  One existing violator I noticed was
the "reject blobs that is this size or larger" in this file; it is
called "max_bytes", but it is apparently not "maximum allowed",
which we probably would want to fix.

> +	/*
> +	 * Note that we do not use _MARK_SEEN in order to allow re-traversal in
> +	 * case we encounter a tree or blob again at a shallower depth.
> +	 */

Hmph.  Earlier tree:0 support never even read the actual tree, so
this was not a problem.  We wouldn't have found a tree in deeper
places first and then at a shallower depth, as we wouldn't have seen
any tree in any depth deeper than the surface anyway.

Now we need to worry about a tree that originally gets seen in a
deeper depth (that is still below the allowed maximum) to reappear
at a shallower place, so a subtree within it that used to be too
deep may now be within the allowed maximum depth.

Step 1 of these three patches made sure trees are not even opened
under "tree:0", so it was not just optimizing/shrinking the output
of rev-list but also optimizing the traversal.  When we are
collecting omits, however, this one now returns _ZERO which means we
still traverse into the tree, even under "tree:0"?  I must be
reading the code incorrectly (in general, when we are seeing a tree
object that itself is at the maximum allowed depth, trees found by
reading its contents will never become eligible for output, even if
they are found at a shallower depth than their other copies were
previously found, I would think).

>  	switch (filter_situation) {
>  	default:
>  		BUG("unknown filter_situation: %d", filter_situation);
>  
> -	case LOFS_BEGIN_TREE:
>  	case LOFS_BLOB:
> +		if (!too_deep) goto include_it;

Style: on two lines, like you did below for the next if() statement.

> +
> +		if (filter_data->omits)
> +			oidset_insert(filter_data->omits, &obj->oid);
> +
> +		return LOFR_ZERO;
> +
> +	case LOFS_BEGIN_TREE:
> +		filter_data->current_depth++;
> +
> +		if (!too_deep) goto include_it;
> +
>  		if (filter_data->omits) {
>  			oidset_insert(filter_data->omits, &obj->oid);
> -			/* _MARK_SEEN but not _DO_SHOW (hard omit) */
> -			return LOFR_MARK_SEEN;
> +			return LOFR_ZERO;
>  		}
>  		else
>  			/*
>  			 * Not collecting omits so no need to to traverse tree.
>  			 */
> -			return LOFR_SKIP_TREE | LOFR_MARK_SEEN;
> +			return LOFR_SKIP_TREE;
>  
>  	case LOFS_END_TREE:
>  		assert(obj->type == OBJ_TREE);
> +		filter_data->current_depth--;
>  		return LOFR_ZERO;
>  
>  	}
> +
> +include_it:
> +	if (filter_data->omits)
> +		oidset_remove(filter_data->omits, &obj->oid);
> +	return LOFR_MARK_SEEN | LOFR_DO_SHOW;
>  }

  reply	other threads:[~2018-10-15  2:31 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-11 23:08 [RFC PATCH 0/3] support for filtering trees and blobs based on depth Matthew DeVore
2018-10-11 23:09 ` [RFC PATCH 1/3] list-objects: support for skipping tree traversal Matthew DeVore
2018-10-14 23:15   ` Junio C Hamano
2018-10-18  0:25     ` Matthew DeVore
2018-10-11 23:09 ` [RFC PATCH 2/3] Documentation/git-rev-list: s/<commit>/<object>/ Matthew DeVore
2018-10-14 23:25   ` Junio C Hamano
2018-10-20  0:03     ` Matthew DeVore
2018-10-22  2:21       ` Junio C Hamano
2018-12-07 22:55         ` Matthew DeVore
2018-12-08  6:24           ` Junio C Hamano
2018-10-11 23:09 ` [RFC PATCH 3/3] list-objects-filter: teach tree:# how to handle >0 Matthew DeVore
2018-10-15  2:31   ` Junio C Hamano [this message]
2018-11-08  0:47     ` Matthew DeVore
2018-12-10 23:15     ` Matthew DeVore

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqlg706iv1.fsf@gitster-ct.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=jeffhost@microsoft.com \
    --cc=jonathantanmy@google.com \
    --cc=matvore@google.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    --cc=stefanbeller@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).