git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff Hostetler <git@jeffhostetler.com>
Cc: git@vger.kernel.org, peff@peff.net, jonathantanmy@google.com,
	Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH 02/13] list-objects-filter-map: extend oidmap to collect omitted objects
Date: Thu, 26 Oct 2017 13:12:53 +0900	[thread overview]
Message-ID: <xmqqefpqfsxm.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <2f7ad5dc-821e-3fd3-bb7c-205ea5016457@jeffhostetler.com> (Jeff Hostetler's message of "Wed, 25 Oct 2017 15:22:04 -0400")

Jeff Hostetler <git@jeffhostetler.com> writes:

> Sorry, I meant a later commit in this patch series.  It is used by
> commits 4, 5, 6, and 10 to actually do the filtering and collect a
> list of omitted or missing objects.

I know you meant "later commits in the series" ;-).  

It does not change the fact that readers of 02/13 haven't seen them
yet to understand patch 02/13, if the changes that drove the design
of this step is in the same series or if they are not yet posted.

> I think of a "set" as a member? or not-member? class.
> I think of a "map" as a member? or not-member? class but where each
> member also has a value.  Sometimes map lookups just want to know
> membership and sometimes the lookup wants the value.
>
> Granted, having the key and value data stuffed into the same entry
> (from hashmap's point of view, rather than a key having a pointer
> to a value) does kind of blur the line, but I was thinking about
> a map here.  (And I was building on oidmap which builds on hashmap,
> so it seemed appropriate.)

My question was mostly about "if this is a map, then a caller that
queries the map with an oid does so because it wants to know the
data associated to the oid; if this is just a set, it is mostly
interested in the membership" and "I cannot quite tell which was
meant without the caller".  

It seems that some callers do care about the "path" name from your
response above, so calling this "map" sounds more appropriate.

The answer "it can be used to speed up 'is this path excluded?'
check" is a bit worrisome, though.  A blob can appear at more than
one path, and unless all the appearances of it are in an excluded
path, omitting the blob from the repository would lead to an aborted
"rev-list --objects" run, and this "map" can record at most one path
per each object; we need to wait until seeing the optimization code
to actually see how effectively this data helps optimization and
comment on the code ;-)

>>> +	len = ((pathname && *pathname) ? strlen(pathname) : 0);
>>> +	size = (offsetof(struct list_objects_filter_map_entry, pathname) + len + 1);
>>> +	e = xcalloc(1, size);
>>> +
>>> +	oidcpy(&e->entry.oid, oid);
>>> +	e->type = type;
>>> +	if (pathname && *pathname)
>>> +		strcpy(e->pathname, pathname);
>>> +
>>> +	oidmap_put(map, e);
>>> +	return 0;
>>> +}
>>
>> The return value from the function needs to be documented in the
>> header to help callers.  It is not apparent why "we did already have
>> one" and "we now newly added" is interesting to the callers, for
>> example.  An obvious alternative implementation of this function
>> would return the pointer to an entry that records the object id
>> (i.e. either the one that was already there, or the one we created
>> because we saw this object for the first time), so that the caller
>> can do something interesting to it---again, because the reason why
>> we want this "filter map" is not explained at this stage, it is hard
>> to tell what that "sometehing interesting" would be.
>
> good point.  thanks.

I am more confused by the response ;-) But as we established that
this is a map (not a set that borrows the implementation of map),
where the data recorded in 'e' is quite useful to the caller, it
probably makes sense to make 'e' available to the caller?  It is
still unclear if the caller finds "it is the first time I saw the
object you gave me" vs "I've seen that object before already"
useful.

>>> +	for (k = 0; k < nr; k++)
>>> +		cb(k, nr, array[k], cb_data);
>>
>> Also it is not clear if you wanted to expose the type of the
>> entry to the callback function.
>
> The thought was that we would sort the OIDs so that things
> like rev-list could print the omitted/missing objects in OID
> order.  Not critical that we do it here, but I thought it would
> help callers.

I can foresee some callers would want sorted, while others do not.
I was primarily wondering why "my_cmp" is not a parameter that can
be NULL (in which case we do not sort at all).

>> An obvious alternative
>>
>> 	fn(&array[k].entry.oid, cb_data);
>>
>> would allow you to keep the type of map-entry private to the map,
>> and also the callback does not need to know about k or nr.
>> ...
> I included the {k, nr} so that the callback could dump header/trailer
> information when reporting the results or pre-allocate an array.
> I'll look at refactoring this -- I never quite liked how it turned
> out anyway -- especially with the oidmap simplifications.

And as we established that this is a map, where the data associated
with each oid is interesting to the caller, we do not want to hide
the type of array[] element by passing only &array[k].entry.oid, I
guess?

Thanks.

  reply	other threads:[~2017-10-26  4:12 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-24 18:53 [PATCH 00/13] WIP Partial clone part 1: object filtering Jeff Hostetler
2017-10-24 18:53 ` [PATCH 01/13] dir: allow exclusions from blob in addition to file Jeff Hostetler
2017-10-25  4:05   ` Eric Sunshine
2017-10-25  6:43   ` Junio C Hamano
2017-10-25 14:54     ` Jeff Hostetler
2017-10-26  3:47       ` Junio C Hamano
2017-10-26 18:11         ` Jeff Hostetler
2017-10-24 18:53 ` [PATCH 02/13] list-objects-filter-map: extend oidmap to collect omitted objects Jeff Hostetler
2017-10-25  7:10   ` Junio C Hamano
2017-10-25 19:22     ` Jeff Hostetler
2017-10-26  4:12       ` Junio C Hamano [this message]
2017-10-24 18:53 ` [PATCH 03/13] list-objects: filter objects in traverse_commit_list Jeff Hostetler
2017-10-25  4:05   ` Jonathan Tan
2017-10-25 19:25     ` Jeff Hostetler
2017-10-24 18:53 ` [PATCH 04/13] list-objects-filter-blobs-none: add filter to omit all blobs Jeff Hostetler
2017-10-24 18:53 ` [PATCH 05/13] list-objects-filter-blobs-limit: add large blob filtering Jeff Hostetler
2017-10-24 18:53 ` [PATCH 06/13] list-objects-filter-sparse: add sparse filter Jeff Hostetler
2017-10-24 18:53 ` [PATCH 07/13] list-objects-filter-options: common argument parsing Jeff Hostetler
2017-10-25  4:14   ` Jonathan Tan
2017-10-25 19:28     ` Jeff Hostetler
2017-10-24 18:53 ` [PATCH 08/13] list-objects: add traverse_commit_list_filtered method Jeff Hostetler
2017-10-25  4:24   ` Jonathan Tan
2017-10-25 19:29     ` Jeff Hostetler
2017-10-24 18:53 ` [PATCH 09/13] extension.partialclone: introduce partial clone extension Jeff Hostetler
2017-10-24 18:53 ` [PATCH 10/13] rev-list: add list-objects filtering support Jeff Hostetler
2017-10-25  4:41   ` Jonathan Tan
2017-10-25 19:37     ` Jeff Hostetler
2017-10-24 18:53 ` [PATCH 11/13] t6112: rev-list object filtering test Jeff Hostetler
2017-10-24 18:53 ` [PATCH 12/13] pack-objects: add list-objects filtering Jeff Hostetler
2017-10-24 18:53 ` [PATCH 13/13] t5317: pack-objects object filtering test Jeff Hostetler
2017-10-25  4:57 ` [PATCH 00/13] WIP Partial clone part 1: object filtering Jonathan Tan
2017-10-25  5:00 ` Junio C Hamano
2017-10-25  6:46   ` Jonathan Tan
2017-10-25 15:39     ` Jeff Hostetler
2017-10-26  2:09       ` Junio C Hamano
2017-10-26  2:01     ` Junio C Hamano
2017-10-30 22:27     ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqefpqfsxm.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=jeffhost@microsoft.com \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).