git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Stefan Beller <sbeller@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 4/9] submodule.c: sort changed_submodule_names before searching it
Date: Wed, 12 Sep 2018 11:18:02 -0700
Message-ID: <xmqqr2hylgv9.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <20180911234951.14129-5-sbeller@google.com>

Stefan Beller <sbeller@google.com> writes:

> We can string_list_insert() to maintain sorted-ness of the
> list as we find new items, or we can string_list_append() to
> build an unsorted list and sort it at the end just once.
>
> To pick which one is more appropriate, we notice the fact
> that we discover new items more or less in the already
> sorted order.  That makes "append then sort" more
> appropriate.

Sorry, but I still do not get the math you are implying in the
second paragraph.  Are you saying that append-then-sort is efficient
when items being appended is already sorted?  That depends on the
sorting algorithm used, so the logic is incomplete unless you say
"given that we use X for sorting,...", I think.

Do we really discover new items in sorted order, by the way?  In a
single diff invocation made inside collect_changed_submodules() for
one commit in the superproject's history, we will grab changed paths
in the pathname order (i.e. sorted); if the superproject's tip commit
touches the submodules at paths A and Z, we will discover these two
paths in sorted order.

But because we are walking the superproject's history to collect all
paths that have been affected in that function, and repeatedly
calling diff as we discover commit in the superproject's history, I
am not sure how well the resulting set of paths would be sorted.

The tip commit in superproject's history may have modified the
submodule at path X, the parent of that commit may have touched the
submodule at path M, and its parent may have touched the submodule
at path A.  Don't we end up grabbing these paths in that discoverd
order, i.e. X, M and A?

I still think changing it from "insert as we find an item, keeping
the list sorted" to "append all and then sort before we start
looking things up from the result" makes sense, but I do not think
the "we find things in sorted order" is either true, or it would
affect the choice between the two.  A justification to choose the
latter I can think of that makes sense is that we don't have to pay
cost to keep the list sorted while building it because we do not do
any look-up while building the list.

> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  submodule.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/submodule.c b/submodule.c
> index d29dfa3d1f5..c6eff7699f3 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1256,7 +1256,7 @@ static int get_next_submodule(struct child_process *cp,
>  		case RECURSE_SUBMODULES_DEFAULT:
>  		case RECURSE_SUBMODULES_ON_DEMAND:
>  			if (!submodule ||
> -			    !unsorted_string_list_lookup(
> +			    !string_list_lookup(
>  					&changed_submodule_names,
>  					submodule->name))
>  				continue;
> @@ -1350,6 +1350,7 @@ int fetch_populated_submodules(struct repository *r,
>  	/* default value, "--submodule-prefix" and its value are added later */
>  
>  	calculate_changed_submodule_paths();
> +	string_list_sort(&changed_submodule_names);
>  	run_processes_parallel(max_parallel_jobs,
>  			       get_next_submodule,
>  			       fetch_start_failure,

  reply index

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-11 23:49 [PATCH 0/9] fetch: make sure submodule oids are fetched Stefan Beller
2018-09-11 23:49 ` [PATCH 1/9] string-list: add string_list_{pop, last} functions Stefan Beller
2018-09-12  2:24   ` Ramsay Jones
2018-09-12 17:52   ` Junio C Hamano
2018-09-11 23:49 ` [PATCH 2/9] sha1-array: provide oid_array_filter Stefan Beller
2018-09-12 18:02   ` Junio C Hamano
2018-09-11 23:49 ` [PATCH 3/9] submodule.c: fix indentation Stefan Beller
2018-09-12 18:02   ` Junio C Hamano
2018-09-11 23:49 ` [PATCH 4/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
2018-09-12 18:18   ` Junio C Hamano [this message]
2018-09-12 19:06     ` Stefan Beller
2018-09-11 23:49 ` [PATCH 5/9] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
2018-09-11 23:49 ` [PATCH 6/9] submodule.c: do not copy around submodule list Stefan Beller
2018-09-12  2:25   ` Ramsay Jones
2018-09-11 23:49 ` [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
2018-09-12 18:36   ` Junio C Hamano
2018-09-13 19:29     ` Stefan Beller
2018-09-11 23:49 ` [PATCH 8/9] fetch: retry fetching submodules if sha1 were not fetched Stefan Beller
2018-09-12 19:03   ` Junio C Hamano
2018-09-11 23:49 ` [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
2018-09-12 19:20   ` Junio C Hamano
2018-09-17 21:35 ` [PATCHv2 0/9] fetch: make sure submodule oids are fetched Stefan Beller
2018-09-17 21:35   ` [PATCH 1/9] string-list: add string_list_{pop, last} functions Stefan Beller
2018-09-21 22:08     ` [PATCH] " Stefan Beller
2018-09-17 21:35   ` [PATCH 2/9] sha1-array: provide oid_array_filter Stefan Beller
2018-09-17 21:42     ` Stefan Beller
2018-09-17 21:35   ` [PATCH 3/9] submodule.c: fix indentation Stefan Beller
2018-09-17 21:35   ` [PATCH 4/9] submodule.c: sort changed_submodule_names before searching it Stefan Beller
2018-09-17 21:35   ` [PATCH 5/9] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
2018-09-17 21:35   ` [PATCH 6/9] submodule.c: do not copy around submodule list Stefan Beller
2018-09-17 21:35   ` [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
2018-09-17 21:35   ` [PATCH 8/9] fetch: retry fetching submodules if needed objects were not fetched Stefan Beller
2018-09-17 21:35   ` [PATCH 9/9] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqr2hylgv9.fsf@gitster-ct.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox