git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git <git@vger.kernel.org>
Subject: Re: [PATCH 06/11] submodule.c: sort changed_submodule_names before searching it
Date: Tue, 11 Sep 2018 11:31:55 -0700
Message-ID: <CAGZ79kZ4UtcojkDGjAbxORV6+dfSNmQYOx_Loe=B=-Z3JrQ5dg@mail.gmail.com> (raw)
In-Reply-To: <xmqqbm9a1p1p.fsf@gitster-ct.c.googlers.com>

On Thu, Sep 6, 2018 at 11:03 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Stefan Beller <sbeller@google.com> writes:
>
> > Instead of sorting it after we created an unsorted list, we could insert
> > correctly into the list.
>
> It is unclear what problem you are solving, especially with
> subjunctive "could" there.  We are creating an unsorted list and
> then sorting it and you see it as a problem because it is just as
> easy and efficient to do the insertion sort while building up the
> list?  (don't react and answer without reading all the way to the
> end; I think I know what is going on).
>
> > As the unsorted append is in order of cache entry
> > names, this is already sorted if names were equal to paths for submodules.
>
> That may be a statement of a fact, but it is unclear how that fact
> relates to what the code is doing before this patch, or what the
> code updated by this patch is doing.
>
> > As submodule names are often the same as their path, the input is sorted
> > pretty well already, so let's just do the sort afterwards.
>
> It is unclear what (performance?) trade-off this senttence is trying
> to make.  It sounds as if it is claiming this:
>
>         We can string_list_insert() to maintain sorted-ness of the
>         list as we find new items, or we can string_list_append() to
>         build an unsorted list and sort it at the end just once.
>
>         To pick which one is more appropriate, we notice the fact
>         that we discover new items more or less in the already
>         sorted order.  That makes "append then sort" more
>         appropriate.
>
> But is that reasoning sensible?
>
> I'd imagine that append-then-sort would always be more efficient
> than insert-at-the-right-place-as-we-go, and the reason why we
> sometimes would need to do the latter is when we need to look up
> elements in the middle of building the list (e.g. we may want to
> de-dup, which requires us to look up a name from the ones we
> collected so far).

If we come across a (mostly) sorted list, then the insert-at-the-right-place
we'd come across the best case of insertion sort which is O(n), which
sounds better than append-then-sort as our sorting is a merge sort,
which has O(n log n) even in its best case (and needs to copy stuff
into a temp buffer and back).

By having the submodules named after its path, I strongly suspect
we have a mostly sorted list in nearly all cases except some really
interesting corner cases out there.

> And in this application, calculate_changed_submodule_paths()
> discover paths by calling collect_changed_submodules() which finds a
> mapping <submodule name, oid of commits> into a list sorted by
> submodule name, and then goes through that list and builds a list of
> submodules paths (which could be different from submodule names) by
> appending.  Only after this list is fully built, get_next_submodule()
> gets called, so making the latter use string_list_lookup() that assumes
> a sorted list is safe if we built the list by append-then-sort (iow,
> sortedness while building the list does not matter).
>
> Having analysed all that, I find it somewhat iffy that _append() is
> used there in the calculate_changed_submodule_paths() function.

Note that this is fixed in the later patch
"submodule.c: do not copy around submodule list"

>  It
> would cause the resulting changed_submodule_names list to hold the
> same name twice (or more),

This would be possible if there is a submodule at path A and another
submodule (at a different path) named "A", as we'll try hard to collect
names, but are also okay with path as we want to keep supporting the
historical use case of submodules.

> but I do not know if that would pose a
> problem to the consumer of the list.  Using "accumulate then sort
> before calling look-up" would not change it as string_list_sort()
> would not dedup, so I do not think this patch would introduce a new
> problem, though.

Yes, that is true, so we'd want to extend the message above to
mention the potential duplicates.

Thanks,
Stefan

  reply index

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-04 23:01 [PATCH 00/11] fetch: make sure submodule oids are fetched Stefan Beller
2018-09-04 23:01 ` [PATCH 01/11] string_list: print_string_list to use trace_printf Stefan Beller
2018-09-06 16:52   ` Junio C Hamano
2018-09-06 16:56     ` Jeff King
2018-09-06 22:16       ` Stefan Beller
2018-09-07  0:04         ` Jeff King
2018-09-07  9:53           ` Junio C Hamano
2018-09-07 17:21             ` Stefan Beller
2018-09-10 21:58               ` [PATCH 1/2] trace: add trace_print_string_list_key Stefan Beller
2018-09-10 21:58                 ` [PATCH 2/2] string-list: remove unused function print_string_list Stefan Beller
2018-09-10 22:32                 ` [PATCH 1/2] trace: add trace_print_string_list_key Junio C Hamano
2018-09-10 22:38                   ` Stefan Beller
2018-09-11  0:52                     ` Junio C Hamano
2018-09-11  3:08                       ` Stefan Beller
2018-09-11 21:03                         ` Jeff King
2018-09-11 18:48                       ` [PATCH] string-list: remove unused function print_string_list Stefan Beller
2018-09-11 19:27                         ` Junio C Hamano
2018-09-11 19:30                           ` Junio C Hamano
2018-09-11 19:47                             ` Stefan Beller
2018-09-11 20:53                               ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 02/11] string-list.h: add string_list_{pop, last} functions Stefan Beller
2018-09-06 17:10   ` Junio C Hamano
2018-09-06 22:29     ` Stefan Beller
2018-09-04 23:01 ` [PATCH 03/11] sha1-array: provide oid_array_filter Stefan Beller
2018-09-06 17:20   ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 04/11] submodule.c: convert submodule_move_head new argument to object id Stefan Beller
2018-09-06 17:31   ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 05/11] submodule.c: fix indentation Stefan Beller
2018-09-06 17:34   ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 06/11] submodule.c: sort changed_submodule_names before searching it Stefan Beller
2018-09-06 18:03   ` Junio C Hamano
2018-09-11 18:31     ` Stefan Beller [this message]
2018-09-04 23:01 ` [PATCH 07/11] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
2018-09-06 18:04   ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 08/11] submodule.c: do not copy around submodule list Stefan Beller
2018-09-06 18:20   ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 09/11] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
2018-09-04 23:01 ` [PATCH 10/11] fetch: retry fetching submodules if sha1 were not fetched Stefan Beller
2018-09-04 23:01 ` [PATCH 11/11] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGZ79kZ4UtcojkDGjAbxORV6+dfSNmQYOx_Loe=B=-Z3JrQ5dg@mail.gmail.com' \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox