From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git <git@vger.kernel.org>
Subject: Re: [PATCH 06/11] submodule.c: sort changed_submodule_names before searching it
Date: Tue, 11 Sep 2018 11:31:55 -0700 [thread overview]
Message-ID: <CAGZ79kZ4UtcojkDGjAbxORV6+dfSNmQYOx_Loe=B=-Z3JrQ5dg@mail.gmail.com> (raw)
In-Reply-To: <xmqqbm9a1p1p.fsf@gitster-ct.c.googlers.com>
On Thu, Sep 6, 2018 at 11:03 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Stefan Beller <sbeller@google.com> writes:
>
> > Instead of sorting it after we created an unsorted list, we could insert
> > correctly into the list.
>
> It is unclear what problem you are solving, especially with
> subjunctive "could" there. We are creating an unsorted list and
> then sorting it and you see it as a problem because it is just as
> easy and efficient to do the insertion sort while building up the
> list? (don't react and answer without reading all the way to the
> end; I think I know what is going on).
>
> > As the unsorted append is in order of cache entry
> > names, this is already sorted if names were equal to paths for submodules.
>
> That may be a statement of a fact, but it is unclear how that fact
> relates to what the code is doing before this patch, or what the
> code updated by this patch is doing.
>
> > As submodule names are often the same as their path, the input is sorted
> > pretty well already, so let's just do the sort afterwards.
>
> It is unclear what (performance?) trade-off this senttence is trying
> to make. It sounds as if it is claiming this:
>
> We can string_list_insert() to maintain sorted-ness of the
> list as we find new items, or we can string_list_append() to
> build an unsorted list and sort it at the end just once.
>
> To pick which one is more appropriate, we notice the fact
> that we discover new items more or less in the already
> sorted order. That makes "append then sort" more
> appropriate.
>
> But is that reasoning sensible?
>
> I'd imagine that append-then-sort would always be more efficient
> than insert-at-the-right-place-as-we-go, and the reason why we
> sometimes would need to do the latter is when we need to look up
> elements in the middle of building the list (e.g. we may want to
> de-dup, which requires us to look up a name from the ones we
> collected so far).
If we come across a (mostly) sorted list, then the insert-at-the-right-place
we'd come across the best case of insertion sort which is O(n), which
sounds better than append-then-sort as our sorting is a merge sort,
which has O(n log n) even in its best case (and needs to copy stuff
into a temp buffer and back).
By having the submodules named after its path, I strongly suspect
we have a mostly sorted list in nearly all cases except some really
interesting corner cases out there.
> And in this application, calculate_changed_submodule_paths()
> discover paths by calling collect_changed_submodules() which finds a
> mapping <submodule name, oid of commits> into a list sorted by
> submodule name, and then goes through that list and builds a list of
> submodules paths (which could be different from submodule names) by
> appending. Only after this list is fully built, get_next_submodule()
> gets called, so making the latter use string_list_lookup() that assumes
> a sorted list is safe if we built the list by append-then-sort (iow,
> sortedness while building the list does not matter).
>
> Having analysed all that, I find it somewhat iffy that _append() is
> used there in the calculate_changed_submodule_paths() function.
Note that this is fixed in the later patch
"submodule.c: do not copy around submodule list"
> It
> would cause the resulting changed_submodule_names list to hold the
> same name twice (or more),
This would be possible if there is a submodule at path A and another
submodule (at a different path) named "A", as we'll try hard to collect
names, but are also okay with path as we want to keep supporting the
historical use case of submodules.
> but I do not know if that would pose a
> problem to the consumer of the list. Using "accumulate then sort
> before calling look-up" would not change it as string_list_sort()
> would not dedup, so I do not think this patch would introduce a new
> problem, though.
Yes, that is true, so we'd want to extend the message above to
mention the potential duplicates.
Thanks,
Stefan
next prev parent reply other threads:[~2018-09-11 18:32 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-04 23:01 [PATCH 00/11] fetch: make sure submodule oids are fetched Stefan Beller
2018-09-04 23:01 ` [PATCH 01/11] string_list: print_string_list to use trace_printf Stefan Beller
2018-09-06 16:52 ` Junio C Hamano
2018-09-06 16:56 ` Jeff King
2018-09-06 22:16 ` Stefan Beller
2018-09-07 0:04 ` Jeff King
2018-09-07 9:53 ` Junio C Hamano
2018-09-07 17:21 ` Stefan Beller
2018-09-10 21:58 ` [PATCH 1/2] trace: add trace_print_string_list_key Stefan Beller
2018-09-10 21:58 ` [PATCH 2/2] string-list: remove unused function print_string_list Stefan Beller
2018-09-10 22:32 ` [PATCH 1/2] trace: add trace_print_string_list_key Junio C Hamano
2018-09-10 22:38 ` Stefan Beller
2018-09-11 0:52 ` Junio C Hamano
2018-09-11 3:08 ` Stefan Beller
2018-09-11 21:03 ` Jeff King
2018-09-11 18:48 ` [PATCH] string-list: remove unused function print_string_list Stefan Beller
2018-09-11 19:27 ` Junio C Hamano
2018-09-11 19:30 ` Junio C Hamano
2018-09-11 19:47 ` Stefan Beller
2018-09-11 20:53 ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 02/11] string-list.h: add string_list_{pop, last} functions Stefan Beller
2018-09-06 17:10 ` Junio C Hamano
2018-09-06 22:29 ` Stefan Beller
2018-09-04 23:01 ` [PATCH 03/11] sha1-array: provide oid_array_filter Stefan Beller
2018-09-06 17:20 ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 04/11] submodule.c: convert submodule_move_head new argument to object id Stefan Beller
2018-09-06 17:31 ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 05/11] submodule.c: fix indentation Stefan Beller
2018-09-06 17:34 ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 06/11] submodule.c: sort changed_submodule_names before searching it Stefan Beller
2018-09-06 18:03 ` Junio C Hamano
2018-09-11 18:31 ` Stefan Beller [this message]
2018-09-04 23:01 ` [PATCH 07/11] submodule: move global changed_submodule_names into fetch submodule struct Stefan Beller
2018-09-06 18:04 ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 08/11] submodule.c: do not copy around submodule list Stefan Beller
2018-09-06 18:20 ` Junio C Hamano
2018-09-04 23:01 ` [PATCH 09/11] submodule: fetch in submodules git directory instead of in worktree Stefan Beller
2018-09-04 23:01 ` [PATCH 10/11] fetch: retry fetching submodules if sha1 were not fetched Stefan Beller
2018-09-04 23:01 ` [PATCH 11/11] builtin/fetch: check for submodule updates for non branch fetches Stefan Beller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGZ79kZ4UtcojkDGjAbxORV6+dfSNmQYOx_Loe=B=-Z3JrQ5dg@mail.gmail.com' \
--to=sbeller@google.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).