git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, "Carlos Martín Nieto" <cmn@elego.de>,
	"Michael Schubert" <mschub@elegosoft.com>,
	"Johan Herland" <johan@herland.net>, "Jeff King" <peff@peff.net>,
	"Marc Branchaud" <marcnarc@xiplink.com>,
	"Nicolas Pitre" <nico@fluxnic.net>,
	"John Szakmeister" <john@szakmeister.net>,
	"Michael Haggerty" <mhagger@alum.mit.edu>
Subject: [PATCH 06/15] ref_remove_duplicates(): avoid redundant bisection
Date: Wed, 23 Oct 2013 17:50:39 +0200	[thread overview]
Message-ID: <1382543448-2586-7-git-send-email-mhagger@alum.mit.edu> (raw)
In-Reply-To: <1382543448-2586-1-git-send-email-mhagger@alum.mit.edu>

The old code called string_list_lookup(), and if that failed called
string_list_insert(), thus doing the bisection search through the
string list twice in the latter code path.

Instead, just call string_list_insert() right away.  If an entry for
that peer reference name already existed, then its util pointer is
always non-NULL.

Of course this doesn't change the fact that the repeated
string_list_insert() calls make the function scale like O(N^2) if the
input reference list is not already approximately sorted.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---

The O(N^2) algorithm mentioned above essentially boils down to

    for each reference:
        insert reference into string_list (sorted)

Because the insertion requires later array entries to be shifted to
the right, the algorithm scales like O(N^2) if the entries are not
already approximately in order.

I can think of three easy ways to fix this:

* Use a hash map instead of a sorted string_list to record which
  entries have already been seen.

* If the order of the result doesn't matter and it doesn't matter
  *which* of the duplicates is retained, then insert all of the names
  into the string_list unsorted, then sort them all in one go, then
  iterate through looking for duplicates and stringing the survivors
  together in a new linked list.

* If the order is important, or it is important that the *first* of
  duplicates be retained, then iterate through the linked list twice.
  The first time, just add the entries to the string_list.  Then sort
  the string_list using a stable sort (e.g., git_qsort()).  Then
  iterate through the linked list a second time as in the current
  code.

See also my comments to commit 10.

However, I believe that the O(N^2) worst case is unlikely to bite in
most cases.  This function is called from
builtin/fetch.c:get_ref_map(), and the input list is the concatenation
of several sub-lists.  Most sublists come from resolving refspecs, and
the last comes from find_non_local_tags().  I believe that each of the
sublists is sorted.  So the O(N^2) worst-case could only occur if more
than one of the sublists is large and at least two of the large
sublists are not in the correct order relative to each other.

I tried to concoct such a test case by creating a repository with
about 20k branches and 20k tags, instrumenting
ref_remove_duplicates(), and then fetching with something like

    git fetch --prune origin refs/heads/*:refs/remotes/origin/* \
                             refs/tags/*:refs/tags/*

Indeed, the running time of ref_remove_duplicates() became significant
when the order of the refspecs was reversed, but it was still less
than a second out of a much longer command invocation.

So I didn't bother fixing the O(N^2) algorithm.  If somebody reports
real-world slowness here, we can always come back to it.

 remote.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/remote.c b/remote.c
index e9fedfa..a44e897 100644
--- a/remote.c
+++ b/remote.c
@@ -750,13 +750,15 @@ void ref_remove_duplicates(struct ref *ref_map)
 	struct string_list refs = STRING_LIST_INIT_NODUP;
 	struct string_list_item *item = NULL;
 	struct ref *prev = NULL, *next = NULL;
+
 	for (; ref_map; prev = ref_map, ref_map = next) {
 		next = ref_map->next;
 		if (!ref_map->peer_ref)
 			continue;
 
-		item = string_list_lookup(&refs, ref_map->peer_ref->name);
-		if (item) {
+		item = string_list_insert(&refs, ref_map->peer_ref->name);
+		if (item->util) {
+			/* Entry already existed */
 			if (strcmp(((struct ref *)item->util)->name,
 				   ref_map->name))
 				die("%s tracks both %s and %s",
@@ -767,11 +769,9 @@ void ref_remove_duplicates(struct ref *ref_map)
 			free(ref_map->peer_ref);
 			free(ref_map);
 			ref_map = prev; /* skip this; we freed it */
-			continue;
+		} else {
+			item->util = ref_map;
 		}
-
-		item = string_list_insert(&refs, ref_map->peer_ref->name);
-		item->util = ref_map;
 	}
 	string_list_clear(&refs, 0);
 }
-- 
1.8.4

  parent reply	other threads:[~2013-10-23 15:52 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-13  2:54 Local tag killer Michael Haggerty
2013-09-13  4:03 ` Junio C Hamano
2013-09-20 22:51   ` Junio C Hamano
2013-09-21  6:42     ` Michael Haggerty
2013-09-21 12:28       ` John Szakmeister
2013-09-24  7:51       ` Jeff King
2013-09-24 13:22         ` Marc Branchaud
2013-09-25  8:22           ` Jeff King
2013-09-25 22:54         ` Nicolas Pitre
2013-09-28 12:20           ` Michael Haggerty
2013-09-28 21:42             ` Johan Herland
2013-09-29  4:29               ` Michael Haggerty
2013-09-29  9:30                 ` Johan Herland
2013-09-30 15:24                 ` Marc Branchaud
2013-09-30 15:52                   ` Nicolas Pitre
2013-09-30 19:16                     ` Marc Branchaud
2013-09-30 20:08                       ` Nicolas Pitre
2013-09-30 21:14                         ` Marc Branchaud
2013-09-30 22:44                           ` Nicolas Pitre
2013-09-30 23:18                             ` Jeff King
2013-10-01  3:04                             ` Marc Branchaud
2013-10-01  3:28                               ` Nicolas Pitre
2013-10-01 12:45                                 ` Marc Branchaud
2013-10-23 15:50 ` [PATCH 00/15] Change semantics of "fetch --tags" Michael Haggerty
2013-10-23 15:50   ` [PATCH 01/15] t5510: use the correct tag name in test Michael Haggerty
2013-10-23 15:50   ` [PATCH 02/15] t5510: prepare test refs more straightforwardly Michael Haggerty
2013-10-23 18:36     ` Junio C Hamano
2013-10-24  6:49       ` Michael Haggerty
2013-10-24 19:50         ` Junio C Hamano
2013-10-23 15:50   ` [PATCH 03/15] t5510: check that "git fetch --prune --tags" does not prune branches Michael Haggerty
2013-10-23 15:50   ` [PATCH 04/15] api-remote.txt: correct section "struct refspect" Michael Haggerty
2013-10-23 18:43     ` Junio C Hamano
2013-10-24  7:06       ` Michael Haggerty
2013-10-23 15:50   ` [PATCH 05/15] get_ref_map(): rename local variables Michael Haggerty
2013-10-23 18:45     ` Junio C Hamano
2013-10-24  7:24       ` Michael Haggerty
2013-10-23 15:50   ` Michael Haggerty [this message]
2013-10-23 15:50   ` [PATCH 07/15] ref_remove_duplicates(): simplify function Michael Haggerty
2013-10-23 15:50   ` [PATCH 08/15] ref_remove_duplicates(): improve documentation comment Michael Haggerty
2013-10-23 18:47     ` Junio C Hamano
2013-10-23 15:50   ` [PATCH 09/15] builtin/fetch.c: reorder function definitions Michael Haggerty
2013-10-23 15:50   ` [PATCH 10/15] fetch --tags: fetch tags *in addition to* other stuff Michael Haggerty
2013-10-24 20:51     ` Junio C Hamano
2013-10-25 15:08       ` Michael Haggerty
2013-10-28 19:10         ` Junio C Hamano
2013-10-30  4:26           ` Michael Haggerty
2013-10-26  5:10       ` Michael Haggerty
2013-10-23 15:50   ` [PATCH 11/15] fetch --prune: prune only based on explicit refspecs Michael Haggerty
2013-10-24 21:11     ` Junio C Hamano
2013-10-26  6:49       ` Michael Haggerty
2013-10-28 15:08         ` Junio C Hamano
2013-10-23 15:50   ` [PATCH 12/15] query_refspecs(): move some constants out of the loop Michael Haggerty
2013-10-23 15:50   ` [PATCH 13/15] builtin/remote.c: reorder function definitions Michael Haggerty
2013-10-23 15:50   ` [PATCH 14/15] builtin/remote.c:update(): use struct argv_array Michael Haggerty
2013-10-23 15:50   ` [PATCH 15/15] fetch, remote: properly convey --no-prune options to subprocesses Michael Haggerty
2013-10-24 21:17     ` Junio C Hamano
2013-10-23 16:59   ` [PATCH 00/15] Change semantics of "fetch --tags" Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1382543448-2586-7-git-send-email-mhagger@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=cmn@elego.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johan@herland.net \
    --cc=john@szakmeister.net \
    --cc=marcnarc@xiplink.com \
    --cc=mschub@elegosoft.com \
    --cc=nico@fluxnic.net \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).