git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2 0/2] Speedup fetch with large numbers of refs
@ 2009-10-25 21:28 Julian Phillips
  2009-10-25 21:28 ` [PATCH v2 1/2] remote: Make ref_remove_duplicates faster for " Julian Phillips
  2009-10-25 21:28 ` [PATCH v2 2/2] fetch: Speed up fetch of large numbers of refs Julian Phillips
  0 siblings, 2 replies; 4+ messages in thread
From: Julian Phillips @ 2009-10-25 21:28 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

This is the same as v1, except that this time the tests pass on Linux
as well as on MacOS.

Julian Phillips (2):
  remote: Make ref_remove_duplicates faster for large numbers of refs
  fetch: Speed up fetch of large numbers of refs

 builtin-fetch.c |   17 ++++++++++++++---
 remote.c        |   42 +++++++++++++++++++++++-------------------
 2 files changed, 37 insertions(+), 22 deletions(-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/2] remote: Make ref_remove_duplicates faster for large numbers of refs
  2009-10-25 21:28 [PATCH v2 0/2] Speedup fetch with large numbers of refs Julian Phillips
@ 2009-10-25 21:28 ` Julian Phillips
  2009-10-26 23:12   ` [PATCH v2 3/2] remote: fix poential ref_map list corruption in ref_remove_duplicates Julian Phillips
  2009-10-25 21:28 ` [PATCH v2 2/2] fetch: Speed up fetch of large numbers of refs Julian Phillips
  1 sibling, 1 reply; 4+ messages in thread
From: Julian Phillips @ 2009-10-25 21:28 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

The ref_remove_duplicates function was very slow at dealing with very
large numbers of refs.  This is because it was using a linear search
through all remaining refs to find any duplicates of the current ref.

Rewriting it to use a string list to keep track of which refs have
already been seen and removing duplicates when they are found is much
more efficient.

Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
---
 remote.c |   42 +++++++++++++++++++++++-------------------
 1 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/remote.c b/remote.c
index 73d33f2..1380b20 100644
--- a/remote.c
+++ b/remote.c
@@ -6,6 +6,7 @@
 #include "revision.h"
 #include "dir.h"
 #include "tag.h"
+#include "string-list.h"
 
 static struct refspec s_tag_refspec = {
 	0,
@@ -734,29 +735,32 @@ int for_each_remote(each_remote_fn fn, void *priv)
 
 void ref_remove_duplicates(struct ref *ref_map)
 {
-	struct ref **posn;
-	struct ref *next;
-	for (; ref_map; ref_map = ref_map->next) {
+	struct string_list refs = { NULL, 0, 0, 0 };
+	struct string_list_item *item = NULL;
+	struct ref *prev = NULL, *next = NULL;
+	for (; ref_map; ref_map = next) {
+		next = ref_map->next;
 		if (!ref_map->peer_ref)
 			continue;
-		posn = &ref_map->next;
-		while (*posn) {
-			if ((*posn)->peer_ref &&
-			    !strcmp((*posn)->peer_ref->name,
-				    ref_map->peer_ref->name)) {
-				if (strcmp((*posn)->name, ref_map->name))
-					die("%s tracks both %s and %s",
-					    ref_map->peer_ref->name,
-					    (*posn)->name, ref_map->name);
-				next = (*posn)->next;
-				free((*posn)->peer_ref);
-				free(*posn);
-				*posn = next;
-			} else {
-				posn = &(*posn)->next;
-			}
+
+		item = string_list_lookup(ref_map->peer_ref->name, &refs);
+		if (item) {
+			if (strcmp(((struct ref *)item->util)->name,
+				   ref_map->name))
+				die("%s tracks both %s and %s",
+				    ref_map->peer_ref->name,
+				    ((struct ref *)item->util)->name,
+				    ref_map->name);
+			prev->next = ref_map->next;
+			free(ref_map->peer_ref);
+			free(ref_map);
 		}
+
+		item = string_list_insert(ref_map->peer_ref->name, &refs);
+		item->util = ref_map;
+		prev = ref_map;
 	}
+	string_list_clear(&refs, 0);
 }
 
 int remote_has_url(struct remote *remote, const char *url)
-- 
1.6.5.rc2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 2/2] fetch: Speed up fetch of large numbers of refs
  2009-10-25 21:28 [PATCH v2 0/2] Speedup fetch with large numbers of refs Julian Phillips
  2009-10-25 21:28 ` [PATCH v2 1/2] remote: Make ref_remove_duplicates faster for " Julian Phillips
@ 2009-10-25 21:28 ` Julian Phillips
  1 sibling, 0 replies; 4+ messages in thread
From: Julian Phillips @ 2009-10-25 21:28 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

When there are large numbers of refs, calling read_ref for each ref is
inefficent (and infact downright slow) - so instead use for_each_ref
to build up a string list of all the refs that we currently have,
which significantly improves the volume.

Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
---
 builtin-fetch.c |   17 ++++++++++++++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/builtin-fetch.c b/builtin-fetch.c
index acb08e4..0f53cbd 100644
--- a/builtin-fetch.c
+++ b/builtin-fetch.c
@@ -489,7 +489,8 @@ static int add_existing(const char *refname, const unsigned char *sha1,
 			int flag, void *cbdata)
 {
 	struct string_list *list = (struct string_list *)cbdata;
-	string_list_insert(refname, list);
+	struct string_list_item *item = string_list_insert(refname, list);
+	item->util = (void *)sha1;
 	return 0;
 }
 
@@ -606,9 +607,14 @@ static void check_not_current_branch(struct ref *ref_map)
 static int do_fetch(struct transport *transport,
 		    struct refspec *refs, int ref_count)
 {
+	struct string_list existing_refs = { NULL, 0, 0, 0 };
+	struct string_list_item *peer_item = NULL;
 	struct ref *ref_map;
 	struct ref *rm;
 	int autotags = (transport->remote->fetch_tags == 1);
+
+	for_each_ref(add_existing, &existing_refs);
+
 	if (transport->remote->fetch_tags == 2 && tags != TAGS_UNSET)
 		tags = TAGS_SET;
 	if (transport->remote->fetch_tags == -1)
@@ -631,8 +637,13 @@ static int do_fetch(struct transport *transport,
 		check_not_current_branch(ref_map);
 
 	for (rm = ref_map; rm; rm = rm->next) {
-		if (rm->peer_ref)
-			read_ref(rm->peer_ref->name, rm->peer_ref->old_sha1);
+		if (rm->peer_ref) {
+			peer_item = string_list_lookup(rm->peer_ref->name,
+						       &existing_refs);
+			if (peer_item)
+				hashcpy(rm->peer_ref->old_sha1,
+					peer_item->util);
+		}
 	}
 
 	if (tags == TAGS_DEFAULT && autotags)
-- 
1.6.5.rc2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 3/2] remote: fix poential ref_map list corruption in ref_remove_duplicates
  2009-10-25 21:28 ` [PATCH v2 1/2] remote: Make ref_remove_duplicates faster for " Julian Phillips
@ 2009-10-26 23:12   ` Julian Phillips
  0 siblings, 0 replies; 4+ messages in thread
From: Julian Phillips @ 2009-10-26 23:12 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, git

The prev pointer was not being updated when the peer_ref member
pointer was NULL, which means that that any items in the list with a
NULL peer_ref immediately preceeding a duplicate would be dropped
without being freed.

Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
---

Having fixed the access after free bug, I realised that there was
still a problem.  This one didn't show up in the tests - due to the
rather specific circumstances required, but may occur in real use.

 remote.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/remote.c b/remote.c
index 1380b20..4f9f0cc 100644
--- a/remote.c
+++ b/remote.c
@@ -738,7 +738,7 @@ void ref_remove_duplicates(struct ref *ref_map)
 	struct string_list refs = { NULL, 0, 0, 0 };
 	struct string_list_item *item = NULL;
 	struct ref *prev = NULL, *next = NULL;
-	for (; ref_map; ref_map = next) {
+	for (; ref_map; prev = ref_map, ref_map = next) {
 		next = ref_map->next;
 		if (!ref_map->peer_ref)
 			continue;
@@ -758,7 +758,6 @@ void ref_remove_duplicates(struct ref *ref_map)
 
 		item = string_list_insert(ref_map->peer_ref->name, &refs);
 		item->util = ref_map;
-		prev = ref_map;
 	}
 	string_list_clear(&refs, 0);
 }
-- 
1.6.5.rc2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-10-26 23:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-25 21:28 [PATCH v2 0/2] Speedup fetch with large numbers of refs Julian Phillips
2009-10-25 21:28 ` [PATCH v2 1/2] remote: Make ref_remove_duplicates faster for " Julian Phillips
2009-10-26 23:12   ` [PATCH v2 3/2] remote: fix poential ref_map list corruption in ref_remove_duplicates Julian Phillips
2009-10-25 21:28 ` [PATCH v2 2/2] fetch: Speed up fetch of large numbers of refs Julian Phillips

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).