git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: git@vger.kernel.org
Subject: [REJECT 4/3] http-walker: use hashmap to reduce list scan
Date: Mon, 11 Jul 2016 21:02:43 +0000	[thread overview]
Message-ID: <20160711210243.GA1604@whir> (raw)
In-Reply-To: <20160711205131.1291-1-e@80x24.org>

For the sake of documentation, I worked on this patch but I
don't there was a measurable improvement (hard to tell with
variable network conditions) and it increased memory usage to
around 380M.

I wanted to reduce the list scanning in fill_active_slot() by
deleting during iteration, but I'm not sure it helps since the
loop in that is nowhere near as bad as the prefetch() insertion
loop fixed in 3/3.

list_for_each in fetch_object() also tends hit the first object
in the list when iterating, so there's no improvement I see
with this patch.

-----8<------
Subject: [PATCH] http-walker: use hashmap to reduce list scan

We can reduce list walking in fill_active_slot by deleting items
as we walk through the object request queue of pending objects.

However, we still need to maintain a mapping of live objects
for fetch_object, so introduce the use of a hashmap to keep
track of all live object requests in O(1) average time.

Signed-off-by: Eric Wong <e@80x24.org>
---
 http-walker.c | 35 +++++++++++++++++++++++++++--------
 1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/http-walker.c b/http-walker.c
index 0b24255..8d27707 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -3,6 +3,7 @@
 #include "walker.h"
 #include "http.h"
 #include "list.h"
+#include "hashmap.h"
 
 struct alt_base {
 	char *base;
@@ -19,6 +20,7 @@ enum object_request_state {
 };
 
 struct object_request {
+	struct hashmap_entry ent;
 	struct walker *walker;
 	unsigned char sha1[20];
 	struct alt_base *repo;
@@ -43,6 +45,7 @@ struct walker_data {
 };
 
 static LIST_HEAD(object_queue_head);
+static struct hashmap *object_requests;
 
 static void fetch_alternates(struct walker *walker, const char *base);
 
@@ -114,7 +117,11 @@ static void release_object_request(struct object_request *obj_req)
 	if (obj_req->req !=NULL && obj_req->req->localfile != -1)
 		error("fd leakage in release: %d", obj_req->req->localfile);
 
-	list_del(&obj_req->node);
+	/* XXX seems unnecessary with list_del in fill_active_slot */
+	if (!list_empty(&obj_req->node))
+		list_del(&obj_req->node);
+
+	hashmap_remove(object_requests, obj_req, obj_req->sha1);
 	free(obj_req);
 }
 
@@ -127,6 +134,8 @@ static int fill_active_slot(struct walker *walker)
 	list_for_each_safe(pos, tmp, head) {
 		obj_req = list_entry(pos, struct object_request, node);
 		if (obj_req->state == WAITING) {
+			/* _init so future list_del is idempotent */
+			list_del_init(pos);
 			if (has_sha1_file(obj_req->sha1))
 				obj_req->state = COMPLETE;
 			else {
@@ -145,6 +154,8 @@ static void prefetch(struct walker *walker, unsigned char *sha1)
 	struct walker_data *data = walker->data;
 
 	newreq = xmalloc(sizeof(*newreq));
+	hashmap_entry_init(&newreq->ent, sha1hash(sha1));
+	hashmap_add(object_requests, &newreq->ent);
 	newreq->walker = walker;
 	hashcpy(newreq->sha1, sha1);
 	newreq->repo = data->alt;
@@ -435,15 +446,12 @@ static int fetch_object(struct walker *walker, unsigned char *sha1)
 {
 	char *hex = sha1_to_hex(sha1);
 	int ret = 0;
-	struct object_request *obj_req = NULL;
+	struct object_request *obj_req;
+	struct hashmap_entry key;
 	struct http_object_request *req;
-	struct list_head *pos, *head = &object_queue_head;
 
-	list_for_each(pos, head) {
-		obj_req = list_entry(pos, struct object_request, node);
-		if (!hashcmp(obj_req->sha1, sha1))
-			break;
-	}
+	hashmap_entry_init(&key, sha1hash(sha1));
+	obj_req = hashmap_get(object_requests, &key, sha1);
 	if (obj_req == NULL)
 		return error("Couldn't find request for %s in the queue", hex);
 
@@ -553,6 +561,12 @@ static void cleanup(struct walker *walker)
 	}
 }
 
+static int obj_req_cmp(const struct object_request *e1,
+		const struct object_request *e2, const unsigned char *sha1)
+{
+	return hashcmp(e1->sha1, sha1 ? sha1 : e2->sha1);
+}
+
 struct walker *get_http_walker(const char *url)
 {
 	char *s;
@@ -580,5 +594,10 @@ struct walker *get_http_walker(const char *url)
 	add_fill_function(walker, (int (*)(void *)) fill_active_slot);
 #endif
 
+	if (!object_requests) {
+		object_requests = xmalloc(sizeof(*object_requests));
+		hashmap_init(object_requests, (hashmap_cmp_fn)obj_req_cmp, 0);
+	}
+
 	return walker;
 }
-- 
EW

  parent reply	other threads:[~2016-07-11 21:02 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-11 20:51 [RFC 0/3] dumb HTTP transport speedups Eric Wong
2016-07-11 20:51 ` [PATCH 1/3] http-walker: remove unused parameter from fetch_object Eric Wong
2016-07-11 20:51 ` [PATCH 2/3] http: avoid disconnecting on 404s for loose objects Eric Wong
2016-07-11 20:51 ` [PATCH 3/3] http-walker: reduce O(n) ops with doubly-linked list Eric Wong
2016-07-11 21:02 ` Eric Wong [this message]
2016-07-24 10:11 ` [RFC 0/3] dumb HTTP transport speedups Jakub Narębski
2016-07-24 11:20   ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160711210243.GA1604@whir \
    --to=e@80x24.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).