From: Eric Wong <e@80x24.org>
To: git@vger.kernel.org
Subject: [REJECT 4/3] http-walker: use hashmap to reduce list scan
Date: Mon, 11 Jul 2016 21:02:43 +0000 [thread overview]
Message-ID: <20160711210243.GA1604@whir> (raw)
In-Reply-To: <20160711205131.1291-1-e@80x24.org>
For the sake of documentation, I worked on this patch but I
don't there was a measurable improvement (hard to tell with
variable network conditions) and it increased memory usage to
around 380M.
I wanted to reduce the list scanning in fill_active_slot() by
deleting during iteration, but I'm not sure it helps since the
loop in that is nowhere near as bad as the prefetch() insertion
loop fixed in 3/3.
list_for_each in fetch_object() also tends hit the first object
in the list when iterating, so there's no improvement I see
with this patch.
-----8<------
Subject: [PATCH] http-walker: use hashmap to reduce list scan
We can reduce list walking in fill_active_slot by deleting items
as we walk through the object request queue of pending objects.
However, we still need to maintain a mapping of live objects
for fetch_object, so introduce the use of a hashmap to keep
track of all live object requests in O(1) average time.
Signed-off-by: Eric Wong <e@80x24.org>
---
http-walker.c | 35 +++++++++++++++++++++++++++--------
1 file changed, 27 insertions(+), 8 deletions(-)
diff --git a/http-walker.c b/http-walker.c
index 0b24255..8d27707 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -3,6 +3,7 @@
#include "walker.h"
#include "http.h"
#include "list.h"
+#include "hashmap.h"
struct alt_base {
char *base;
@@ -19,6 +20,7 @@ enum object_request_state {
};
struct object_request {
+ struct hashmap_entry ent;
struct walker *walker;
unsigned char sha1[20];
struct alt_base *repo;
@@ -43,6 +45,7 @@ struct walker_data {
};
static LIST_HEAD(object_queue_head);
+static struct hashmap *object_requests;
static void fetch_alternates(struct walker *walker, const char *base);
@@ -114,7 +117,11 @@ static void release_object_request(struct object_request *obj_req)
if (obj_req->req !=NULL && obj_req->req->localfile != -1)
error("fd leakage in release: %d", obj_req->req->localfile);
- list_del(&obj_req->node);
+ /* XXX seems unnecessary with list_del in fill_active_slot */
+ if (!list_empty(&obj_req->node))
+ list_del(&obj_req->node);
+
+ hashmap_remove(object_requests, obj_req, obj_req->sha1);
free(obj_req);
}
@@ -127,6 +134,8 @@ static int fill_active_slot(struct walker *walker)
list_for_each_safe(pos, tmp, head) {
obj_req = list_entry(pos, struct object_request, node);
if (obj_req->state == WAITING) {
+ /* _init so future list_del is idempotent */
+ list_del_init(pos);
if (has_sha1_file(obj_req->sha1))
obj_req->state = COMPLETE;
else {
@@ -145,6 +154,8 @@ static void prefetch(struct walker *walker, unsigned char *sha1)
struct walker_data *data = walker->data;
newreq = xmalloc(sizeof(*newreq));
+ hashmap_entry_init(&newreq->ent, sha1hash(sha1));
+ hashmap_add(object_requests, &newreq->ent);
newreq->walker = walker;
hashcpy(newreq->sha1, sha1);
newreq->repo = data->alt;
@@ -435,15 +446,12 @@ static int fetch_object(struct walker *walker, unsigned char *sha1)
{
char *hex = sha1_to_hex(sha1);
int ret = 0;
- struct object_request *obj_req = NULL;
+ struct object_request *obj_req;
+ struct hashmap_entry key;
struct http_object_request *req;
- struct list_head *pos, *head = &object_queue_head;
- list_for_each(pos, head) {
- obj_req = list_entry(pos, struct object_request, node);
- if (!hashcmp(obj_req->sha1, sha1))
- break;
- }
+ hashmap_entry_init(&key, sha1hash(sha1));
+ obj_req = hashmap_get(object_requests, &key, sha1);
if (obj_req == NULL)
return error("Couldn't find request for %s in the queue", hex);
@@ -553,6 +561,12 @@ static void cleanup(struct walker *walker)
}
}
+static int obj_req_cmp(const struct object_request *e1,
+ const struct object_request *e2, const unsigned char *sha1)
+{
+ return hashcmp(e1->sha1, sha1 ? sha1 : e2->sha1);
+}
+
struct walker *get_http_walker(const char *url)
{
char *s;
@@ -580,5 +594,10 @@ struct walker *get_http_walker(const char *url)
add_fill_function(walker, (int (*)(void *)) fill_active_slot);
#endif
+ if (!object_requests) {
+ object_requests = xmalloc(sizeof(*object_requests));
+ hashmap_init(object_requests, (hashmap_cmp_fn)obj_req_cmp, 0);
+ }
+
return walker;
}
--
EW
next prev parent reply other threads:[~2016-07-11 21:02 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-11 20:51 [RFC 0/3] dumb HTTP transport speedups Eric Wong
2016-07-11 20:51 ` [PATCH 1/3] http-walker: remove unused parameter from fetch_object Eric Wong
2016-07-11 20:51 ` [PATCH 2/3] http: avoid disconnecting on 404s for loose objects Eric Wong
2016-07-11 20:51 ` [PATCH 3/3] http-walker: reduce O(n) ops with doubly-linked list Eric Wong
2016-07-11 21:02 ` Eric Wong [this message]
2016-07-24 10:11 ` [RFC 0/3] dumb HTTP transport speedups Jakub Narębski
2016-07-24 11:20 ` Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160711210243.GA1604@whir \
--to=e@80x24.org \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).