[RFC 0/3] dumb HTTP transport speedups

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [RFC 0/3] dumb HTTP transport speedups
@ 2016-07-11 20:51 Eric Wong
  2016-07-11 20:51 ` [PATCH 1/3] http-walker: remove unused parameter from fetch_object Eric Wong
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Eric Wong @ 2016-07-11 20:51 UTC (permalink / raw)
  To: git

TL;DR: dumb HTTP clone from a certain badly-packed repo goes from
~2 hours to ~30 min memory usage drops from 2G to 360M

I hadn't packed the public repo at https://public-inbox.org/git
for a few weeks.  As an admin of a small server limited memory
and CPU resources but fairly good bandwidth, I prefer clients
use dumb HTTP for initial clones.

Unfortunately, I noticed my dinky netbook runs out-of-memory
when using GIT_SMART_HTTP=0 to clone this giant repo; and a
machine with more memory still takes over two hours depending
on network conditions (and uses around 2GB RSS!).

Anyways, https://public-inbox.org/git is better packed, now;
but I've kept https://80x24.org/git-i-forgot-to-pack available
with over 7K loose objects to illustrate the problem:

	(this is dumb HTTP-only)
	git clone --mirror https://80x24.org/git-i-forgot-to-pack

The primary problem is fixed by PATCH 3/3 in this series, and I
can now clone the above in around 30 minutes and "only" seems to
use around 360M memory.

I'll leave git-i-forgot-to-pack up for a few months/year
so others can test and hammer away at it.

The following changes since commit 5c589a73de4394ad125a4effac227b3aec856fa1:

  Third batch of topics for 2.10 (2016-07-06 13:42:58 -0700)

are available in the git repository at:

  git://bogomips.org/git-svn.git dumb-speedups

for you to fetch changes up to b9d5aca4b8e6c9f7fb5ee4e0ce33bb42c4ea2992:

  http-walker: reduce O(n) ops with doubly-linked list (2016-07-11 20:25:51 +0000)

----------------------------------------------------------------
Eric Wong (3):
      http-walker: remove unused parameter from fetch_object
      http: avoid disconnecting on 404s for loose objects
      http-walker: reduce O(n) ops with doubly-linked list

 http-walker.c |  55 ++++++++++----------
 http.c        |  16 +++++-
 list.h        | 164 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 204 insertions(+), 31 deletions(-)
 create mode 100644 list.h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] http-walker: remove unused parameter from fetch_object
  2016-07-11 20:51 [RFC 0/3] dumb HTTP transport speedups Eric Wong
@ 2016-07-11 20:51 ` Eric Wong
  2016-07-11 20:51 ` [PATCH 2/3] http: avoid disconnecting on 404s for loose objects Eric Wong
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2016-07-11 20:51 UTC (permalink / raw)
  To: git; +Cc: Eric Wong

This parameter has not been used since commit 1d389ab65dc6
("Add support for parallel HTTP transfers") back in 2005

Signed-off-by: Eric Wong <e@80x24.org>
---
 http-walker.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/http-walker.c b/http-walker.c
index 2c721f0..9f28523 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -447,7 +447,7 @@ static void abort_object_request(struct object_request *obj_req)
 	release_object_request(obj_req);
 }
 
-static int fetch_object(struct walker *walker, struct alt_base *repo, unsigned char *sha1)
+static int fetch_object(struct walker *walker, unsigned char *sha1)
 {
 	char *hex = sha1_to_hex(sha1);
 	int ret = 0;
@@ -518,7 +518,7 @@ static int fetch(struct walker *walker, unsigned char *sha1)
 	struct walker_data *data = walker->data;
 	struct alt_base *altbase = data->alt;
 
-	if (!fetch_object(walker, altbase, sha1))
+	if (!fetch_object(walker, sha1))
 		return 0;
 	while (altbase) {
 		if (!http_fetch_pack(walker, altbase, sha1))
-- 
EW


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] http: avoid disconnecting on 404s for loose objects
  2016-07-11 20:51 [RFC 0/3] dumb HTTP transport speedups Eric Wong
  2016-07-11 20:51 ` [PATCH 1/3] http-walker: remove unused parameter from fetch_object Eric Wong
@ 2016-07-11 20:51 ` Eric Wong
  2016-07-11 20:51 ` [PATCH 3/3] http-walker: reduce O(n) ops with doubly-linked list Eric Wong
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2016-07-11 20:51 UTC (permalink / raw)
  To: git; +Cc: Eric Wong

404s are common when fetching loose objects on static HTTP
servers, and reestablishing a connection for every single
404 adds additional latency.

Signed-off-by: Eric Wong <e@80x24.org>
---
 http-walker.c |  9 +++++++++
 http.c        | 16 ++++++++++++++--
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/http-walker.c b/http-walker.c
index 9f28523..48f2df4 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -488,6 +488,15 @@ static int fetch_object(struct walker *walker, unsigned char *sha1)
 		req->localfile = -1;
 	}
 
+	/*
+	 * we turned off CURLOPT_FAILONERROR to avoid losing a
+	 * persistent connection and got CURLE_OK.
+	 */
+	if (req->http_code == 404 && req->curl_result == CURLE_OK &&
+			(starts_with(req->url, "http://") ||
+			 starts_with(req->url, "https://")))
+		req->curl_result = CURLE_HTTP_RETURNED_ERROR;
+
 	if (obj_req->state == ABORTED) {
 		ret = error("Request for %s aborted", hex);
 	} else if (req->curl_result != CURLE_OK &&
diff --git a/http.c b/http.c
index d8b2bec..e81dd13 100644
--- a/http.c
+++ b/http.c
@@ -1975,8 +1975,19 @@ static size_t fwrite_sha1_file(char *ptr, size_t eltsize, size_t nmemb,
 	unsigned char expn[4096];
 	size_t size = eltsize * nmemb;
 	int posn = 0;
-	struct http_object_request *freq =
-		(struct http_object_request *)data;
+	struct http_object_request *freq = data;
+	struct active_request_slot *slot = freq->slot;
+
+	if (slot) {
+		CURLcode c = curl_easy_getinfo(slot->curl, CURLINFO_HTTP_CODE,
+						&slot->http_code);
+		if (c != CURLE_OK)
+			die("BUG: curl_easy_getinfo for HTTP code failed: %s",
+				curl_easy_strerror(c));
+		if (slot->http_code >= 400)
+			return size;
+	}
+
 	do {
 		ssize_t retval = xwrite(freq->localfile,
 					(char *) ptr + posn, size - posn);
@@ -2097,6 +2108,7 @@ struct http_object_request *new_http_object_request(const char *base_url,
 	freq->slot = get_active_slot();
 
 	curl_easy_setopt(freq->slot->curl, CURLOPT_FILE, freq);
+	curl_easy_setopt(freq->slot->curl, CURLOPT_FAILONERROR, 0);
 	curl_easy_setopt(freq->slot->curl, CURLOPT_WRITEFUNCTION, fwrite_sha1_file);
 	curl_easy_setopt(freq->slot->curl, CURLOPT_ERRORBUFFER, freq->errorstr);
 	curl_easy_setopt(freq->slot->curl, CURLOPT_URL, freq->url);
-- 
EW


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] http-walker: reduce O(n) ops with doubly-linked list
  2016-07-11 20:51 [RFC 0/3] dumb HTTP transport speedups Eric Wong
  2016-07-11 20:51 ` [PATCH 1/3] http-walker: remove unused parameter from fetch_object Eric Wong
  2016-07-11 20:51 ` [PATCH 2/3] http: avoid disconnecting on 404s for loose objects Eric Wong
@ 2016-07-11 20:51 ` Eric Wong
  2016-07-11 21:02 ` [REJECT 4/3] http-walker: use hashmap to reduce list scan Eric Wong
  2016-07-24 10:11 ` [RFC 0/3] dumb HTTP transport speedups Jakub Narębski
  4 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2016-07-11 20:51 UTC (permalink / raw)
  To: git; +Cc: Eric Wong

Using the a Linux-kernel-derived doubly-linked list
implementation from the Userspace RCU library allows us to
enqueue and delete items from the object request queue in
constant time.

This change reduces enqueue times in the prefetch() function
where object request queue could grow to several thousand
objects.

I left out the list_for_each_entry* family macros from list.h
which relied on the __typeof__ operator as we support platforms
without it.  Thus, list_entry (aka "container_of") needs to be
called explicitly inside macro-wrapped for loops.

The downside is this costs us an additional pointer per object
request, but this is offset by reduced overhead on queue
operations leading to improved performance and shorter queue
depths.

Signed-off-by: Eric Wong <e@80x24.org>
---
 http-walker.c |  42 ++++++---------
 list.h        | 164 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 179 insertions(+), 27 deletions(-)
 create mode 100644 list.h

diff --git a/http-walker.c b/http-walker.c
index 48f2df4..0b24255 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -2,6 +2,7 @@
 #include "commit.h"
 #include "walker.h"
 #include "http.h"
+#include "list.h"
 
 struct alt_base {
 	char *base;
@@ -23,7 +24,7 @@ struct object_request {
 	struct alt_base *repo;
 	enum object_request_state state;
 	struct http_object_request *req;
-	struct object_request *next;
+	struct list_head node;
 };
 
 struct alternates_request {
@@ -41,7 +42,7 @@ struct walker_data {
 	struct alt_base *alt;
 };
 
-static struct object_request *object_queue_head;
+static LIST_HEAD(object_queue_head);
 
 static void fetch_alternates(struct walker *walker, const char *base);
 
@@ -110,19 +111,10 @@ static void process_object_response(void *callback_data)
 
 static void release_object_request(struct object_request *obj_req)
 {
-	struct object_request *entry = object_queue_head;
-
 	if (obj_req->req !=NULL && obj_req->req->localfile != -1)
 		error("fd leakage in release: %d", obj_req->req->localfile);
-	if (obj_req == object_queue_head) {
-		object_queue_head = obj_req->next;
-	} else {
-		while (entry->next != NULL && entry->next != obj_req)
-			entry = entry->next;
-		if (entry->next == obj_req)
-			entry->next = entry->next->next;
-	}
 
+	list_del(&obj_req->node);
 	free(obj_req);
 }
 
@@ -130,8 +122,10 @@ static void release_object_request(struct object_request *obj_req)
 static int fill_active_slot(struct walker *walker)
 {
 	struct object_request *obj_req;
+	struct list_head *pos, *tmp, *head = &object_queue_head;
 
-	for (obj_req = object_queue_head; obj_req; obj_req = obj_req->next) {
+	list_for_each_safe(pos, tmp, head) {
+		obj_req = list_entry(pos, struct object_request, node);
 		if (obj_req->state == WAITING) {
 			if (has_sha1_file(obj_req->sha1))
 				obj_req->state = COMPLETE;
@@ -148,7 +142,6 @@ static int fill_active_slot(struct walker *walker)
 static void prefetch(struct walker *walker, unsigned char *sha1)
 {
 	struct object_request *newreq;
-	struct object_request *tail;
 	struct walker_data *data = walker->data;
 
 	newreq = xmalloc(sizeof(*newreq));
@@ -157,18 +150,9 @@ static void prefetch(struct walker *walker, unsigned char *sha1)
 	newreq->repo = data->alt;
 	newreq->state = WAITING;
 	newreq->req = NULL;
-	newreq->next = NULL;
 
 	http_is_verbose = walker->get_verbosely;
-
-	if (object_queue_head == NULL) {
-		object_queue_head = newreq;
-	} else {
-		tail = object_queue_head;
-		while (tail->next != NULL)
-			tail = tail->next;
-		tail->next = newreq;
-	}
+	list_add_tail(&newreq->node, &object_queue_head);
 
 #ifdef USE_CURL_MULTI
 	fill_active_slots();
@@ -451,11 +435,15 @@ static int fetch_object(struct walker *walker, unsigned char *sha1)
 {
 	char *hex = sha1_to_hex(sha1);
 	int ret = 0;
-	struct object_request *obj_req = object_queue_head;
+	struct object_request *obj_req = NULL;
 	struct http_object_request *req;
+	struct list_head *pos, *head = &object_queue_head;
 
-	while (obj_req != NULL && hashcmp(obj_req->sha1, sha1))
-		obj_req = obj_req->next;
+	list_for_each(pos, head) {
+		obj_req = list_entry(pos, struct object_request, node);
+		if (!hashcmp(obj_req->sha1, sha1))
+			break;
+	}
 	if (obj_req == NULL)
 		return error("Couldn't find request for %s in the queue", hex);
 
diff --git a/list.h b/list.h
new file mode 100644
index 0000000..f65edce
--- /dev/null
+++ b/list.h
@@ -0,0 +1,164 @@
+/*
+ * Copyright (C) 2002 Free Software Foundation, Inc.
+ * (originally part of the GNU C Library and Userspace RCU)
+ * Contributed by Ulrich Drepper <drepper@redhat.com>, 2002.
+ *
+ * Copyright (C) 2009 Pierre-Marc Fournier
+ * Conversion to RCU list.
+ * Copyright (C) 2010 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see
+ * <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef LIST_H
+#define LIST_H	1
+
+/*
+ * The definitions of this file are adopted from those which can be
+ * found in the Linux kernel headers to enable people familiar with the
+ * latter find their way in these sources as well.
+ */
+
+/* Basic type for the double-link list. */
+struct list_head {
+	struct list_head *next, *prev;
+};
+
+/* Define a variable with the head and tail of the list. */
+#define LIST_HEAD(name) \
+	struct list_head name = { &(name), &(name) }
+
+/* Initialize a new list head. */
+#define INIT_LIST_HEAD(ptr) \
+	(ptr)->next = (ptr)->prev = (ptr)
+
+#define LIST_HEAD_INIT(name) { &(name), &(name) }
+
+/* Add new element at the head of the list. */
+static inline void list_add(struct list_head *newp, struct list_head *head)
+{
+	head->next->prev = newp;
+	newp->next = head->next;
+	newp->prev = head;
+	head->next = newp;
+}
+
+/* Add new element at the tail of the list. */
+static inline void list_add_tail(struct list_head *newp, struct list_head *head)
+{
+	head->prev->next = newp;
+	newp->next = head;
+	newp->prev = head->prev;
+	head->prev = newp;
+}
+
+/* Remove element from list. */
+static inline void __list_del(struct list_head *prev, struct list_head *next)
+{
+	next->prev = prev;
+	prev->next = next;
+}
+
+/* Remove element from list. */
+static inline void list_del(struct list_head *elem)
+{
+	__list_del(elem->prev, elem->next);
+}
+
+/* Remove element from list, initializing the element's list pointers. */
+static inline void list_del_init(struct list_head *elem)
+{
+	list_del(elem);
+	INIT_LIST_HEAD(elem);
+}
+
+/* Delete from list, add to another list as head. */
+static inline void list_move(struct list_head *elem, struct list_head *head)
+{
+	__list_del(elem->prev, elem->next);
+	list_add(elem, head);
+}
+
+/* Replace an old entry. */
+static inline void list_replace(struct list_head *old, struct list_head *newp)
+{
+	newp->next = old->next;
+	newp->prev = old->prev;
+	newp->prev->next = newp;
+	newp->next->prev = newp;
+}
+
+/* Join two lists. */
+static inline void list_splice(struct list_head *add, struct list_head *head)
+{
+	/* Do nothing if the list which gets added is empty. */
+	if (add != add->next) {
+		add->next->prev = head;
+		add->prev->next = head->next;
+		head->next->prev = add->prev;
+		head->next = add->next;
+	}
+}
+
+/* Get typed element from list at a given position. */
+#define list_entry(ptr, type, member) \
+	((type *) ((char *) (ptr) - offsetof(type, member)))
+
+/* Get first entry from a list. */
+#define list_first_entry(ptr, type, member) \
+	list_entry((ptr)->next, type, member)
+
+/* Iterate forward over the elements of the list. */
+#define list_for_each(pos, head) \
+	for (pos = (head)->next; pos != (head); pos = pos->next)
+
+/*
+ * Iterate forward over the elements list. The list elements can be
+ * removed from the list while doing this.
+ */
+#define list_for_each_safe(pos, p, head) \
+	for (pos = (head)->next, p = pos->next; \
+		pos != (head); \
+		pos = p, p = pos->next)
+
+/* Iterate backward over the elements of the list. */
+#define list_for_each_prev(pos, head) \
+	for (pos = (head)->prev; pos != (head); pos = pos->prev)
+
+/*
+ * Iterate backwards over the elements list. The list elements can be
+ * removed from the list while doing this.
+ */
+#define list_for_each_prev_safe(pos, p, head) \
+	for (pos = (head)->prev, p = pos->prev; \
+		pos != (head); \
+		pos = p, p = pos->prev)
+
+static inline int list_empty(struct list_head *head)
+{
+	return head == head->next;
+}
+
+static inline void list_replace_init(struct list_head *old,
+				     struct list_head *newp)
+{
+	struct list_head *head = old->next;
+
+	list_del(old);
+	list_add_tail(newp, head);
+	INIT_LIST_HEAD(old);
+}
+
+#endif /* LIST_H */
-- 
EW


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [REJECT 4/3] http-walker: use hashmap to reduce list scan
  2016-07-11 20:51 [RFC 0/3] dumb HTTP transport speedups Eric Wong
                   ` (2 preceding siblings ...)
  2016-07-11 20:51 ` [PATCH 3/3] http-walker: reduce O(n) ops with doubly-linked list Eric Wong
@ 2016-07-11 21:02 ` Eric Wong
  2016-07-24 10:11 ` [RFC 0/3] dumb HTTP transport speedups Jakub Narębski
  4 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2016-07-11 21:02 UTC (permalink / raw)
  To: git

For the sake of documentation, I worked on this patch but I
don't there was a measurable improvement (hard to tell with
variable network conditions) and it increased memory usage to
around 380M.

I wanted to reduce the list scanning in fill_active_slot() by
deleting during iteration, but I'm not sure it helps since the
loop in that is nowhere near as bad as the prefetch() insertion
loop fixed in 3/3.

list_for_each in fetch_object() also tends hit the first object
in the list when iterating, so there's no improvement I see
with this patch.

-----8<------
Subject: [PATCH] http-walker: use hashmap to reduce list scan

We can reduce list walking in fill_active_slot by deleting items
as we walk through the object request queue of pending objects.

However, we still need to maintain a mapping of live objects
for fetch_object, so introduce the use of a hashmap to keep
track of all live object requests in O(1) average time.

Signed-off-by: Eric Wong <e@80x24.org>
---
 http-walker.c | 35 +++++++++++++++++++++++++++--------
 1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/http-walker.c b/http-walker.c
index 0b24255..8d27707 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -3,6 +3,7 @@
 #include "walker.h"
 #include "http.h"
 #include "list.h"
+#include "hashmap.h"
 
 struct alt_base {
 	char *base;
@@ -19,6 +20,7 @@ enum object_request_state {
 };
 
 struct object_request {
+	struct hashmap_entry ent;
 	struct walker *walker;
 	unsigned char sha1[20];
 	struct alt_base *repo;
@@ -43,6 +45,7 @@ struct walker_data {
 };
 
 static LIST_HEAD(object_queue_head);
+static struct hashmap *object_requests;
 
 static void fetch_alternates(struct walker *walker, const char *base);
 
@@ -114,7 +117,11 @@ static void release_object_request(struct object_request *obj_req)
 	if (obj_req->req !=NULL && obj_req->req->localfile != -1)
 		error("fd leakage in release: %d", obj_req->req->localfile);
 
-	list_del(&obj_req->node);
+	/* XXX seems unnecessary with list_del in fill_active_slot */
+	if (!list_empty(&obj_req->node))
+		list_del(&obj_req->node);
+
+	hashmap_remove(object_requests, obj_req, obj_req->sha1);
 	free(obj_req);
 }
 
@@ -127,6 +134,8 @@ static int fill_active_slot(struct walker *walker)
 	list_for_each_safe(pos, tmp, head) {
 		obj_req = list_entry(pos, struct object_request, node);
 		if (obj_req->state == WAITING) {
+			/* _init so future list_del is idempotent */
+			list_del_init(pos);
 			if (has_sha1_file(obj_req->sha1))
 				obj_req->state = COMPLETE;
 			else {
@@ -145,6 +154,8 @@ static void prefetch(struct walker *walker, unsigned char *sha1)
 	struct walker_data *data = walker->data;
 
 	newreq = xmalloc(sizeof(*newreq));
+	hashmap_entry_init(&newreq->ent, sha1hash(sha1));
+	hashmap_add(object_requests, &newreq->ent);
 	newreq->walker = walker;
 	hashcpy(newreq->sha1, sha1);
 	newreq->repo = data->alt;
@@ -435,15 +446,12 @@ static int fetch_object(struct walker *walker, unsigned char *sha1)
 {
 	char *hex = sha1_to_hex(sha1);
 	int ret = 0;
-	struct object_request *obj_req = NULL;
+	struct object_request *obj_req;
+	struct hashmap_entry key;
 	struct http_object_request *req;
-	struct list_head *pos, *head = &object_queue_head;
 
-	list_for_each(pos, head) {
-		obj_req = list_entry(pos, struct object_request, node);
-		if (!hashcmp(obj_req->sha1, sha1))
-			break;
-	}
+	hashmap_entry_init(&key, sha1hash(sha1));
+	obj_req = hashmap_get(object_requests, &key, sha1);
 	if (obj_req == NULL)
 		return error("Couldn't find request for %s in the queue", hex);
 
@@ -553,6 +561,12 @@ static void cleanup(struct walker *walker)
 	}
 }
 
+static int obj_req_cmp(const struct object_request *e1,
+		const struct object_request *e2, const unsigned char *sha1)
+{
+	return hashcmp(e1->sha1, sha1 ? sha1 : e2->sha1);
+}
+
 struct walker *get_http_walker(const char *url)
 {
 	char *s;
@@ -580,5 +594,10 @@ struct walker *get_http_walker(const char *url)
 	add_fill_function(walker, (int (*)(void *)) fill_active_slot);
 #endif
 
+	if (!object_requests) {
+		object_requests = xmalloc(sizeof(*object_requests));
+		hashmap_init(object_requests, (hashmap_cmp_fn)obj_req_cmp, 0);
+	}
+
 	return walker;
 }
-- 
EW

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC 0/3] dumb HTTP transport speedups
  2016-07-11 20:51 [RFC 0/3] dumb HTTP transport speedups Eric Wong
                   ` (3 preceding siblings ...)
  2016-07-11 21:02 ` [REJECT 4/3] http-walker: use hashmap to reduce list scan Eric Wong
@ 2016-07-24 10:11 ` Jakub Narębski
  2016-07-24 11:20   ` Eric Wong
  4 siblings, 1 reply; 7+ messages in thread
From: Jakub Narębski @ 2016-07-24 10:11 UTC (permalink / raw)
  To: Eric Wong, git

W dniu 2016-07-11 o 22:51, Eric Wong pisze:

> TL;DR: dumb HTTP clone from a certain badly-packed repo goes from
> ~2 hours to ~30 min memory usage drops from 2G to 360M
> 
> 
> I hadn't packed the public repo at https://public-inbox.org/git
> for a few weeks.  As an admin of a small server limited memory
> and CPU resources but fairly good bandwidth, I prefer clients
> use dumb HTTP for initial clones.

Hopefully the solution / workaround for large initial clone
problem utilizing bundles (`git bundle`), which can be resumably
transferred, would get standarized and automated.

Do you use bitmap indices for speeding up fetches?

BTW. IMVHO the problem with dumb HTTP is the latency, not extra
bandwidth needed...

Best,
-- 
Jakub Narębski


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 0/3] dumb HTTP transport speedups
  2016-07-24 10:11 ` [RFC 0/3] dumb HTTP transport speedups Jakub Narębski
@ 2016-07-24 11:20   ` Eric Wong
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2016-07-24 11:20 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: git

Jakub Narębski <jnareb@gmail.com> wrote:
> W dniu 2016-07-11 o 22:51, Eric Wong pisze:
> 
> > TL;DR: dumb HTTP clone from a certain badly-packed repo goes from
> > ~2 hours to ~30 min memory usage drops from 2G to 360M
> > 
> > 
> > I hadn't packed the public repo at https://public-inbox.org/git
> > for a few weeks.  As an admin of a small server limited memory
> > and CPU resources but fairly good bandwidth, I prefer clients
> > use dumb HTTP for initial clones.
> 
> Hopefully the solution / workaround for large initial clone
> problem utilizing bundles (`git bundle`), which can be resumably
> transferred, would get standarized and automated.

I've been hoping to look at this more in coming weeks/months.
It would be nice if bundles and packs could be unified somehow
to avoid doubling storage on the server.

> Do you use bitmap indices for speeding up fetches?

Yes, but slow clients are still a problem since big responses
keeps memory-hungry processes running while trickling
(or waste disk space buffering the pack output up front)

Static packfiles/bundles are nice since all the clients can
share the same data on the server side as it's trickled out.

> BTW. IMVHO the problem with dumb HTTP is the latency, not extra
> bandwidth needed...

I enabled persistent connections for 404s on loose objects for
this reason :)  We should probably be doing it across the board
on 404s, just haven't gotten around to it...

Increasing default parallelism should also help; but might hurt
some servers which can't handle many connections...
Hard to imagine people using antiquated prefork servers for
slow clients in a post-Slowloris world, but maybe it happens?

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-07-24 11:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-11 20:51 [RFC 0/3] dumb HTTP transport speedups Eric Wong
2016-07-11 20:51 ` [PATCH 1/3] http-walker: remove unused parameter from fetch_object Eric Wong
2016-07-11 20:51 ` [PATCH 2/3] http: avoid disconnecting on 404s for loose objects Eric Wong
2016-07-11 20:51 ` [PATCH 3/3] http-walker: reduce O(n) ops with doubly-linked list Eric Wong
2016-07-11 21:02 ` [REJECT 4/3] http-walker: use hashmap to reduce list scan Eric Wong
2016-07-24 10:11 ` [RFC 0/3] dumb HTTP transport speedups Jakub Narębski
2016-07-24 11:20   ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).