git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Takuto Ikuta <tikuta@chromium.org>
To: git@vger.kernel.org, gitster@pobox.com
Cc: Takuto Ikuta <tikuta@chromium.org>
Subject: [PATCH v6] fetch-pack.c: use oidset to check existence of loose object
Date: Wed, 14 Mar 2018 15:05:11 +0900	[thread overview]
Message-ID: <20180314060511.257985-1-tikuta@chromium.org> (raw)
In-Reply-To: <20180310123445.79609-1-tikuta@chromium.org>

When fetching from a repository with large number of refs, because to
check existence of each refs in local repository to packed and loose
objects, 'git fetch' ends up doing a lot of lstat(2) to non-existing
loose form, which makes it slow.

Instead of making as many lstat(2) calls as the refs the remote side
advertised to see if these objects exist in the loose form, first
enumerate all the existing loose objects in hashmap beforehand and use
it to check existence of them if the number of refs is larger than the
number of loose objects.

With this patch, the number of lstat(2) calls in `git fetch` is reduced
from 411412 to 13794 for chromium repository, it has more than 480000
remote refs.

I took time stat of `git fetch` when fetch-pack happens for chromium
repository 3 times on linux with SSD.
* with this patch
8.105s
8.309s
7.640s
avg: 8.018s

* master
12.287s
11.175s
12.227s
avg: 11.896s

On my MacBook Air which has slower lstat(2).
* with this patch
14.501s

* master
1m16.027s

`git fetch` on slow disk will be improved largely.

Signed-off-by: Takuto Ikuta <tikuta@chromium.org>
---
 cache.h      |  2 ++
 fetch-pack.c | 45 ++++++++++++++++++++++++++++++++++++++++++---
 sha1_file.c  |  3 +++
 3 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index d06932ed0..6a72f54d7 100644
--- a/cache.h
+++ b/cache.h
@@ -1773,6 +1773,8 @@ struct object_info {
 #define OBJECT_INFO_SKIP_CACHED 4
 /* Do not retry packed storage after checking packed and loose storage */
 #define OBJECT_INFO_QUICK 8
+/* Do not check loose object */
+#define OBJECT_INFO_IGNORE_LOOSE 16
 extern int sha1_object_info_extended(const unsigned char *, struct object_info *, unsigned flags);
 
 /*
diff --git a/fetch-pack.c b/fetch-pack.c
index d97461296..92b9bb4d9 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -711,6 +711,28 @@ static void mark_alternate_complete(struct object *obj)
 	mark_complete(&obj->oid);
 }
 
+struct loose_object_iter {
+	struct oidset *loose_object_set;
+	struct ref *refs;
+};
+
+/*
+ *  If the number of refs is not larger than the number of loose objects,
+ *  this function stops inserting and returns false.
+ */
+static int add_loose_objects_to_set(const struct object_id *oid,
+				    const char *path,
+				    void *data)
+{
+	struct loose_object_iter *iter = data;
+	oidset_insert(iter->loose_object_set, oid);
+	if (iter->refs == NULL)
+		return 1;
+
+	iter->refs = iter->refs->next;
+	return 0;
+}
+
 static int everything_local(struct fetch_pack_args *args,
 			    struct ref **refs,
 			    struct ref **sought, int nr_sought)
@@ -719,16 +741,31 @@ static int everything_local(struct fetch_pack_args *args,
 	int retval;
 	int old_save_commit_buffer = save_commit_buffer;
 	timestamp_t cutoff = 0;
+	struct oidset loose_oid_set = OIDSET_INIT;
+	int use_oidset = 0;
+	struct loose_object_iter iter = {&loose_oid_set, *refs};
+
+	/* Enumerate all loose objects or know refs are not so many. */
+	use_oidset = !for_each_loose_object(add_loose_objects_to_set,
+					    &iter, 0);
 
 	save_commit_buffer = 0;
 
 	for (ref = *refs; ref; ref = ref->next) {
 		struct object *o;
+		unsigned int flags = OBJECT_INFO_QUICK;
 
-		if (!has_object_file_with_flags(&ref->old_oid,
-						OBJECT_INFO_QUICK))
-			continue;
+		if (use_oidset &&
+		    !oidset_contains(&loose_oid_set, &ref->old_oid)) {
+			/*
+			 * I know this does not exist in the loose form,
+			 * so check if it exists in a non-loose form.
+			 */
+			flags |= OBJECT_INFO_IGNORE_LOOSE;
+		}
 
+		if (!has_object_file_with_flags(&ref->old_oid, flags))
+			continue;
 		o = parse_object(&ref->old_oid);
 		if (!o)
 			continue;
@@ -744,6 +781,8 @@ static int everything_local(struct fetch_pack_args *args,
 		}
 	}
 
+	oidset_clear(&loose_oid_set);
+
 	if (!args->no_dependents) {
 		if (!args->deepen) {
 			for_each_ref(mark_complete_oid, NULL);
diff --git a/sha1_file.c b/sha1_file.c
index 1b94f39c4..c0a197947 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1262,6 +1262,9 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 		if (find_pack_entry(real, &e))
 			break;
 
+		if (flags & OBJECT_INFO_IGNORE_LOOSE)
+			return -1;
+
 		/* Most likely it's a loose object. */
 		if (!sha1_loose_object_info(real, oi, flags))
 			return 0;
-- 
2.16.2.804.g6dcf76e118-goog


  parent reply	other threads:[~2018-03-14  6:06 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-08 12:06 [PATCH] fetch-pack.c: use oidset to check existence of loose object Takuto Ikuta
2018-03-08 17:19 ` René Scharfe
2018-03-09 13:42   ` Takuto Ikuta
2018-03-08 18:42 ` Junio C Hamano
2018-03-09 13:11   ` [PATCH v2 0/1] " Takuto Ikuta
2018-03-09 13:11     ` [PATCH v2 1/1] " Takuto Ikuta
2018-03-09 13:26       ` [PATCH v3] " Takuto Ikuta
2018-03-09 19:54         ` Junio C Hamano
2018-03-10 13:19           ` Takuto Ikuta
2018-03-13 17:53             ` Junio C Hamano
2018-03-14  6:26               ` Takuto Ikuta
2018-03-10 12:34         ` [PATCH v4] " Takuto Ikuta
2018-03-10 12:46           ` [PATCH v5] " Takuto Ikuta
2018-03-13 19:04             ` Junio C Hamano
2018-03-14  6:05           ` Takuto Ikuta [this message]
2018-03-14  6:32             ` [PATCH v7] " Takuto Ikuta
2018-03-09 14:12   ` [PATCH] " Takuto Ikuta
2018-03-09 18:00     ` Junio C Hamano
2018-03-09 19:41       ` Junio C Hamano
2018-03-13 15:30   ` [PATCH] sha1_file: restore OBJECT_INFO_QUICK functionality Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180314060511.257985-1-tikuta@chromium.org \
    --to=tikuta@chromium.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).