git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Christian Couder <christian.couder@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
	Ben Peart <Ben.Peart@microsoft.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	Mike Hommey <mh@glandium.org>,
	Lars Schneider <larsxschneider@gmail.com>,
	Eric Wong <e@80x24.org>,
	Christian Couder <chriscool@tuxfamily.org>,
	Jeff Hostetler <jeffhost@microsoft.com>
Subject: [PATCH 11/40] sha1_file: support lazily fetching missing objects
Date: Wed,  3 Jan 2018 17:33:34 +0100	[thread overview]
Message-ID: <20180103163403.11303-12-chriscool@tuxfamily.org> (raw)
In-Reply-To: <20180103163403.11303-1-chriscool@tuxfamily.org>

From: Jonathan Tan <jonathantanmy@google.com>

Teach sha1_file to fetch objects from the remote configured in
extensions.partialclone whenever an object is requested but missing.

The fetching of objects can be suppressed through a global variable.
This is used by fsck and index-pack.

However, by default, such fetching is not suppressed. This is meant as a
temporary measure to ensure that all Git commands work in such a
situation. Future patches will update some commands to either tolerate
missing objects (without fetching them) or be more efficient in fetching
them.

In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
 (1) functions in sha1_file.c that take in a hash, without the user
     regarding how the object is stored (loose or packed)
 (2) functions in packfile.c (because I need to check callers that know
     about the loose/packed distinction and operate on both differently,
     and ensure that they can handle the concept of objects that are
     neither loose nor packed)

(1) is handled by the modification to sha1_object_info_extended().

For (2), I looked at for_each_packed_object and others.  For
for_each_packed_object, the callers either already work or are fixed in
this patch:
 - reachable - only to find recent objects
 - builtin/fsck - already knows about missing objects
 - builtin/cat-file - warning message added in this commit

Callers of the other functions do not need to be changed:
 - parse_pack_index
   - http - indirectly from http_get_info_packs
   - find_pack_entry_one
     - this searches a single pack that is provided as an argument; the
       caller already knows (through other means) that the sought object
       is in a specific pack
 - find_sha1_pack
   - fast-import - appears to be an optimization to not store a file if
     it is already in a pack
   - http-walker - to search through a struct alt_base
   - http-push - to search through remote packs
 - has_sha1_pack
   - builtin/fsck - already knows about promisor objects
   - builtin/count-objects - informational purposes only (check if loose
     object is also packed)
   - builtin/prune-packed - check if object to be pruned is packed (if
     not, don't prune it)
   - revision - used to exclude packed objects if requested by user
   - diff - just for optimization

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/cat-file.c       |  3 +++
 builtin/fetch-pack.c     |  2 ++
 builtin/fsck.c           |  3 +++
 builtin/index-pack.c     |  6 ++++++
 cache.h                  |  8 ++++++++
 fetch-object.c           |  3 +++
 sha1_file.c              | 28 ++++++++++++++++++--------
 t/t0410-partial-clone.sh | 51 ++++++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 96 insertions(+), 8 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index f5fa4fd75a..1e4edd81a0 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -13,6 +13,7 @@
 #include "tree-walk.h"
 #include "sha1-array.h"
 #include "packfile.h"
+#include "external-odb.h"
 
 struct batch_options {
 	int enabled;
@@ -475,6 +476,8 @@ static int batch_objects(struct batch_options *opt)
 
 		for_each_loose_object(batch_loose_object, &sa, 0);
 		for_each_packed_object(batch_packed_object, &sa, 0);
+		if (has_external_odb())
+			warning("This repository uses an odb. Some objects may not be loaded.");
 
 		cb.opt = opt;
 		cb.expand = &data;
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 02abe7211e..15eeed7b17 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -53,6 +53,8 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 	struct oid_array shallow = OID_ARRAY_INIT;
 	struct string_list deepen_not = STRING_LIST_INIT_DUP;
 
+	fetch_if_missing = 0;
+
 	packet_trace_identity("fetch-pack");
 
 	memset(&args, 0, sizeof(args));
diff --git a/builtin/fsck.c b/builtin/fsck.c
index a6fa6d6482..7a8a679d4f 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -678,6 +678,9 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	int i;
 	struct alternate_object_database *alt;
 
+	/* fsck knows how to handle missing promisor objects */
+	fetch_if_missing = 0;
+
 	errors_found = 0;
 	check_replace_refs = 0;
 
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 9dffaf20ae..54c921fa71 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1657,6 +1657,12 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	unsigned foreign_nr = 1;	/* zero is a "good" value, assume bad */
 	int report_end_of_input = 0;
 
+	/*
+	 * index-pack never needs to fetch missing objects, since it only
+	 * accesses the repo to do hash collision checks
+	 */
+	fetch_if_missing = 0;
+
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage(index_pack_usage);
 
diff --git a/cache.h b/cache.h
index 078607ee91..3fabf998ce 100644
--- a/cache.h
+++ b/cache.h
@@ -1789,6 +1789,14 @@ struct object_info {
 #define OBJECT_INFO_QUICK 8
 extern int sha1_object_info_extended(const unsigned char *, struct object_info *, unsigned flags);
 
+/*
+ * Set this to 0 to prevent sha1_object_info_extended() from fetching missing
+ * blobs. This has a difference only if extensions.partialClone is set.
+ *
+ * Its default value is 1.
+ */
+extern int fetch_if_missing;
+
 /* Dumb servers support */
 extern int update_server_info(int);
 
diff --git a/fetch-object.c b/fetch-object.c
index 23ec2bb0d0..8afadeda2b 100644
--- a/fetch-object.c
+++ b/fetch-object.c
@@ -14,7 +14,9 @@ void fetch_object(const char *remote_name, const unsigned char *sha1)
 	struct remote *remote;
 	struct transport *transport;
 	struct ref *ref;
+	int original_fetch_if_missing = fetch_if_missing;
 
+	fetch_if_missing = 0;
 	remote = remote_get(remote_name);
 	if (!remote->url[0])
 		die(_("Remote with no URL"));
@@ -25,4 +27,5 @@ void fetch_object(const char *remote_name, const unsigned char *sha1)
 	transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
 	transport_set_option(transport, TRANS_OPT_NO_DEPENDENTS, "1");
 	transport_fetch_refs(transport, ref);
+	fetch_if_missing = original_fetch_if_missing;
 }
diff --git a/sha1_file.c b/sha1_file.c
index cba6b2a537..261baf800f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1233,6 +1233,8 @@ static int sha1_loose_object_info(const unsigned char *sha1,
 	return (status < 0) ? status : 0;
 }
 
+int fetch_if_missing = 1;
+
 int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, unsigned flags)
 {
 	static struct object_info blank_oi = OBJECT_INFO_INIT;
@@ -1241,6 +1243,7 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 	const unsigned char *real = (flags & OBJECT_INFO_LOOKUP_REPLACE) ?
 				    lookup_replace_object(sha1) :
 				    sha1;
+	int already_retried = 0;
 
 	if (is_null_sha1(real))
 		return -1;
@@ -1268,19 +1271,29 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 		}
 	}
 
-	if (!find_pack_entry(real, &e)) {
+	while (1) {
+		if (find_pack_entry(real, &e))
+			break;
+
 		/* Most likely it's a loose object. */
 		if (!sha1_loose_object_info(real, oi, flags))
 			return 0;
 
 		/* Not a loose object; someone else may have just packed it. */
-		if (flags & OBJECT_INFO_QUICK) {
-			return -1;
-		} else {
-			reprepare_packed_git();
-			if (!find_pack_entry(real, &e))
-				return -1;
+		reprepare_packed_git();
+		if (find_pack_entry(real, &e))
+			break;
+
+		/* Check if it is a missing object */
+		if (fetch_if_missing && has_external_odb() &&
+		    !already_retried) {
+			if (!external_odb_get_direct(real))
+				return 0;
+			already_retried = 1;
+			continue;
 		}
+
+		return -1;
 	}
 
 	if (oi == &blank_oi)
@@ -1289,7 +1302,6 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 		 * information below, so return early.
 		 */
 		return 0;
-
 	rtype = packed_object_info(e.p, e.offset, oi);
 	if (rtype < 0) {
 		mark_bad_packed_object(e.p, real);
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index a0f901fa1d..8b20d18603 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -138,4 +138,55 @@ test_expect_success 'missing CLI object, but promised, passes fsck' '
 	git -C repo fsck "$A"
 '
 
+test_expect_success 'fetching of missing objects' '
+	rm -rf repo &&
+	test_create_repo server &&
+	test_commit -C server foo &&
+	git -C server repack -a -d --write-bitmap-index &&
+
+	git clone "file://$(pwd)/server" repo &&
+	HASH=$(git -C repo rev-parse foo) &&
+	rm -rf repo/.git/objects/* &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config odb.magic.promisorRemote "origin" &&
+	git -C repo cat-file -p "$HASH" &&
+
+	# Ensure that the .promisor file is written, and check that its
+	# associated packfile contains the object
+	ls repo/.git/objects/pack/pack-*.promisor >promisorlist &&
+	test_line_count = 1 promisorlist &&
+	IDX=$(cat promisorlist | sed "s/promisor$/idx/") &&
+	git verify-pack --verbose "$IDX" | grep "$HASH"
+'
+
+LIB_HTTPD_PORT=12345  # default port, 410, cannot be used as non-root
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'fetching of missing objects from an HTTP server' '
+	rm -rf repo &&
+	SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+	test_create_repo "$SERVER" &&
+	test_commit -C "$SERVER" foo &&
+	git -C "$SERVER" repack -a -d --write-bitmap-index &&
+
+	git clone $HTTPD_URL/smart/server repo &&
+	HASH=$(git -C repo rev-parse foo) &&
+	rm -rf repo/.git/objects/* &&
+
+	git -C repo config core.repositoryformatversion 1 &&
+	git -C repo config odb.magic.promisorRemote "origin" &&
+	git -C repo cat-file -p "$HASH" &&
+
+	# Ensure that the .promisor file is written, and check that its
+	# associated packfile contains the object
+	ls repo/.git/objects/pack/pack-*.promisor >promisorlist &&
+	test_line_count = 1 promisorlist &&
+	IDX=$(cat promisorlist | sed "s/promisor$/idx/") &&
+	git verify-pack --verbose "$IDX" | grep "$HASH"
+'
+
+stop_httpd
+
 test_done
-- 
2.16.0.rc0.16.g82191dbc6c.dirty


  parent reply	other threads:[~2018-01-03 16:34 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-03 16:33 [PATCH 00/40] Promisor remotes and external ODB support Christian Couder
2018-01-03 16:33 ` [PATCH 01/40] Add initial external odb support Christian Couder
2018-01-04 19:59   ` Jeff Hostetler
2018-01-15 14:34     ` Christian Couder
2018-01-03 16:33 ` [PATCH 02/40] Add GIT_NO_EXTERNAL_ODB env variable Christian Couder
2018-01-03 16:33 ` [PATCH 03/40] external-odb: add has_external_odb() Christian Couder
2018-01-03 16:33 ` [PATCH 04/40] fsck: introduce promisor objects Christian Couder
2018-01-03 16:33 ` [PATCH 05/40] fsck: support refs pointing to " Christian Couder
2018-01-03 16:33 ` [PATCH 06/40] fsck: support referenced " Christian Couder
2018-01-03 16:33 ` [PATCH 07/40] fsck: support promisor objects as CLI argument Christian Couder
2018-01-03 16:33 ` [PATCH 08/40] index-pack: refactor writing of .keep files Christian Couder
2018-01-03 16:33 ` [PATCH 09/40] introduce fetch-object: fetch one promisor object Christian Couder
2018-01-03 16:33 ` [PATCH 10/40] external-odb: implement external_odb_get_direct Christian Couder
2018-01-04 17:44   ` Jeff Hostetler
2018-01-15 14:47     ` Christian Couder
2018-01-03 16:33 ` Christian Couder [this message]
2018-01-03 16:33 ` [PATCH 12/40] rev-list: support termination at promisor objects Christian Couder
2018-01-03 16:33 ` [PATCH 13/40] gc: do not repack promisor packfiles Christian Couder
2018-01-03 16:33 ` [PATCH 14/40] sha1_file: prepare for external odbs Christian Couder
2018-01-04 18:00   ` Jeff Hostetler
2018-01-16  7:23     ` Christian Couder
2018-01-03 16:33 ` [PATCH 15/40] external-odb: add script mode support Christian Couder
2018-01-04 19:55   ` Jeff Hostetler
2018-03-19 13:15     ` Christian Couder
2018-01-03 16:33 ` [PATCH 16/40] odb-helper: add 'enum odb_helper_type' Christian Couder
2018-01-03 16:33 ` [PATCH 17/40] odb-helper: add odb_helper_init() to send 'init' instruction Christian Couder
2018-01-03 16:33 ` [PATCH 18/40] t0400: add 'put_raw_obj' instruction to odb-helper script Christian Couder
2018-01-03 16:33 ` [PATCH 19/40] external odb: add 'put_raw_obj' support Christian Couder
2018-01-03 16:33 ` [PATCH 20/40] external-odb: accept only blobs for now Christian Couder
2018-01-03 16:33 ` [PATCH 21/40] t0400: add test for external odb write support Christian Couder
2018-01-03 16:33 ` [PATCH 22/40] Add t0410 to test external ODB transfer Christian Couder
2018-01-03 16:33 ` [PATCH 23/40] lib-httpd: pass config file to start_httpd() Christian Couder
2018-01-03 16:33 ` [PATCH 24/40] lib-httpd: add upload.sh Christian Couder
2018-01-03 16:33 ` [PATCH 25/40] lib-httpd: add list.sh Christian Couder
2018-01-03 16:33 ` [PATCH 26/40] lib-httpd: add apache-e-odb.conf Christian Couder
2018-01-03 16:33 ` [PATCH 27/40] odb-helper: add odb_helper_get_raw_object() Christian Couder
2018-01-03 16:33 ` [PATCH 28/40] pack-objects: don't pack objects in external odbs Christian Couder
2018-01-04 20:54   ` Jeff Hostetler
2018-03-19 13:27     ` Christian Couder
2018-01-03 16:33 ` [PATCH 29/40] Add t0420 to test transfer to HTTP external odb Christian Couder
2018-01-03 16:33 ` [PATCH 30/40] external-odb: add 'get_direct' support Christian Couder
2018-01-03 16:33 ` [PATCH 31/40] odb-helper: add 'script_mode' to 'struct odb_helper' Christian Couder
2018-01-03 16:33 ` [PATCH 32/40] odb-helper: add init_object_process() Christian Couder
2018-01-03 16:33 ` [PATCH 33/40] Add t0450 to test 'get_direct' mechanism Christian Couder
2018-01-03 16:33 ` [PATCH 34/40] Add t0460 to test passing git objects Christian Couder
2018-01-03 16:33 ` [PATCH 35/40] odb-helper: add put_object_process() Christian Couder
2018-01-03 16:33 ` [PATCH 36/40] Add t0470 to test passing raw objects Christian Couder
2018-01-03 16:34 ` [PATCH 37/40] odb-helper: add have_object_process() Christian Couder
2018-01-03 16:34 ` [PATCH 38/40] Add t0480 to test "have" capability and raw objects Christian Couder
2018-01-03 16:34 ` [PATCH 39/40] external-odb: use 'odb=magic' attribute to mark odb blobs Christian Couder
2018-01-03 16:34 ` [PATCH 40/40] Add Documentation/technical/external-odb.txt Christian Couder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180103163403.11303-12-chriscool@tuxfamily.org \
    --to=christian.couder@gmail.com \
    --cc=Ben.Peart@microsoft.com \
    --cc=chriscool@tuxfamily.org \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=jonathantanmy@google.com \
    --cc=larsxschneider@gmail.com \
    --cc=mh@glandium.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).