git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/4] check_everything_connected replacement
@ 2013-03-31 11:09 Nguyễn Thái Ngọc Duy
  2013-03-31 11:09 ` [PATCH 1/4] fetch-pack: save shallow file before fetching the pack Nguyễn Thái Ngọc Duy
                   ` (5 more replies)
  0 siblings, 6 replies; 35+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-03-31 11:09 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano, Nguyễn Thái Ngọc Duy

My investigation in lowering connectivity check cost in git-clone [1]
led me to try 'index-pack --strict' code. Without calling fsck_object(),
--strict seems to be cheaper than check_everything_connected() while
accomplishing the same thing (imo).

The first patch is a bug fix running git-clone --depth with
fetch.fsckObjects on. The second is "fix while i'm there". The third
introduces check_everything_connected alternative. The fourth makes
use of it.

The last use of check_everything_connected after this series is
fetch.c:quickfetch(), which I think is unnecessary. It can only catch
errors if we have incomplete object islands in repo, which could
happen before this series. After this series, incomplete object
islands can't enter the repository, at least via git transport. So
maybe we should remove that check_everything_connected too (maybe
after a few years, enough time for the laziest user to run fsck/repack
once).

[1] http://article.gmane.org/gmane.comp.version-control.git/219602

Nguyễn Thái Ngọc Duy (4):
  fetch-pack: save shallow file before fetching the pack
  index-pack: remove dead code (it should never happen)
  index-pack, unpack-objects: add --not-so-strict for connectivity check
  Use --not-so-strict on all pack transfer for connectivity check

 builtin/fetch.c                 |  6 ----
 builtin/index-pack.c            | 10 ++++--
 builtin/receive-pack.c          | 22 +++----------
 builtin/unpack-objects.c        |  9 ++++--
 fetch-pack.c                    | 72 +++++++++++++++++++++++------------------
 t/t5500-fetch-pack.sh           |  7 ++++
 t/t5504-fetch-receive-strict.sh |  2 +-
 7 files changed, 66 insertions(+), 62 deletions(-)

-- 
1.8.2.83.gc99314b

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/4] fetch-pack: save shallow file before fetching the pack
  2013-03-31 11:09 [PATCH 0/4] check_everything_connected replacement Nguyễn Thái Ngọc Duy
@ 2013-03-31 11:09 ` Nguyễn Thái Ngọc Duy
  2013-04-01 14:53   ` Junio C Hamano
  2013-03-31 11:09 ` [PATCH 2/4] index-pack: remove dead code (it should never happen) Nguyễn Thái Ngọc Duy
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 35+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-03-31 11:09 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano, Nguyễn Thái Ngọc Duy

index-pack --strict looks up and follows parent commits. If shallow
information is not ready by the time index-pack is run, index-pack may
be lead to non-existent objects. Make fetch-pack save shallow file to
disk before invoking index-pack.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 fetch-pack.c          | 70 ++++++++++++++++++++++++++++-----------------------
 t/t5500-fetch-pack.sh |  7 ++++++
 2 files changed, 45 insertions(+), 32 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index cef8fde..1f9c5ba 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -779,11 +779,44 @@ static int cmp_ref_by_name(const void *a_, const void *b_)
 	return strcmp(a->name, b->name);
 }
 
+static void flush_shallow_to_disk(struct stat *st)
+{
+	static struct lock_file lock;
+	struct cache_time mtime;
+	struct strbuf sb = STRBUF_INIT;
+	char *shallow = git_path("shallow");
+	int fd;
+
+	mtime.sec = st->st_mtime;
+	mtime.nsec = ST_MTIME_NSEC(*st);
+	if (stat(shallow, st)) {
+		if (mtime.sec)
+			die("shallow file was removed during fetch");
+	} else if (st->st_mtime != mtime.sec
+#ifdef USE_NSEC
+		   || ST_MTIME_NSEC(*st) != mtime.nsec
+#endif
+		   )
+		die("shallow file was changed during fetch");
+
+	fd = hold_lock_file_for_update(&lock, shallow,
+				       LOCK_DIE_ON_ERROR);
+	if (!write_shallow_commits(&sb, 0)
+	    || write_in_full(fd, sb.buf, sb.len) != sb.len) {
+		unlink_or_warn(shallow);
+		rollback_lock_file(&lock);
+	} else {
+		commit_lock_file(&lock);
+	}
+	strbuf_release(&sb);
+}
+
 static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 				 int fd[2],
 				 const struct ref *orig_ref,
 				 struct ref **sought, int nr_sought,
-				 char **pack_lockfile)
+				 char **pack_lockfile,
+				 struct stat *shallow_st)
 {
 	struct ref *ref = copy_ref_list(orig_ref);
 	unsigned char sha1[20];
@@ -858,6 +891,8 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 
 	if (args->stateless_rpc)
 		packet_flush(fd[1]);
+	if (args->depth > 0)
+		flush_shallow_to_disk(shallow_st);
 	if (get_pack(args, fd, pack_lockfile))
 		die("git fetch-pack: fetch failed.");
 
@@ -952,38 +987,9 @@ struct ref *fetch_pack(struct fetch_pack_args *args,
 		packet_flush(fd[1]);
 		die("no matching remote head");
 	}
-	ref_cpy = do_fetch_pack(args, fd, ref, sought, nr_sought, pack_lockfile);
 
-	if (args->depth > 0) {
-		static struct lock_file lock;
-		struct cache_time mtime;
-		struct strbuf sb = STRBUF_INIT;
-		char *shallow = git_path("shallow");
-		int fd;
-
-		mtime.sec = st.st_mtime;
-		mtime.nsec = ST_MTIME_NSEC(st);
-		if (stat(shallow, &st)) {
-			if (mtime.sec)
-				die("shallow file was removed during fetch");
-		} else if (st.st_mtime != mtime.sec
-#ifdef USE_NSEC
-				|| ST_MTIME_NSEC(st) != mtime.nsec
-#endif
-			  )
-			die("shallow file was changed during fetch");
-
-		fd = hold_lock_file_for_update(&lock, shallow,
-					       LOCK_DIE_ON_ERROR);
-		if (!write_shallow_commits(&sb, 0)
-		 || write_in_full(fd, sb.buf, sb.len) != sb.len) {
-			unlink_or_warn(shallow);
-			rollback_lock_file(&lock);
-		} else {
-			commit_lock_file(&lock);
-		}
-		strbuf_release(&sb);
-	}
+	ref_cpy = do_fetch_pack(args, fd, ref, sought, nr_sought,
+				pack_lockfile, &st);
 
 	reprepare_packed_git();
 	return ref_cpy;
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index d574085..557b073 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -135,6 +135,13 @@ test_expect_success 'clone shallow depth 1' '
 	test "`git --git-dir=shallow0/.git rev-list --count HEAD`" = 1
 '
 
+test_expect_success 'clone shallow depth 1 with fsck' '
+	git config --global fetch.fsckobjects true &&
+	git clone --no-single-branch --depth 1 "file://$(pwd)/." shallow0fsck &&
+	test "`git --git-dir=shallow0fsck/.git rev-list --count HEAD`" = 1 &&
+	git config --global --unset fetch.fsckobjects
+'
+
 test_expect_success 'clone shallow' '
 	git clone --no-single-branch --depth 2 "file://$(pwd)/." shallow
 '
-- 
1.8.2.83.gc99314b

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/4] index-pack: remove dead code (it should never happen)
  2013-03-31 11:09 [PATCH 0/4] check_everything_connected replacement Nguyễn Thái Ngọc Duy
  2013-03-31 11:09 ` [PATCH 1/4] fetch-pack: save shallow file before fetching the pack Nguyễn Thái Ngọc Duy
@ 2013-03-31 11:09 ` Nguyễn Thái Ngọc Duy
  2013-03-31 11:09 ` [PATCH 3/4] index-pack, unpack-objects: add --not-so-strict for connectivity check Nguyễn Thái Ngọc Duy
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 35+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-03-31 11:09 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano, Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/index-pack.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index ef62124..fdac7c7 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -735,8 +735,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 			int eaten;
 			void *buf = (void *) data;
 
-			if (!buf)
-				buf = new_data = get_data_from_pack(obj_entry);
+			assert(data && "data can only be NULL for large blobs");
 
 			/*
 			 * we do not need to free the memory here, as the
-- 
1.8.2.83.gc99314b

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/4] index-pack, unpack-objects: add --not-so-strict for connectivity check
  2013-03-31 11:09 [PATCH 0/4] check_everything_connected replacement Nguyễn Thái Ngọc Duy
  2013-03-31 11:09 ` [PATCH 1/4] fetch-pack: save shallow file before fetching the pack Nguyễn Thái Ngọc Duy
  2013-03-31 11:09 ` [PATCH 2/4] index-pack: remove dead code (it should never happen) Nguyễn Thái Ngọc Duy
@ 2013-03-31 11:09 ` Nguyễn Thái Ngọc Duy
  2013-03-31 11:09 ` [PATCH 4/4] Use --not-so-strict on all pack transfer " Nguyễn Thái Ngọc Duy
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 35+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-03-31 11:09 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano, Nguyễn Thái Ngọc Duy

--not-so-strict only checks if all links from objects in the pack
point to real objects (either in current repo, or from the pack
itself). It's like check_everything_connected() except that:

 - it does not follow DAG in order
 - it can detect incomplete object islands
 - it seems to be faster than "rev-list --objects --all"

On my box, "rev-list --objects --all" takes 34 seconds. index-pack takes

215.25user 8.42system 1:32.31elapsed 242%CPU (0avgtext+0avgdata 1357328maxresident)k
0inputs+1421016outputs (0major+1222987minor)pagefaults 0swaps

And index-pack --not-so-strict takes

pack    96a4e3befa40bf38eddc2d7c99246a59af4ad55d
229.75user 11.31system 1:42.50elapsed 235%CPU (0avgtext+0avgdata 1876816maxresident)k
0inputs+1421016outputs (0major+1307989minor)pagefaults 0swaps

The overhead is about 10 seconds, just 1/3 of rev-list, which makes it
in a better position to replace check_everything_connected(). If this
holds true for general case, it could reduce fetch time by a little bit.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/index-pack.c     | 7 ++++++-
 builtin/unpack-objects.c | 9 +++++++--
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index fdac7c7..3cded32 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -77,6 +77,7 @@ static int nr_threads;
 
 static int from_stdin;
 static int strict;
+static int do_fsck_object;
 static int verbose;
 
 static struct progress *progress;
@@ -744,7 +745,8 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 			obj = parse_object_buffer(sha1, type, size, buf, &eaten);
 			if (!obj)
 				die(_("invalid %s"), typename(type));
-			if (fsck_object(obj, 1, fsck_error_function))
+			if (do_fsck_object &&
+			    fsck_object(obj, 1, fsck_error_function))
 				die(_("Error in object"));
 			if (fsck_walk(obj, mark_link, NULL))
 				die(_("Not all child objects of %s are reachable"), sha1_to_hex(obj->sha1));
@@ -1491,6 +1493,9 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				fix_thin_pack = 1;
 			} else if (!strcmp(arg, "--strict")) {
 				strict = 1;
+				do_fsck_object = 1;
+			} else if (!strcmp(arg, "--not-so-strict")) {
+				strict = 1;
 			} else if (!strcmp(arg, "--verify")) {
 				verify = 1;
 			} else if (!strcmp(arg, "--verify-stat")) {
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 2217d7b..dd0518b 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -12,7 +12,7 @@
 #include "decorate.h"
 #include "fsck.h"
 
-static int dry_run, quiet, recover, has_errors, strict;
+static int dry_run, quiet, recover, has_errors, strict, do_fsck_object;
 static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict] < pack-file";
 
 /* We always read in 4kB chunks. */
@@ -198,7 +198,7 @@ static int check_object(struct object *obj, int type, void *data)
 		return 0;
 	}
 
-	if (fsck_object(obj, 1, fsck_error_function))
+	if (do_fsck_object && fsck_object(obj, 1, fsck_error_function))
 		die("Error in object");
 	if (fsck_walk(obj, check_object, NULL))
 		die("Error on reachable objects of %s", sha1_to_hex(obj->sha1));
@@ -520,6 +520,11 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix)
 				continue;
 			}
 			if (!strcmp(arg, "--strict")) {
+				do_fsck_object = 1;
+				strict = 1;
+				continue;
+			}
+			if (!strcmp(arg, "--not-so-strict")) {
 				strict = 1;
 				continue;
 			}
-- 
1.8.2.83.gc99314b

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/4] Use --not-so-strict on all pack transfer for connectivity check
  2013-03-31 11:09 [PATCH 0/4] check_everything_connected replacement Nguyễn Thái Ngọc Duy
                   ` (2 preceding siblings ...)
  2013-03-31 11:09 ` [PATCH 3/4] index-pack, unpack-objects: add --not-so-strict for connectivity check Nguyễn Thái Ngọc Duy
@ 2013-03-31 11:09 ` Nguyễn Thái Ngọc Duy
  2013-04-01 14:48 ` [PATCH 0/4] check_everything_connected replacement Junio C Hamano
  2013-05-01 10:59 ` [PATCH v2 0/5] " Nguyễn Thái Ngọc Duy
  5 siblings, 0 replies; 35+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-03-31 11:09 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano, Nguyễn Thái Ngọc Duy

This replaces check_everything_connected() with --not-so-strict, which
accomplishes the same thing and is generally cheaper.

This also forces connectivity check on "git clone".

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/fetch.c                 |  6 ------
 builtin/receive-pack.c          | 22 ++++------------------
 fetch-pack.c                    |  2 ++
 t/t5504-fetch-receive-strict.sh |  2 +-
 4 files changed, 7 insertions(+), 25 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 4b6b1df..d9f970f 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -400,12 +400,6 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 	else
 		url = xstrdup("foreign");
 
-	rm = ref_map;
-	if (check_everything_connected(iterate_ref_map, 0, &rm)) {
-		rc = error(_("%s did not send all necessary objects\n"), url);
-		goto abort;
-	}
-
 	/*
 	 * The first pass writes objects to be merged and then the
 	 * second pass writes the rest, in order to allow using
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 62ba6e7..07abb14 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -663,19 +663,6 @@ static int command_singleton_iterator(void *cb_data, unsigned char sha1[20])
 	return 0;
 }
 
-static void set_connectivity_errors(struct command *commands)
-{
-	struct command *cmd;
-
-	for (cmd = commands; cmd; cmd = cmd->next) {
-		struct command *singleton = cmd;
-		if (!check_everything_connected(command_singleton_iterator,
-						0, &singleton))
-			continue;
-		cmd->error_string = "missing necessary objects";
-	}
-}
-
 static int iterate_receive_command_list(void *cb_data, unsigned char sha1[20])
 {
 	struct command **cmd_list = cb_data;
@@ -718,11 +705,6 @@ static void execute_commands(struct command *commands, const char *unpacker_erro
 		return;
 	}
 
-	cmd = commands;
-	if (check_everything_connected(iterate_receive_command_list,
-				       0, &cmd))
-		set_connectivity_errors(commands);
-
 	reject_updates_to_hidden(commands);
 
 	if (run_receive_hook(commands, "pre-receive", 0)) {
@@ -843,6 +825,8 @@ static const char *unpack(int err_fd)
 			unpacker[i++] = "-q";
 		if (fsck_objects)
 			unpacker[i++] = "--strict";
+		else
+			unpacker[i++] = "--not-so-strict";
 		unpacker[i++] = hdr_arg;
 		unpacker[i++] = NULL;
 		memset(&child, 0, sizeof(child));
@@ -868,6 +852,8 @@ static const char *unpack(int err_fd)
 		keeper[i++] = "--stdin";
 		if (fsck_objects)
 			keeper[i++] = "--strict";
+		else
+			keeper[i++] = "--not-so-strict";
 		keeper[i++] = "--fix-thin";
 		keeper[i++] = hdr_arg;
 		keeper[i++] = keep_arg;
diff --git a/fetch-pack.c b/fetch-pack.c
index 1f9c5ba..ae20ae5 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -754,6 +754,8 @@ static int get_pack(struct fetch_pack_args *args,
 	    ? transfer_fsck_objects
 	    : 0)
 		*av++ = "--strict";
+	else
+		*av++ = "--not-so-strict";
 	*av++ = NULL;
 
 	cmd.in = demux.out;
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 69ee13c..14d2935 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -60,7 +60,7 @@ test_expect_success 'fetch with transfer.fsckobjects' '
 
 cat >exp <<EOF
 To dst
-!	refs/heads/master:refs/heads/test	[remote rejected] (missing necessary objects)
+!	refs/heads/master:refs/heads/test	[remote rejected] (unpacker error)
 EOF
 
 test_expect_success 'push without strict' '
-- 
1.8.2.83.gc99314b

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] check_everything_connected replacement
  2013-03-31 11:09 [PATCH 0/4] check_everything_connected replacement Nguyễn Thái Ngọc Duy
                   ` (3 preceding siblings ...)
  2013-03-31 11:09 ` [PATCH 4/4] Use --not-so-strict on all pack transfer " Nguyễn Thái Ngọc Duy
@ 2013-04-01 14:48 ` Junio C Hamano
  2013-05-01 10:59 ` [PATCH v2 0/5] " Nguyễn Thái Ngọc Duy
  5 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2013-04-01 14:48 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> The last use of check_everything_connected after this series is
> fetch.c:quickfetch(), which I think is unnecessary. It can only catch
> errors if we have incomplete object islands in repo, which could
> happen before this series.

A typical source of object islands the quickfetch() was designed to
protect against is the HTTP walker.  I do not think your patch adds
any safety in that area (nor it should).

> After this series, incomplete object
> islands can't enter the repository, at least via git transport.

Exactly.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/4] fetch-pack: save shallow file before fetching the pack
  2013-03-31 11:09 ` [PATCH 1/4] fetch-pack: save shallow file before fetching the pack Nguyễn Thái Ngọc Duy
@ 2013-04-01 14:53   ` Junio C Hamano
  2013-04-05  2:11     ` Duy Nguyen
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2013-04-01 14:53 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> index-pack --strict looks up and follows parent commits. If shallow
> information is not ready by the time index-pack is run, index-pack may
> be lead to non-existent objects. Make fetch-pack save shallow file to
> disk before invoking index-pack.

I think the problem the patch is trying to solve _is_ real, but I
wonder if this is a correct approach to solve it.  What happens if
we die in the middle after writing the updated shallow file
prematurely?

Perhaps the index-pack (or any "Git" process in general) needs to
learn a way to use an alternate shallow file, instead of always
using a hard-coded git_path("shallow"), so that you can flush an
updated one to a to-be-the-next-shallow-file, run fetch-pack with
that alternate shallow file, and rename it to the final name at the
end after everything goes well, or something?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/4] fetch-pack: save shallow file before fetching the pack
  2013-04-01 14:53   ` Junio C Hamano
@ 2013-04-05  2:11     ` Duy Nguyen
  0 siblings, 0 replies; 35+ messages in thread
From: Duy Nguyen @ 2013-04-05  2:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Jeff King

On Tue, Apr 2, 2013 at 1:53 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> index-pack --strict looks up and follows parent commits. If shallow
>> information is not ready by the time index-pack is run, index-pack may
>> be lead to non-existent objects. Make fetch-pack save shallow file to
>> disk before invoking index-pack.
>
> I think the problem the patch is trying to solve _is_ real, but I
> wonder if this is a correct approach to solve it.  What happens if
> we die in the middle after writing the updated shallow file
> prematurely?
>
> Perhaps the index-pack (or any "Git" process in general) needs to
> learn a way to use an alternate shallow file, instead of always
> using a hard-coded git_path("shallow"), so that you can flush an
> updated one to a to-be-the-next-shallow-file, run fetch-pack with
> that alternate shallow file, and rename it to the final name at the
> end after everything goes well, or something?

Yeah. I was focused on the clone case so it didn't matter but this
change applies to git-fetch as well. Will add new GIT_SHALLOW_FILE
environment variable.
--
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 0/5] check_everything_connected replacement
  2013-03-31 11:09 [PATCH 0/4] check_everything_connected replacement Nguyễn Thái Ngọc Duy
                   ` (4 preceding siblings ...)
  2013-04-01 14:48 ` [PATCH 0/4] check_everything_connected replacement Junio C Hamano
@ 2013-05-01 10:59 ` Nguyễn Thái Ngọc Duy
  2013-05-01 10:59   ` [PATCH v2 1/5] clone: let the user know when check_everything_connected is run Nguyễn Thái Ngọc Duy
                     ` (5 more replies)
  5 siblings, 6 replies; 35+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-05-01 10:59 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

This series attempts to reduce the cost of check_everything_connected
on the clone case where index-pack is run. For example, cloning
file://../linux-2.6.git:
    
        before       after
real    4m23.664s    3m47.280s
user    4m55.613s    4m39.530s
sys     0m14.805s    0m17.728s

The first three patches are improvements or fixes and I think they're
good to go (if no bugs are found). The last two introduce
--not-so-strict as check_everything_connected replacement and use it
where applicable and are debatable (I guess).

The reroll fixes shallow file update issue Junio pointed out
previously. It also does not remove check_everything_connected unless
it knows that index-pack or unpack-objects are run with
--not-so-strict.

Nguyễn Thái Ngọc Duy (5):
  clone: let the user know when check_everything_connected is run
  fetch-pack: prepare updated shallow file before fetching the pack
  index-pack: remove dead code (it should never happen)
  index-pack, unpack-objects: add --not-so-strict for connectivity check
  Use --not-so-strict on all pack transfer for connectivity check

 Documentation/git-index-pack.txt     |  3 ++
 Documentation/git-unpack-objects.txt |  4 +++
 builtin/clone.c                      | 18 +++++++---
 builtin/fetch.c                      | 11 +++---
 builtin/index-pack.c                 | 10 ++++--
 builtin/receive-pack.c               | 51 +++------------------------
 builtin/unpack-objects.c             |  9 +++--
 commit.h                             |  1 +
 fetch-pack.c                         | 67 ++++++++++++++++++------------------
 fetch-pack.h                         |  3 +-
 shallow.c                            | 30 ++++++++++++++--
 t/t5500-fetch-pack.sh                |  7 ++++
 t/t5504-fetch-receive-strict.sh      |  2 +-
 transport.c                          |  2 ++
 transport.h                          |  1 +
 15 files changed, 120 insertions(+), 99 deletions(-)

-- 
1.8.2.83.gc99314b

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 1/5] clone: let the user know when check_everything_connected is run
  2013-05-01 10:59 ` [PATCH v2 0/5] " Nguyễn Thái Ngọc Duy
@ 2013-05-01 10:59   ` Nguyễn Thái Ngọc Duy
  2013-05-01 10:59   ` [PATCH v2 2/5] fetch-pack: prepare updated shallow file before fetching the pack Nguyễn Thái Ngọc Duy
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 35+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-05-01 10:59 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

check_everything_connected could take a long time, especially in the
clone case where the whole DAG is traversed. The user deserves to know
what's going on.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/clone.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index 035ab64..dad4265 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -546,8 +546,12 @@ static void update_remote_refs(const struct ref *refs,
 {
 	const struct ref *rm = mapped_refs;
 
+	if (0 <= option_verbosity)
+		printf(_("Checking connectivity... "));
 	if (check_everything_connected(iterate_ref_map, 0, &rm))
 		die(_("remote did not send all necessary objects"));
+	if (0 <= option_verbosity)
+		printf(_("done\n"));
 
 	if (refs) {
 		write_remote_refs(mapped_refs);
-- 
1.8.2.83.gc99314b

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 2/5] fetch-pack: prepare updated shallow file before fetching the pack
  2013-05-01 10:59 ` [PATCH v2 0/5] " Nguyễn Thái Ngọc Duy
  2013-05-01 10:59   ` [PATCH v2 1/5] clone: let the user know when check_everything_connected is run Nguyễn Thái Ngọc Duy
@ 2013-05-01 10:59   ` Nguyễn Thái Ngọc Duy
  2013-05-01 20:27     ` Junio C Hamano
  2013-05-01 10:59   ` [PATCH v2 3/5] index-pack: remove dead code (it should never happen) Nguyễn Thái Ngọc Duy
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 35+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-05-01 10:59 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

index-pack --strict looks up and follows parent commits. If shallow
information is not ready by the time index-pack is run, index-pack may
be lead to non-existent objects. Make fetch-pack save shallow file to
disk before invoking index-pack.

A new environment variable GIT_SHALLOW_FILE is used to inform
index-pack to use an alternate shallow file. This variable is for
internal use and thus not advertised in git.txt.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 commit.h              |  1 +
 fetch-pack.c          | 65 ++++++++++++++++++++++++---------------------------
 shallow.c             | 30 ++++++++++++++++++++++--
 t/t5500-fetch-pack.sh |  7 ++++++
 4 files changed, 67 insertions(+), 36 deletions(-)

diff --git a/commit.h b/commit.h
index 67bd509..0cef659 100644
--- a/commit.h
+++ b/commit.h
@@ -176,6 +176,7 @@ extern int for_each_commit_graft(each_commit_graft_fn, void *);
 extern int is_repository_shallow(void);
 extern struct commit_list *get_shallow_commits(struct object_array *heads,
 		int depth, int shallow_flag, int not_shallow_flag);
+extern void check_shallow_file_for_update(void);
 
 int is_descendant_of(struct commit *, struct commit_list *);
 int in_merge_bases(struct commit *, struct commit *);
diff --git a/fetch-pack.c b/fetch-pack.c
index f156dd4..17cfa88 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -20,6 +20,7 @@ static int no_done;
 static int fetch_fsck_objects = -1;
 static int transfer_fsck_objects = -1;
 static int agent_supported;
+static struct lock_file shallow_lock;
 
 #define COMPLETE	(1U << 0)
 #define COMMON		(1U << 1)
@@ -779,6 +780,26 @@ static int cmp_ref_by_name(const void *a_, const void *b_)
 	return strcmp(a->name, b->name);
 }
 
+static void setup_alternate_shallow(void)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int fd;
+
+	check_shallow_file_for_update();
+	fd = hold_lock_file_for_update(&shallow_lock, git_path("shallow"),
+				       LOCK_DIE_ON_ERROR);
+	if (write_shallow_commits(&sb, 0)) {
+		if (write_in_full(fd, sb.buf, sb.len) != sb.len)
+			die_errno("failed to write to %s", shallow_lock.filename);
+		if (setenv("GIT_SHALLOW_FILE", shallow_lock.filename, 1))
+			die_errno("failed to set GIT_SHALLOW_FILE");
+	} else {
+		if (setenv("GIT_SHALLOW_FILE", "", 1))
+			die_errno("failed to set GIT_SHALLOW_FILE");
+	}
+	strbuf_release(&sb);
+}
+
 static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 				 int fd[2],
 				 const struct ref *orig_ref,
@@ -858,6 +879,8 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 
 	if (args->stateless_rpc)
 		packet_flush(fd[1]);
+	if (args->depth > 0)
+		setup_alternate_shallow();
 	if (get_pack(args, fd, pack_lockfile))
 		die("git fetch-pack: fetch failed.");
 
@@ -936,15 +959,9 @@ struct ref *fetch_pack(struct fetch_pack_args *args,
 		       struct ref **sought, int nr_sought,
 		       char **pack_lockfile)
 {
-	struct stat st;
 	struct ref *ref_cpy;
 
 	fetch_pack_setup();
-	if (args->depth > 0) {
-		if (stat(git_path("shallow"), &st))
-			st.st_mtime = 0;
-	}
-
 	if (nr_sought)
 		nr_sought = remove_duplicates_in_refs(sought, nr_sought);
 
@@ -955,34 +972,14 @@ struct ref *fetch_pack(struct fetch_pack_args *args,
 	ref_cpy = do_fetch_pack(args, fd, ref, sought, nr_sought, pack_lockfile);
 
 	if (args->depth > 0) {
-		static struct lock_file lock;
-		struct cache_time mtime;
-		struct strbuf sb = STRBUF_INIT;
-		char *shallow = git_path("shallow");
-		int fd;
-
-		mtime.sec = st.st_mtime;
-		mtime.nsec = ST_MTIME_NSEC(st);
-		if (stat(shallow, &st)) {
-			if (mtime.sec)
-				die("shallow file was removed during fetch");
-		} else if (st.st_mtime != mtime.sec
-#ifdef USE_NSEC
-				|| ST_MTIME_NSEC(st) != mtime.nsec
-#endif
-			  )
-			die("shallow file was changed during fetch");
-
-		fd = hold_lock_file_for_update(&lock, shallow,
-					       LOCK_DIE_ON_ERROR);
-		if (!write_shallow_commits(&sb, 0)
-		 || write_in_full(fd, sb.buf, sb.len) != sb.len) {
-			unlink_or_warn(shallow);
-			rollback_lock_file(&lock);
-		} else {
-			commit_lock_file(&lock);
-		}
-		strbuf_release(&sb);
+		struct stat st;
+		if (!fstat(shallow_lock.fd, &st) &&
+		    st.st_size == 0) {
+			unlink_or_warn(git_path("shallow"));
+			rollback_lock_file(&shallow_lock);
+		} else
+			commit_lock_file(&shallow_lock);
+		unsetenv("GIT_SHALLOW_FILE");
 	}
 
 	reprepare_packed_git();
diff --git a/shallow.c b/shallow.c
index 6be915f..9c03806 100644
--- a/shallow.c
+++ b/shallow.c
@@ -3,6 +3,7 @@
 #include "tag.h"
 
 static int is_shallow = -1;
+static struct stat shallow_stat;
 
 int register_shallow(const unsigned char *sha1)
 {
@@ -21,12 +22,15 @@ int is_repository_shallow(void)
 {
 	FILE *fp;
 	char buf[1024];
+	const char *path;
 
 	if (is_shallow >= 0)
 		return is_shallow;
 
-	fp = fopen(git_path("shallow"), "r");
-	if (!fp) {
+	path = getenv("GIT_SHALLOW_FILE");
+	if (!path)
+		path = git_path("shallow");
+	if (stat(path, &shallow_stat) || (fp = fopen(path, "r")) == NULL) {
 		is_shallow = 0;
 		return is_shallow;
 	}
@@ -108,3 +112,25 @@ struct commit_list *get_shallow_commits(struct object_array *heads, int depth,
 
 	return result;
 }
+
+void check_shallow_file_for_update(void)
+{
+	struct stat st;
+
+	if (getenv("GIT_SHALLOW_FILE"))
+		die("GIT_SHALLOW_FILE should not be manually set");
+
+	if (!is_shallow)
+		return;
+	else if (is_shallow == -1)
+		die("BUG: shallow must be initialized by now");
+
+	if (stat(git_path("shallow"), &st))
+		die("shallow file was removed during fetch");
+	else if (st.st_mtime != shallow_stat.st_mtime
+#ifdef USE_NSEC
+		 || ST_MTIME_NSEC(st) != ST_MTIME_NSEC(shallow_stat)
+#endif
+		   )
+		die("shallow file was changed during fetch");
+}
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index d574085..557b073 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -135,6 +135,13 @@ test_expect_success 'clone shallow depth 1' '
 	test "`git --git-dir=shallow0/.git rev-list --count HEAD`" = 1
 '
 
+test_expect_success 'clone shallow depth 1 with fsck' '
+	git config --global fetch.fsckobjects true &&
+	git clone --no-single-branch --depth 1 "file://$(pwd)/." shallow0fsck &&
+	test "`git --git-dir=shallow0fsck/.git rev-list --count HEAD`" = 1 &&
+	git config --global --unset fetch.fsckobjects
+'
+
 test_expect_success 'clone shallow' '
 	git clone --no-single-branch --depth 2 "file://$(pwd)/." shallow
 '
-- 
1.8.2.83.gc99314b

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 3/5] index-pack: remove dead code (it should never happen)
  2013-05-01 10:59 ` [PATCH v2 0/5] " Nguyễn Thái Ngọc Duy
  2013-05-01 10:59   ` [PATCH v2 1/5] clone: let the user know when check_everything_connected is run Nguyễn Thái Ngọc Duy
  2013-05-01 10:59   ` [PATCH v2 2/5] fetch-pack: prepare updated shallow file before fetching the pack Nguyễn Thái Ngọc Duy
@ 2013-05-01 10:59   ` Nguyễn Thái Ngọc Duy
  2013-05-01 10:59   ` [PATCH v2 4/5] index-pack, unpack-objects: add --not-so-strict for connectivity check Nguyễn Thái Ngọc Duy
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 35+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-05-01 10:59 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/index-pack.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 79dfe47..1fd56d9 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -747,8 +747,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 			int eaten;
 			void *buf = (void *) data;
 
-			if (!buf)
-				buf = new_data = get_data_from_pack(obj_entry);
+			assert(data && "data can only be NULL for large blobs");
 
 			/*
 			 * we do not need to free the memory here, as the
-- 
1.8.2.83.gc99314b

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 4/5] index-pack, unpack-objects: add --not-so-strict for connectivity check
  2013-05-01 10:59 ` [PATCH v2 0/5] " Nguyễn Thái Ngọc Duy
                     ` (2 preceding siblings ...)
  2013-05-01 10:59   ` [PATCH v2 3/5] index-pack: remove dead code (it should never happen) Nguyễn Thái Ngọc Duy
@ 2013-05-01 10:59   ` Nguyễn Thái Ngọc Duy
  2013-05-01 23:35     ` Junio C Hamano
  2013-05-01 10:59   ` [PATCH v2 5/5] Use --not-so-strict on all pack transfer " Nguyễn Thái Ngọc Duy
  2013-05-03 12:35   ` [PATCH v3 0/4] check_everything_connected replacement Nguyễn Thái Ngọc Duy
  5 siblings, 1 reply; 35+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-05-01 10:59 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

--not-so-strict only checks if all links from objects in the pack
point to real objects (either in current repo, or from the pack
itself). It's like check_everything_connected() except that:

 - it does not follow DAG in order
 - it can detect incomplete object islands
 - it seems to be faster than "rev-list --objects --all"

On my box, "rev-list --objects --all" takes 34 seconds. index-pack takes

215.25user 8.42system 1:32.31elapsed 242%CPU (0avgtext+0avgdata 1357328maxresident)k
0inputs+1421016outputs (0major+1222987minor)pagefaults 0swaps

And index-pack --not-so-strict takes

pack    96a4e3befa40bf38eddc2d7c99246a59af4ad55d
229.75user 11.31system 1:42.50elapsed 235%CPU (0avgtext+0avgdata 1876816maxresident)k
0inputs+1421016outputs (0major+1307989minor)pagefaults 0swaps

The overhead is about 10 seconds, just 1/3 of rev-list, which makes it
in a better position to replace check_everything_connected(). If this
holds true for general case, it could reduce fetch time by a little bit.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-index-pack.txt     | 3 +++
 Documentation/git-unpack-objects.txt | 4 ++++
 builtin/index-pack.c                 | 7 ++++++-
 builtin/unpack-objects.c             | 9 +++++++--
 4 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index bde8eec..51af7be 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -74,6 +74,9 @@ OPTIONS
 --strict::
 	Die, if the pack contains broken objects or links.
 
+--not-so-strict::
+	Die if the pack contains broken links. For internal use only.
+
 --threads=<n>::
 	Specifies the number of threads to spawn when resolving
 	deltas. This requires that index-pack be compiled with
diff --git a/Documentation/git-unpack-objects.txt b/Documentation/git-unpack-objects.txt
index ff23494..14b6018 100644
--- a/Documentation/git-unpack-objects.txt
+++ b/Documentation/git-unpack-objects.txt
@@ -44,6 +44,10 @@ OPTIONS
 --strict::
 	Don't write objects with broken content or links.
 
+--not-so-strict::
+	Don't write objects with broken links. For internal
+	use only.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 1fd56d9..5d28a1b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -77,6 +77,7 @@ static int nr_threads;
 
 static int from_stdin;
 static int strict;
+static int do_fsck_object;
 static int verbose;
 static int show_stat;
 
@@ -756,7 +757,8 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 			obj = parse_object_buffer(sha1, type, size, buf, &eaten);
 			if (!obj)
 				die(_("invalid %s"), typename(type));
-			if (fsck_object(obj, 1, fsck_error_function))
+			if (do_fsck_object &&
+			    fsck_object(obj, 1, fsck_error_function))
 				die(_("Error in object"));
 			if (fsck_walk(obj, mark_link, NULL))
 				die(_("Not all child objects of %s are reachable"), sha1_to_hex(obj->sha1));
@@ -1511,6 +1513,9 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				fix_thin_pack = 1;
 			} else if (!strcmp(arg, "--strict")) {
 				strict = 1;
+				do_fsck_object = 1;
+			} else if (!strcmp(arg, "--not-so-strict")) {
+				strict = 1;
 			} else if (!strcmp(arg, "--verify")) {
 				verify = 1;
 			} else if (!strcmp(arg, "--verify-stat")) {
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 2217d7b..dd0518b 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -12,7 +12,7 @@
 #include "decorate.h"
 #include "fsck.h"
 
-static int dry_run, quiet, recover, has_errors, strict;
+static int dry_run, quiet, recover, has_errors, strict, do_fsck_object;
 static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict] < pack-file";
 
 /* We always read in 4kB chunks. */
@@ -198,7 +198,7 @@ static int check_object(struct object *obj, int type, void *data)
 		return 0;
 	}
 
-	if (fsck_object(obj, 1, fsck_error_function))
+	if (do_fsck_object && fsck_object(obj, 1, fsck_error_function))
 		die("Error in object");
 	if (fsck_walk(obj, check_object, NULL))
 		die("Error on reachable objects of %s", sha1_to_hex(obj->sha1));
@@ -520,6 +520,11 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix)
 				continue;
 			}
 			if (!strcmp(arg, "--strict")) {
+				do_fsck_object = 1;
+				strict = 1;
+				continue;
+			}
+			if (!strcmp(arg, "--not-so-strict")) {
 				strict = 1;
 				continue;
 			}
-- 
1.8.2.83.gc99314b

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 5/5] Use --not-so-strict on all pack transfer for connectivity check
  2013-05-01 10:59 ` [PATCH v2 0/5] " Nguyễn Thái Ngọc Duy
                     ` (3 preceding siblings ...)
  2013-05-01 10:59   ` [PATCH v2 4/5] index-pack, unpack-objects: add --not-so-strict for connectivity check Nguyễn Thái Ngọc Duy
@ 2013-05-01 10:59   ` Nguyễn Thái Ngọc Duy
  2013-05-03 12:35   ` [PATCH v3 0/4] check_everything_connected replacement Nguyễn Thái Ngọc Duy
  5 siblings, 0 replies; 35+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-05-01 10:59 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

This replaces check_everything_connected() with --not-so-strict, which
accomplishes the same thing and is generally cheaper when index-pack
or unpack-objects are used. All other cases fall back to
check_everything_connected.

This could help reduce the impact of check_everything_connected() on
pack transport. For example, cloning file://../linux-2.6.git

        before       after
real    4m23.664s    3m47.280s
user    4m55.613s    4m39.530s
sys     0m14.805s    0m17.728s

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/clone.c                 | 22 ++++++++++--------
 builtin/fetch.c                 | 11 +++++----
 builtin/receive-pack.c          | 51 ++++-------------------------------------
 fetch-pack.c                    |  2 ++
 fetch-pack.h                    |  3 ++-
 t/t5504-fetch-receive-strict.sh |  2 +-
 transport.c                     |  2 ++
 transport.h                     |  1 +
 8 files changed, 32 insertions(+), 62 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index dad4265..1cd37c1 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -497,6 +497,8 @@ static void write_remote_refs(const struct ref *local_refs)
 	for (r = local_refs; r; r = r->next) {
 		if (!r->peer_ref)
 			continue;
+		if (!has_sha1_file(r->old_sha1))
+			die(_("remote did not send all necessary objects"));
 		add_packed_ref(r->peer_ref->name, r->old_sha1);
 	}
 
@@ -544,15 +546,6 @@ static void update_remote_refs(const struct ref *refs,
 			       const char *branch_top,
 			       const char *msg)
 {
-	const struct ref *rm = mapped_refs;
-
-	if (0 <= option_verbosity)
-		printf(_("Checking connectivity... "));
-	if (check_everything_connected(iterate_ref_map, 0, &rm))
-		die(_("remote did not send all necessary objects"));
-	if (0 <= option_verbosity)
-		printf(_("done\n"));
-
 	if (refs) {
 		write_remote_refs(mapped_refs);
 		if (option_single_branch)
@@ -953,6 +946,17 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	else if (refs && complete_refs_before_fetch)
 		transport_fetch_refs(transport, mapped_refs);
 
+	if (!transport->smart_options ||
+	    !transport->smart_options->connected) {
+		const struct ref *rm = mapped_refs;
+
+		if (0 <= option_verbosity)
+			printf(_("Checking connectivity... "));
+		if (check_everything_connected(iterate_ref_map, 0, &rm))
+			die(_("remote did not send all necessary objects"));
+		if (0 <= option_verbosity)
+			printf(_("done\n"));
+	}
 	update_remote_refs(refs, mapped_refs, remote_head_points_at,
 			   branch_top.buf, reflog_msg.buf);
 
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 4b6b1df..e013d3f 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -380,7 +380,7 @@ static int iterate_ref_map(void *cb_data, unsigned char sha1[20])
 }
 
 static int store_updated_refs(const char *raw_url, const char *remote_name,
-		struct ref *ref_map)
+			      struct ref *ref_map, int need_check)
 {
 	FILE *fp;
 	struct commit *commit;
@@ -401,7 +401,8 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		url = xstrdup("foreign");
 
 	rm = ref_map;
-	if (check_everything_connected(iterate_ref_map, 0, &rm)) {
+	if (need_check &&
+	    check_everything_connected(iterate_ref_map, 0, &rm)) {
 		rc = error(_("%s did not send all necessary objects\n"), url);
 		goto abort;
 	}
@@ -538,8 +539,10 @@ static int fetch_refs(struct transport *transport, struct ref *ref_map)
 		ret = transport_fetch_refs(transport, ref_map);
 	if (!ret)
 		ret |= store_updated_refs(transport->url,
-				transport->remote->name,
-				ref_map);
+					  transport->remote->name,
+					  ref_map,
+					  !transport->smart_options ||
+					  !transport->smart_options->connected);
 	transport_unlock_pack(transport);
 	return ret;
 }
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index e3eb5fc..2157c1a 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -651,48 +651,6 @@ static void check_aliased_updates(struct command *commands)
 	string_list_clear(&ref_list, 0);
 }
 
-static int command_singleton_iterator(void *cb_data, unsigned char sha1[20])
-{
-	struct command **cmd_list = cb_data;
-	struct command *cmd = *cmd_list;
-
-	if (!cmd || is_null_sha1(cmd->new_sha1))
-		return -1; /* end of list */
-	*cmd_list = NULL; /* this returns only one */
-	hashcpy(sha1, cmd->new_sha1);
-	return 0;
-}
-
-static void set_connectivity_errors(struct command *commands)
-{
-	struct command *cmd;
-
-	for (cmd = commands; cmd; cmd = cmd->next) {
-		struct command *singleton = cmd;
-		if (!check_everything_connected(command_singleton_iterator,
-						0, &singleton))
-			continue;
-		cmd->error_string = "missing necessary objects";
-	}
-}
-
-static int iterate_receive_command_list(void *cb_data, unsigned char sha1[20])
-{
-	struct command **cmd_list = cb_data;
-	struct command *cmd = *cmd_list;
-
-	while (cmd) {
-		if (!is_null_sha1(cmd->new_sha1)) {
-			hashcpy(sha1, cmd->new_sha1);
-			*cmd_list = cmd->next;
-			return 0;
-		}
-		cmd = cmd->next;
-	}
-	*cmd_list = NULL;
-	return -1; /* end of list */
-}
-
 static void reject_updates_to_hidden(struct command *commands)
 {
 	struct command *cmd;
@@ -718,11 +676,6 @@ static void execute_commands(struct command *commands, const char *unpacker_erro
 		return;
 	}
 
-	cmd = commands;
-	if (check_everything_connected(iterate_receive_command_list,
-				       0, &cmd))
-		set_connectivity_errors(commands);
-
 	reject_updates_to_hidden(commands);
 
 	if (run_receive_hook(commands, "pre-receive", 0)) {
@@ -844,6 +797,8 @@ static const char *unpack(int err_fd)
 			unpacker[i++] = "-q";
 		if (fsck_objects)
 			unpacker[i++] = "--strict";
+		else
+			unpacker[i++] = "--not-so-strict";
 		unpacker[i++] = hdr_arg;
 		unpacker[i++] = NULL;
 		memset(&child, 0, sizeof(child));
@@ -869,6 +824,8 @@ static const char *unpack(int err_fd)
 		keeper[i++] = "--stdin";
 		if (fsck_objects)
 			keeper[i++] = "--strict";
+		else
+			keeper[i++] = "--not-so-strict";
 		keeper[i++] = "--fix-thin";
 		keeper[i++] = hdr_arg;
 		keeper[i++] = keep_arg;
diff --git a/fetch-pack.c b/fetch-pack.c
index 17cfa88..cdd9e2d 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -755,6 +755,8 @@ static int get_pack(struct fetch_pack_args *args,
 	    ? transfer_fsck_objects
 	    : 0)
 		*av++ = "--strict";
+	else if (args->check_connectivity)
+		*av++ = "--not-so-strict";
 	*av++ = NULL;
 
 	cmd.in = demux.out;
diff --git a/fetch-pack.h b/fetch-pack.h
index dc5266c..824136d 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -16,7 +16,8 @@ struct fetch_pack_args {
 		verbose:1,
 		no_progress:1,
 		include_tag:1,
-		stateless_rpc:1;
+		stateless_rpc:1,
+		check_connectivity:1;
 };
 
 /*
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 69ee13c..14d2935 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -60,7 +60,7 @@ test_expect_success 'fetch with transfer.fsckobjects' '
 
 cat >exp <<EOF
 To dst
-!	refs/heads/master:refs/heads/test	[remote rejected] (missing necessary objects)
+!	refs/heads/master:refs/heads/test	[remote rejected] (unpacker error)
 EOF
 
 test_expect_success 'push without strict' '
diff --git a/transport.c b/transport.c
index ba5d8af..72e6fa3 100644
--- a/transport.c
+++ b/transport.c
@@ -534,6 +534,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 	args.quiet = (transport->verbose < 0);
 	args.no_progress = !transport->progress;
 	args.depth = data->options.depth;
+	args.check_connectivity = 1;
 
 	if (!data->got_remote_heads) {
 		connect_setup(transport, 0, 0);
@@ -551,6 +552,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 		refs = NULL;
 	data->conn = NULL;
 	data->got_remote_heads = 0;
+	data->options.connected = 1;
 
 	free_refs(refs_tmp);
 
diff --git a/transport.h b/transport.h
index fcb1d25..e127224 100644
--- a/transport.h
+++ b/transport.h
@@ -8,6 +8,7 @@ struct git_transport_options {
 	unsigned thin : 1;
 	unsigned keep : 1;
 	unsigned followtags : 1;
+	unsigned connected : 1;
 	int depth;
 	const char *uploadpack;
 	const char *receivepack;
-- 
1.8.2.83.gc99314b

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 2/5] fetch-pack: prepare updated shallow file before fetching the pack
  2013-05-01 10:59   ` [PATCH v2 2/5] fetch-pack: prepare updated shallow file before fetching the pack Nguyễn Thái Ngọc Duy
@ 2013-05-01 20:27     ` Junio C Hamano
  2013-05-02 10:04       ` Duy Nguyen
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2013-05-01 20:27 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> index-pack --strict looks up and follows parent commits. If shallow
> information is not ready by the time index-pack is run, index-pack may
> be lead to non-existent objects. Make fetch-pack save shallow file to
> disk before invoking index-pack.
>
> A new environment variable GIT_SHALLOW_FILE is used to inform
> index-pack to use an alternate shallow file. This variable is for
> internal use and thus not advertised in git.txt.

The idea to tell the index-pack to use a different shallow file is
sound, but is the environment variable the best way to go?  I am
mostly worried about it being an implicit "apply everywhere"
mechanism and while I do not particularly like the idea of doing
everything in the same process as the top-level while dealing with
submodules, I am reasonably sure people would want to "clone"
recursively inside the same top-level process in the future, and
this new environment variable adds one more thing to be cleansed
when crossing a repository boundary.

> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  commit.h              |  1 +
>  fetch-pack.c          | 65 ++++++++++++++++++++++++---------------------------
>  shallow.c             | 30 ++++++++++++++++++++++--
>  t/t5500-fetch-pack.sh |  7 ++++++
>  4 files changed, 67 insertions(+), 36 deletions(-)
>
> diff --git a/commit.h b/commit.h
> index 67bd509..0cef659 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -176,6 +176,7 @@ extern int for_each_commit_graft(each_commit_graft_fn, void *);
>  extern int is_repository_shallow(void);
>  extern struct commit_list *get_shallow_commits(struct object_array *heads,
>  		int depth, int shallow_flag, int not_shallow_flag);
> +extern void check_shallow_file_for_update(void);
>  
>  int is_descendant_of(struct commit *, struct commit_list *);
>  int in_merge_bases(struct commit *, struct commit *);
> diff --git a/fetch-pack.c b/fetch-pack.c
> index f156dd4..17cfa88 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -20,6 +20,7 @@ static int no_done;
>  static int fetch_fsck_objects = -1;
>  static int transfer_fsck_objects = -1;
>  static int agent_supported;
> +static struct lock_file shallow_lock;
>  
>  #define COMPLETE	(1U << 0)
>  #define COMMON		(1U << 1)
> @@ -779,6 +780,26 @@ static int cmp_ref_by_name(const void *a_, const void *b_)
>  	return strcmp(a->name, b->name);
>  }
>  
> +static void setup_alternate_shallow(void)
> +{
> +	struct strbuf sb = STRBUF_INIT;
> +	int fd;
> +
> +	check_shallow_file_for_update();
> +	fd = hold_lock_file_for_update(&shallow_lock, git_path("shallow"),
> +				       LOCK_DIE_ON_ERROR);
> +	if (write_shallow_commits(&sb, 0)) {
> +		if (write_in_full(fd, sb.buf, sb.len) != sb.len)
> +			die_errno("failed to write to %s", shallow_lock.filename);
> +		if (setenv("GIT_SHALLOW_FILE", shallow_lock.filename, 1))
> +			die_errno("failed to set GIT_SHALLOW_FILE");
> +	} else {
> +		if (setenv("GIT_SHALLOW_FILE", "", 1))
> +			die_errno("failed to set GIT_SHALLOW_FILE");
> +	}
> +	strbuf_release(&sb);
> +}
> +
>  static struct ref *do_fetch_pack(struct fetch_pack_args *args,
>  				 int fd[2],
>  				 const struct ref *orig_ref,
> @@ -858,6 +879,8 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
>  
>  	if (args->stateless_rpc)
>  		packet_flush(fd[1]);
> +	if (args->depth > 0)
> +		setup_alternate_shallow();
>  	if (get_pack(args, fd, pack_lockfile))
>  		die("git fetch-pack: fetch failed.");
>  
> @@ -936,15 +959,9 @@ struct ref *fetch_pack(struct fetch_pack_args *args,
>  		       struct ref **sought, int nr_sought,
>  		       char **pack_lockfile)
>  {
> -	struct stat st;
>  	struct ref *ref_cpy;
>  
>  	fetch_pack_setup();
> -	if (args->depth > 0) {
> -		if (stat(git_path("shallow"), &st))
> -			st.st_mtime = 0;
> -	}
> -
>  	if (nr_sought)
>  		nr_sought = remove_duplicates_in_refs(sought, nr_sought);
>  
> @@ -955,34 +972,14 @@ struct ref *fetch_pack(struct fetch_pack_args *args,
>  	ref_cpy = do_fetch_pack(args, fd, ref, sought, nr_sought, pack_lockfile);
>  
>  	if (args->depth > 0) {
> -		static struct lock_file lock;
> -		struct cache_time mtime;
> -		struct strbuf sb = STRBUF_INIT;
> -		char *shallow = git_path("shallow");
> -		int fd;
> -
> -		mtime.sec = st.st_mtime;
> -		mtime.nsec = ST_MTIME_NSEC(st);
> -		if (stat(shallow, &st)) {
> -			if (mtime.sec)
> -				die("shallow file was removed during fetch");
> -		} else if (st.st_mtime != mtime.sec
> -#ifdef USE_NSEC
> -				|| ST_MTIME_NSEC(st) != mtime.nsec
> -#endif
> -			  )
> -			die("shallow file was changed during fetch");
> -
> -		fd = hold_lock_file_for_update(&lock, shallow,
> -					       LOCK_DIE_ON_ERROR);
> -		if (!write_shallow_commits(&sb, 0)
> -		 || write_in_full(fd, sb.buf, sb.len) != sb.len) {
> -			unlink_or_warn(shallow);
> -			rollback_lock_file(&lock);
> -		} else {
> -			commit_lock_file(&lock);
> -		}
> -		strbuf_release(&sb);
> +		struct stat st;
> +		if (!fstat(shallow_lock.fd, &st) &&
> +		    st.st_size == 0) {
> +			unlink_or_warn(git_path("shallow"));
> +			rollback_lock_file(&shallow_lock);
> +		} else
> +			commit_lock_file(&shallow_lock);
> +		unsetenv("GIT_SHALLOW_FILE");
>  	}
>  
>  	reprepare_packed_git();
> diff --git a/shallow.c b/shallow.c
> index 6be915f..9c03806 100644
> --- a/shallow.c
> +++ b/shallow.c
> @@ -3,6 +3,7 @@
>  #include "tag.h"
>  
>  static int is_shallow = -1;
> +static struct stat shallow_stat;
>  
>  int register_shallow(const unsigned char *sha1)
>  {
> @@ -21,12 +22,15 @@ int is_repository_shallow(void)
>  {
>  	FILE *fp;
>  	char buf[1024];
> +	const char *path;
>  
>  	if (is_shallow >= 0)
>  		return is_shallow;
>  
> -	fp = fopen(git_path("shallow"), "r");
> -	if (!fp) {
> +	path = getenv("GIT_SHALLOW_FILE");
> +	if (!path)
> +		path = git_path("shallow");
> +	if (stat(path, &shallow_stat) || (fp = fopen(path, "r")) == NULL) {
>  		is_shallow = 0;
>  		return is_shallow;
>  	}
> @@ -108,3 +112,25 @@ struct commit_list *get_shallow_commits(struct object_array *heads, int depth,
>  
>  	return result;
>  }
> +
> +void check_shallow_file_for_update(void)
> +{
> +	struct stat st;
> +
> +	if (getenv("GIT_SHALLOW_FILE"))
> +		die("GIT_SHALLOW_FILE should not be manually set");
> +
> +	if (!is_shallow)
> +		return;
> +	else if (is_shallow == -1)
> +		die("BUG: shallow must be initialized by now");
> +
> +	if (stat(git_path("shallow"), &st))
> +		die("shallow file was removed during fetch");
> +	else if (st.st_mtime != shallow_stat.st_mtime
> +#ifdef USE_NSEC
> +		 || ST_MTIME_NSEC(st) != ST_MTIME_NSEC(shallow_stat)
> +#endif
> +		   )
> +		die("shallow file was changed during fetch");
> +}
> diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
> index d574085..557b073 100755
> --- a/t/t5500-fetch-pack.sh
> +++ b/t/t5500-fetch-pack.sh
> @@ -135,6 +135,13 @@ test_expect_success 'clone shallow depth 1' '
>  	test "`git --git-dir=shallow0/.git rev-list --count HEAD`" = 1
>  '
>  
> +test_expect_success 'clone shallow depth 1 with fsck' '
> +	git config --global fetch.fsckobjects true &&
> +	git clone --no-single-branch --depth 1 "file://$(pwd)/." shallow0fsck &&
> +	test "`git --git-dir=shallow0fsck/.git rev-list --count HEAD`" = 1 &&
> +	git config --global --unset fetch.fsckobjects
> +'
> +
>  test_expect_success 'clone shallow' '
>  	git clone --no-single-branch --depth 2 "file://$(pwd)/." shallow
>  '

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/5] index-pack, unpack-objects: add --not-so-strict for connectivity check
  2013-05-01 10:59   ` [PATCH v2 4/5] index-pack, unpack-objects: add --not-so-strict for connectivity check Nguyễn Thái Ngọc Duy
@ 2013-05-01 23:35     ` Junio C Hamano
  2013-05-02  9:53       ` Duy Nguyen
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2013-05-01 23:35 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> --not-so-strict only checks if all links from objects in the pack
> point to real objects (either in current repo, or from the pack
> itself). It's like check_everything_connected() except that:
>
>  - it does not follow DAG in order
>  - it can detect incomplete object islands

Could you clarify what this second point means?

"rev-list --objects --all --not $this $that" does not detect
"islands" but checking with the updated index-pack does?

>  - it seems to be faster than "rev-list --objects --all"

More important is that it makes sure that it is safe to update our
refs to the new value, just like the check this attempts to replace.
If that is not the case, the speed does not matter.

I am guessing that the code assumes that we are updating our refs to
objects that are in the pack that we are looking at, and I can see
how the new check in sha1_object() may detect an object that points
at another object that is missing.  But that assumption (which I
think is correct) is probably the most important thing to say in the
log message.

> +--not-so-strict::

Perhaps "--check-connectivity" is a better name than this?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/5] index-pack, unpack-objects: add --not-so-strict for connectivity check
  2013-05-01 23:35     ` Junio C Hamano
@ 2013-05-02  9:53       ` Duy Nguyen
  2013-05-02 16:27         ` Junio C Hamano
  0 siblings, 1 reply; 35+ messages in thread
From: Duy Nguyen @ 2013-05-02  9:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Jeff King

On Thu, May 2, 2013 at 6:35 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> --not-so-strict only checks if all links from objects in the pack
>> point to real objects (either in current repo, or from the pack
>> itself). It's like check_everything_connected() except that:
>>
>>  - it does not follow DAG in order
>>  - it can detect incomplete object islands
>
> Could you clarify what this second point means?
>
> "rev-list --objects --all --not $this $that" does not detect
> "islands" but checking with the updated index-pack does?

Object islands (in the new pack) by definition are not connected to
the main DAG and so invisible to/unreachable from rev-list. index-pack
examines all objects in the pack and checks links of each object. With
this approach, islands are no different than reachable objects.

>>  - it seems to be faster than "rev-list --objects --all"
>
> More important is that it makes sure that it is safe to update our
> refs to the new value, just like the check this attempts to replace.
> If that is not the case, the speed does not matter.
>
> I am guessing that the code assumes that we are updating our refs to
> objects that are in the pack that we are looking at, and I can see
> how the new check in sha1_object() may detect an object that points
> at another object that is missing.  But that assumption (which I
> think is correct) is probably the most important thing to say in the
> log message.

Yes, we need to make sure the new value of our refs are existing
objects. But it does not need to be in the new pack. After index-pack
is run, we're guaranteed that all objects in repo are connected and
any of them could be new ref. This is also why I add has_sha1_file()
in clone.c. I forget if I have checked for similar call in fetch.c and
receive-pack.c. Will look again (and update log message)

>> +--not-so-strict::
>
> Perhaps "--check-connectivity" is a better name than this?

Definitely.
--
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 2/5] fetch-pack: prepare updated shallow file before fetching the pack
  2013-05-01 20:27     ` Junio C Hamano
@ 2013-05-02 10:04       ` Duy Nguyen
  0 siblings, 0 replies; 35+ messages in thread
From: Duy Nguyen @ 2013-05-02 10:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Jeff King

On Thu, May 2, 2013 at 3:27 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> index-pack --strict looks up and follows parent commits. If shallow
>> information is not ready by the time index-pack is run, index-pack may
>> be lead to non-existent objects. Make fetch-pack save shallow file to
>> disk before invoking index-pack.
>>
>> A new environment variable GIT_SHALLOW_FILE is used to inform
>> index-pack to use an alternate shallow file. This variable is for
>> internal use and thus not advertised in git.txt.
>
> The idea to tell the index-pack to use a different shallow file is
> sound, but is the environment variable the best way to go?  I am
> mostly worried about it being an implicit "apply everywhere"
> mechanism and while I do not particularly like the idea of doing
> everything in the same process as the top-level while dealing with
> submodules, I am reasonably sure people would want to "clone"
> recursively inside the same top-level process in the future, and
> this new environment variable adds one more thing to be cleansed
> when crossing a repository boundary.

First of all, fetch does check that GIT_SHALLOW_FILE must not be set
before it updates shallow file, so if fetch is run recursively, the
inner one will get caught. And GIT_SHALLOW_FILE is only set for a
period of time when pack is received. Unless somebody launches fetch
inside index-pack, we should be safe.

That said, passing this info via a --shallow-file is not hard to do
(so I will likely do it). But I wonder how submodule code handles the
case where the user set GIT_OBJECT_DIRECTORY or similar variables.
GIT_WORK_TREE for example could be set internally by git and
propagated down to submodule subprocesses..
--
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/5] index-pack, unpack-objects: add --not-so-strict for connectivity check
  2013-05-02  9:53       ` Duy Nguyen
@ 2013-05-02 16:27         ` Junio C Hamano
  2013-05-03  2:29           ` Duy Nguyen
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2013-05-02 16:27 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Jeff King

Duy Nguyen <pclouds@gmail.com> writes:

> On Thu, May 2, 2013 at 6:35 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>>
>>> --not-so-strict only checks if all links from objects in the pack
>>> point to real objects (either in current repo, or from the pack
>>> itself). It's like check_everything_connected() except that:
>>>
>>>  - it does not follow DAG in order
>>>  - it can detect incomplete object islands
>>
>> Could you clarify what this second point means?
>>
>> "rev-list --objects --all --not $this $that" does not detect
>> "islands" but checking with the updated index-pack does?
>
> Object islands (in the new pack) by definition are not connected to
> the main DAG and so invisible to/unreachable from rev-list. index-pack
> examines all objects in the pack and checks links of each object. With
> this approach, islands are no different than reachable objects.

OK, so if you are fetching an updated tip of the main history, and a
new tip of a history that is disjoint. If we imagine that my public
repository just added the 'todo' branch and you are fecting them for
the first time. The history of 'todo' branch is an island that is
not connected anywhere from your refs namespace yet. In order to
ensure that updating the tip of fetched 'todo' is safe, you would
need to verify the island is free of dangling pointers and the only
thing you need to be sure is the tip of 'todo' is _in_ that island.

>> I am guessing that the code assumes that we are updating our refs to
>> objects that are in the pack that we are looking at, and I can see
>> how the new check in sha1_object() may detect an object that points
>> at another object that is missing.  But that assumption (which I
>> think is correct) is probably the most important thing to say in the
>> log message.
>
> Yes, we need to make sure the new value of our refs are existing
> objects. But it does not need to be in the new pack.

It is a bit more tricky than that.  A malicious (or simply buggy)
other side can send a subset of my 'todo' branch, which is an island
that is free of dangling pointers (think: 'rev-list --objects
todo~8').  Further imagine that you earlier attempted a fetch of the
same history from me over a commit walker and you happen to have
partial history near the tip of 'todo' but not connected to the
island.  sha1_object() will find it, but that does not say anything
useful.  The tip _must_ appear in the island for your check to yield
a usable result, no?

The existing "everything connected" was designed to protect against
that kind of breakage as well.

I might be reading your change incorrectly, but I am not sure how
the new code protects against such a breakage.

> After index-pack
> is run, we're guaranteed that all objects in repo are connected and
> any of them could be new ref. This is also why I add has_sha1_file()
> in clone.c.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/5] index-pack, unpack-objects: add --not-so-strict for connectivity check
  2013-05-02 16:27         ` Junio C Hamano
@ 2013-05-03  2:29           ` Duy Nguyen
  2013-05-03  6:33             ` Junio C Hamano
  0 siblings, 1 reply; 35+ messages in thread
From: Duy Nguyen @ 2013-05-03  2:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Jeff King

On Fri, May 3, 2013 at 2:27 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Object islands (in the new pack) by definition are not connected to
>> the main DAG and so invisible to/unreachable from rev-list. index-pack
>> examines all objects in the pack and checks links of each object. With
>> this approach, islands are no different than reachable objects.
>
> OK, so if you are fetching an updated tip of the main history, and a
> new tip of a history that is disjoint. If we imagine that my public
> repository just added the 'todo' branch and you are fecting them for
> the first time. The history of 'todo' branch is an island that is
> not connected anywhere from your refs namespace yet. In order to
> ensure that updating the tip of fetched 'todo' is safe, you would
> need to verify the island is free of dangling pointers and the only
> thing you need to be sure is the tip of 'todo' is _in_ that island.

Why tip must be in that island or any other islands? There is no way
we know which island (or the main DAG) should be connected to any
tips. It should be enough that the tip in question exists and do not
contain any dangling pointers.

>>> I am guessing that the code assumes that we are updating our refs to
>>> objects that are in the pack that we are looking at, and I can see
>>> how the new check in sha1_object() may detect an object that points
>>> at another object that is missing.  But that assumption (which I
>>> think is correct) is probably the most important thing to say in the
>>> log message.
>>
>> Yes, we need to make sure the new value of our refs are existing
>> objects. But it does not need to be in the new pack.
>
> It is a bit more tricky than that.  A malicious (or simply buggy)
> other side can send a subset of my 'todo' branch, which is an island
> that is free of dangling pointers (think: 'rev-list --objects
> todo~8').  Further imagine that you earlier attempted a fetch of the
> same history from me over a commit walker and you happen to have
> partial history near the tip of 'todo' but not connected to the
> island.  sha1_object() will find it, but that does not say anything
> useful.  The tip _must_ appear in the island for your check to yield
> a usable result, no?

What do you mean by "partial history"? Do we have dangling pointers
after doing that commit walker?

> The existing "everything connected" was designed to protect against
> that kind of breakage as well.
>
> I might be reading your change incorrectly, but I am not sure how
> the new code protects against such a breakage.
>
>> After index-pack
>> is run, we're guaranteed that all objects in repo are connected and
>> any of them could be new ref. This is also why I add has_sha1_file()
>> in clone.c.
-- 
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/5] index-pack, unpack-objects: add --not-so-strict for connectivity check
  2013-05-03  2:29           ` Duy Nguyen
@ 2013-05-03  6:33             ` Junio C Hamano
  2013-05-03  6:55               ` Junio C Hamano
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2013-05-03  6:33 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Jeff King

Duy Nguyen <pclouds@gmail.com> writes:

> What do you mean by "partial history"? Do we have dangling pointers
> after doing that commit walker?

"^C" will leave the objects and it is safe because it will not
update refs.

But your code that does not verify the full connectivity from such
an object (that is outside the transferred pack) to the rest of the
history will then make the resulting repository broken, if you
update your ref to point at such an object, no?  Ading a single
has_sha1_file() only verifies that single object, not the history
behind it.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/5] index-pack, unpack-objects: add --not-so-strict for connectivity check
  2013-05-03  6:33             ` Junio C Hamano
@ 2013-05-03  6:55               ` Junio C Hamano
  2013-05-03  7:09                 ` Duy Nguyen
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2013-05-03  6:55 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> Duy Nguyen <pclouds@gmail.com> writes:
>
>> What do you mean by "partial history"? Do we have dangling pointers
>> after doing that commit walker?
>
> "^C" will leave the objects and it is safe because it will not
> update refs.
>
> But your code that does not verify the full connectivity from such
> an object (that is outside the transferred pack) to the rest of the
> history will then make the resulting repository broken, if you
> update your ref to point at such an object, no?  Ading a single
> has_sha1_file() only verifies that single object, not the history
> behind it.

Let's illustrate.  Imagine your project as a whole has this history:

---A---B---C---D---E

When you cloned, it only had up to A, so that is what you have.
Then you try http walker, which slurps E, wants to go to D and dig
down, but after fetching E, some trees and blobs in E, and fetching
D, but before completing D' trees and blobs, your ISP cuts you off.

You have these in your object store:

---A           D---E

but your ref is pointing at A, because we promise that we make sure
full connectivity before we update the ref, and even if you have
commits D and E, the associated trees, blobs, and commits behind
this subpart of the history are missing.

Now you try to fetch from another mirror over the pack transport.
You advertise that you have A (but you do not advertise E, because
your refs do not point at it), and you expect all objects that are
listed in "rev-list --objects A..E" to be gien to you.

Unfortunately, the mirror is broken, in that it only packs the
objects that appear in "rev-list --objects A..B", but still tells
you that it is sending objects to complete history leading to E.

Your object store would have objects like this after this transfer:

---A---B       D---E

You may still have commits D and E unpruned, but are missing C, and
trees and blobs that are needed in D or E.

You have to walk the history from E and list the necessary objects
yourself, running has_sha1_file() on all of them, to prove that you
have everything needed, and the only things you can trust are your
refs (in this case, A).

If you verify that all the objects in the transferred pack are
complete, and also verify that the object the transfer proposes to
update your ref to is _in_ that pack, then you can detect a mirror
that is broken and only sends objects for A..B, but that does not
actually solve the issue.  Imagine another broken mirror that
instead sends objects for E.  E, its trees and all its blobs may be
in the pack and the only outgoing link from that pack may be a
pointer out of E pointing at D.  You happen to _have_ it in your
object store, but that does not mean you can update your ref to E
(remember, you do not have trees and blobs to complete D, or the
history behind it).

The current check is implemented in the way we currently do it,
_not_ because we were stupid and did not realize the optimization
possibility (in other words, what your patch proposes is not new),
but because we rejected this approach because it was not safe.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/5] index-pack, unpack-objects: add --not-so-strict for connectivity check
  2013-05-03  6:55               ` Junio C Hamano
@ 2013-05-03  7:09                 ` Duy Nguyen
  2013-05-03  8:16                   ` Eric Sunshine
  0 siblings, 1 reply; 35+ messages in thread
From: Duy Nguyen @ 2013-05-03  7:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Jeff King

On Thu, May 02, 2013 at 11:55:57PM -0700, Junio C Hamano wrote:
> Let's illustrate.  Imagine your project as a whole has this history:

[snip]

OK I agree my approach is flawed. But if all these are met, we can be
sure the new refs are good, correct?

 - all refs point to an object in the pack
 - there are no dangling pointers in any object in the pack
 - all objects in the pack does not point to any objects outside the
   pack

We can check all these for the clone case with the patch below, even
if somehow we get some bad objects in the cloned repository before the
new pack arrives.

-- 8< --
Subject: [PATCH] clone: open a shortcut for connectivity check

In order to make sure the cloned repository is good, we run "rev-list
--objects --not --all $new_refs" on the repository. This is expensive
on large repositories. This patch attempts to mitigate the impact in
this special case.

In the "good" clone case, we only have one pack. If all of the
following are met, we can be sure that all objects reachable from the
new refs exist, which is the intention of running "rev-list ...":

 - all refs point to an object in the pack
 - there are no dangling pointers in any object in the pack
 - all objects in the pack does not point to objects outside the pack

The second and third checks can be done with the help of index-pack as
a slight modification of --strict check (which introduces a new
condition for the shortcut: pack transfer must be used and the number
of objects large enough to call index-pack). The first is checked in
check_everything_connected after we get an "ok" from index-pack.

"index-pack + new checks" is still faster than "index-pack + rev-list"
currently, which is the whole point of this patch. If any of the
conditions fails, we fall back to good old and expensive "rev-list
..". In that case it's even more expensive because we have to pay for
the new checks in index-pack.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-index-pack.txt |  3 +++
 builtin/clone.c                  | 11 ++++++++---
 builtin/index-pack.c             | 35 ++++++++++++++++++++++++++---------
 connected.c                      | 34 +++++++++++++++++++++++++++++++++-
 connected.h                      |  5 +++++
 fetch-pack.c                     | 11 ++++++++++-
 fetch-pack.h                     |  4 +++-
 transport.c                      |  4 ++++
 transport.h                      |  2 ++
 9 files changed, 94 insertions(+), 15 deletions(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index bde8eec..7a4e055 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -74,6 +74,9 @@ OPTIONS
 --strict::
 	Die, if the pack contains broken objects or links.
 
+--check-self-contained-and-connected::
+	Die if the pack contains broken links. For internal use only.
+
 --threads=<n>::
 	Specifies the number of threads to spawn when resolving
 	deltas. This requires that index-pack be compiled with
diff --git a/builtin/clone.c b/builtin/clone.c
index dad4265..427bec4 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -542,13 +542,15 @@ static void update_remote_refs(const struct ref *refs,
 			       const struct ref *mapped_refs,
 			       const struct ref *remote_head_points_at,
 			       const char *branch_top,
-			       const char *msg)
+			       const char *msg,
+			       struct transport *transport)
 {
 	const struct ref *rm = mapped_refs;
 
 	if (0 <= option_verbosity)
 		printf(_("Checking connectivity... "));
-	if (check_everything_connected(iterate_ref_map, 0, &rm))
+	if (check_everything_connected_with_transport(iterate_ref_map,
+						      0, &rm, transport))
 		die(_("remote did not send all necessary objects"));
 	if (0 <= option_verbosity)
 		printf(_("done\n"));
@@ -893,6 +895,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		if (option_upload_pack)
 			transport_set_option(transport, TRANS_OPT_UPLOADPACK,
 					     option_upload_pack);
+
+		if (transport->smart_options)
+			transport->smart_options->check_self_contained_and_connected = 1;
 	}
 
 	refs = transport_get_remote_refs(transport);
@@ -954,7 +959,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		transport_fetch_refs(transport, mapped_refs);
 
 	update_remote_refs(refs, mapped_refs, remote_head_points_at,
-			   branch_top.buf, reflog_msg.buf);
+			   branch_top.buf, reflog_msg.buf, transport);
 
 	update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 1fd56d9..6aa7bb6 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -77,8 +77,10 @@ static int nr_threads;
 
 static int from_stdin;
 static int strict;
+static int do_fsck_object;
 static int verbose;
 static int show_stat;
+static int check_self_contained_and_connected;
 
 static struct progress *progress;
 
@@ -187,13 +189,13 @@ static int mark_link(struct object *obj, int type, void *data)
 
 /* The content of each linked object must have been checked
    or it must be already present in the object database */
-static void check_object(struct object *obj)
+static unsigned check_object(struct object *obj)
 {
 	if (!obj)
-		return;
+		return 0;
 
 	if (!(obj->flags & FLAG_LINK))
-		return;
+		return 0;
 
 	if (!(obj->flags & FLAG_CHECKED)) {
 		unsigned long size;
@@ -201,17 +203,20 @@ static void check_object(struct object *obj)
 		if (type != obj->type || type <= 0)
 			die(_("object of unexpected type"));
 		obj->flags |= FLAG_CHECKED;
-		return;
+		return 1;
 	}
+
+	return 0;
 }
 
-static void check_objects(void)
+static unsigned check_objects(void)
 {
-	unsigned i, max;
+	unsigned i, max, foreign_nr;
 
 	max = get_max_object_index();
 	for (i = 0; i < max; i++)
-		check_object(get_indexed_object(i));
+		foreign_nr += check_object(get_indexed_object(i));
+	return foreign_nr;
 }
 
 
@@ -756,7 +761,8 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 			obj = parse_object_buffer(sha1, type, size, buf, &eaten);
 			if (!obj)
 				die(_("invalid %s"), typename(type));
-			if (fsck_object(obj, 1, fsck_error_function))
+			if (do_fsck_object &&
+			    fsck_object(obj, 1, fsck_error_function))
 				die(_("Error in object"));
 			if (fsck_walk(obj, mark_link, NULL))
 				die(_("Not all child objects of %s are reachable"), sha1_to_hex(obj->sha1));
@@ -1490,6 +1496,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	struct pack_idx_entry **idx_objects;
 	struct pack_idx_option opts;
 	unsigned char pack_sha1[20];
+	unsigned foreign_nr = 1;	/* zero is a "good" value, assume bad */
 
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage(index_pack_usage);
@@ -1511,6 +1518,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				fix_thin_pack = 1;
 			} else if (!strcmp(arg, "--strict")) {
 				strict = 1;
+				do_fsck_object = 1;
+			} else if (!strcmp(arg, "--check-self-contained-and-connected")) {
+				strict = 1;
+				check_self_contained_and_connected = 1;
 			} else if (!strcmp(arg, "--verify")) {
 				verify = 1;
 			} else if (!strcmp(arg, "--verify-stat")) {
@@ -1624,7 +1635,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	conclude_pack(fix_thin_pack, curr_pack, pack_sha1);
 	free(deltas);
 	if (strict)
-		check_objects();
+		foreign_nr = check_objects();
 
 	if (show_stat)
 		show_pack_info(stat_only);
@@ -1650,5 +1661,11 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (index_name == NULL)
 		free((void *) curr_index);
 
+	/*
+	 * Let the caller know this pack is not self contained
+	 */
+	if (check_self_contained_and_connected && foreign_nr)
+		return 1;
+
 	return 0;
 }
diff --git a/connected.c b/connected.c
index 1e89c1c..fae8d64 100644
--- a/connected.c
+++ b/connected.c
@@ -2,7 +2,12 @@
 #include "run-command.h"
 #include "sigchain.h"
 #include "connected.h"
+#include "transport.h"
 
+int check_everything_connected(sha1_iterate_fn fn, int quiet, void *cb_data)
+{
+	return check_everything_connected_with_transport(fn, quiet, cb_data, NULL);
+}
 /*
  * If we feed all the commits we want to verify to this command
  *
@@ -14,7 +19,10 @@
  *
  * Returns 0 if everything is connected, non-zero otherwise.
  */
-int check_everything_connected(sha1_iterate_fn fn, int quiet, void *cb_data)
+int check_everything_connected_with_transport(sha1_iterate_fn fn,
+					      int quiet,
+					      void *cb_data,
+					      struct transport *transport)
 {
 	struct child_process rev_list;
 	const char *argv[] = {"rev-list", "--objects",
@@ -22,10 +30,23 @@ int check_everything_connected(sha1_iterate_fn fn, int quiet, void *cb_data)
 	char commit[41];
 	unsigned char sha1[20];
 	int err = 0;
+	struct packed_git *new_pack = NULL;
 
 	if (fn(cb_data, sha1))
 		return err;
 
+	if (transport && transport->smart_options &&
+	    transport->smart_options->self_contained_and_connected &&
+	    transport->pack_lockfile &&
+	    !suffixcmp(transport->pack_lockfile, ".keep")) {
+		struct strbuf idx_file = STRBUF_INIT;
+		strbuf_addstr(&idx_file, transport->pack_lockfile);
+		strbuf_setlen(&idx_file, idx_file.len - 5); /* ".keep" */
+		strbuf_addstr(&idx_file, ".idx");
+		new_pack = add_packed_git(idx_file.buf, idx_file.len, 1);
+		strbuf_release(&idx_file);
+	}
+
 	if (quiet)
 		argv[5] = "--quiet";
 
@@ -42,6 +63,17 @@ int check_everything_connected(sha1_iterate_fn fn, int quiet, void *cb_data)
 
 	commit[40] = '\n';
 	do {
+		/*
+		 * If index-pack already checked that:
+		 * - there are no dangling pointers in the new pack
+		 * - the pack is self contained
+		 * Then if the updated ref is in the new pack, then we
+		 * are sure the ref is good and not sending it to
+		 * rev-list for verification.
+		 */
+		if (new_pack && find_pack_entry_one(sha1, new_pack))
+			continue;
+
 		memcpy(commit, sha1_to_hex(sha1), 40);
 		if (write_in_full(rev_list.in, commit, 41) < 0) {
 			if (errno != EPIPE && errno != EINVAL)
diff --git a/connected.h b/connected.h
index 7e4585a..0b060b7 100644
--- a/connected.h
+++ b/connected.h
@@ -1,6 +1,8 @@
 #ifndef CONNECTED_H
 #define CONNECTED_H
 
+struct transport;
+
 /*
  * Take callback data, and return next object name in the buffer.
  * When called after returning the name for the last object, return -1
@@ -16,5 +18,8 @@ typedef int (*sha1_iterate_fn)(void *, unsigned char [20]);
  * Return 0 if Ok, non zero otherwise (i.e. some missing objects)
  */
 extern int check_everything_connected(sha1_iterate_fn, int quiet, void *cb_data);
+extern int check_everything_connected_with_transport(sha1_iterate_fn, int quiet,
+						     void *cb_data,
+						     struct transport *transport);
 
 #endif /* CONNECTED_H */
diff --git a/fetch-pack.c b/fetch-pack.c
index 17cfa88..75c6f05 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -690,6 +690,7 @@ static int get_pack(struct fetch_pack_args *args,
 	const char **av;
 	int do_keep = args->keep_pack;
 	struct child_process cmd;
+	int ret;
 
 	memset(&demux, 0, sizeof(demux));
 	if (use_sideband) {
@@ -741,11 +742,14 @@ static int get_pack(struct fetch_pack_args *args,
 				strcpy(keep_arg + s, "localhost");
 			*av++ = keep_arg;
 		}
+		if (args->check_self_contained_and_connected)
+			*av++ = "--check-self-contained-and-connected";
 	}
 	else {
 		*av++ = "unpack-objects";
 		if (args->quiet || args->no_progress)
 			*av++ = "-q";
+		args->check_self_contained_and_connected = 0;
 	}
 	if (*hdr_arg)
 		*av++ = hdr_arg;
@@ -766,7 +770,12 @@ static int get_pack(struct fetch_pack_args *args,
 		close(cmd.out);
 	}
 
-	if (finish_command(&cmd))
+	ret = finish_command(&cmd);
+	if (!ret || (args->check_self_contained_and_connected && ret == 1))
+		args->self_contained_and_connected =
+			args->check_self_contained_and_connected &&
+			ret == 0;
+	else
 		die("%s failed", argv[0]);
 	if (use_sideband && finish_async(&demux))
 		die("error in sideband demultiplexer");
diff --git a/fetch-pack.h b/fetch-pack.h
index dc5266c..40f08ba 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -16,7 +16,9 @@ struct fetch_pack_args {
 		verbose:1,
 		no_progress:1,
 		include_tag:1,
-		stateless_rpc:1;
+		stateless_rpc:1,
+		check_self_contained_and_connected:1,
+		self_contained_and_connected:1;
 };
 
 /*
diff --git a/transport.c b/transport.c
index ba5d8af..359a671 100644
--- a/transport.c
+++ b/transport.c
@@ -534,6 +534,8 @@ static int fetch_refs_via_pack(struct transport *transport,
 	args.quiet = (transport->verbose < 0);
 	args.no_progress = !transport->progress;
 	args.depth = data->options.depth;
+	args.check_self_contained_and_connected =
+		data->options.check_self_contained_and_connected;
 
 	if (!data->got_remote_heads) {
 		connect_setup(transport, 0, 0);
@@ -551,6 +553,8 @@ static int fetch_refs_via_pack(struct transport *transport,
 		refs = NULL;
 	data->conn = NULL;
 	data->got_remote_heads = 0;
+	data->options.self_contained_and_connected =
+		args.self_contained_and_connected;
 
 	free_refs(refs_tmp);
 
diff --git a/transport.h b/transport.h
index fcb1d25..4edebc5 100644
--- a/transport.h
+++ b/transport.h
@@ -8,6 +8,8 @@ struct git_transport_options {
 	unsigned thin : 1;
 	unsigned keep : 1;
 	unsigned followtags : 1;
+	unsigned check_self_contained_and_connected : 1;
+	unsigned self_contained_and_connected : 1;
 	int depth;
 	const char *uploadpack;
 	const char *receivepack;
-- 
1.8.2.82.gc24b958
-- 8< --

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/5] index-pack, unpack-objects: add --not-so-strict for connectivity check
  2013-05-03  7:09                 ` Duy Nguyen
@ 2013-05-03  8:16                   ` Eric Sunshine
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Sunshine @ 2013-05-03  8:16 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Junio C Hamano, Git Mailing List, Jeff King

On Fri, May 3, 2013 at 3:09 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> Subject: [PATCH] clone: open a shortcut for connectivity check
>
> In order to make sure the cloned repository is good, we run "rev-list
> --objects --not --all $new_refs" on the repository. This is expensive
> on large repositories. This patch attempts to mitigate the impact in
> this special case.
>
> In the "good" clone case, we only have one pack. If all of the
> following are met, we can be sure that all objects reachable from the
> new refs exist, which is the intention of running "rev-list ...":
>
>  - all refs point to an object in the pack
>  - there are no dangling pointers in any object in the pack
>  - all objects in the pack does not point to objects outside the pack

"no objects in the pack point to..."

>
> The second and third checks can be done with the help of index-pack as
> a slight modification of --strict check (which introduces a new
> condition for the shortcut: pack transfer must be used and the number
> of objects large enough to call index-pack). The first is checked in
> check_everything_connected after we get an "ok" from index-pack.
>
> "index-pack + new checks" is still faster than "index-pack + rev-list"
> currently, which is the whole point of this patch. If any of the
> conditions fails, we fall back to good old and expensive "rev-list
> ..". In that case it's even more expensive because we have to pay for
> the new checks in index-pack.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 0/4] check_everything_connected replacement
  2013-05-01 10:59 ` [PATCH v2 0/5] " Nguyễn Thái Ngọc Duy
                     ` (4 preceding siblings ...)
  2013-05-01 10:59   ` [PATCH v2 5/5] Use --not-so-strict on all pack transfer " Nguyễn Thái Ngọc Duy
@ 2013-05-03 12:35   ` Nguyễn Thái Ngọc Duy
  2013-05-03 12:35     ` [PATCH v3 1/4] clone: let the user know when check_everything_connected is run Nguyễn Thái Ngọc Duy
                       ` (3 more replies)
  5 siblings, 4 replies; 35+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-05-03 12:35 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

v3 is more like check_everything_connected's special case for clone
because check_everything_connected is not really replaced.
GIT_SHALLOW_FILE in 2/4 is now replaced by --shallow-file to avoid
unintended propagation to child processes.

Nguyễn Thái Ngọc Duy (4):
  clone: let the user know when check_everything_connected is run
  fetch-pack: prepare updated shallow file before fetching the pack
  index-pack: remove dead code (it should never happen)
  clone: open a shortcut for connectivity check

 Documentation/git-index-pack.txt |  3 ++
 builtin/clone.c                  | 15 ++++++--
 builtin/index-pack.c             | 38 +++++++++++++------
 commit.h                         |  2 +
 connected.c                      | 34 ++++++++++++++++-
 connected.h                      |  5 +++
 fetch-pack.c                     | 80 ++++++++++++++++++++++------------------
 fetch-pack.h                     |  4 +-
 git.c                            |  5 +++
 shallow.c                        | 45 +++++++++++++++++++++-
 t/t5500-fetch-pack.sh            |  7 ++++
 transport.c                      |  4 ++
 transport.h                      |  2 +
 13 files changed, 190 insertions(+), 54 deletions(-)

-- 
1.8.2.83.gc99314b

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 1/4] clone: let the user know when check_everything_connected is run
  2013-05-03 12:35   ` [PATCH v3 0/4] check_everything_connected replacement Nguyễn Thái Ngọc Duy
@ 2013-05-03 12:35     ` Nguyễn Thái Ngọc Duy
  2013-05-03 12:35     ` [PATCH v3 2/4] fetch-pack: prepare updated shallow file before fetching the pack Nguyễn Thái Ngọc Duy
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 35+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-05-03 12:35 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

check_everything_connected could take a long time, especially in the
clone case where the whole DAG is traversed. The user deserves to know
what's going on.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/clone.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index 035ab64..dad4265 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -546,8 +546,12 @@ static void update_remote_refs(const struct ref *refs,
 {
 	const struct ref *rm = mapped_refs;
 
+	if (0 <= option_verbosity)
+		printf(_("Checking connectivity... "));
 	if (check_everything_connected(iterate_ref_map, 0, &rm))
 		die(_("remote did not send all necessary objects"));
+	if (0 <= option_verbosity)
+		printf(_("done\n"));
 
 	if (refs) {
 		write_remote_refs(mapped_refs);
-- 
1.8.2.83.gc99314b

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 2/4] fetch-pack: prepare updated shallow file before fetching the pack
  2013-05-03 12:35   ` [PATCH v3 0/4] check_everything_connected replacement Nguyễn Thái Ngọc Duy
  2013-05-03 12:35     ` [PATCH v3 1/4] clone: let the user know when check_everything_connected is run Nguyễn Thái Ngọc Duy
@ 2013-05-03 12:35     ` Nguyễn Thái Ngọc Duy
  2013-05-03 12:37       ` Eric Sunshine
  2013-05-07 15:59       ` Junio C Hamano
  2013-05-03 12:35     ` [PATCH v3 3/4] index-pack: remove dead code (it should never happen) Nguyễn Thái Ngọc Duy
  2013-05-03 12:35     ` [PATCH v3 4/4] clone: open a shortcut for connectivity check Nguyễn Thái Ngọc Duy
  3 siblings, 2 replies; 35+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-05-03 12:35 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

index-pack --strict looks up and follows parent commits. If shallow
information is not ready by the time index-pack is run, index-pack may
be lead to non-existent objects. Make fetch-pack save shallow file to
disk before invoking index-pack.

git learns new global option --shallow-file to pass on the alternate
shallow file path. Undocumented (and not even support --shallow-file=
syntax) because it's unlikely to be used again elsewhere.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 commit.h              |  2 ++
 fetch-pack.c          | 69 +++++++++++++++++++++++++--------------------------
 git.c                 |  5 ++++
 shallow.c             | 45 +++++++++++++++++++++++++++++++--
 t/t5500-fetch-pack.sh |  7 ++++++
 5 files changed, 91 insertions(+), 37 deletions(-)

diff --git a/commit.h b/commit.h
index 67bd509..6e9c7cd 100644
--- a/commit.h
+++ b/commit.h
@@ -176,6 +176,8 @@ extern int for_each_commit_graft(each_commit_graft_fn, void *);
 extern int is_repository_shallow(void);
 extern struct commit_list *get_shallow_commits(struct object_array *heads,
 		int depth, int shallow_flag, int not_shallow_flag);
+extern void check_shallow_file_for_update(void);
+extern void set_alternate_shallow_file(const char *path);
 
 int is_descendant_of(struct commit *, struct commit_list *);
 int in_merge_bases(struct commit *, struct commit *);
diff --git a/fetch-pack.c b/fetch-pack.c
index f156dd4..1ca4f6b 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -20,6 +20,8 @@ static int no_done;
 static int fetch_fsck_objects = -1;
 static int transfer_fsck_objects = -1;
 static int agent_supported;
+static struct lock_file shallow_lock;
+static const char *alternate_shallow_file;
 
 #define COMPLETE	(1U << 0)
 #define COMMON		(1U << 1)
@@ -683,7 +685,7 @@ static int get_pack(struct fetch_pack_args *args,
 		    int xd[2], char **pack_lockfile)
 {
 	struct async demux;
-	const char *argv[20];
+	const char *argv[22];
 	char keep_arg[256];
 	char hdr_arg[256];
 	const char **av;
@@ -724,6 +726,11 @@ static int get_pack(struct fetch_pack_args *args,
 			do_keep = 1;
 	}
 
+	if (alternate_shallow_file) {
+		*av++ = "--shallow-file";
+		*av++ = alternate_shallow_file;
+	}
+
 	if (do_keep) {
 		if (pack_lockfile)
 			cmd.out = -1;
@@ -779,6 +786,23 @@ static int cmp_ref_by_name(const void *a_, const void *b_)
 	return strcmp(a->name, b->name);
 }
 
+static void setup_alternate_shallow(void)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int fd;
+
+	check_shallow_file_for_update();
+	fd = hold_lock_file_for_update(&shallow_lock, git_path("shallow"),
+				       LOCK_DIE_ON_ERROR);
+	if (write_shallow_commits(&sb, 0)) {
+		if (write_in_full(fd, sb.buf, sb.len) != sb.len)
+			die_errno("failed to write to %s", shallow_lock.filename);
+		alternate_shallow_file = shallow_lock.filename;
+	} else
+		alternate_shallow_file = "";
+	strbuf_release(&sb);
+}
+
 static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 				 int fd[2],
 				 const struct ref *orig_ref,
@@ -858,6 +882,8 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 
 	if (args->stateless_rpc)
 		packet_flush(fd[1]);
+	if (args->depth > 0)
+		setup_alternate_shallow();
 	if (get_pack(args, fd, pack_lockfile))
 		die("git fetch-pack: fetch failed.");
 
@@ -936,15 +962,9 @@ struct ref *fetch_pack(struct fetch_pack_args *args,
 		       struct ref **sought, int nr_sought,
 		       char **pack_lockfile)
 {
-	struct stat st;
 	struct ref *ref_cpy;
 
 	fetch_pack_setup();
-	if (args->depth > 0) {
-		if (stat(git_path("shallow"), &st))
-			st.st_mtime = 0;
-	}
-
 	if (nr_sought)
 		nr_sought = remove_duplicates_in_refs(sought, nr_sought);
 
@@ -955,34 +975,13 @@ struct ref *fetch_pack(struct fetch_pack_args *args,
 	ref_cpy = do_fetch_pack(args, fd, ref, sought, nr_sought, pack_lockfile);
 
 	if (args->depth > 0) {
-		static struct lock_file lock;
-		struct cache_time mtime;
-		struct strbuf sb = STRBUF_INIT;
-		char *shallow = git_path("shallow");
-		int fd;
-
-		mtime.sec = st.st_mtime;
-		mtime.nsec = ST_MTIME_NSEC(st);
-		if (stat(shallow, &st)) {
-			if (mtime.sec)
-				die("shallow file was removed during fetch");
-		} else if (st.st_mtime != mtime.sec
-#ifdef USE_NSEC
-				|| ST_MTIME_NSEC(st) != mtime.nsec
-#endif
-			  )
-			die("shallow file was changed during fetch");
-
-		fd = hold_lock_file_for_update(&lock, shallow,
-					       LOCK_DIE_ON_ERROR);
-		if (!write_shallow_commits(&sb, 0)
-		 || write_in_full(fd, sb.buf, sb.len) != sb.len) {
-			unlink_or_warn(shallow);
-			rollback_lock_file(&lock);
-		} else {
-			commit_lock_file(&lock);
-		}
-		strbuf_release(&sb);
+		struct stat st;
+		if (!fstat(shallow_lock.fd, &st) &&
+		    st.st_size == 0) {
+			unlink_or_warn(git_path("shallow"));
+			rollback_lock_file(&shallow_lock);
+		} else
+			commit_lock_file(&shallow_lock);
 	}
 
 	reprepare_packed_git();
diff --git a/git.c b/git.c
index 1ada169..6450a38 100644
--- a/git.c
+++ b/git.c
@@ -4,6 +4,7 @@
 #include "help.h"
 #include "quote.h"
 #include "run-command.h"
+#include "commit.h"
 
 const char git_usage_string[] =
 	"git [--version] [--help] [-c name=value]\n"
@@ -146,6 +147,10 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
 			setenv(GIT_LITERAL_PATHSPECS_ENVIRONMENT, "0", 1);
 			if (envchanged)
 				*envchanged = 1;
+		} else if (!strcmp(cmd, "--shallow-file")) {
+			(*argv)++;
+			(*argc)--;
+			set_alternate_shallow_file((*argv)[0]);
 		} else {
 			fprintf(stderr, "Unknown option: %s\n", cmd);
 			usage(git_usage_string);
diff --git a/shallow.c b/shallow.c
index 6be915f..bdae988 100644
--- a/shallow.c
+++ b/shallow.c
@@ -3,6 +3,16 @@
 #include "tag.h"
 
 static int is_shallow = -1;
+static struct stat shallow_stat;
+static char *alternate_shallow_file;
+
+void set_alternate_shallow_file(const char *path)
+{
+	if (is_shallow != -1)
+		die("BUG: is_repository_shallow must not be called before set_alternate_shallow_file");
+	free(alternate_shallow_file);
+	alternate_shallow_file = path ? xstrdup(path) : NULL;
+}
 
 int register_shallow(const unsigned char *sha1)
 {
@@ -21,12 +31,21 @@ int is_repository_shallow(void)
 {
 	FILE *fp;
 	char buf[1024];
+	const char *path = alternate_shallow_file;
 
 	if (is_shallow >= 0)
 		return is_shallow;
 
-	fp = fopen(git_path("shallow"), "r");
-	if (!fp) {
+	if (!path)
+		path = git_path("shallow");
+	/*
+	 * fetch-pack set '--shallow-file ""' as an indicator that no
+	 * shallow file should be used. We could just open it and it
+	 * will likely fail. But let's do an explicit check instead.
+	 */
+	if (!*path ||
+	    stat(path, &shallow_stat) ||
+	    (fp = fopen(path, "r")) == NULL) {
 		is_shallow = 0;
 		return is_shallow;
 	}
@@ -108,3 +127,25 @@ struct commit_list *get_shallow_commits(struct object_array *heads, int depth,
 
 	return result;
 }
+
+void check_shallow_file_for_update(void)
+{
+	struct stat st;
+
+	if (getenv("GIT_SHALLOW_FILE"))
+		die("GIT_SHALLOW_FILE should not be manually set");
+
+	if (!is_shallow)
+		return;
+	else if (is_shallow == -1)
+		die("BUG: shallow must be initialized by now");
+
+	if (stat(git_path("shallow"), &st))
+		die("shallow file was removed during fetch");
+	else if (st.st_mtime != shallow_stat.st_mtime
+#ifdef USE_NSEC
+		 || ST_MTIME_NSEC(st) != ST_MTIME_NSEC(shallow_stat)
+#endif
+		   )
+		die("shallow file was changed during fetch");
+}
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index d574085..557b073 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -135,6 +135,13 @@ test_expect_success 'clone shallow depth 1' '
 	test "`git --git-dir=shallow0/.git rev-list --count HEAD`" = 1
 '
 
+test_expect_success 'clone shallow depth 1 with fsck' '
+	git config --global fetch.fsckobjects true &&
+	git clone --no-single-branch --depth 1 "file://$(pwd)/." shallow0fsck &&
+	test "`git --git-dir=shallow0fsck/.git rev-list --count HEAD`" = 1 &&
+	git config --global --unset fetch.fsckobjects
+'
+
 test_expect_success 'clone shallow' '
 	git clone --no-single-branch --depth 2 "file://$(pwd)/." shallow
 '
-- 
1.8.2.83.gc99314b

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 3/4] index-pack: remove dead code (it should never happen)
  2013-05-03 12:35   ` [PATCH v3 0/4] check_everything_connected replacement Nguyễn Thái Ngọc Duy
  2013-05-03 12:35     ` [PATCH v3 1/4] clone: let the user know when check_everything_connected is run Nguyễn Thái Ngọc Duy
  2013-05-03 12:35     ` [PATCH v3 2/4] fetch-pack: prepare updated shallow file before fetching the pack Nguyễn Thái Ngọc Duy
@ 2013-05-03 12:35     ` Nguyễn Thái Ngọc Duy
  2013-05-03 12:35     ` [PATCH v3 4/4] clone: open a shortcut for connectivity check Nguyễn Thái Ngọc Duy
  3 siblings, 0 replies; 35+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-05-03 12:35 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/index-pack.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 79dfe47..f52a04f 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -747,8 +747,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 			int eaten;
 			void *buf = (void *) data;
 
-			if (!buf)
-				buf = new_data = get_data_from_pack(obj_entry);
+			assert(data && "data can only be NULL for large _blobs_");
 
 			/*
 			 * we do not need to free the memory here, as the
-- 
1.8.2.83.gc99314b

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 4/4] clone: open a shortcut for connectivity check
  2013-05-03 12:35   ` [PATCH v3 0/4] check_everything_connected replacement Nguyễn Thái Ngọc Duy
                       ` (2 preceding siblings ...)
  2013-05-03 12:35     ` [PATCH v3 3/4] index-pack: remove dead code (it should never happen) Nguyễn Thái Ngọc Duy
@ 2013-05-03 12:35     ` Nguyễn Thái Ngọc Duy
  2013-05-03 12:41       ` Eric Sunshine
  2013-05-03 16:15       ` Junio C Hamano
  3 siblings, 2 replies; 35+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-05-03 12:35 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

In order to make sure the cloned repository is good, we run "rev-list
--objects --not --all $new_refs" on the repository. This is expensive
on large repositories. This patch attempts to mitigate the impact in
this special case.

In the "good" clone case, we only have one pack. If all of the
following are met, we can be sure that all objects reachable from the
new refs exist, which is the intention of running "rev-list ...":

 - all refs point to an object in the pack
 - there are no dangling pointers in any object in the pack
 - no objects in the pack point to objects outside the pack

The second and third checks can be done with the help of index-pack as
a slight variation of --strict check (which introduces a new condition
for the shortcut: pack transfer must be used and the number of objects
large enough to call index-pack). The first is checked in
check_everything_connected after we get an "ok" from index-pack.

"index-pack + new checks" is still faster than the current "index-pack
+ rev-list", which is the whole point of this patch. If any of the
conditions fails, we fall back to the good old but expensive "rev-list
..". In that case it's even more expensive because we have to pay for
the new checks in index-pack. But that should only happen when the
other side is either buggy or malicious.

Cloning linux-2.6 over file://

        before         after
real    3m25.693s      2m53.050s
user    5m2.037s       4m42.396s
sys     0m13.750s      0m16.574s

A more realistic test with ssh:// over wireless

        before         after
real    11m26.629s     10m4.213s
user    5m43.196s      5m19.444s
sys     0m35.812s      0m37.630s

This shortcut is not applied to shallow clones, partly because shallow
clones should have no more objects than a usual fetch and the cost of
rev-list is acceptable, partly to avoid dealing with corner cases when
grafting is involved.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-index-pack.txt |  3 +++
 builtin/clone.c                  | 11 ++++++++---
 builtin/index-pack.c             | 35 ++++++++++++++++++++++++++---------
 connected.c                      | 34 +++++++++++++++++++++++++++++++++-
 connected.h                      |  5 +++++
 fetch-pack.c                     | 11 ++++++++++-
 fetch-pack.h                     |  4 +++-
 transport.c                      |  4 ++++
 transport.h                      |  2 ++
 9 files changed, 94 insertions(+), 15 deletions(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index bde8eec..7a4e055 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -74,6 +74,9 @@ OPTIONS
 --strict::
 	Die, if the pack contains broken objects or links.
 
+--check-self-contained-and-connected::
+	Die if the pack contains broken links. For internal use only.
+
 --threads=<n>::
 	Specifies the number of threads to spawn when resolving
 	deltas. This requires that index-pack be compiled with
diff --git a/builtin/clone.c b/builtin/clone.c
index dad4265..069e81e 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -542,13 +542,15 @@ static void update_remote_refs(const struct ref *refs,
 			       const struct ref *mapped_refs,
 			       const struct ref *remote_head_points_at,
 			       const char *branch_top,
-			       const char *msg)
+			       const char *msg,
+			       struct transport *transport)
 {
 	const struct ref *rm = mapped_refs;
 
 	if (0 <= option_verbosity)
 		printf(_("Checking connectivity... "));
-	if (check_everything_connected(iterate_ref_map, 0, &rm))
+	if (check_everything_connected_with_transport(iterate_ref_map,
+						      0, &rm, transport))
 		die(_("remote did not send all necessary objects"));
 	if (0 <= option_verbosity)
 		printf(_("done\n"));
@@ -893,6 +895,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		if (option_upload_pack)
 			transport_set_option(transport, TRANS_OPT_UPLOADPACK,
 					     option_upload_pack);
+
+		if (transport->smart_options && !option_depth)
+			transport->smart_options->check_self_contained_and_connected = 1;
 	}
 
 	refs = transport_get_remote_refs(transport);
@@ -954,7 +959,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		transport_fetch_refs(transport, mapped_refs);
 
 	update_remote_refs(refs, mapped_refs, remote_head_points_at,
-			   branch_top.buf, reflog_msg.buf);
+			   branch_top.buf, reflog_msg.buf, transport);
 
 	update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index f52a04f..9c1cfac 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -77,8 +77,10 @@ static int nr_threads;
 
 static int from_stdin;
 static int strict;
+static int do_fsck_object;
 static int verbose;
 static int show_stat;
+static int check_self_contained_and_connected;
 
 static struct progress *progress;
 
@@ -187,13 +189,13 @@ static int mark_link(struct object *obj, int type, void *data)
 
 /* The content of each linked object must have been checked
    or it must be already present in the object database */
-static void check_object(struct object *obj)
+static unsigned check_object(struct object *obj)
 {
 	if (!obj)
-		return;
+		return 0;
 
 	if (!(obj->flags & FLAG_LINK))
-		return;
+		return 0;
 
 	if (!(obj->flags & FLAG_CHECKED)) {
 		unsigned long size;
@@ -201,17 +203,20 @@ static void check_object(struct object *obj)
 		if (type != obj->type || type <= 0)
 			die(_("object of unexpected type"));
 		obj->flags |= FLAG_CHECKED;
-		return;
+		return 1;
 	}
+
+	return 0;
 }
 
-static void check_objects(void)
+static unsigned check_objects(void)
 {
-	unsigned i, max;
+	unsigned i, max, foreign_nr = 0;
 
 	max = get_max_object_index();
 	for (i = 0; i < max; i++)
-		check_object(get_indexed_object(i));
+		foreign_nr += check_object(get_indexed_object(i));
+	return foreign_nr;
 }
 
 
@@ -756,7 +761,8 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 			obj = parse_object_buffer(sha1, type, size, buf, &eaten);
 			if (!obj)
 				die(_("invalid %s"), typename(type));
-			if (fsck_object(obj, 1, fsck_error_function))
+			if (do_fsck_object &&
+			    fsck_object(obj, 1, fsck_error_function))
 				die(_("Error in object"));
 			if (fsck_walk(obj, mark_link, NULL))
 				die(_("Not all child objects of %s are reachable"), sha1_to_hex(obj->sha1));
@@ -1490,6 +1496,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	struct pack_idx_entry **idx_objects;
 	struct pack_idx_option opts;
 	unsigned char pack_sha1[20];
+	unsigned foreign_nr = 1;	/* zero is a "good" value, assume bad */
 
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage(index_pack_usage);
@@ -1511,6 +1518,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				fix_thin_pack = 1;
 			} else if (!strcmp(arg, "--strict")) {
 				strict = 1;
+				do_fsck_object = 1;
+			} else if (!strcmp(arg, "--check-self-contained-and-connected")) {
+				strict = 1;
+				check_self_contained_and_connected = 1;
 			} else if (!strcmp(arg, "--verify")) {
 				verify = 1;
 			} else if (!strcmp(arg, "--verify-stat")) {
@@ -1624,7 +1635,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	conclude_pack(fix_thin_pack, curr_pack, pack_sha1);
 	free(deltas);
 	if (strict)
-		check_objects();
+		foreign_nr = check_objects();
 
 	if (show_stat)
 		show_pack_info(stat_only);
@@ -1650,5 +1661,11 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (index_name == NULL)
 		free((void *) curr_index);
 
+	/*
+	 * Let the caller know this pack is not self contained
+	 */
+	if (check_self_contained_and_connected && foreign_nr)
+		return 1;
+
 	return 0;
 }
diff --git a/connected.c b/connected.c
index 1e89c1c..fae8d64 100644
--- a/connected.c
+++ b/connected.c
@@ -2,7 +2,12 @@
 #include "run-command.h"
 #include "sigchain.h"
 #include "connected.h"
+#include "transport.h"
 
+int check_everything_connected(sha1_iterate_fn fn, int quiet, void *cb_data)
+{
+	return check_everything_connected_with_transport(fn, quiet, cb_data, NULL);
+}
 /*
  * If we feed all the commits we want to verify to this command
  *
@@ -14,7 +19,10 @@
  *
  * Returns 0 if everything is connected, non-zero otherwise.
  */
-int check_everything_connected(sha1_iterate_fn fn, int quiet, void *cb_data)
+int check_everything_connected_with_transport(sha1_iterate_fn fn,
+					      int quiet,
+					      void *cb_data,
+					      struct transport *transport)
 {
 	struct child_process rev_list;
 	const char *argv[] = {"rev-list", "--objects",
@@ -22,10 +30,23 @@ int check_everything_connected(sha1_iterate_fn fn, int quiet, void *cb_data)
 	char commit[41];
 	unsigned char sha1[20];
 	int err = 0;
+	struct packed_git *new_pack = NULL;
 
 	if (fn(cb_data, sha1))
 		return err;
 
+	if (transport && transport->smart_options &&
+	    transport->smart_options->self_contained_and_connected &&
+	    transport->pack_lockfile &&
+	    !suffixcmp(transport->pack_lockfile, ".keep")) {
+		struct strbuf idx_file = STRBUF_INIT;
+		strbuf_addstr(&idx_file, transport->pack_lockfile);
+		strbuf_setlen(&idx_file, idx_file.len - 5); /* ".keep" */
+		strbuf_addstr(&idx_file, ".idx");
+		new_pack = add_packed_git(idx_file.buf, idx_file.len, 1);
+		strbuf_release(&idx_file);
+	}
+
 	if (quiet)
 		argv[5] = "--quiet";
 
@@ -42,6 +63,17 @@ int check_everything_connected(sha1_iterate_fn fn, int quiet, void *cb_data)
 
 	commit[40] = '\n';
 	do {
+		/*
+		 * If index-pack already checked that:
+		 * - there are no dangling pointers in the new pack
+		 * - the pack is self contained
+		 * Then if the updated ref is in the new pack, then we
+		 * are sure the ref is good and not sending it to
+		 * rev-list for verification.
+		 */
+		if (new_pack && find_pack_entry_one(sha1, new_pack))
+			continue;
+
 		memcpy(commit, sha1_to_hex(sha1), 40);
 		if (write_in_full(rev_list.in, commit, 41) < 0) {
 			if (errno != EPIPE && errno != EINVAL)
diff --git a/connected.h b/connected.h
index 7e4585a..0b060b7 100644
--- a/connected.h
+++ b/connected.h
@@ -1,6 +1,8 @@
 #ifndef CONNECTED_H
 #define CONNECTED_H
 
+struct transport;
+
 /*
  * Take callback data, and return next object name in the buffer.
  * When called after returning the name for the last object, return -1
@@ -16,5 +18,8 @@ typedef int (*sha1_iterate_fn)(void *, unsigned char [20]);
  * Return 0 if Ok, non zero otherwise (i.e. some missing objects)
  */
 extern int check_everything_connected(sha1_iterate_fn, int quiet, void *cb_data);
+extern int check_everything_connected_with_transport(sha1_iterate_fn, int quiet,
+						     void *cb_data,
+						     struct transport *transport);
 
 #endif /* CONNECTED_H */
diff --git a/fetch-pack.c b/fetch-pack.c
index 1ca4f6b..6b7e701 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -691,6 +691,7 @@ static int get_pack(struct fetch_pack_args *args,
 	const char **av;
 	int do_keep = args->keep_pack;
 	struct child_process cmd;
+	int ret;
 
 	memset(&demux, 0, sizeof(demux));
 	if (use_sideband) {
@@ -747,11 +748,14 @@ static int get_pack(struct fetch_pack_args *args,
 				strcpy(keep_arg + s, "localhost");
 			*av++ = keep_arg;
 		}
+		if (args->check_self_contained_and_connected)
+			*av++ = "--check-self-contained-and-connected";
 	}
 	else {
 		*av++ = "unpack-objects";
 		if (args->quiet || args->no_progress)
 			*av++ = "-q";
+		args->check_self_contained_and_connected = 0;
 	}
 	if (*hdr_arg)
 		*av++ = hdr_arg;
@@ -772,7 +776,12 @@ static int get_pack(struct fetch_pack_args *args,
 		close(cmd.out);
 	}
 
-	if (finish_command(&cmd))
+	ret = finish_command(&cmd);
+	if (!ret || (args->check_self_contained_and_connected && ret == 1))
+		args->self_contained_and_connected =
+			args->check_self_contained_and_connected &&
+			ret == 0;
+	else
 		die("%s failed", argv[0]);
 	if (use_sideband && finish_async(&demux))
 		die("error in sideband demultiplexer");
diff --git a/fetch-pack.h b/fetch-pack.h
index dc5266c..40f08ba 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -16,7 +16,9 @@ struct fetch_pack_args {
 		verbose:1,
 		no_progress:1,
 		include_tag:1,
-		stateless_rpc:1;
+		stateless_rpc:1,
+		check_self_contained_and_connected:1,
+		self_contained_and_connected:1;
 };
 
 /*
diff --git a/transport.c b/transport.c
index ba5d8af..359a671 100644
--- a/transport.c
+++ b/transport.c
@@ -534,6 +534,8 @@ static int fetch_refs_via_pack(struct transport *transport,
 	args.quiet = (transport->verbose < 0);
 	args.no_progress = !transport->progress;
 	args.depth = data->options.depth;
+	args.check_self_contained_and_connected =
+		data->options.check_self_contained_and_connected;
 
 	if (!data->got_remote_heads) {
 		connect_setup(transport, 0, 0);
@@ -551,6 +553,8 @@ static int fetch_refs_via_pack(struct transport *transport,
 		refs = NULL;
 	data->conn = NULL;
 	data->got_remote_heads = 0;
+	data->options.self_contained_and_connected =
+		args.self_contained_and_connected;
 
 	free_refs(refs_tmp);
 
diff --git a/transport.h b/transport.h
index fcb1d25..4edebc5 100644
--- a/transport.h
+++ b/transport.h
@@ -8,6 +8,8 @@ struct git_transport_options {
 	unsigned thin : 1;
 	unsigned keep : 1;
 	unsigned followtags : 1;
+	unsigned check_self_contained_and_connected : 1;
+	unsigned self_contained_and_connected : 1;
 	int depth;
 	const char *uploadpack;
 	const char *receivepack;
-- 
1.8.2.83.gc99314b

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/4] fetch-pack: prepare updated shallow file before fetching the pack
  2013-05-03 12:35     ` [PATCH v3 2/4] fetch-pack: prepare updated shallow file before fetching the pack Nguyễn Thái Ngọc Duy
@ 2013-05-03 12:37       ` Eric Sunshine
  2013-05-07 15:59       ` Junio C Hamano
  1 sibling, 0 replies; 35+ messages in thread
From: Eric Sunshine @ 2013-05-03 12:37 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: Git List, Junio C Hamano, Jeff King

On Fri, May 3, 2013 at 8:35 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> index-pack --strict looks up and follows parent commits. If shallow
> information is not ready by the time index-pack is run, index-pack may
> be lead to non-existent objects. Make fetch-pack save shallow file to

s/lead/led/

> disk before invoking index-pack.
>
> git learns new global option --shallow-file to pass on the alternate
> shallow file path. Undocumented (and not even support --shallow-file=
> syntax) because it's unlikely to be used again elsewhere.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 4/4] clone: open a shortcut for connectivity check
  2013-05-03 12:35     ` [PATCH v3 4/4] clone: open a shortcut for connectivity check Nguyễn Thái Ngọc Duy
@ 2013-05-03 12:41       ` Eric Sunshine
  2013-05-03 16:15       ` Junio C Hamano
  1 sibling, 0 replies; 35+ messages in thread
From: Eric Sunshine @ 2013-05-03 12:41 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: Git List, Junio C Hamano, Jeff King

On Fri, May 3, 2013 at 8:35 AM, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> In order to make sure the cloned repository is good, we run "rev-list
> --objects --not --all $new_refs" on the repository. This is expensive
> on large repositories. This patch attempts to mitigate the impact in
> this special case.
>
> In the "good" clone case, we only have one pack. If all of the
> following are met, we can be sure that all objects reachable from the
> new refs exist, which is the intention of running "rev-list ...":
>
>  - all refs point to an object in the pack
>  - there are no dangling pointers in any object in the pack
>  - no objects in the pack point to objects outside the pack
>
> The second and third checks can be done with the help of index-pack as
> a slight variation of --strict check (which introduces a new condition
> for the shortcut: pack transfer must be used and the number of objects
> large enough to call index-pack). The first is checked in
> check_everything_connected after we get an "ok" from index-pack.
>
> "index-pack + new checks" is still faster than the current "index-pack
> + rev-list", which is the whole point of this patch. If any of the
> conditions fails, we fall back to the good old but expensive "rev-list

s/fails/fail/

> ..". In that case it's even more expensive because we have to pay for
> the new checks in index-pack. But that should only happen when the
> other side is either buggy or malicious.
>
> Cloning linux-2.6 over file://
>
>         before         after
> real    3m25.693s      2m53.050s
> user    5m2.037s       4m42.396s
> sys     0m13.750s      0m16.574s
>
> A more realistic test with ssh:// over wireless
>
>         before         after
> real    11m26.629s     10m4.213s
> user    5m43.196s      5m19.444s
> sys     0m35.812s      0m37.630s
>
> This shortcut is not applied to shallow clones, partly because shallow
> clones should have no more objects than a usual fetch and the cost of
> rev-list is acceptable, partly to avoid dealing with corner cases when
> grafting is involved.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 4/4] clone: open a shortcut for connectivity check
  2013-05-03 12:35     ` [PATCH v3 4/4] clone: open a shortcut for connectivity check Nguyễn Thái Ngọc Duy
  2013-05-03 12:41       ` Eric Sunshine
@ 2013-05-03 16:15       ` Junio C Hamano
  2013-05-04  1:10         ` Duy Nguyen
  1 sibling, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2013-05-03 16:15 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> In order to make sure the cloned repository is good, we run "rev-list
> --objects --not --all $new_refs" on the repository. This is expensive
> on large repositories. This patch attempts to mitigate the impact in
> this special case.
>
> In the "good" clone case, we only have one pack.

If "On large repositories" is the focus, we need to take into
account the fact that pack.packSizeLimit can split and store the
incoming packstream to multiple packs, so "only have one pack" is
misleading.

I think you can still do the same trick even when we split the pack
as index-pack will keep track of the objects it saw in the same
incoming pack stream (but I am writing this from memory without
looking at the original code you are touching, so please double
check).

> If all of the
> following are met, we can be sure that all objects reachable from the
> new refs exist, which is the intention of running "rev-list ...":
>
>  - all refs point to an object in the pack
>  - there are no dangling pointers in any object in the pack
>  - no objects in the pack point to objects outside the pack
>
> The second and third checks can be done with the help of index-pack as
> a slight variation of --strict check (which introduces a new condition
> for the shortcut: pack transfer must be used and the number of objects
> large enough to call index-pack). The first is checked in
> check_everything_connected after we get an "ok" from index-pack.
>
> "index-pack + new checks" is still faster than the current "index-pack
> + rev-list", which is the whole point of this patch. If any of the

Does the same check apply if we end up on the unpack-objects
codepath?

> This shortcut is not applied to shallow clones, partly because shallow
> clones should have no more objects than a usual fetch and the cost of
> rev-list is acceptable, partly to avoid dealing with corner cases when
> grafting is involved.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 4/4] clone: open a shortcut for connectivity check
  2013-05-03 16:15       ` Junio C Hamano
@ 2013-05-04  1:10         ` Duy Nguyen
  0 siblings, 0 replies; 35+ messages in thread
From: Duy Nguyen @ 2013-05-04  1:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Jeff King

On Fri, May 3, 2013 at 11:15 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> In order to make sure the cloned repository is good, we run "rev-list
>> --objects --not --all $new_refs" on the repository. This is expensive
>> on large repositories. This patch attempts to mitigate the impact in
>> this special case.
>>
>> In the "good" clone case, we only have one pack.
>
> If "On large repositories" is the focus, we need to take into
> account the fact that pack.packSizeLimit can split and store the
> incoming packstream to multiple packs, so "only have one pack" is
> misleading.

I only had a quick look. But I don't think index-pack respects
packSizeLimit. pack-objects does but only when --stdout is not used,
which is not the case for pack transfer.

> I think you can still do the same trick even when we split the pack
> as index-pack will keep track of the objects it saw in the same
> incoming pack stream (but I am writing this from memory without
> looking at the original code you are touching, so please double
> check).

Yeah. As long we have only one incoming stream, we can still do the
same verification.

>> "index-pack + new checks" is still faster than the current "index-pack
>> + rev-list", which is the whole point of this patch. If any of the
>
> Does the same check apply if we end up on the unpack-objects
> codepath?

No. unpack-objects does not do this and check_everything_connected
should invoke rev-list like before.
--
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/4] fetch-pack: prepare updated shallow file before fetching the pack
  2013-05-03 12:35     ` [PATCH v3 2/4] fetch-pack: prepare updated shallow file before fetching the pack Nguyễn Thái Ngọc Duy
  2013-05-03 12:37       ` Eric Sunshine
@ 2013-05-07 15:59       ` Junio C Hamano
  2013-05-26  1:01         ` Duy Nguyen
  1 sibling, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2013-05-07 15:59 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> index-pack --strict looks up and follows parent commits. If shallow
> information is not ready by the time index-pack is run, index-pack may
> be lead to non-existent objects. Make fetch-pack save shallow file to
> disk before invoking index-pack.
>
> git learns new global option --shallow-file to pass on the alternate
> shallow file path. Undocumented (and not even support --shallow-file=
> syntax) because it's unlikely to be used again elsewhere.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  commit.h              |  2 ++
>  fetch-pack.c          | 69 +++++++++++++++++++++++++--------------------------
>  git.c                 |  5 ++++
>  shallow.c             | 45 +++++++++++++++++++++++++++++++--
>  t/t5500-fetch-pack.sh |  7 ++++++
>  5 files changed, 91 insertions(+), 37 deletions(-)
>
> diff --git a/commit.h b/commit.h
> index 67bd509..6e9c7cd 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -176,6 +176,8 @@ extern int for_each_commit_graft(each_commit_graft_fn, void *);
>  extern int is_repository_shallow(void);
>  extern struct commit_list *get_shallow_commits(struct object_array *heads,
>  		int depth, int shallow_flag, int not_shallow_flag);
> +extern void check_shallow_file_for_update(void);
> +extern void set_alternate_shallow_file(const char *path);
>  
>  int is_descendant_of(struct commit *, struct commit_list *);
>  int in_merge_bases(struct commit *, struct commit *);
> diff --git a/fetch-pack.c b/fetch-pack.c
> index f156dd4..1ca4f6b 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -20,6 +20,8 @@ static int no_done;
>  static int fetch_fsck_objects = -1;
>  static int transfer_fsck_objects = -1;
>  static int agent_supported;
> +static struct lock_file shallow_lock;
> +static const char *alternate_shallow_file;
>  
>  #define COMPLETE	(1U << 0)
>  #define COMMON		(1U << 1)
> @@ -683,7 +685,7 @@ static int get_pack(struct fetch_pack_args *args,
>  		    int xd[2], char **pack_lockfile)
>  {
>  	struct async demux;
> -	const char *argv[20];
> +	const char *argv[22];
>  	char keep_arg[256];
>  	char hdr_arg[256];
>  	const char **av;
> @@ -724,6 +726,11 @@ static int get_pack(struct fetch_pack_args *args,
>  			do_keep = 1;
>  	}
>  
> +	if (alternate_shallow_file) {
> +		*av++ = "--shallow-file";
> +		*av++ = alternate_shallow_file;
> +	}
> +
>  	if (do_keep) {
>  		if (pack_lockfile)
>  			cmd.out = -1;
> @@ -779,6 +786,23 @@ static int cmp_ref_by_name(const void *a_, const void *b_)
>  	return strcmp(a->name, b->name);
>  }
>  
> +static void setup_alternate_shallow(void)
> +{
> +	struct strbuf sb = STRBUF_INIT;
> +	int fd;
> +
> +	check_shallow_file_for_update();
> +	fd = hold_lock_file_for_update(&shallow_lock, git_path("shallow"),
> +				       LOCK_DIE_ON_ERROR);
> +	if (write_shallow_commits(&sb, 0)) {
> +		if (write_in_full(fd, sb.buf, sb.len) != sb.len)
> +			die_errno("failed to write to %s", shallow_lock.filename);
> +		alternate_shallow_file = shallow_lock.filename;
> +	} else
> +		alternate_shallow_file = "";

This "empty string, not NULL" trick needs to be commented here to
match the other place that uses this as a cue to do things
differently.

> +	strbuf_release(&sb);
> +}
> +
>  static struct ref *do_fetch_pack(struct fetch_pack_args *args,
>  				 int fd[2],
>  				 const struct ref *orig_ref,
> @@ -858,6 +882,8 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
>  
>  	if (args->stateless_rpc)
>  		packet_flush(fd[1]);
> +	if (args->depth > 0)
> +		setup_alternate_shallow();
>  	if (get_pack(args, fd, pack_lockfile))
>  		die("git fetch-pack: fetch failed.");
>  
> @@ -936,15 +962,9 @@ struct ref *fetch_pack(struct fetch_pack_args *args,
>  		       struct ref **sought, int nr_sought,
>  		       char **pack_lockfile)
>  {
>  	struct ref *ref_cpy;
>  
>  	fetch_pack_setup();
>  	if (nr_sought)
>  		nr_sought = remove_duplicates_in_refs(sought, nr_sought);
>  
> @@ -955,34 +975,13 @@ struct ref *fetch_pack(struct fetch_pack_args *args,
>  	ref_cpy = do_fetch_pack(args, fd, ref, sought, nr_sought, pack_lockfile);
>  
>  	if (args->depth > 0) {
> +		struct stat st;
> +		if (!fstat(shallow_lock.fd, &st) &&
> +		    st.st_size == 0) {
> +			unlink_or_warn(git_path("shallow"));

Are we unlinking the right file here?

> +			rollback_lock_file(&shallow_lock);
> +		} else
> +			commit_lock_file(&shallow_lock);
>  	}
>  
>  	reprepare_packed_git();
> diff --git a/git.c b/git.c
> index 1ada169..6450a38 100644
> --- a/git.c
> +++ b/git.c
> @@ -4,6 +4,7 @@
>  #include "help.h"
>  #include "quote.h"
>  #include "run-command.h"
> +#include "commit.h"
>  
>  const char git_usage_string[] =
>  	"git [--version] [--help] [-c name=value]\n"
> @@ -146,6 +147,10 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
>  			setenv(GIT_LITERAL_PATHSPECS_ENVIRONMENT, "0", 1);
>  			if (envchanged)
>  				*envchanged = 1;
> +		} else if (!strcmp(cmd, "--shallow-file")) {
> +			(*argv)++;
> +			(*argc)--;
> +			set_alternate_shallow_file((*argv)[0]);

Should this count as "envchanged"?

> diff --git a/shallow.c b/shallow.c
> index 6be915f..bdae988 100644
> --- a/shallow.c
> +++ b/shallow.c
> @@ -3,6 +3,16 @@
>  #include "tag.h"
>  
>  static int is_shallow = -1;
> +static struct stat shallow_stat;
> +static char *alternate_shallow_file;
> +
> +void set_alternate_shallow_file(const char *path)
> +{
> +	if (is_shallow != -1)
> +		die("BUG: is_repository_shallow must not be called before set_alternate_shallow_file");
> +	free(alternate_shallow_file);
> +	alternate_shallow_file = path ? xstrdup(path) : NULL;
> +}
>  
>  int register_shallow(const unsigned char *sha1)
>  {
> @@ -21,12 +31,21 @@ int is_repository_shallow(void)
>  {
>  	FILE *fp;
>  	char buf[1024];
> +	const char *path = alternate_shallow_file;
>  
>  	if (is_shallow >= 0)
>  		return is_shallow;
>  
> -	fp = fopen(git_path("shallow"), "r");
> -	if (!fp) {
> +	if (!path)
> +		path = git_path("shallow");
> +	/*
> +	 * fetch-pack set '--shallow-file ""' as an indicator that no

s/set/&s/?

> +	 * shallow file should be used. We could just open it and it
> +	 * will likely fail. But let's do an explicit check instead.
> +	 */
> +	if (!*path ||
> +	    stat(path, &shallow_stat) ||
> +	    (fp = fopen(path, "r")) == NULL) {
>  		is_shallow = 0;
>  		return is_shallow;
>  	}
> @@ -108,3 +127,25 @@ struct commit_list *get_shallow_commits(struct object_array *heads, int depth,
>  
>  	return result;
>  }
> +
> +void check_shallow_file_for_update(void)
> +{
> +	struct stat st;
> +
> +	if (getenv("GIT_SHALLOW_FILE"))
> +		die("GIT_SHALLOW_FILE should not be manually set");

Forgot to remove this?

> +	if (!is_shallow)
> +		return;
> +	else if (is_shallow == -1)
> +		die("BUG: shallow must be initialized by now");
> +
> +	if (stat(git_path("shallow"), &st))
> +		die("shallow file was removed during fetch");
> +	else if (st.st_mtime != shallow_stat.st_mtime
> +#ifdef USE_NSEC
> +		 || ST_MTIME_NSEC(st) != ST_MTIME_NSEC(shallow_stat)
> +#endif
> +		   )
> +		die("shallow file was changed during fetch");
> +}
> diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
> index d574085..557b073 100755
> --- a/t/t5500-fetch-pack.sh
> +++ b/t/t5500-fetch-pack.sh
> @@ -135,6 +135,13 @@ test_expect_success 'clone shallow depth 1' '
>  	test "`git --git-dir=shallow0/.git rev-list --count HEAD`" = 1
>  '
>  
> +test_expect_success 'clone shallow depth 1 with fsck' '
> +	git config --global fetch.fsckobjects true &&
> +	git clone --no-single-branch --depth 1 "file://$(pwd)/." shallow0fsck &&
> +	test "`git --git-dir=shallow0fsck/.git rev-list --count HEAD`" = 1 &&
> +	git config --global --unset fetch.fsckobjects
> +'
> +
>  test_expect_success 'clone shallow' '
>  	git clone --no-single-branch --depth 2 "file://$(pwd)/." shallow
>  '

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 2/4] fetch-pack: prepare updated shallow file before fetching the pack
  2013-05-07 15:59       ` Junio C Hamano
@ 2013-05-26  1:01         ` Duy Nguyen
  0 siblings, 0 replies; 35+ messages in thread
From: Duy Nguyen @ 2013-05-26  1:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Jeff King

On Tue, May 7, 2013 at 10:59 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>       if (args->depth > 0) {
>> +             struct stat st;
>> +             if (!fstat(shallow_lock.fd, &st) &&
>> +                 st.st_size == 0) {
>> +                     unlink_or_warn(git_path("shallow"));
>
> Are we unlinking the right file here?

Yes we are. when st.st_size is 0, the new shallow file is empty (i.e.
--unshallow), so we want to remove the old file. I should check
"!*alternate_shallow_file" here instead of based on st.st_size.
--
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2013-05-26  1:02 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-31 11:09 [PATCH 0/4] check_everything_connected replacement Nguyễn Thái Ngọc Duy
2013-03-31 11:09 ` [PATCH 1/4] fetch-pack: save shallow file before fetching the pack Nguyễn Thái Ngọc Duy
2013-04-01 14:53   ` Junio C Hamano
2013-04-05  2:11     ` Duy Nguyen
2013-03-31 11:09 ` [PATCH 2/4] index-pack: remove dead code (it should never happen) Nguyễn Thái Ngọc Duy
2013-03-31 11:09 ` [PATCH 3/4] index-pack, unpack-objects: add --not-so-strict for connectivity check Nguyễn Thái Ngọc Duy
2013-03-31 11:09 ` [PATCH 4/4] Use --not-so-strict on all pack transfer " Nguyễn Thái Ngọc Duy
2013-04-01 14:48 ` [PATCH 0/4] check_everything_connected replacement Junio C Hamano
2013-05-01 10:59 ` [PATCH v2 0/5] " Nguyễn Thái Ngọc Duy
2013-05-01 10:59   ` [PATCH v2 1/5] clone: let the user know when check_everything_connected is run Nguyễn Thái Ngọc Duy
2013-05-01 10:59   ` [PATCH v2 2/5] fetch-pack: prepare updated shallow file before fetching the pack Nguyễn Thái Ngọc Duy
2013-05-01 20:27     ` Junio C Hamano
2013-05-02 10:04       ` Duy Nguyen
2013-05-01 10:59   ` [PATCH v2 3/5] index-pack: remove dead code (it should never happen) Nguyễn Thái Ngọc Duy
2013-05-01 10:59   ` [PATCH v2 4/5] index-pack, unpack-objects: add --not-so-strict for connectivity check Nguyễn Thái Ngọc Duy
2013-05-01 23:35     ` Junio C Hamano
2013-05-02  9:53       ` Duy Nguyen
2013-05-02 16:27         ` Junio C Hamano
2013-05-03  2:29           ` Duy Nguyen
2013-05-03  6:33             ` Junio C Hamano
2013-05-03  6:55               ` Junio C Hamano
2013-05-03  7:09                 ` Duy Nguyen
2013-05-03  8:16                   ` Eric Sunshine
2013-05-01 10:59   ` [PATCH v2 5/5] Use --not-so-strict on all pack transfer " Nguyễn Thái Ngọc Duy
2013-05-03 12:35   ` [PATCH v3 0/4] check_everything_connected replacement Nguyễn Thái Ngọc Duy
2013-05-03 12:35     ` [PATCH v3 1/4] clone: let the user know when check_everything_connected is run Nguyễn Thái Ngọc Duy
2013-05-03 12:35     ` [PATCH v3 2/4] fetch-pack: prepare updated shallow file before fetching the pack Nguyễn Thái Ngọc Duy
2013-05-03 12:37       ` Eric Sunshine
2013-05-07 15:59       ` Junio C Hamano
2013-05-26  1:01         ` Duy Nguyen
2013-05-03 12:35     ` [PATCH v3 3/4] index-pack: remove dead code (it should never happen) Nguyễn Thái Ngọc Duy
2013-05-03 12:35     ` [PATCH v3 4/4] clone: open a shortcut for connectivity check Nguyễn Thái Ngọc Duy
2013-05-03 12:41       ` Eric Sunshine
2013-05-03 16:15       ` Junio C Hamano
2013-05-04  1:10         ` Duy Nguyen

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).