git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/10] RFC Partial Clone and Fetch
@ 2017-03-08 17:37 Jeff Hostetler
  2017-03-08 17:37 ` [PATCH 01/10] pack-objects: eat CR in addition to LF after fgets Jeff Hostetler
                   ` (9 more replies)
  0 siblings, 10 replies; 32+ messages in thread
From: Jeff Hostetler @ 2017-03-08 17:37 UTC (permalink / raw)
  To: git
  Cc: jeffhost, peff, gitster, markbt, benpeart, jonathantanmy,
	Jeff Hostetler

From: Jeff Hostetler <git@jeffhostetler.com>


[RFC] Partial Clone and Fetch
=============================


This is a WIP RFC for a partial clone and fetch feature wherein the client
can request that the server omit various blobs from the packfile during
clone and fetch.  Clients can later request omitted blobs (either from a
modified upload-pack-like request to the server or via a completely
independent mechanism).

The purpose here is to reduce the size of packfile downloads and help
git scale to extremely large repos.

I use the term "partial" here to refer to a portion of one or more commits
and to avoid use of loaded terms like "sparse", "lazy", "narrow", and "skeleton".

The concept of a partial clone/fetch is independent of and can complement
the existing shallow-clone, refspec, and limited-ref filtering mechanisms
since these all filter at the DAG level whereas the work described here
works *within* the set of commits already chosen for download.


A. Requesting a Partial Clone/Fetch
===================================

Clone, fetch, and fetch-pack will accept one or more new "partial"
command line arguments as described below.  The fetch-pack/upload-pack
protocol will be extended to include these new arguments.  Upload-pack
and pack-objects will be updated accordingly.  Pack-objects will filter
out the unwanted blobs as it is building the packfile.  Rev-list and
index-pack will be updated to not complain when missing blobs are
detected in the received packfile.

[1] "--partial-by-size=<n>[kmg]"
Where <n> is a non-negative integer with an optional unit.

Request that only blobs smaller than this be included in the packfile.
The client might use this to implement an alternate LFS or ODB mechanism
for large blobs, such as suggested in:
    https://public-inbox.org/git/20161130210420.15982-1-chriscool@tuxfamily.org/

A special case of size zero would omit all blobs and is similar to the
commits-and-trees-only feature described in:
    https://public-inbox.org/git/20170113155253.1644-1-benpeart@microsoft.com/

[2] "--partial-special"
Request that special files, such as ".gitignore" and .gitattributes",
be included.

[3] *TODO* "--partial-by-profile=<sparse-checkout-path>"
Where <sparse-checkout-path> is verson-controlled file in the repository
(either present in the requested commit or the default HEAD on the server).

    [I envision a ".gitsparse/<path>" hierarchy where teams can store
     common sparse-checkout profiles.  And then they can reference
     them from their private ".git/info/sparse-checkout" files.]

Pack-objects will use this file and the sparse-checkout rules to only
include blobs in the packfile that would be needed to do the corresponding
sparse-checkout (and let the client avoid having to demand-load their
entire enlistment).


When multiple "partial" options are given, they are treated as a simple OR
giving the union of the blobs selected.

The patch series describes the changes to the fetch-pack/upload-pack
protocol:
    Documentation/technical/pack-protocol.txt
    Documentation/technical/protocol-capabilities.txt


B. Issues Backfilling Omitted Blobs
===================================

Ideally, if the client only does "--partial-by-profile" fetches, it
should not need to fetch individual missing blobs, but we have to allow
for it to handle the other commands and other unexpected issues.

There are 3 orthogonal concepts here:  when, how and where?


[1] When:
(1a) a pre-command or hook to identify needed blobs and pre-fetch them
before allowing the actual command to start;
(1b) a dry-run mode for the command to likewise pre-fetch them; or
(1c) "fault" them in as necessary in read_object() while the command is
running and without any pre-fetch (either synchronously or asynchronously
and with/without a helper process).

Ideas for (1c) are being addressed in the following threads:
    https://public-inbox.org/git/20170113155253.1644-1-benpeart@microsoft.com/
    https://public-inbox.org/git/20170117184258.sd7h2hkv27w52gzt@sigill.intra.peff.net/
    https://public-inbox.org/git/20161130210420.15982-1-chriscool@tuxfamily.org/
so I won't consider them here.

Ideas (1a) and (1b) have the advantage that they try to obtain all
required blobs before allowing an operation to start, so there is
less opportunity to leave the user in a weird state.

The best solution may be a combination of (1a) and (1b) and may depend
on the individual command.  However, (1b) will further complicate the
source in the existing commands, so in some cases it may be simpler to
just take the ideas and implement stand-alone pre-commands.

For now, I'm going to limit this RFC to (1a).


[2] How:
(2a) augment the existing git protocols to include blob requests;
(2b) local external process (such as a database client or a local bulk
fetch daemon);

Ideas for (2b) are being addressed in the above threads, so I won't
consider them here.

So I'm going to limit this RFC to (2a).


[3] Where:
(3a) the same remote server used for the partial clone/fetch;
(3b) anywhere else, such as a proxy server or Azure or S3 blob store.

There's no reason that the client should be limited to going back to
the same server, but I'm not going to consider it here, so I'm going
to limit this RFC to (3a).



C. New Blob-Fetch Protocol (2a)
===============================

*TODO* A new pair of commands, such as fetch-blob-pack and upload-blob-pack,
will be created to let the client request a batch of blobs and receive a
packfile.  A protocol similar to the fetch-pack/upload-pack will be spoken
between them.  (This avoids complicating the existing protocol and the work
of enumerating the refs.)  Upload-blob-pack will use pack-objects to build
the packfile.

It is also more efficient than requesting a single blob at a time using
the existing fetch-pack/upload-pack mechanism (with the various allow
unreachable options).

*TODO* The new request protocol will be defined in the patch series.
It will include: a list of the desired blob SHAs.  Possibly also the commit
SHA, branch name, and pathname of each blob (or whatever is necessary to let
the server address the reachability concerns).  Possibly also the last
known SHA for each blob to allow for deltafication in the packfile.


D. Pre-fetching Blobs (1a)
==========================

On the client side, one or more special commands will be created to assemble
the list of blobs needed for an operation and passed to fetch-blob-pack.


Checkout Example:  After running a command like:
    'clone --partial-by-size=1m --no-checkout'

and before doing an actual checkout, we need a command to essentially do:
    (1) "ls-tree -r <tree-ish>",
    (2) filter that by the sparse-checkout currently in effect,
    (3) filter that for missing blobs,
    (4) and pass the resulting list to fetch-blob-pack.

Afterwards, checkout should complete without faulting.

A new "git ls-partial <treeish>" command has been created to do
steps 1 thru 3 and print the resulting list of SHAs on stdout.


E. Unresolved Thoughts
======================

*TODO* The server should optionally return (in a side-band?) a list 
of the blobs that it omitted from the packfile (and possibly the sizes
or sha1_object_info() data for them) during the fetch-pack/upload-pack
operation.  This would allow the client to distinguish from invalid
SHAs and missing ones.  Size information would allow the client to
maybe choose between various servers.

*TODO* The partial clone arguments should be recorded in ".git/info/"
so that subsequent fetch commands can inherit them and rev-list/index-pack
know to not complain by default.

*TODO* Update GC like rev-list to not complain when there are missing blobs.

*TODO* Extend ls-partial to include the "-m" and 3 tree-ish arguments
like read-tree, so we can pre-fetch for merges that may require file
merges (that may or may not be within our sparse-checkout).

*TODO* I also need to review the RFC that Mark Thomas submitted over
the weekend:
    https://public-inbox.org/git/20170304191901.9622-1-markbt%40efaref.net/t





Jeff Hostetler (10):
  pack-objects: eat CR in addition to LF after fgets.
  pack-objects: add --partial-by-size=n --partial-special
  pack-objects: test for --partial-by-size --partial-special
  upload-pack: add partial (sparse) fetch
  fetch-pack: add partial-by-size and partial-special
  rev-list: add --allow-partial option to relax connectivity checks
  index-pack: add --allow-partial option to relax blob existence checks
  fetch: add partial-by-size and partial-special arguments
  clone: add partial-by-size and partial-special arguments
  ls-partial: created command to list missing blobs

 Documentation/technical/pack-protocol.txt         |  14 ++
 Documentation/technical/protocol-capabilities.txt |   7 +
 Makefile                                          |   2 +
 builtin.h                                         |   1 +
 builtin/clone.c                                   |  26 ++
 builtin/fetch-pack.c                              |   9 +
 builtin/fetch.c                                   |  26 +-
 builtin/index-pack.c                              |  20 +-
 builtin/ls-partial.c                              | 110 +++++++++
 builtin/pack-objects.c                            |  64 ++++-
 builtin/rev-list.c                                |  22 +-
 connected.c                                       |   3 +
 connected.h                                       |   3 +
 fetch-pack.c                                      |  17 ++
 fetch-pack.h                                      |   2 +
 git.c                                             |   1 +
 partial-utils.c                                   | 279 ++++++++++++++++++++++
 partial-utils.h                                   |  93 ++++++++
 t/5316-pack-objects-partial.sh                    |  72 ++++++
 transport.c                                       |   8 +
 transport.h                                       |   8 +
 upload-pack.c                                     |  32 ++-
 22 files changed, 813 insertions(+), 6 deletions(-)
 create mode 100644 builtin/ls-partial.c
 create mode 100644 partial-utils.c
 create mode 100644 partial-utils.h
 create mode 100644 t/5316-pack-objects-partial.sh

-- 
2.7.4


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/10] pack-objects: eat CR in addition to LF after fgets.
  2017-03-08 17:37 [PATCH 00/10] RFC Partial Clone and Fetch Jeff Hostetler
@ 2017-03-08 17:37 ` Jeff Hostetler
  2017-03-09  7:01   ` Jeff King
  2017-03-08 17:37 ` [PATCH 02/10] pack-objects: add --partial-by-size=n --partial-special Jeff Hostetler
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Jeff Hostetler @ 2017-03-08 17:37 UTC (permalink / raw)
  To: git
  Cc: jeffhost, peff, gitster, markbt, benpeart, jonathantanmy,
	Jeff Hostetler

From: Jeff Hostetler <git@jeffhostetler.com>

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/pack-objects.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index f294dcf..7e052bb 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2764,6 +2764,8 @@ static void get_object_list(int ac, const char **av)
 		int len = strlen(line);
 		if (len && line[len - 1] == '\n')
 			line[--len] = 0;
+		if (len && line[len - 1] == '\r')
+			line[--len] = 0;
 		if (!len)
 			break;
 		if (*line == '-') {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/10] pack-objects: add --partial-by-size=n --partial-special
  2017-03-08 17:37 [PATCH 00/10] RFC Partial Clone and Fetch Jeff Hostetler
  2017-03-08 17:37 ` [PATCH 01/10] pack-objects: eat CR in addition to LF after fgets Jeff Hostetler
@ 2017-03-08 17:37 ` Jeff Hostetler
  2017-03-08 18:47   ` Junio C Hamano
  2017-03-09  7:31   ` Jeff King
  2017-03-08 17:37 ` [PATCH 03/10] pack-objects: test for --partial-by-size --partial-special Jeff Hostetler
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 32+ messages in thread
From: Jeff Hostetler @ 2017-03-08 17:37 UTC (permalink / raw)
  To: git
  Cc: jeffhost, peff, gitster, markbt, benpeart, jonathantanmy,
	Jeff Hostetler

From: Jeff Hostetler <git@jeffhostetler.com>

Teach pack-objects to omit blobs from the generated packfile.

When the --partial-by-size=n[kmg] argument is used, only blobs
smaller than the requested size are included.  When n is zero,
no blobs are included.

When the --partial-special argument is used, git special files,
such as ".gitattributes" and ".gitignores" are included.

When both are given, the union of two are included.

This is intended to be used in a partial clone or fetch.
(This has also been called sparse- or lazy-clone.)

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/pack-objects.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7e052bb..2df2f49 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -77,6 +77,10 @@ static unsigned long cache_max_small_delta_size = 1000;
 
 static unsigned long window_memory_limit = 0;
 
+static signed long partial_by_size = -1;
+static int partial_special = 0;
+static struct trace_key trace_partial = TRACE_KEY_INIT(PARTIAL);
+
 /*
  * stats
  */
@@ -2532,6 +2536,54 @@ static void show_object(struct object *obj, const char *name, void *data)
 	obj->flags |= OBJECT_ADDED;
 }
 
+/*
+ * If ANY --partial-* option was given, we want to OMIT all
+ * blobs UNLESS they match one of our patterns.  We treat
+ * the options as OR's so that we get the resulting UNION.
+ */
+static void show_object_partial(struct object *obj, const char *name, void *data)
+{
+	unsigned long s = 0;
+
+	if (obj->type != OBJ_BLOB)
+		goto include_it;
+
+	/*
+	 * When (partial_by_size == 0), we want to OMIT all blobs.
+	 * When (partial_by_size >  0), we want blobs smaller than that.
+	 */
+	if (partial_by_size > 0) {
+		enum object_type t = sha1_object_info(obj->oid.hash, &s);
+		assert(t == OBJ_BLOB);
+		if (s < partial_by_size)
+			goto include_it;
+	}
+
+	/*
+	 * When (partial_special), we want the .git* special files.
+	 */
+	if (partial_special) {
+		if (strcmp(name, GITATTRIBUTES_FILE) == 0 ||
+			strcmp(name, ".gitignore") == 0)
+			goto include_it;
+		else {
+			const char *last_slash = strrchr(name, '/');
+			if (last_slash)
+				if (strcmp(last_slash+1, GITATTRIBUTES_FILE) == 0 ||
+					strcmp(last_slash+1, ".gitignore") == 0)
+					goto include_it;
+		}
+	}
+
+	trace_printf_key(
+		&trace_partial, "omitting blob '%s' %"PRIuMAX" '%s'\n",
+		oid_to_hex(&obj->oid), (uintmax_t)s, name);
+	return;
+
+include_it:
+	show_object(obj, name, data);
+}
+
 static void show_edge(struct commit *commit)
 {
 	add_preferred_base(commit->object.oid.hash);
@@ -2794,7 +2846,11 @@ static void get_object_list(int ac, const char **av)
 	if (prepare_revision_walk(&revs))
 		die("revision walk setup failed");
 	mark_edges_uninteresting(&revs, show_edge);
-	traverse_commit_list(&revs, show_commit, show_object, NULL);
+
+	if (partial_by_size >= 0 || partial_special)
+		traverse_commit_list(&revs, show_commit, show_object_partial, NULL);
+	else
+		traverse_commit_list(&revs, show_commit, show_object, NULL);
 
 	if (unpack_unreachable_expiration) {
 		revs.ignore_missing_links = 1;
@@ -2930,6 +2986,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			 N_("use a bitmap index if available to speed up counting objects")),
 		OPT_BOOL(0, "write-bitmap-index", &write_bitmap_index,
 			 N_("write a bitmap index together with the pack index")),
+		OPT_MAGNITUDE(0, "partial-by-size", (unsigned long *)&partial_by_size,
+			 N_("only include blobs smaller than size in result")),
+		OPT_BOOL(0, "partial-special", &partial_special,
+			 N_("only include blobs for git special files")),
 		OPT_END(),
 	};
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/10] pack-objects: test for --partial-by-size --partial-special
  2017-03-08 17:37 [PATCH 00/10] RFC Partial Clone and Fetch Jeff Hostetler
  2017-03-08 17:37 ` [PATCH 01/10] pack-objects: eat CR in addition to LF after fgets Jeff Hostetler
  2017-03-08 17:37 ` [PATCH 02/10] pack-objects: add --partial-by-size=n --partial-special Jeff Hostetler
@ 2017-03-08 17:37 ` Jeff Hostetler
  2017-03-09  7:35   ` Jeff King
  2017-03-09 18:11   ` Johannes Sixt
  2017-03-08 17:37 ` [PATCH 04/10] upload-pack: add partial (sparse) fetch Jeff Hostetler
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 32+ messages in thread
From: Jeff Hostetler @ 2017-03-08 17:37 UTC (permalink / raw)
  To: git
  Cc: jeffhost, peff, gitster, markbt, benpeart, jonathantanmy,
	Jeff Hostetler

From: Jeff Hostetler <git@jeffhostetler.com>

Some simple tests for pack-objects with the new --partial-by-size
and --partial-special options.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 t/5316-pack-objects-partial.sh | 72 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)
 create mode 100644 t/5316-pack-objects-partial.sh

diff --git a/t/5316-pack-objects-partial.sh b/t/5316-pack-objects-partial.sh
new file mode 100644
index 0000000..352de34
--- /dev/null
+++ b/t/5316-pack-objects-partial.sh
@@ -0,0 +1,72 @@
+#!/bin/sh
+
+test_description='pack-object partial'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	perl -e "print \"a\" x 11;"      > a &&
+	perl -e "print \"a\" x 1100;"    > b &&
+	perl -e "print \"a\" x 1100000;" > c &&
+	echo "ignored"                   > .gitignore &&
+	git add a b c .gitignore &&
+	git commit -m test
+	'
+
+test_expect_success 'all blobs' '
+	git pack-objects --revs --thin --stdout >all.pack <<-EOF &&
+	master
+	
+	EOF
+	git index-pack all.pack &&
+	test 4 = $(git verify-pack -v all.pack | grep blob | wc -l)
+	'
+
+test_expect_success 'no blobs' '
+	git pack-objects --revs --thin --stdout --partial-by-size=0 >none.pack <<-EOF &&
+	master
+	
+	EOF
+	git index-pack none.pack &&
+	test 0 = $(git verify-pack -v none.pack | grep blob | wc -l)
+	'
+
+test_expect_success 'small blobs' '
+	git pack-objects --revs --thin --stdout --partial-by-size=1M >small.pack <<-EOF &&
+	master
+	
+	EOF
+	git index-pack small.pack &&
+	test 3 = $(git verify-pack -v small.pack | grep blob | wc -l)
+	'
+
+test_expect_success 'tiny blobs' '
+	git pack-objects --revs --thin --stdout --partial-by-size=100 >tiny.pack <<-EOF &&
+	master
+	
+	EOF
+	git index-pack tiny.pack &&
+	test 2 = $(git verify-pack -v tiny.pack | grep blob | wc -l)
+	'
+
+test_expect_success 'special' '
+	git pack-objects --revs --thin --stdout --partial-special >spec.pack <<-EOF &&
+	master
+	
+	EOF
+	git index-pack spec.pack &&
+	test 1 = $(git verify-pack -v spec.pack | grep blob | wc -l)
+	'
+
+test_expect_success 'union' '
+	git pack-objects --revs --thin --stdout --partial-by-size=0 --partial-special >union.pack <<-EOF &&
+	master
+	
+	EOF
+	git index-pack union.pack &&
+	test 1 = $(git verify-pack -v union.pack | grep blob | wc -l)
+	'
+
+test_done
+
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/10] upload-pack: add partial (sparse) fetch
  2017-03-08 17:37 [PATCH 00/10] RFC Partial Clone and Fetch Jeff Hostetler
                   ` (2 preceding siblings ...)
  2017-03-08 17:37 ` [PATCH 03/10] pack-objects: test for --partial-by-size --partial-special Jeff Hostetler
@ 2017-03-08 17:37 ` Jeff Hostetler
  2017-03-09  7:48   ` Jeff King
  2017-03-08 17:38 ` [PATCH 05/10] fetch-pack: add partial-by-size and partial-special Jeff Hostetler
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Jeff Hostetler @ 2017-03-08 17:37 UTC (permalink / raw)
  To: git
  Cc: jeffhost, peff, gitster, markbt, benpeart, jonathantanmy,
	Jeff Hostetler

From: Jeff Hostetler <git@jeffhostetler.com>

Teach upload-pack to advertise the "partial" capability
in the fetch-pack/upload-pack protocol header and to pass
the value of partial-by-size and partial-special on to
pack-objects.

Update protocol documentation.

This might be used in conjunction with a partial (sparse) clone
or fetch to omit various blobs from the generated packfile.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/pack-protocol.txt         | 14 ++++++++++
 Documentation/technical/protocol-capabilities.txt |  7 +++++
 upload-pack.c                                     | 32 ++++++++++++++++++++++-
 3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index c59ac99..0032729 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -212,6 +212,7 @@ out of what the server said it could do with the first 'want' line.
   upload-request    =  want-list
 		       *shallow-line
 		       *1depth-request
+		       *partial
 		       flush-pkt
 
   want-list         =  first-want
@@ -223,10 +224,15 @@ out of what the server said it could do with the first 'want' line.
 		       PKT-LINE("deepen-since" SP timestamp) /
 		       PKT-LINE("deepen-not" SP ref)
 
+  partial           =  PKT-LINE("partial-by-size" SP magnitude) /
+		       PKT-LINE("partial-special)  
+
   first-want        =  PKT-LINE("want" SP obj-id SP capability-list)
   additional-want   =  PKT-LINE("want" SP obj-id)
 
   depth             =  1*DIGIT
+
+  magnitude         =  1*DIGIT [ "k" | "m" | "g" ]
 ----
 
 Clients MUST send all the obj-ids it wants from the reference
@@ -249,6 +255,14 @@ complete those commits. Commits whose parents are not received as a
 result are defined as shallow and marked as such in the server. This
 information is sent back to the client in the next step.
 
+The client can optionally request a partial packfile that omits
+various blobs.  The value of "partial-by-size" is a non-negative
+integer with optional units and requests blobs smaller than this
+value.  The "partial-special" command requests git-special files,
+such as ".gitignore".  Using both requests the union of the two.
+These requests are only valid if the server advertises the "partial"
+capability.
+
 Once all the 'want's and 'shallow's (and optional 'deepen') are
 transferred, clients MUST send a flush-pkt, to tell the server side
 that it is done sending the list.
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 26dcc6f..9aa2123 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -309,3 +309,10 @@ to accept a signed push certificate, and asks the <nonce> to be
 included in the push certificate.  A send-pack client MUST NOT
 send a push-cert packet unless the receive-pack server advertises
 this capability.
+
+partial
+-------
+
+If the upload-pack server advertises this capability, fetch-pack
+may send various "partial-*" commands to request a partial clone
+or fetch where the server omits certain blobs from the packfile.
diff --git a/upload-pack.c b/upload-pack.c
index 7597ba3..74f9dfa 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -63,6 +63,11 @@ static int advertise_refs;
 static int stateless_rpc;
 static const char *pack_objects_hook;
 
+static struct strbuf partial_by_size = STRBUF_INIT;
+static int client_requested_partial_capability;
+static int have_partial_by_size;
+static int have_partial_special;
+
 static void reset_timeout(void)
 {
 	alarm(timeout);
@@ -130,6 +135,10 @@ static void create_pack_file(void)
 		argv_array_push(&pack_objects.args, "--delta-base-offset");
 	if (use_include_tag)
 		argv_array_push(&pack_objects.args, "--include-tag");
+	if (have_partial_by_size)
+		argv_array_push(&pack_objects.args, partial_by_size.buf);
+	if (have_partial_special)
+		argv_array_push(&pack_objects.args, "--partial-special");
 
 	pack_objects.in = -1;
 	pack_objects.out = -1;
@@ -793,6 +802,23 @@ static void receive_needs(void)
 			deepen_rev_list = 1;
 			continue;
 		}
+		if (skip_prefix(line, "partial-by-size ", &arg)) {
+			unsigned long s;
+			if (!client_requested_partial_capability)
+				die("git upload-pack: 'partial-by-size' option requires 'partial' capability");
+			if (!git_parse_ulong(arg, &s))
+				die("git upload-pack: invalid partial-by-size value: %s", line);
+			strbuf_addstr(&partial_by_size, "--partial-by-size=");
+			strbuf_addstr(&partial_by_size, arg);
+			have_partial_by_size = 1;
+			continue;
+		}
+		if (skip_prefix(line, "partial-special", &arg)) {
+			if (!client_requested_partial_capability)
+				die("git upload-pack: 'partial-special' option requires 'partial' capability");
+			have_partial_special = 1;
+			continue;
+		}
 		if (!skip_prefix(line, "want ", &arg) ||
 		    get_sha1_hex(arg, sha1_buf))
 			die("git upload-pack: protocol error, "
@@ -820,6 +846,8 @@ static void receive_needs(void)
 			no_progress = 1;
 		if (parse_feature_request(features, "include-tag"))
 			use_include_tag = 1;
+		if (parse_feature_request(features, "partial"))
+			client_requested_partial_capability = 1;
 
 		o = parse_object(sha1_buf);
 		if (!o)
@@ -924,7 +952,9 @@ static int send_ref(const char *refname, const struct object_id *oid,
 {
 	static const char *capabilities = "multi_ack thin-pack side-band"
 		" side-band-64k ofs-delta shallow deepen-since deepen-not"
-		" deepen-relative no-progress include-tag multi_ack_detailed";
+		" deepen-relative no-progress include-tag multi_ack_detailed"
+		" partial"
+		;
 	const char *refname_nons = strip_namespace(refname);
 	struct object_id peeled;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/10] fetch-pack: add partial-by-size and partial-special
  2017-03-08 17:37 [PATCH 00/10] RFC Partial Clone and Fetch Jeff Hostetler
                   ` (3 preceding siblings ...)
  2017-03-08 17:37 ` [PATCH 04/10] upload-pack: add partial (sparse) fetch Jeff Hostetler
@ 2017-03-08 17:38 ` Jeff Hostetler
  2017-03-08 17:38 ` [PATCH 06/10] rev-list: add --allow-partial option to relax connectivity checks Jeff Hostetler
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Jeff Hostetler @ 2017-03-08 17:38 UTC (permalink / raw)
  To: git
  Cc: jeffhost, peff, gitster, markbt, benpeart, jonathantanmy,
	Jeff Hostetler

From: Jeff Hostetler <git@jeffhostetler.com>

Teach fetch-pack to take --partial-by-size and --partial-special
arguments and pass them via the transport to upload-pack to
request that certain blobs be omitted from the resulting packfile.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/fetch-pack.c |  9 +++++++++
 fetch-pack.c         | 17 +++++++++++++++++
 fetch-pack.h         |  2 ++
 transport.c          |  8 ++++++++
 transport.h          |  8 ++++++++
 5 files changed, 44 insertions(+)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index cfe9e44..324d7b2 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -8,6 +8,7 @@
 static const char fetch_pack_usage[] =
 "git fetch-pack [--all] [--stdin] [--quiet | -q] [--keep | -k] [--thin] "
 "[--include-tag] [--upload-pack=<git-upload-pack>] [--depth=<n>] "
+"[--partial-by-size=<n>] [--partial-special] "
 "[--no-progress] [--diag-url] [-v] [<host>:]<directory> [<refs>...]";
 
 static void add_sought_entry(struct ref ***sought, int *nr, int *alloc,
@@ -143,6 +144,14 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 			args.update_shallow = 1;
 			continue;
 		}
+		if (skip_prefix(arg, "--partial-by-size=", &arg)) {
+			args.partial_by_size = xstrdup(arg);
+			continue;
+		}
+		if (!strcmp("--partial-special", arg)) {
+			args.partial_special = 1;
+			continue;
+		}
 		usage(fetch_pack_usage);
 	}
 	if (deepen_not.nr)
diff --git a/fetch-pack.c b/fetch-pack.c
index e0f5d5c..e355c38 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -372,6 +372,8 @@ static int find_common(struct fetch_pack_args *args,
 			if (prefer_ofs_delta)   strbuf_addstr(&c, " ofs-delta");
 			if (deepen_since_ok)    strbuf_addstr(&c, " deepen-since");
 			if (deepen_not_ok)      strbuf_addstr(&c, " deepen-not");
+			if (args->partial_by_size || args->partial_special)
+				strbuf_addstr(&c, " partial");
 			if (agent_supported)    strbuf_addf(&c, " agent=%s",
 							    git_user_agent_sanitized());
 			packet_buf_write(&req_buf, "want %s%s\n", remote_hex, c.buf);
@@ -402,6 +404,12 @@ static int find_common(struct fetch_pack_args *args,
 			packet_buf_write(&req_buf, "deepen-not %s", s->string);
 		}
 	}
+
+	if (args->partial_by_size)
+		packet_buf_write(&req_buf, "partial-by-size %s", args->partial_by_size);
+	if (args->partial_special)
+		packet_buf_write(&req_buf, "partial-special");
+
 	packet_buf_flush(&req_buf);
 	state_len = req_buf.len;
 
@@ -807,6 +815,10 @@ static int get_pack(struct fetch_pack_args *args,
 					"--keep=fetch-pack %"PRIuMAX " on %s",
 					(uintmax_t)getpid(), hostname);
 		}
+
+		if (args->partial_by_size || args->partial_special)
+			argv_array_push(&cmd.args, "--allow-partial");
+
 		if (args->check_self_contained_and_connected)
 			argv_array_push(&cmd.args, "--check-self-contained-and-connected");
 	}
@@ -920,6 +932,11 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	else
 		prefer_ofs_delta = 0;
 
+	if (server_supports("partial"))
+		print_verbose(args, _("Server supports partial"));
+	else if (args->partial_by_size || args->partial_special)
+		die(_("Server does not support 'partial'"));
+
 	if ((agent_feature = server_feature_value("agent", &agent_len))) {
 		agent_supported = 1;
 		if (agent_len)
diff --git a/fetch-pack.h b/fetch-pack.h
index c912e3d..b8a26e0 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -12,6 +12,7 @@ struct fetch_pack_args {
 	int depth;
 	const char *deepen_since;
 	const struct string_list *deepen_not;
+	const char *partial_by_size;
 	unsigned deepen_relative:1;
 	unsigned quiet:1;
 	unsigned keep_pack:1;
@@ -29,6 +30,7 @@ struct fetch_pack_args {
 	unsigned cloning:1;
 	unsigned update_shallow:1;
 	unsigned deepen:1;
+	unsigned partial_special:1;
 };
 
 /*
diff --git a/transport.c b/transport.c
index 5828e06..45f35a4 100644
--- a/transport.c
+++ b/transport.c
@@ -160,6 +160,12 @@ static int set_git_option(struct git_transport_options *opts,
 	} else if (!strcmp(name, TRANS_OPT_DEEPEN_RELATIVE)) {
 		opts->deepen_relative = !!value;
 		return 0;
+	} else if (!strcmp(name, TRANS_OPT_PARTIAL_BY_SIZE)) {
+		opts->partial_by_size = xstrdup(value);
+		return 0;
+	} else if (!strcmp(name, TRANS_OPT_PARTIAL_SPECIAL)) {
+		opts->partial_special = !!value;
+		return 0;
 	}
 	return 1;
 }
@@ -227,6 +233,8 @@ static int fetch_refs_via_pack(struct transport *transport,
 		data->options.check_self_contained_and_connected;
 	args.cloning = transport->cloning;
 	args.update_shallow = data->options.update_shallow;
+	args.partial_by_size = data->options.partial_by_size;
+	args.partial_special = data->options.partial_special;
 
 	if (!data->got_remote_heads) {
 		connect_setup(transport, 0);
diff --git a/transport.h b/transport.h
index bc55715..c3f2d52 100644
--- a/transport.h
+++ b/transport.h
@@ -15,12 +15,14 @@ struct git_transport_options {
 	unsigned self_contained_and_connected : 1;
 	unsigned update_shallow : 1;
 	unsigned deepen_relative : 1;
+	unsigned partial_special : 1;
 	int depth;
 	const char *deepen_since;
 	const struct string_list *deepen_not;
 	const char *uploadpack;
 	const char *receivepack;
 	struct push_cas_option *cas;
+	const char *partial_by_size;
 };
 
 enum transport_family {
@@ -210,6 +212,12 @@ void transport_check_allowed(const char *type);
 /* Send push certificates */
 #define TRANS_OPT_PUSH_CERT "pushcert"
 
+/* Partial fetch to only include small files */
+#define TRANS_OPT_PARTIAL_BY_SIZE "partial-by-size"
+
+/* Partial fetch to only include special files, like ".gitignore" */
+#define TRANS_OPT_PARTIAL_SPECIAL "partial-special"
+
 /**
  * Returns 0 if the option was used, non-zero otherwise. Prints a
  * message to stderr if the option is not used.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/10] rev-list: add --allow-partial option to relax connectivity checks
  2017-03-08 17:37 [PATCH 00/10] RFC Partial Clone and Fetch Jeff Hostetler
                   ` (4 preceding siblings ...)
  2017-03-08 17:38 ` [PATCH 05/10] fetch-pack: add partial-by-size and partial-special Jeff Hostetler
@ 2017-03-08 17:38 ` Jeff Hostetler
  2017-03-08 18:55   ` Junio C Hamano
  2017-03-08 17:38 ` [PATCH 07/10] index-pack: add --allow-partial option to relax blob existence checks Jeff Hostetler
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Jeff Hostetler @ 2017-03-08 17:38 UTC (permalink / raw)
  To: git
  Cc: jeffhost, peff, gitster, markbt, benpeart, jonathantanmy,
	Jeff Hostetler

From: Jeff Hostetler <git@jeffhostetler.com>

Teach rev-list to optionally not complain when there are missing
blobs.  This is for use following a partial clone or fetch when
the server omitted certain blobs.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/rev-list.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 0aa93d5..50c49ba 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -45,6 +45,7 @@ static const char rev_list_usage[] =
 "    --left-right\n"
 "    --count\n"
 "  special purpose:\n"
+"    --allow-partial\n"
 "    --bisect\n"
 "    --bisect-vars\n"
 "    --bisect-all"
@@ -53,6 +54,9 @@ static const char rev_list_usage[] =
 static struct progress *progress;
 static unsigned progress_counter;
 
+static int allow_partial;
+static struct trace_key trace_partial = TRACE_KEY_INIT(PARTIAL);
+
 static void finish_commit(struct commit *commit, void *data);
 static void show_commit(struct commit *commit, void *data)
 {
@@ -178,8 +182,16 @@ static void finish_commit(struct commit *commit, void *data)
 static void finish_object(struct object *obj, const char *name, void *cb_data)
 {
 	struct rev_list_info *info = cb_data;
-	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid))
+	if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
+		if (allow_partial) {
+			/* Assume a previous partial clone/fetch omitted it. */
+			trace_printf_key(
+				&trace_partial, "omitted blob '%s' '%s'\n",
+				oid_to_hex(&obj->oid), name);
+			return;
+		}
 		die("missing blob object '%s'", oid_to_hex(&obj->oid));
+	}
 	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
 		parse_object(obj->oid.hash);
 }
@@ -329,6 +341,14 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			show_progress = arg;
 			continue;
 		}
+		if (!strcmp(arg, "--allow-partial")) {
+			allow_partial = 1;
+			continue;
+		}
+		if (!strcmp(arg, "--no-allow-partial")) {
+			allow_partial = 0;
+			continue;
+		}
 		usage(rev_list_usage);
 
 	}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/10] index-pack: add --allow-partial option to relax blob existence checks
  2017-03-08 17:37 [PATCH 00/10] RFC Partial Clone and Fetch Jeff Hostetler
                   ` (5 preceding siblings ...)
  2017-03-08 17:38 ` [PATCH 06/10] rev-list: add --allow-partial option to relax connectivity checks Jeff Hostetler
@ 2017-03-08 17:38 ` Jeff Hostetler
  2017-03-08 17:38 ` [PATCH 08/10] fetch: add partial-by-size and partial-special arguments Jeff Hostetler
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Jeff Hostetler @ 2017-03-08 17:38 UTC (permalink / raw)
  To: git
  Cc: jeffhost, peff, gitster, markbt, benpeart, jonathantanmy,
	Jeff Hostetler

From: Jeff Hostetler <git@jeffhostetler.com>

Teach index-pack to optionally not complain when there are missing
blobs.  This is for use following a partial clone or fetch when
the server omitted certain blobs.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/index-pack.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index f4b87c6..8f99408 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -13,7 +13,7 @@
 #include "thread-utils.h"
 
 static const char index_pack_usage[] =
-"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
+"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--verify] [--allow-partial] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
 
 struct object_entry {
 	struct pack_idx_entry idx;
@@ -81,6 +81,9 @@ static int show_resolving_progress;
 static int show_stat;
 static int check_self_contained_and_connected;
 
+static int allow_partial;
+static struct trace_key trace_partial = TRACE_KEY_INIT(PARTIAL);
+
 static struct progress *progress;
 
 /* We always read in 4kB chunks. */
@@ -220,9 +223,18 @@ static unsigned check_object(struct object *obj)
 	if (!(obj->flags & FLAG_CHECKED)) {
 		unsigned long size;
 		int type = sha1_object_info(obj->oid.hash, &size);
-		if (type <= 0)
+		if (type <= 0) {
+			if (allow_partial > 0 && obj->type == OBJ_BLOB) {
+				/* Assume a previous partial clone/fetch omitted it. */
+				trace_printf_key(
+					&trace_partial, "omitted blob '%s'\n",
+					oid_to_hex(&obj->oid));
+				obj->flags |= FLAG_CHECKED;
+				return 0;
+			}
 			die(_("did not receive expected object %s"),
 			      oid_to_hex(&obj->oid));
+		}
 		if (type != obj->type)
 			die(_("object %s: expected type %s, found %s"),
 			    oid_to_hex(&obj->oid),
@@ -1718,6 +1730,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 					die(_("bad %s"), arg);
 			} else if (skip_prefix(arg, "--max-input-size=", &arg)) {
 				max_input_size = strtoumax(arg, NULL, 10);
+			} else if (!strcmp(arg, "--allow-partial")) {
+				allow_partial = 1;
+			} else if (!strcmp(arg, "--no-allow-partial")) {
+				allow_partial = 0;
 			} else
 				usage(index_pack_usage);
 			continue;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/10] fetch: add partial-by-size and partial-special arguments
  2017-03-08 17:37 [PATCH 00/10] RFC Partial Clone and Fetch Jeff Hostetler
                   ` (6 preceding siblings ...)
  2017-03-08 17:38 ` [PATCH 07/10] index-pack: add --allow-partial option to relax blob existence checks Jeff Hostetler
@ 2017-03-08 17:38 ` Jeff Hostetler
  2017-03-08 17:38 ` [PATCH 09/10] clone: " Jeff Hostetler
  2017-03-08 17:38 ` [PATCH 10/10] ls-partial: created command to list missing blobs Jeff Hostetler
  9 siblings, 0 replies; 32+ messages in thread
From: Jeff Hostetler @ 2017-03-08 17:38 UTC (permalink / raw)
  To: git
  Cc: jeffhost, peff, gitster, markbt, benpeart, jonathantanmy,
	Jeff Hostetler

From: Jeff Hostetler <git@jeffhostetler.com>

Teach fetch to accept --partial-by-size=n and --partial-special
arguments and pass them to fetch-patch to request that the
server certain blobs from the generated packfile.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/fetch.c | 26 +++++++++++++++++++++++++-
 connected.c     |  3 +++
 connected.h     |  3 +++
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index b5ad09d..3d47107 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -52,6 +52,8 @@ static const char *recurse_submodules_default;
 static int shown_url = 0;
 static int refmap_alloc, refmap_nr;
 static const char **refmap_array;
+static const char *partial_by_size;
+static int partial_special;
 
 static int option_parse_recurse_submodules(const struct option *opt,
 				   const char *arg, int unset)
@@ -141,6 +143,11 @@ static struct option builtin_fetch_options[] = {
 			TRANSPORT_FAMILY_IPV4),
 	OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
 			TRANSPORT_FAMILY_IPV6),
+	OPT_STRING(0, "partial-by-size", &partial_by_size,
+			   N_("size"),
+			   N_("only include blobs smaller than this")),
+	OPT_BOOL(0, "partial-special", &partial_special,
+			 N_("only include blobs for git special files")),
 	OPT_END()
 };
 
@@ -731,6 +738,10 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 	const char *filename = dry_run ? "/dev/null" : git_path_fetch_head();
 	int want_status;
 	int summary_width = transport_summary_width(ref_map);
+	struct check_connected_options opt = CHECK_CONNECTED_INIT;
+
+	if (partial_by_size || partial_special)
+		opt.allow_partial = 1;
 
 	fp = fopen(filename, "a");
 	if (!fp)
@@ -742,7 +753,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		url = xstrdup("foreign");
 
 	rm = ref_map;
-	if (check_connected(iterate_ref_map, &rm, NULL)) {
+	if (check_connected(iterate_ref_map, &rm, &opt)) {
 		rc = error(_("%s did not send all necessary objects\n"), url);
 		goto abort;
 	}
@@ -882,6 +893,9 @@ static int quickfetch(struct ref *ref_map)
 	struct ref *rm = ref_map;
 	struct check_connected_options opt = CHECK_CONNECTED_INIT;
 
+	if (partial_by_size || partial_special)
+		opt.allow_partial = 1;
+
 	/*
 	 * If we are deepening a shallow clone we already have these
 	 * objects reachable.  Running rev-list here will return with
@@ -1020,6 +1034,10 @@ static struct transport *prepare_transport(struct remote *remote, int deepen)
 		set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, "yes");
 	if (update_shallow)
 		set_option(transport, TRANS_OPT_UPDATE_SHALLOW, "yes");
+	if (partial_by_size)
+		set_option(transport, TRANS_OPT_PARTIAL_BY_SIZE, partial_by_size);
+	if (partial_special)
+		set_option(transport, TRANS_OPT_PARTIAL_SPECIAL, "yes");
 	return transport;
 }
 
@@ -1314,6 +1332,12 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix,
 			     builtin_fetch_options, builtin_fetch_usage, 0);
 
+	if (partial_by_size) {
+		unsigned long s;
+		if (!git_parse_ulong(partial_by_size, &s))
+			die(_("invalid partial-by-size value"));
+	}
+
 	if (deepen_relative) {
 		if (deepen_relative < 0)
 			die(_("Negative depth in --deepen is not supported"));
diff --git a/connected.c b/connected.c
index 136c2ac..b07cbb5 100644
--- a/connected.c
+++ b/connected.c
@@ -62,6 +62,9 @@ int check_connected(sha1_iterate_fn fn, void *cb_data,
 		argv_array_pushf(&rev_list.args, "--progress=%s",
 				 _("Checking connectivity"));
 
+	if (opt->allow_partial)
+		argv_array_push(&rev_list.args, "--allow-partial");
+
 	rev_list.git_cmd = 1;
 	rev_list.env = opt->env;
 	rev_list.in = -1;
diff --git a/connected.h b/connected.h
index 4ca325f..756259e 100644
--- a/connected.h
+++ b/connected.h
@@ -34,6 +34,9 @@ struct check_connected_options {
 	/* If non-zero, show progress as we traverse the objects. */
 	int progress;
 
+	/* A previous partial clone/fetch may have omitted some blobs. */
+	int allow_partial;
+
 	/*
 	 * Insert these variables into the environment of the child process.
 	 */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/10] clone: add partial-by-size and partial-special arguments
  2017-03-08 17:37 [PATCH 00/10] RFC Partial Clone and Fetch Jeff Hostetler
                   ` (7 preceding siblings ...)
  2017-03-08 17:38 ` [PATCH 08/10] fetch: add partial-by-size and partial-special arguments Jeff Hostetler
@ 2017-03-08 17:38 ` Jeff Hostetler
  2017-03-08 17:38 ` [PATCH 10/10] ls-partial: created command to list missing blobs Jeff Hostetler
  9 siblings, 0 replies; 32+ messages in thread
From: Jeff Hostetler @ 2017-03-08 17:38 UTC (permalink / raw)
  To: git
  Cc: jeffhost, peff, gitster, markbt, benpeart, jonathantanmy,
	Jeff Hostetler

From: Jeff Hostetler <git@jeffhostetler.com>

Teach clone to accept --partial-by-size=n and --partial-special
arguments to request that the server omit certain blobs from
the generated packfile.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/clone.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index 3f63edb..e5a5904 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -56,6 +56,8 @@ static struct string_list option_required_reference = STRING_LIST_INIT_NODUP;
 static struct string_list option_optional_reference = STRING_LIST_INIT_NODUP;
 static int option_dissociate;
 static int max_jobs = -1;
+static const char *partial_by_size;
+static int partial_special;
 
 static struct option builtin_clone_options[] = {
 	OPT__VERBOSITY(&option_verbosity),
@@ -112,6 +114,11 @@ static struct option builtin_clone_options[] = {
 			TRANSPORT_FAMILY_IPV4),
 	OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
 			TRANSPORT_FAMILY_IPV6),
+	OPT_STRING(0, "partial-by-size", &partial_by_size,
+			   N_("size"),
+			   N_("only include blobs smaller than this")),
+	OPT_BOOL(0, "partial-special", &partial_special,
+			 N_("only include blobs for git special files")),
 	OPT_END()
 };
 
@@ -625,6 +632,9 @@ static void update_remote_refs(const struct ref *refs,
 	if (check_connectivity) {
 		struct check_connected_options opt = CHECK_CONNECTED_INIT;
 
+		if (partial_by_size || partial_special)
+			opt.allow_partial = 1;
+
 		opt.transport = transport;
 		opt.progress = transport->progress;
 
@@ -1021,6 +1031,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 			warning(_("--shallow-since is ignored in local clones; use file:// instead."));
 		if (option_not.nr)
 			warning(_("--shallow-exclude is ignored in local clones; use file:// instead."));
+		if (partial_by_size)
+			warning(_("--partial-by-size is ignored in local clones; use file:// instead."));
+		if (partial_special)
+			warning(_("--partial-special is ignored in local clones; use file:// instead."));
 		if (!access(mkpath("%s/shallow", path), F_OK)) {
 			if (option_local > 0)
 				warning(_("source repository is shallow, ignoring --local"));
@@ -1052,6 +1066,18 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		transport_set_option(transport, TRANS_OPT_UPLOADPACK,
 				     option_upload_pack);
 
+	if (partial_by_size) {
+		transport_set_option(transport, TRANS_OPT_PARTIAL_BY_SIZE,
+				     partial_by_size);
+		if (transport->smart_options)
+			transport->smart_options->partial_by_size = partial_by_size;
+	}
+	if (partial_special) {
+		transport_set_option(transport, TRANS_OPT_PARTIAL_SPECIAL, "yes");
+		if (transport->smart_options)
+			transport->smart_options->partial_special = 1;
+	}
+
 	if (transport->smart_options && !deepen)
 		transport->smart_options->check_self_contained_and_connected = 1;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/10] ls-partial: created command to list missing blobs
  2017-03-08 17:37 [PATCH 00/10] RFC Partial Clone and Fetch Jeff Hostetler
                   ` (8 preceding siblings ...)
  2017-03-08 17:38 ` [PATCH 09/10] clone: " Jeff Hostetler
@ 2017-03-08 17:38 ` Jeff Hostetler
  9 siblings, 0 replies; 32+ messages in thread
From: Jeff Hostetler @ 2017-03-08 17:38 UTC (permalink / raw)
  To: git
  Cc: jeffhost, peff, gitster, markbt, benpeart, jonathantanmy,
	Jeff Hostetler

From: Jeff Hostetler <git@jeffhostetler.com>

Added a command to list the missing blobs for a commit.
This can be used after a partial clone or fetch to list
the omitted blobs that the client would need to checkout
the given commit/branch.  Optionally respecting or ignoring
the current sparse-checkout definition.

This command prints a simple list of blob SHAs.  It is
expected that this would be piped into another command
with knowledge of the transport and/or blob store.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Makefile             |   2 +
 builtin.h            |   1 +
 builtin/ls-partial.c | 110 ++++++++++++++++++++
 git.c                |   1 +
 partial-utils.c      | 279 +++++++++++++++++++++++++++++++++++++++++++++++++++
 partial-utils.h      |  93 +++++++++++++++++
 6 files changed, 486 insertions(+)
 create mode 100644 builtin/ls-partial.c
 create mode 100644 partial-utils.c
 create mode 100644 partial-utils.h

diff --git a/Makefile b/Makefile
index 9ec6065..96e9e1e 100644
--- a/Makefile
+++ b/Makefile
@@ -791,6 +791,7 @@ LIB_OBJS += pack-write.o
 LIB_OBJS += pager.o
 LIB_OBJS += parse-options.o
 LIB_OBJS += parse-options-cb.o
+LIB_OBJS += partial-utils.o
 LIB_OBJS += patch-delta.o
 LIB_OBJS += patch-ids.o
 LIB_OBJS += path.o
@@ -908,6 +909,7 @@ BUILTIN_OBJS += builtin/init-db.o
 BUILTIN_OBJS += builtin/interpret-trailers.o
 BUILTIN_OBJS += builtin/log.o
 BUILTIN_OBJS += builtin/ls-files.o
+BUILTIN_OBJS += builtin/ls-partial.o
 BUILTIN_OBJS += builtin/ls-remote.o
 BUILTIN_OBJS += builtin/ls-tree.o
 BUILTIN_OBJS += builtin/mailinfo.o
diff --git a/builtin.h b/builtin.h
index 9e4a898..df00c4b 100644
--- a/builtin.h
+++ b/builtin.h
@@ -79,6 +79,7 @@ extern int cmd_interpret_trailers(int argc, const char **argv, const char *prefi
 extern int cmd_log(int argc, const char **argv, const char *prefix);
 extern int cmd_log_reflog(int argc, const char **argv, const char *prefix);
 extern int cmd_ls_files(int argc, const char **argv, const char *prefix);
+extern int cmd_ls_partial(int argc, const char **argv, const char *prefix);
 extern int cmd_ls_tree(int argc, const char **argv, const char *prefix);
 extern int cmd_ls_remote(int argc, const char **argv, const char *prefix);
 extern int cmd_mailinfo(int argc, const char **argv, const char *prefix);
diff --git a/builtin/ls-partial.c b/builtin/ls-partial.c
new file mode 100644
index 0000000..8ebf045
--- /dev/null
+++ b/builtin/ls-partial.c
@@ -0,0 +1,110 @@
+#include "cache.h"
+#include "blob.h"
+#include "tree.h"
+#include "commit.h"
+#include "quote.h"
+#include "builtin.h"
+#include "parse-options.h"
+#include "pathspec.h"
+#include "dir.h"
+#include "partial-utils.h"
+
+static struct trace_key trace_partial = TRACE_KEY_INIT(PARTIAL);
+
+static int verbose;
+static int ignore_sparse;
+struct exclude_list el;
+
+static const char * const ls_partial_usage[] = {
+	N_("git ls-partial [<options>] <tree-ish>"),
+	NULL
+};
+
+/*
+ * map <tree-ish> arg into SHA1 and get the root treenode.
+ */
+static struct tree *lookup_tree_from_treeish(const char *arg)
+{
+	unsigned char sha1[20];
+	struct tree *tree;
+
+	if (get_sha1(arg, sha1))
+		die("not a valid object name '%s'", arg);
+
+	trace_printf_key(
+		&trace_partial,
+		"ls-partial: treeish '%s' '%s'\n",
+		arg, sha1_to_hex(sha1));
+
+	if (verbose) {
+		printf("commit\t%s\n", sha1_to_hex(sha1));
+		printf("branch\t%s\n", arg);
+	}
+	
+	tree = parse_tree_indirect(sha1);
+	if (!tree)
+		die("not a tree object '%s'", arg);
+
+	return tree;
+}
+
+static void print_results(const struct pu_vec *vec)
+{
+	int k;
+
+	for (k = 0; k < vec->data_nr; k++)
+		printf("%s\n", oid_to_hex(&vec->data[k]->oid));
+}
+
+static void print_results_verbose(const struct pu_vec *vec)
+{
+	int k;
+
+	/* TODO Consider -z version */
+
+	for (k = 0; k < vec->data_nr; k++)
+		printf("%s\t%s\n", oid_to_hex(&vec->data[k]->oid), vec->data[k]->fullpath.buf);
+}
+
+int cmd_ls_partial(int argc, const char **argv, const char *prefix)
+{
+	struct exclude_list el;
+	struct tree *tree;
+	struct pu_vec *vec;
+	struct pu_vec *vec_all = NULL;
+	struct pu_vec *vec_sparse = NULL;
+	struct pu_vec *vec_missing = NULL;
+	
+	const struct option ls_partial_options[] = {
+		OPT__VERBOSE(&verbose, N_("show verbose blob details")),
+		OPT_BOOL(0, "ignore-sparse", &ignore_sparse,
+				 N_("ignore sparse-checkout settings (scan whole tree)")),
+		OPT_END()
+	};
+
+	git_config(git_default_config, NULL);
+	argc = parse_options(argc, argv, prefix,
+						 ls_partial_options, ls_partial_usage, 0);
+	if (argc < 1)
+		usage_with_options(ls_partial_usage, ls_partial_options);
+
+	tree = lookup_tree_from_treeish(argv[0]);
+
+	vec_all = pu_vec_ls_tree(tree, prefix, argv + 1);
+	if (ignore_sparse || pu_load_sparse_definitions("info/sparse-checkout", &el) < 0)
+		vec = vec_all;
+	else {
+		vec_sparse = pu_vec_filter_sparse(vec_all, &el);
+		vec = vec_sparse;
+	}
+
+	vec_missing = pu_vec_filter_missing(vec);
+	vec = vec_missing;
+
+	if (verbose)
+		print_results_verbose(vec);
+	else
+		print_results(vec);
+
+	return 0;
+}
diff --git a/git.c b/git.c
index 33f52ac..ef1e019 100644
--- a/git.c
+++ b/git.c
@@ -444,6 +444,7 @@ static struct cmd_struct commands[] = {
 	{ "interpret-trailers", cmd_interpret_trailers, RUN_SETUP_GENTLY },
 	{ "log", cmd_log, RUN_SETUP },
 	{ "ls-files", cmd_ls_files, RUN_SETUP | SUPPORT_SUPER_PREFIX },
+	{ "ls-partial", cmd_ls_partial, RUN_SETUP },
 	{ "ls-remote", cmd_ls_remote, RUN_SETUP_GENTLY },
 	{ "ls-tree", cmd_ls_tree, RUN_SETUP },
 	{ "mailinfo", cmd_mailinfo, RUN_SETUP_GENTLY },
diff --git a/partial-utils.c b/partial-utils.c
new file mode 100644
index 0000000..b75e91e
--- /dev/null
+++ b/partial-utils.c
@@ -0,0 +1,279 @@
+#include "cache.h"
+#include "blob.h"
+#include "tree.h"
+#include "commit.h"
+#include "quote.h"
+#include "builtin.h"
+#include "parse-options.h"
+#include "pathspec.h"
+#include "dir.h"
+#include "partial-utils.h"
+
+static struct trace_key trace_partial_utils = TRACE_KEY_INIT(PARTIAL_UTILS);
+
+void pu_row_trace(
+	const struct pu_row *row,
+	const char *label)
+{
+	trace_printf_key(
+		&trace_partial_utils,
+		"%s: %06o %s %.*s\n",
+		label,
+		row->mode,
+		oid_to_hex(&row->oid),
+		(int)row->fullpath.len,
+		row->fullpath.buf);
+}
+
+struct pu_row *pu_row_alloc(
+	const unsigned char *sha1,
+	const struct strbuf *base,
+	const char *entryname,
+	unsigned mode)
+{
+	struct pu_row *row = xcalloc(1, sizeof(struct pu_row));
+
+	hashcpy(row->oid.hash, sha1);
+	strbuf_init(&row->fullpath, base->len + strlen(entryname) + 1);
+	if (base->len)
+		strbuf_addbuf(&row->fullpath, base);
+	strbuf_addstr(&row->fullpath, entryname);
+	row->mode = mode;
+	row->entryname_offset = base->len;
+
+	pu_row_trace(row, "alloc");
+
+	return row;
+}
+
+struct pu_vec *pu_vec_alloc(
+	unsigned int nr_pre_alloc)
+{
+	struct pu_vec *vec = xcalloc(1, sizeof(struct pu_vec));
+
+	vec->data = xcalloc(nr_pre_alloc, sizeof(struct pu_row *));
+	vec->data_alloc = nr_pre_alloc;
+
+	return vec;
+}
+
+void pu_vec_append(
+	struct pu_vec *vec,
+	struct pu_row *row)
+{
+	ALLOC_GROW(vec->data, vec->data_nr + 1, vec->data_alloc);
+	vec->data[vec->data_nr++] = row;
+}
+
+static int ls_tree_cb(
+	const unsigned char *sha1,
+	struct strbuf *base,
+	const char *pathname,
+	unsigned mode,
+	int stage,
+	void *context)
+{
+	struct pu_vec *vec = (struct pu_vec *)context;
+
+	/* omit submodules */
+	if (S_ISGITLINK(mode))
+		return 0;
+
+	pu_vec_append(vec, pu_row_alloc(sha1, base, pathname, mode));
+
+	if (S_ISDIR(mode))
+		return READ_TREE_RECURSIVE;
+
+	return 0;
+}
+
+struct pu_vec *pu_vec_ls_tree(
+	struct tree *tree,
+	const char *prefix,
+	const char **argv)
+{
+	struct pu_vec *vec;
+	struct pathspec pathspec;
+	int k;
+
+	vec = pu_vec_alloc(PU_VEC_DEFAULT_SIZE);
+
+	parse_pathspec(
+		&pathspec, PATHSPEC_GLOB | PATHSPEC_ICASE | PATHSPEC_EXCLUDE,
+		PATHSPEC_PREFER_CWD, prefix, argv);
+	for (k = 0; k < pathspec.nr; k++)
+		pathspec.items[k].nowildcard_len = pathspec.items[k].len;
+	pathspec.has_wildcard = 0;
+
+	if (read_tree_recursive(tree, "", 0, 0, &pathspec, ls_tree_cb, vec) != 0)
+		die("Could not read tree");
+
+	return vec;
+}
+
+int pu_load_sparse_definitions(
+	const char *path,
+	struct exclude_list *pel)
+{
+	int result;
+	char *sparse = git_pathdup("info/sparse-checkout");
+	memset(pel, 0, sizeof(*pel));
+	result = add_excludes_from_file_to_list(sparse, "", 0, pel, 0);
+	free(sparse);
+	return result;
+}
+
+static int mode_to_dtype(unsigned mode)
+{
+	if (S_ISREG(mode))
+		return DT_REG;
+	if (S_ISDIR(mode) || S_ISGITLINK(mode))
+		return DT_DIR;
+	if (S_ISLNK(mode))
+		return DT_LNK;
+	return DT_UNKNOWN;
+}
+
+static int apply_excludes_1(
+	struct pu_row **subset,
+	unsigned int nr,
+	struct strbuf *prefix,
+	struct exclude_list *pel,
+	int defval,
+	struct pu_vec *vec_out);
+
+/* apply directory rules. based on clear_ce_flags_dir() */
+static int apply_excludes_dir(
+	struct pu_row **subset,
+	unsigned int nr,
+	struct strbuf *prefix,
+	char *basename,
+	struct exclude_list *pel,
+	int defval,
+	struct pu_vec *vec_out)
+{
+	struct pu_row **subset_end;
+	int dtype = DT_DIR;
+	int ret = is_excluded_from_list(
+		prefix->buf, prefix->len, basename, &dtype, pel);
+	int rc;
+
+	strbuf_addch(prefix, '/');
+
+	if (ret < 0)
+		ret = defval;
+
+	for (subset_end = subset; subset_end != subset + nr; subset_end++) {
+		struct pu_row *row = *subset_end;
+		if (strncmp(row->fullpath.buf, prefix->buf, prefix->len))
+			break;
+	}
+
+	rc = apply_excludes_1(
+		subset, subset_end - subset,
+		prefix, pel, ret,
+		vec_out);
+	strbuf_setlen(prefix, prefix->len - 1);
+	return rc;
+}
+
+/* apply sparse rules to subset[0..nr). based on clear_ce_flags_1() */
+static int apply_excludes_1(
+	struct pu_row **subset,
+	unsigned int nr,
+	struct strbuf *prefix,
+	struct exclude_list *pel,
+	int defval,
+	struct pu_vec *vec_out)
+{
+	struct pu_row **subset_end = subset + nr;
+
+	while (subset != subset_end) {
+		struct pu_row *row = *subset;
+		const char *name, *slash;
+		int len, dtype, val;
+
+		if (prefix->len && strncmp(row->fullpath.buf, prefix->buf, prefix->len))
+			break;
+
+		name = row->fullpath.buf + prefix->len;
+		slash = strchr(name, '/');
+
+		if (slash) {
+			int processed;
+
+			len = slash - name;
+			strbuf_add(prefix, name, len);
+
+			processed = apply_excludes_dir(
+				subset, subset_end - subset,
+				prefix, prefix->buf + prefix->len - len,
+				pel, defval,
+				vec_out);
+
+			if (processed) {
+				subset += processed;
+				strbuf_setlen(prefix, prefix->len - len);
+				continue;
+			}
+
+			strbuf_addch(prefix, '/');
+			subset += apply_excludes_1(
+				subset, subset_end - subset,
+				prefix, pel, defval,
+				vec_out);
+			strbuf_setlen(prefix, prefix->len - len - 1);
+			continue;
+		}
+
+		dtype = mode_to_dtype(row->mode);
+		val = is_excluded_from_list(
+			row->fullpath.buf, row->fullpath.len, name, &dtype, pel);
+		if (val < 0)
+			val = defval;
+		if (val > 0) {
+			pu_row_trace(row, "sparse");
+			pu_vec_append(vec_out, row);
+		}
+		subset++;
+	}
+
+	return nr - (subset_end - subset);
+}
+
+struct pu_vec *pu_vec_filter_sparse(
+	const struct pu_vec *vec_in,
+	struct exclude_list *pel)
+{
+	struct pu_vec *vec_out;
+	struct strbuf prefix = STRBUF_INIT;
+	int defval = 0;
+
+	vec_out = pu_vec_alloc(vec_in->data_nr);
+
+	apply_excludes_1(
+		vec_in->data, vec_in->data_nr,
+		&prefix, pel, defval,
+		vec_out);
+
+	return vec_out;
+}
+
+struct pu_vec *pu_vec_filter_missing(
+	const struct pu_vec *vec_in)
+{
+	struct pu_vec *vec_out;
+	int k;
+
+	vec_out = pu_vec_alloc(vec_in->data_nr);
+
+	for (k = 0; k < vec_in->data_nr; k++) {
+		struct pu_row *row = vec_in->data[k];
+		if (!has_sha1_file(row->oid.hash)) {
+			pu_row_trace(row, "missing");
+			pu_vec_append(vec_out, row);
+		}
+	}
+
+	return vec_out;
+}
diff --git a/partial-utils.h b/partial-utils.h
new file mode 100644
index 0000000..3bdf2e4
--- /dev/null
+++ b/partial-utils.h
@@ -0,0 +1,93 @@
+#ifndef PARTIAL_UTILS_H
+#define PARTIAL_UTILS_H
+
+/*
+ * A 'partial-utils row' represents a single item in the tree.
+ * This is conceptually equivalent to a cache_entry, but does
+ * not require an index_state and lets us operate on any commit
+ * and not be tied to the current worktree.
+ */
+struct pu_row
+{
+	struct strbuf fullpath;
+	struct object_id oid;
+	unsigned mode;
+	unsigned entryname_offset;
+};
+
+/*
+ * A 'partial-utils vec' represents a vector of 'pu row'
+ * values using the normal vector machinery.
+ */
+struct pu_vec
+{
+	struct pu_row **data;
+	unsigned int data_nr;
+	unsigned int data_alloc;
+};
+
+#define PU_VEC_DEFAULT_SIZE (1024*1024)
+
+
+void pu_row_trace(
+	const struct pu_row *row,
+	const char *label);
+
+struct pu_row *pu_row_alloc(
+	const unsigned char *sha1,
+	const struct strbuf *base,
+	const char *entryname,
+	unsigned mode);
+
+struct pu_vec *pu_vec_alloc(
+	unsigned int nr_pre_alloc);
+
+/*
+ * Append the given row onto the vector WITHOUT
+ * assuming ownership of the pointer.
+ */
+void pu_vec_append(
+	struct pu_vec *vec,
+	struct pu_row *row);
+
+/*
+ * Enumerate the contents of the tree (recursively) into
+ * a vector of rows.  This is essentially "ls-tree -r -t"
+ * into a vector.
+ */ 
+struct pu_vec *pu_vec_ls_tree(
+	struct tree *tree,
+	const char *prefix,
+	const char **argv);
+
+/*
+ * Load a sparse-checkout file into (*pel).
+ * Returns -1 if none or error.
+ */
+int pu_load_sparse_definitions(
+	const char *path,
+	struct exclude_list *pel);
+
+/*
+ * Filter the given vector using the sparse-checkout
+ * definitions and return new vector of just the paths
+ * that WOULD BE populated.
+ *
+ * The returned vector BORROWS rows from the input vector.
+ *
+ * This is loosely based upon clear_ce_flags() in unpack-trees.c
+ */
+struct pu_vec *pu_vec_filter_sparse(
+	const struct pu_vec *vec_in,
+	struct exclude_list *pel);
+
+/*
+ * Filter the given vector and return the list of blobs
+ * missing from the local ODB.
+ *
+ * The returned vector BORROWS rows from the input vector.
+ */
+struct pu_vec *pu_vec_filter_missing(
+	const struct pu_vec *vec_in);
+
+#endif /* PARTIAL_UTILS_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/10] pack-objects: add --partial-by-size=n --partial-special
  2017-03-08 17:37 ` [PATCH 02/10] pack-objects: add --partial-by-size=n --partial-special Jeff Hostetler
@ 2017-03-08 18:47   ` Junio C Hamano
  2017-03-08 20:21     ` Jeff Hostetler
  2017-03-09  7:31   ` Jeff King
  1 sibling, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2017-03-08 18:47 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: git, peff, markbt, benpeart, jonathantanmy, Jeff Hostetler

Jeff Hostetler <jeffhost@microsoft.com> writes:

> From: Jeff Hostetler <git@jeffhostetler.com>
>
> Teach pack-objects to omit blobs from the generated packfile.
>
> When the --partial-by-size=n[kmg] argument is used, only blobs
> smaller than the requested size are included.  When n is zero,
> no blobs are included.

Does this interact with a more traditional way of feeding output of
an external "rev-list --objects" to pack-objects via its standard
input, and if so, should it (and if not, shouldn't it)?  

It is perfectly OK if the answer is "this applies only to the case
where we generate the list of objects with internal traversal." but
that needs to be documented and discussed in the proposed log
message.

> When the --partial-special argument is used, git special files,
> such as ".gitattributes" and ".gitignores" are included.

And not ."gitmodules"?  

What happens when we later add ".gitsomethingelse"?

Do we have to worry about the case where the set of git "special
files" (can we have a better name for them please, by the way?)
understood by the sending side and the receiving end is different?

I have a feeling that a mode that makes anything whose name begins
with ".git" excempt from the size based cutoff may generally be
easier to handle.

I am not sure how "back-filling" of a resulting narrow clone would
safely be done and how this impacts "git fsck" at this point, but if
they are solved within this effort, that would be a very welcome
change.

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 04/10] upload-pack: add partial (sparse) fetch
  2017-03-08 18:50 [PATCH 00/10] RFC Partial Clone and Fetch git
@ 2017-03-08 18:50 ` git
  0 siblings, 0 replies; 32+ messages in thread
From: git @ 2017-03-08 18:50 UTC (permalink / raw)
  To: git
  Cc: jeffhost, peff, gitster, markbt, benpeart, jonathantanmy,
	Jeff Hostetler

From: Jeff Hostetler <git@jeffhostetler.com>

Teach upload-pack to advertise the "partial" capability
in the fetch-pack/upload-pack protocol header and to pass
the value of partial-by-size and partial-special on to
pack-objects.

Update protocol documentation.

This might be used in conjunction with a partial (sparse) clone
or fetch to omit various blobs from the generated packfile.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 Documentation/technical/pack-protocol.txt         | 14 ++++++++++
 Documentation/technical/protocol-capabilities.txt |  7 +++++
 upload-pack.c                                     | 32 ++++++++++++++++++++++-
 3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index c59ac99..0032729 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -212,6 +212,7 @@ out of what the server said it could do with the first 'want' line.
   upload-request    =  want-list
 		       *shallow-line
 		       *1depth-request
+		       *partial
 		       flush-pkt
 
   want-list         =  first-want
@@ -223,10 +224,15 @@ out of what the server said it could do with the first 'want' line.
 		       PKT-LINE("deepen-since" SP timestamp) /
 		       PKT-LINE("deepen-not" SP ref)
 
+  partial           =  PKT-LINE("partial-by-size" SP magnitude) /
+		       PKT-LINE("partial-special)  
+
   first-want        =  PKT-LINE("want" SP obj-id SP capability-list)
   additional-want   =  PKT-LINE("want" SP obj-id)
 
   depth             =  1*DIGIT
+
+  magnitude         =  1*DIGIT [ "k" | "m" | "g" ]
 ----
 
 Clients MUST send all the obj-ids it wants from the reference
@@ -249,6 +255,14 @@ complete those commits. Commits whose parents are not received as a
 result are defined as shallow and marked as such in the server. This
 information is sent back to the client in the next step.
 
+The client can optionally request a partial packfile that omits
+various blobs.  The value of "partial-by-size" is a non-negative
+integer with optional units and requests blobs smaller than this
+value.  The "partial-special" command requests git-special files,
+such as ".gitignore".  Using both requests the union of the two.
+These requests are only valid if the server advertises the "partial"
+capability.
+
 Once all the 'want's and 'shallow's (and optional 'deepen') are
 transferred, clients MUST send a flush-pkt, to tell the server side
 that it is done sending the list.
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 26dcc6f..9aa2123 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -309,3 +309,10 @@ to accept a signed push certificate, and asks the <nonce> to be
 included in the push certificate.  A send-pack client MUST NOT
 send a push-cert packet unless the receive-pack server advertises
 this capability.
+
+partial
+-------
+
+If the upload-pack server advertises this capability, fetch-pack
+may send various "partial-*" commands to request a partial clone
+or fetch where the server omits certain blobs from the packfile.
diff --git a/upload-pack.c b/upload-pack.c
index 7597ba3..74f9dfa 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -63,6 +63,11 @@ static int advertise_refs;
 static int stateless_rpc;
 static const char *pack_objects_hook;
 
+static struct strbuf partial_by_size = STRBUF_INIT;
+static int client_requested_partial_capability;
+static int have_partial_by_size;
+static int have_partial_special;
+
 static void reset_timeout(void)
 {
 	alarm(timeout);
@@ -130,6 +135,10 @@ static void create_pack_file(void)
 		argv_array_push(&pack_objects.args, "--delta-base-offset");
 	if (use_include_tag)
 		argv_array_push(&pack_objects.args, "--include-tag");
+	if (have_partial_by_size)
+		argv_array_push(&pack_objects.args, partial_by_size.buf);
+	if (have_partial_special)
+		argv_array_push(&pack_objects.args, "--partial-special");
 
 	pack_objects.in = -1;
 	pack_objects.out = -1;
@@ -793,6 +802,23 @@ static void receive_needs(void)
 			deepen_rev_list = 1;
 			continue;
 		}
+		if (skip_prefix(line, "partial-by-size ", &arg)) {
+			unsigned long s;
+			if (!client_requested_partial_capability)
+				die("git upload-pack: 'partial-by-size' option requires 'partial' capability");
+			if (!git_parse_ulong(arg, &s))
+				die("git upload-pack: invalid partial-by-size value: %s", line);
+			strbuf_addstr(&partial_by_size, "--partial-by-size=");
+			strbuf_addstr(&partial_by_size, arg);
+			have_partial_by_size = 1;
+			continue;
+		}
+		if (skip_prefix(line, "partial-special", &arg)) {
+			if (!client_requested_partial_capability)
+				die("git upload-pack: 'partial-special' option requires 'partial' capability");
+			have_partial_special = 1;
+			continue;
+		}
 		if (!skip_prefix(line, "want ", &arg) ||
 		    get_sha1_hex(arg, sha1_buf))
 			die("git upload-pack: protocol error, "
@@ -820,6 +846,8 @@ static void receive_needs(void)
 			no_progress = 1;
 		if (parse_feature_request(features, "include-tag"))
 			use_include_tag = 1;
+		if (parse_feature_request(features, "partial"))
+			client_requested_partial_capability = 1;
 
 		o = parse_object(sha1_buf);
 		if (!o)
@@ -924,7 +952,9 @@ static int send_ref(const char *refname, const struct object_id *oid,
 {
 	static const char *capabilities = "multi_ack thin-pack side-band"
 		" side-band-64k ofs-delta shallow deepen-since deepen-not"
-		" deepen-relative no-progress include-tag multi_ack_detailed";
+		" deepen-relative no-progress include-tag multi_ack_detailed"
+		" partial"
+		;
 	const char *refname_nons = strip_namespace(refname);
 	struct object_id peeled;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/10] rev-list: add --allow-partial option to relax connectivity checks
  2017-03-08 17:38 ` [PATCH 06/10] rev-list: add --allow-partial option to relax connectivity checks Jeff Hostetler
@ 2017-03-08 18:55   ` Junio C Hamano
  2017-03-08 20:10     ` Jeff Hostetler
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2017-03-08 18:55 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: git, peff, markbt, benpeart, jonathantanmy, Jeff Hostetler

Jeff Hostetler <jeffhost@microsoft.com> writes:

> From: Jeff Hostetler <git@jeffhostetler.com>
>
> Teach rev-list to optionally not complain when there are missing
> blobs.  This is for use following a partial clone or fetch when
> the server omitted certain blobs.

This makes it impossible to tell from objects missing by design
(because we did an --partial-by-size clone earlier, expecting we can
later fetch from elsewhere when necessary) and objects inaccessible
by accident (because you have a repository corruption), no?

Even though I do very much like the basic "high level" premise to
omit often useless large blobs that are buried deep in the history
we would not necessarily need from the initial cloning and
subsequent fetches, I find it somewhat disturbing that the code
"Assume"s that any missing blob is due to an previous partial clone.
Adding this option smells like telling the users that they are not
supposed to run "git fsck" because a partially cloned repository is
inherently a corrupt repository.

Can't we do a bit better?  If we want to make the world safer again,
what additional complexity is required to allow us to tell the
"missing by design" and "corrupt repository" apart?

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/10] rev-list: add --allow-partial option to relax connectivity checks
  2017-03-08 18:55   ` Junio C Hamano
@ 2017-03-08 20:10     ` Jeff Hostetler
  2017-03-09  7:56       ` Jeff King
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff Hostetler @ 2017-03-08 20:10 UTC (permalink / raw)
  To: Junio C Hamano, Jeff Hostetler; +Cc: git, peff, markbt, benpeart, jonathantanmy



On 3/8/2017 1:55 PM, Junio C Hamano wrote:
> Jeff Hostetler <jeffhost@microsoft.com> writes:
>
>> From: Jeff Hostetler <git@jeffhostetler.com>
>>
>> Teach rev-list to optionally not complain when there are missing
>> blobs.  This is for use following a partial clone or fetch when
>> the server omitted certain blobs.
>
> This makes it impossible to tell from objects missing by design
> (because we did an --partial-by-size clone earlier, expecting we can
> later fetch from elsewhere when necessary) and objects inaccessible
> by accident (because you have a repository corruption), no?

Right.  It will effectively neuter several commands like
index-pack, gc, and fsck WRT missing blobs.

> Even though I do very much like the basic "high level" premise to
> omit often useless large blobs that are buried deep in the history
> we would not necessarily need from the initial cloning and
> subsequent fetches, I find it somewhat disturbing that the code
> "Assume"s that any missing blob is due to an previous partial clone.
> Adding this option smells like telling the users that they are not
> supposed to run "git fsck" because a partially cloned repository is
> inherently a corrupt repository.
>
> Can't we do a bit better?  If we want to make the world safer again,
> what additional complexity is required to allow us to tell the
> "missing by design" and "corrupt repository" apart?

I'm open to suggestions here.  It would be nice to extend the
fetch-pack/upload-pack protocol to return a list of the SHAa
(and maybe the sizes) of the omitted blobs, so that a partial
clone or fetch would still be able to be integrity checked.

Jeff


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/10] pack-objects: add --partial-by-size=n --partial-special
  2017-03-08 18:47   ` Junio C Hamano
@ 2017-03-08 20:21     ` Jeff Hostetler
  2017-03-09  7:04       ` Jeff King
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff Hostetler @ 2017-03-08 20:21 UTC (permalink / raw)
  To: Junio C Hamano, Jeff Hostetler; +Cc: git, peff, markbt, benpeart, jonathantanmy



On 3/8/2017 1:47 PM, Junio C Hamano wrote:
> Jeff Hostetler <jeffhost@microsoft.com> writes:
>
>> From: Jeff Hostetler <git@jeffhostetler.com>
>>
>> Teach pack-objects to omit blobs from the generated packfile.
>>
>> When the --partial-by-size=n[kmg] argument is used, only blobs
>> smaller than the requested size are included.  When n is zero,
>> no blobs are included.
>
> Does this interact with a more traditional way of feeding output of
> an external "rev-list --objects" to pack-objects via its standard
> input, and if so, should it (and if not, shouldn't it)?
>
> It is perfectly OK if the answer is "this applies only to the case
> where we generate the list of objects with internal traversal." but
> that needs to be documented and discussed in the proposed log
> message.
>

Let me study that and see.  I'm still thinking thru ways and
options for doing the sparse-checkout like filtering.


>> When the --partial-special argument is used, git special files,
>> such as ".gitattributes" and ".gitignores" are included.
>
> And not ."gitmodules"?
>
> What happens when we later add ".gitsomethingelse"?
>
> Do we have to worry about the case where the set of git "special
> files" (can we have a better name for them please, by the way?)
> understood by the sending side and the receiving end is different?
>
> I have a feeling that a mode that makes anything whose name begins
> with ".git" excempt from the size based cutoff may generally be
> easier to handle.

I forgot about ".gitmodules".  The more I think about it, maybe
we should always include them (or anything starting with ".git*")
and ignore the size, since they are important for correct behavior.


> I am not sure how "back-filling" of a resulting narrow clone would
> safely be done and how this impacts "git fsck" at this point, but if
> they are solved within this effort, that would be a very welcome
> change.
>
> Thanks.
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/10] pack-objects: eat CR in addition to LF after fgets.
  2017-03-08 17:37 ` [PATCH 01/10] pack-objects: eat CR in addition to LF after fgets Jeff Hostetler
@ 2017-03-09  7:01   ` Jeff King
  2017-03-09 15:46     ` Jeff Hostetler
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2017-03-09  7:01 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: git, gitster, markbt, benpeart, jonathantanmy, Jeff Hostetler

On Wed, Mar 08, 2017 at 05:37:56PM +0000, Jeff Hostetler wrote:

> From: Jeff Hostetler <git@jeffhostetler.com>
> 
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  builtin/pack-objects.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index f294dcf..7e052bb 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -2764,6 +2764,8 @@ static void get_object_list(int ac, const char **av)
>  		int len = strlen(line);
>  		if (len && line[len - 1] == '\n')
>  			line[--len] = 0;
> +		if (len && line[len - 1] == '\r')
> +			line[--len] = 0;

Rather than add features to this bespoke line-reader, can we switch this
to use strbuf_getline()? That handles line endings, and avoids the
awkward corner case where fgets "breaks" a long line across two calls.

Something like the patch below. I suspect read_object_list_from_stdin()
should get the same treatment.

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 76b1919ca..6b9fffe9c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2765,7 +2765,7 @@ static void record_recent_commit(struct commit *commit, void *data)
 static void get_object_list(int ac, const char **av)
 {
 	struct rev_info revs;
-	char line[1000];
+	struct strbuf buf = STRBUF_INIT;
 	int flags = 0;
 
 	init_revisions(&revs, NULL);
@@ -2775,12 +2775,12 @@ static void get_object_list(int ac, const char **av)
 	/* make sure shallows are read */
 	is_repository_shallow();
 
-	while (fgets(line, sizeof(line), stdin) != NULL) {
-		int len = strlen(line);
-		if (len && line[len - 1] == '\n')
-			line[--len] = 0;
-		if (!len)
+	while (strbuf_getline(&buf, stdin) != EOF) {
+		const char *line = buf.buf;
+
+		if (!buf.len)
 			break;
+
 		if (*line == '-') {
 			if (!strcmp(line, "--not")) {
 				flags ^= UNINTERESTING;
@@ -2800,6 +2800,7 @@ static void get_object_list(int ac, const char **av)
 		if (handle_revision_arg(line, &revs, flags, REVARG_CANNOT_BE_FILENAME))
 			die("bad revision '%s'", line);
 	}
+	strbuf_release(&buf);
 
 	if (use_bitmap_index && !get_object_list_from_bitmap(&revs))
 		return;


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/10] pack-objects: add --partial-by-size=n --partial-special
  2017-03-08 20:21     ` Jeff Hostetler
@ 2017-03-09  7:04       ` Jeff King
  2017-03-10 17:58         ` Brandon Williams
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2017-03-09  7:04 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Junio C Hamano, Jeff Hostetler, git, markbt, benpeart,
	jonathantanmy

On Wed, Mar 08, 2017 at 03:21:11PM -0500, Jeff Hostetler wrote:

> > And not ."gitmodules"?
> > 
> > What happens when we later add ".gitsomethingelse"?
> > 
> > Do we have to worry about the case where the set of git "special
> > files" (can we have a better name for them please, by the way?)
> > understood by the sending side and the receiving end is different?
> > 
> > I have a feeling that a mode that makes anything whose name begins
> > with ".git" excempt from the size based cutoff may generally be
> > easier to handle.
> 
> I forgot about ".gitmodules".  The more I think about it, maybe
> we should always include them (or anything starting with ".git*")
> and ignore the size, since they are important for correct behavior.

I'm also in favor of staking out ".git*" as "this is special and belongs
to Git".

A while back when we discussed whether to allow symlinks for
.gitattributes, etc, I think the consensus was to treat the whole
".git*" namespace consistently. I haven't followed up with patches yet,
but my plan was to go that route.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/10] pack-objects: add --partial-by-size=n --partial-special
  2017-03-08 17:37 ` [PATCH 02/10] pack-objects: add --partial-by-size=n --partial-special Jeff Hostetler
  2017-03-08 18:47   ` Junio C Hamano
@ 2017-03-09  7:31   ` Jeff King
  2017-03-09 18:26     ` Jeff Hostetler
  1 sibling, 1 reply; 32+ messages in thread
From: Jeff King @ 2017-03-09  7:31 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: git, gitster, markbt, benpeart, jonathantanmy, Jeff Hostetler

On Wed, Mar 08, 2017 at 05:37:57PM +0000, Jeff Hostetler wrote:

> From: Jeff Hostetler <git@jeffhostetler.com>
> 
> Teach pack-objects to omit blobs from the generated packfile.
> 
> When the --partial-by-size=n[kmg] argument is used, only blobs
> smaller than the requested size are included.  When n is zero,
> no blobs are included.
> 
> When the --partial-special argument is used, git special files,
> such as ".gitattributes" and ".gitignores" are included.
> 
> When both are given, the union of two are included.

I understand why one would want to do:

  --partial-by-size=100 --partial-special

and get the union. The first one restricts, and the second one adds back
in. But I don't understand why "--partial-special" by itself makes any
sense. Wouldn't we already be including all blobs, and it would be a
noop?


Also, I was thinking a bit on Junio's comment elsewhere on whether
read_object_list_from_stdin() should do the same limiting. I think the
answer is "probably not", because whoever is generating that object list
can cull the set. You could do it today with something like:

  git rev-list --objects HEAD |
  git cat-file --batch-check='%(objectsize) %(objecttype) %(objectname) %(rest)' |
  perl -lne 's/^(\d+) (\S+) //; print if $2 ne "blob" || $1 < 100' |
  git pack-objects

But if we are going to add this --partial-by-size for the pack-objects
traversal, shouldn't we just add it to rev-list? Then:

  git rev-list --objects --partial-by-size=100 --partial-special |
  git pack-objects

works, and you should get it in the pack-objects basically for free (I
think you'd have to allow through the "--partial" arguments on stdin,
and make sure the rev-list implementation is done via
traverse_commit_list).

As a bonus, I suspect it would make the --partial-special path-handling
easier, because you'd see each tree entry rather than the fully
constructed path (so no more monkeying around with "/").

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 7e052bb..2df2f49 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -77,6 +77,10 @@ static unsigned long cache_max_small_delta_size = 1000;
>  
>  static unsigned long window_memory_limit = 0;
>  
> +static signed long partial_by_size = -1;

I would have expected this to be an off_t, though I think
OPT_MAGNITUDE() forces you into "unsigned long". I guess it is nothing
new for Git; we use "unsigned long" for single object sizes elsewhere,
so systems with a 32-bit long are out of luck anyway until we fix that.

The signed "long" here is unfortunate, as it limits us to 2G on such
systems. Maybe it is not worth worrying too much about. The "big object"
threshold is usually around 500MB. I think the failure behavior is not
great, though (asking for "3G" would go negative and effectively be
ignored).

I think handling all cases would involve swapping out OPT_MAGNITUDE()
for a special callback that writes the "yes, the user set this" bit in a
separate variable.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 03/10] pack-objects: test for --partial-by-size --partial-special
  2017-03-08 17:37 ` [PATCH 03/10] pack-objects: test for --partial-by-size --partial-special Jeff Hostetler
@ 2017-03-09  7:35   ` Jeff King
  2017-03-09 18:11   ` Johannes Sixt
  1 sibling, 0 replies; 32+ messages in thread
From: Jeff King @ 2017-03-09  7:35 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: git, gitster, markbt, benpeart, jonathantanmy, Jeff Hostetler

On Wed, Mar 08, 2017 at 05:37:58PM +0000, Jeff Hostetler wrote:

> diff --git a/t/5316-pack-objects-partial.sh b/t/5316-pack-objects-partial.sh
> [...]
> +test_expect_success 'setup' '
> +	perl -e "print \"a\" x 11;"      > a &&
> +	perl -e "print \"a\" x 1100;"    > b &&
> +	perl -e "print \"a\" x 1100000;" > c &&
> +	echo "ignored"                   > .gitignore &&
> +	git add a b c .gitignore &&
> +	git commit -m test
> +	'

A few minor style nits. We usually prefer ">a" with no space, and the
closing single-quote isn't indented.

> +test_expect_success 'special' '
> +	git pack-objects --revs --thin --stdout --partial-special >spec.pack <<-EOF &&
> +	master
> +	
> +	EOF
> +	git index-pack spec.pack &&
> +	test 1 = $(git verify-pack -v spec.pack | grep blob | wc -l)
> +	'

All of the tests make sense to me except this one. I see from the code
in pack-objects why this returns only the .gitattributes file. I'm just
not clear on whether that would ever be useful. I guess it lets you ask
"give me only the special files", but again, that seems kind of weird if
you are not otherwise limiting.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/10] upload-pack: add partial (sparse) fetch
  2017-03-08 17:37 ` [PATCH 04/10] upload-pack: add partial (sparse) fetch Jeff Hostetler
@ 2017-03-09  7:48   ` Jeff King
  2017-03-09 18:34     ` Jeff Hostetler
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2017-03-09  7:48 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: git, gitster, markbt, benpeart, jonathantanmy, Jeff Hostetler

On Wed, Mar 08, 2017 at 05:37:59PM +0000, Jeff Hostetler wrote:

> diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
> index c59ac99..0032729 100644
> --- a/Documentation/technical/pack-protocol.txt
> +++ b/Documentation/technical/pack-protocol.txt
> @@ -212,6 +212,7 @@ out of what the server said it could do with the first 'want' line.
>    upload-request    =  want-list
>  		       *shallow-line
>  		       *1depth-request
> +		       *partial
>  		       flush-pkt
>  
>    want-list         =  first-want
> @@ -223,10 +224,15 @@ out of what the server said it could do with the first 'want' line.
>  		       PKT-LINE("deepen-since" SP timestamp) /
>  		       PKT-LINE("deepen-not" SP ref)
>  
> +  partial           =  PKT-LINE("partial-by-size" SP magnitude) /
> +		       PKT-LINE("partial-special)  
> +

I probably would have added this as a capability coming back from the
client, since it only makes sense to send once (the same way we ask for
other features like include-tag or ofs-delta). I guess it's six of one,
half a dozen of the other, though.

I notice that you require the client to request the "partial" capability
_and_ to send these commands. I'm not sure what the client capability
response is helping. The server has said "I can do this" and the client
either asks for it or not.

> +		if (skip_prefix(line, "partial-by-size ", &arg)) {
> +			unsigned long s;
> +			if (!client_requested_partial_capability)
> +				die("git upload-pack: 'partial-by-size' option requires 'partial' capability");
> +			if (!git_parse_ulong(arg, &s))
> +				die("git upload-pack: invalid partial-by-size value: %s", line);
> +			strbuf_addstr(&partial_by_size, "--partial-by-size=");
> +			strbuf_addstr(&partial_by_size, arg);
> +			have_partial_by_size = 1;
> +			continue;

So we parse it here for validation, but then pass the original string on
to be parsed again by pack-objects. I think I'd rather see us use the
result of our parse here, just to avoid any bugs where the parsing isn't
identical (and there is such a bug currently due to the signed/unsigned
thing I mentioned).

I also wonder whether the magnitude suffixes are worth exposing across
the wire. Anybody touching the list of units in git_parse_ulong() would
probably be surprised that the protocol is dependent on them (not that I
expect us to really take any away, but it just seems like an unnecessary
protocol complication).

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/10] rev-list: add --allow-partial option to relax connectivity checks
  2017-03-08 20:10     ` Jeff Hostetler
@ 2017-03-09  7:56       ` Jeff King
  2017-03-09 18:38         ` Jeff Hostetler
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2017-03-09  7:56 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Junio C Hamano, Jeff Hostetler, git, markbt, benpeart,
	jonathantanmy

On Wed, Mar 08, 2017 at 03:10:54PM -0500, Jeff Hostetler wrote:

> > Even though I do very much like the basic "high level" premise to
> > omit often useless large blobs that are buried deep in the history
> > we would not necessarily need from the initial cloning and
> > subsequent fetches, I find it somewhat disturbing that the code
> > "Assume"s that any missing blob is due to an previous partial clone.
> > Adding this option smells like telling the users that they are not
> > supposed to run "git fsck" because a partially cloned repository is
> > inherently a corrupt repository.
> > 
> > Can't we do a bit better?  If we want to make the world safer again,
> > what additional complexity is required to allow us to tell the
> > "missing by design" and "corrupt repository" apart?
> 
> I'm open to suggestions here.  It would be nice to extend the
> fetch-pack/upload-pack protocol to return a list of the SHAa
> (and maybe the sizes) of the omitted blobs, so that a partial
> clone or fetch would still be able to be integrity checked.

Yeah, the early external-odb patches did this. It lets you do a more
accurate fsck, and it also helps diff avoid faulting in large-object
cases (because we can mark them as binary for "free" by comparing the
size to big_file_threshold).

So I think it makes a lot of sense in the large-blob case, where
transmitting a type/size/sha1 tuple is way more efficient than sending
the blob itself. But it's less clear for "sparse" cases where just
enumerating the set of blobs may be prohibitively large.

I have a feeling that the "sparse" thing needs to be handled separately
from "partial". IOW, the client needs to tell the server "I'm only
interested in the path foo/bar, so just send that". Then you don't find
out about the types and sizes outside of that path, but you don't need
to; the sparse path is stored locally and fsck knows to avoid looking
into it.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/10] pack-objects: eat CR in addition to LF after fgets.
  2017-03-09  7:01   ` Jeff King
@ 2017-03-09 15:46     ` Jeff Hostetler
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff Hostetler @ 2017-03-09 15:46 UTC (permalink / raw)
  To: Jeff King, Jeff Hostetler; +Cc: git, gitster, markbt, benpeart, jonathantanmy



On 3/9/2017 2:01 AM, Jeff King wrote:
> On Wed, Mar 08, 2017 at 05:37:56PM +0000, Jeff Hostetler wrote:
>
>> From: Jeff Hostetler <git@jeffhostetler.com>
>>
>> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
>> ---
>>  builtin/pack-objects.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>> index f294dcf..7e052bb 100644
>> --- a/builtin/pack-objects.c
>> +++ b/builtin/pack-objects.c
>> @@ -2764,6 +2764,8 @@ static void get_object_list(int ac, const char **av)
>>  		int len = strlen(line);
>>  		if (len && line[len - 1] == '\n')
>>  			line[--len] = 0;
>> +		if (len && line[len - 1] == '\r')
>> +			line[--len] = 0;
>
> Rather than add features to this bespoke line-reader, can we switch this
> to use strbuf_getline()? That handles line endings, and avoids the
> awkward corner case where fgets "breaks" a long line across two calls.
>
> Something like the patch below. I suspect read_object_list_from_stdin()
> should get the same treatment.

Much nicer.  Will do.  Thanks!

>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 76b1919ca..6b9fffe9c 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -2765,7 +2765,7 @@ static void record_recent_commit(struct commit *commit, void *data)
>  static void get_object_list(int ac, const char **av)
>  {
>  	struct rev_info revs;
> -	char line[1000];
> +	struct strbuf buf = STRBUF_INIT;
>  	int flags = 0;
>
>  	init_revisions(&revs, NULL);
> @@ -2775,12 +2775,12 @@ static void get_object_list(int ac, const char **av)
>  	/* make sure shallows are read */
>  	is_repository_shallow();
>
> -	while (fgets(line, sizeof(line), stdin) != NULL) {
> -		int len = strlen(line);
> -		if (len && line[len - 1] == '\n')
> -			line[--len] = 0;
> -		if (!len)
> +	while (strbuf_getline(&buf, stdin) != EOF) {
> +		const char *line = buf.buf;
> +
> +		if (!buf.len)
>  			break;
> +
>  		if (*line == '-') {
>  			if (!strcmp(line, "--not")) {
>  				flags ^= UNINTERESTING;
> @@ -2800,6 +2800,7 @@ static void get_object_list(int ac, const char **av)
>  		if (handle_revision_arg(line, &revs, flags, REVARG_CANNOT_BE_FILENAME))
>  			die("bad revision '%s'", line);
>  	}
> +	strbuf_release(&buf);
>
>  	if (use_bitmap_index && !get_object_list_from_bitmap(&revs))
>  		return;
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 03/10] pack-objects: test for --partial-by-size --partial-special
  2017-03-08 17:37 ` [PATCH 03/10] pack-objects: test for --partial-by-size --partial-special Jeff Hostetler
  2017-03-09  7:35   ` Jeff King
@ 2017-03-09 18:11   ` Johannes Sixt
  1 sibling, 0 replies; 32+ messages in thread
From: Johannes Sixt @ 2017-03-09 18:11 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: git, peff, gitster, markbt, benpeart, jonathantanmy,
	Jeff Hostetler

Am 08.03.2017 um 18:37 schrieb Jeff Hostetler:
> +test_expect_success 'setup' '
> +	perl -e "print \"a\" x 11;"      > a &&
> +	perl -e "print \"a\" x 1100;"    > b &&
> +	perl -e "print \"a\" x 1100000;" > c &&

If the file contents do not matter, you can have the same without perl 
like this:

	printf "%011d" 0      >a &&
	printf "%01100d" 0    >b &&
	printf "%01100000d" 0 >c &&

-- Hannes


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/10] pack-objects: add --partial-by-size=n --partial-special
  2017-03-09  7:31   ` Jeff King
@ 2017-03-09 18:26     ` Jeff Hostetler
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff Hostetler @ 2017-03-09 18:26 UTC (permalink / raw)
  To: Jeff King, Jeff Hostetler; +Cc: git, gitster, markbt, benpeart, jonathantanmy



On 3/9/2017 2:31 AM, Jeff King wrote:
> On Wed, Mar 08, 2017 at 05:37:57PM +0000, Jeff Hostetler wrote:
>
>> From: Jeff Hostetler <git@jeffhostetler.com>
>>
>> Teach pack-objects to omit blobs from the generated packfile.
>>
>> When the --partial-by-size=n[kmg] argument is used, only blobs
>> smaller than the requested size are included.  When n is zero,
>> no blobs are included.
>>
>> When the --partial-special argument is used, git special files,
>> such as ".gitattributes" and ".gitignores" are included.
>>
>> When both are given, the union of two are included.
>
> I understand why one would want to do:
>
>   --partial-by-size=100 --partial-special
>
> and get the union. The first one restricts, and the second one adds back
> in. But I don't understand why "--partial-special" by itself makes any
> sense. Wouldn't we already be including all blobs, and it would be a
> noop?

My thought was that the "--partial-special" when used by itself
would *only* give you the .git* files (and if we had something
like a .gitsparse/ directory, everything under it).  The client
could then do a "special" clone -- mainly to get the sparse checkout
templates under .gitsparse/ and then come back for a sparse fetch
using one of them.  Somewhat of a chicken-n-egg problem, unless the
user knows the template names in advance.

>
>
> Also, I was thinking a bit on Junio's comment elsewhere on whether
> read_object_list_from_stdin() should do the same limiting. I think the
> answer is "probably not", because whoever is generating that object list
> can cull the set. You could do it today with something like:
>
>   git rev-list --objects HEAD |
>   git cat-file --batch-check='%(objectsize) %(objecttype) %(objectname) %(rest)' |
>   perl -lne 's/^(\d+) (\S+) //; print if $2 ne "blob" || $1 < 100' |
>   git pack-objects
>
> But if we are going to add this --partial-by-size for the pack-objects
> traversal, shouldn't we just add it to rev-list? Then:
>
>   git rev-list --objects --partial-by-size=100 --partial-special |
>   git pack-objects
>
> works, and you should get it in the pack-objects basically for free (I
> think you'd have to allow through the "--partial" arguments on stdin,
> and make sure the rev-list implementation is done via
> traverse_commit_list).
>
> As a bonus, I suspect it would make the --partial-special path-handling
> easier, because you'd see each tree entry rather than the fully
> constructed path (so no more monkeying around with "/").

Interesting.  Let me give that a try and see what it looks like.

>
>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>> index 7e052bb..2df2f49 100644
>> --- a/builtin/pack-objects.c
>> +++ b/builtin/pack-objects.c
>> @@ -77,6 +77,10 @@ static unsigned long cache_max_small_delta_size = 1000;
>>
>>  static unsigned long window_memory_limit = 0;
>>
>> +static signed long partial_by_size = -1;
>
> I would have expected this to be an off_t, though I think
> OPT_MAGNITUDE() forces you into "unsigned long". I guess it is nothing
> new for Git; we use "unsigned long" for single object sizes elsewhere,
> so systems with a 32-bit long are out of luck anyway until we fix that.
>
> The signed "long" here is unfortunate, as it limits us to 2G on such
> systems. Maybe it is not worth worrying too much about. The "big object"
> threshold is usually around 500MB. I think the failure behavior is not
> great, though (asking for "3G" would go negative and effectively be
> ignored).
>
> I think handling all cases would involve swapping out OPT_MAGNITUDE()
> for a special callback that writes the "yes, the user set this" bit in a
> separate variable.

Yeah, there is a bit of confusion there.  I used OPT_MAGNITUDE in
one place (for the argument checking), but couldn't in another place.
And I tried to pass the original string across the wire for sanity.
And I had to fight with the types a little.  It would probably be
simpler to replace that with a custom handler (or a uint64_t version
of magnitude) that would do the right thing and then use that numeric
value elsewhere.

Thanks,
Jeff


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/10] upload-pack: add partial (sparse) fetch
  2017-03-09  7:48   ` Jeff King
@ 2017-03-09 18:34     ` Jeff Hostetler
  2017-03-09 19:09       ` Jeff King
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff Hostetler @ 2017-03-09 18:34 UTC (permalink / raw)
  To: Jeff King, Jeff Hostetler; +Cc: git, gitster, markbt, benpeart, jonathantanmy



On 3/9/2017 2:48 AM, Jeff King wrote:
> On Wed, Mar 08, 2017 at 05:37:59PM +0000, Jeff Hostetler wrote:
>
>> diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
>> index c59ac99..0032729 100644
>> --- a/Documentation/technical/pack-protocol.txt
>> +++ b/Documentation/technical/pack-protocol.txt
>> @@ -212,6 +212,7 @@ out of what the server said it could do with the first 'want' line.
>>    upload-request    =  want-list
>>  		       *shallow-line
>>  		       *1depth-request
>> +		       *partial
>>  		       flush-pkt
>>
>>    want-list         =  first-want
>> @@ -223,10 +224,15 @@ out of what the server said it could do with the first 'want' line.
>>  		       PKT-LINE("deepen-since" SP timestamp) /
>>  		       PKT-LINE("deepen-not" SP ref)
>>
>> +  partial           =  PKT-LINE("partial-by-size" SP magnitude) /
>> +		       PKT-LINE("partial-special)
>> +
>
> I probably would have added this as a capability coming back from the
> client, since it only makes sense to send once (the same way we ask for
> other features like include-tag or ofs-delta). I guess it's six of one,
> half a dozen of the other, though.

True.  I wanted the size argument.  And later want to add
a sparse-file.  It seemed like they better belonged in a
PKT-LINE than in the capability header.

>
> I notice that you require the client to request the "partial" capability
> _and_ to send these commands. I'm not sure what the client capability
> response is helping. The server has said "I can do this" and the client
> either asks for it or not.

Yeah, I wasn't sure if that was necessary or not.
It looked like there were other fields where the
client advised that it wanted to used a capability
that the server had previously advertised.  If we
don't need it, that's fine.


>
>> +		if (skip_prefix(line, "partial-by-size ", &arg)) {
>> +			unsigned long s;
>> +			if (!client_requested_partial_capability)
>> +				die("git upload-pack: 'partial-by-size' option requires 'partial' capability");
>> +			if (!git_parse_ulong(arg, &s))
>> +				die("git upload-pack: invalid partial-by-size value: %s", line);
>> +			strbuf_addstr(&partial_by_size, "--partial-by-size=");
>> +			strbuf_addstr(&partial_by_size, arg);
>> +			have_partial_by_size = 1;
>> +			continue;
>
> So we parse it here for validation, but then pass the original string on
> to be parsed again by pack-objects. I think I'd rather see us use the
> result of our parse here, just to avoid any bugs where the parsing isn't
> identical (and there is such a bug currently due to the signed/unsigned
> thing I mentioned).
>
> I also wonder whether the magnitude suffixes are worth exposing across
> the wire. Anybody touching the list of units in git_parse_ulong() would
> probably be surprised that the protocol is dependent on them (not that I
> expect us to really take any away, but it just seems like an unnecessary
> protocol complication).

Yeah, I'll change this as I described in an earlier sub-thread.

Jeff


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/10] rev-list: add --allow-partial option to relax connectivity checks
  2017-03-09  7:56       ` Jeff King
@ 2017-03-09 18:38         ` Jeff Hostetler
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff Hostetler @ 2017-03-09 18:38 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Jeff Hostetler, git, markbt, benpeart,
	jonathantanmy



On 3/9/2017 2:56 AM, Jeff King wrote:
> On Wed, Mar 08, 2017 at 03:10:54PM -0500, Jeff Hostetler wrote:
>
>>> Even though I do very much like the basic "high level" premise to
>>> omit often useless large blobs that are buried deep in the history
>>> we would not necessarily need from the initial cloning and
>>> subsequent fetches, I find it somewhat disturbing that the code
>>> "Assume"s that any missing blob is due to an previous partial clone.
>>> Adding this option smells like telling the users that they are not
>>> supposed to run "git fsck" because a partially cloned repository is
>>> inherently a corrupt repository.
>>>
>>> Can't we do a bit better?  If we want to make the world safer again,
>>> what additional complexity is required to allow us to tell the
>>> "missing by design" and "corrupt repository" apart?
>>
>> I'm open to suggestions here.  It would be nice to extend the
>> fetch-pack/upload-pack protocol to return a list of the SHAa
>> (and maybe the sizes) of the omitted blobs, so that a partial
>> clone or fetch would still be able to be integrity checked.
>
> Yeah, the early external-odb patches did this. It lets you do a more
> accurate fsck, and it also helps diff avoid faulting in large-object
> cases (because we can mark them as binary for "free" by comparing the
> size to big_file_threshold).
>
> So I think it makes a lot of sense in the large-blob case, where
> transmitting a type/size/sha1 tuple is way more efficient than sending
> the blob itself. But it's less clear for "sparse" cases where just
> enumerating the set of blobs may be prohibitively large.
>
> I have a feeling that the "sparse" thing needs to be handled separately
> from "partial". IOW, the client needs to tell the server "I'm only
> interested in the path foo/bar, so just send that". Then you don't find
> out about the types and sizes outside of that path, but you don't need
> to; the sparse path is stored locally and fsck knows to avoid looking
> into it.
>
> -Peff
>

That makes sense.  I'd like to get both concepts (by-size/special vs
sparse-file) in, but they don't really overlap that much (internally).
So I could see doing this in 2 separate efforts.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/10] upload-pack: add partial (sparse) fetch
  2017-03-09 18:34     ` Jeff Hostetler
@ 2017-03-09 19:09       ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2017-03-09 19:09 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Jeff Hostetler, git, gitster, markbt, benpeart, jonathantanmy

On Thu, Mar 09, 2017 at 01:34:32PM -0500, Jeff Hostetler wrote:

> > > +  partial           =  PKT-LINE("partial-by-size" SP magnitude) /
> > > +		       PKT-LINE("partial-special)
> > > +
> > 
> > I probably would have added this as a capability coming back from the
> > client, since it only makes sense to send once (the same way we ask for
> > other features like include-tag or ofs-delta). I guess it's six of one,
> > half a dozen of the other, though.
> 
> True.  I wanted the size argument.  And later want to add
> a sparse-file.  It seemed like they better belonged in a
> PKT-LINE than in the capability header.

Yeah, at some point we will run out of room in the capabilities
response. :) If the sparse information might be arbitrarily long (e.g.,
a list of pathspecs) then it probably is better for it to get split into
its own pktline (or even a series of pktlines).

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/10] pack-objects: add --partial-by-size=n --partial-special
  2017-03-09  7:04       ` Jeff King
@ 2017-03-10 17:58         ` Brandon Williams
  2017-03-10 18:03           ` Jeff King
  0 siblings, 1 reply; 32+ messages in thread
From: Brandon Williams @ 2017-03-10 17:58 UTC (permalink / raw)
  To: Jeff King
  Cc: Jeff Hostetler, Junio C Hamano, Jeff Hostetler, git, markbt,
	benpeart, jonathantanmy

On 03/09, Jeff King wrote:
> On Wed, Mar 08, 2017 at 03:21:11PM -0500, Jeff Hostetler wrote:
> 
> > > And not ."gitmodules"?
> > > 
> > > What happens when we later add ".gitsomethingelse"?
> > > 
> > > Do we have to worry about the case where the set of git "special
> > > files" (can we have a better name for them please, by the way?)
> > > understood by the sending side and the receiving end is different?
> > > 
> > > I have a feeling that a mode that makes anything whose name begins
> > > with ".git" excempt from the size based cutoff may generally be
> > > easier to handle.
> > 
> > I forgot about ".gitmodules".  The more I think about it, maybe
> > we should always include them (or anything starting with ".git*")
> > and ignore the size, since they are important for correct behavior.
> 
> I'm also in favor of staking out ".git*" as "this is special and belongs
> to Git".

I agree, .git* files should probably be the bare minimum of files
included.  Especially since things like .gitattributes can effect things
like checkout.

> 
> A while back when we discussed whether to allow symlinks for
> .gitattributes, etc, I think the consensus was to treat the whole
> ".git*" namespace consistently. I haven't followed up with patches yet,
> but my plan was to go that route.

Well if I remember correctly you sent out some patches for
.gitattributes but I got in the way with the refactoring work! :)

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/10] pack-objects: add --partial-by-size=n --partial-special
  2017-03-10 17:58         ` Brandon Williams
@ 2017-03-10 18:03           ` Jeff King
  2017-03-10 19:38             ` Junio C Hamano
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2017-03-10 18:03 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Jeff Hostetler, Junio C Hamano, Jeff Hostetler, git, markbt,
	benpeart, jonathantanmy

On Fri, Mar 10, 2017 at 09:58:23AM -0800, Brandon Williams wrote:

> > A while back when we discussed whether to allow symlinks for
> > .gitattributes, etc, I think the consensus was to treat the whole
> > ".git*" namespace consistently. I haven't followed up with patches yet,
> > but my plan was to go that route.
> 
> Well if I remember correctly you sent out some patches for
> .gitattributes but I got in the way with the refactoring work! :)

True. :) But those were the old method that tries to treat
.gitattributes specially, by using O_NOFOLLOW in the attribute code (but
only for in-tree files, naturally).

I think we ended up deciding that it would be better to just disallow
symlink .gitattributes (and .git*) from entering the index, the way we
disallow ".git".

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/10] pack-objects: add --partial-by-size=n --partial-special
  2017-03-10 18:03           ` Jeff King
@ 2017-03-10 19:38             ` Junio C Hamano
  2017-03-10 19:47               ` Jeff King
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2017-03-10 19:38 UTC (permalink / raw)
  To: Jeff King
  Cc: Brandon Williams, Jeff Hostetler, Jeff Hostetler, git, markbt,
	benpeart, jonathantanmy

Jeff King <peff@peff.net> writes:

> I think we ended up deciding that it would be better to just disallow
> symlink .gitattributes (and .git*) from entering the index, the way we
> disallow ".git".

Hmph, I thought we would need both, though.  Or do we specifically
want to honor untracked .gitattributes that is left as a symlink
pointing to elsewhere in the filesystem or something like that?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/10] pack-objects: add --partial-by-size=n --partial-special
  2017-03-10 19:38             ` Junio C Hamano
@ 2017-03-10 19:47               ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2017-03-10 19:47 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Brandon Williams, Jeff Hostetler, Jeff Hostetler, git, markbt,
	benpeart, jonathantanmy

On Fri, Mar 10, 2017 at 11:38:10AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I think we ended up deciding that it would be better to just disallow
> > symlink .gitattributes (and .git*) from entering the index, the way we
> > disallow ".git".
> 
> Hmph, I thought we would need both, though.  Or do we specifically
> want to honor untracked .gitattributes that is left as a symlink
> pointing to elsewhere in the filesystem or something like that?

I wasn't going to worry about an untracked .gitattributes. The reasons
for disallowing symlinked .gitattributes are:

  - it doesn't behave the same when accessed internally via the tree
    objects

  - malicious symlinks that try to leave the repository

Neither of those issues is at play if your symlink .gitattributes file
isn't tracked. So there's some inconsistency in the sense that it
"works" until you try to "git add" it. But either you aren't going to
add it (in which case it's a feature that it works), or you are going to
add it, and you'll get notified then that it's disallowed.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2017-03-10 19:47 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-08 17:37 [PATCH 00/10] RFC Partial Clone and Fetch Jeff Hostetler
2017-03-08 17:37 ` [PATCH 01/10] pack-objects: eat CR in addition to LF after fgets Jeff Hostetler
2017-03-09  7:01   ` Jeff King
2017-03-09 15:46     ` Jeff Hostetler
2017-03-08 17:37 ` [PATCH 02/10] pack-objects: add --partial-by-size=n --partial-special Jeff Hostetler
2017-03-08 18:47   ` Junio C Hamano
2017-03-08 20:21     ` Jeff Hostetler
2017-03-09  7:04       ` Jeff King
2017-03-10 17:58         ` Brandon Williams
2017-03-10 18:03           ` Jeff King
2017-03-10 19:38             ` Junio C Hamano
2017-03-10 19:47               ` Jeff King
2017-03-09  7:31   ` Jeff King
2017-03-09 18:26     ` Jeff Hostetler
2017-03-08 17:37 ` [PATCH 03/10] pack-objects: test for --partial-by-size --partial-special Jeff Hostetler
2017-03-09  7:35   ` Jeff King
2017-03-09 18:11   ` Johannes Sixt
2017-03-08 17:37 ` [PATCH 04/10] upload-pack: add partial (sparse) fetch Jeff Hostetler
2017-03-09  7:48   ` Jeff King
2017-03-09 18:34     ` Jeff Hostetler
2017-03-09 19:09       ` Jeff King
2017-03-08 17:38 ` [PATCH 05/10] fetch-pack: add partial-by-size and partial-special Jeff Hostetler
2017-03-08 17:38 ` [PATCH 06/10] rev-list: add --allow-partial option to relax connectivity checks Jeff Hostetler
2017-03-08 18:55   ` Junio C Hamano
2017-03-08 20:10     ` Jeff Hostetler
2017-03-09  7:56       ` Jeff King
2017-03-09 18:38         ` Jeff Hostetler
2017-03-08 17:38 ` [PATCH 07/10] index-pack: add --allow-partial option to relax blob existence checks Jeff Hostetler
2017-03-08 17:38 ` [PATCH 08/10] fetch: add partial-by-size and partial-special arguments Jeff Hostetler
2017-03-08 17:38 ` [PATCH 09/10] clone: " Jeff Hostetler
2017-03-08 17:38 ` [PATCH 10/10] ls-partial: created command to list missing blobs Jeff Hostetler
  -- strict thread matches above, loose matches on Subject: below --
2017-03-08 18:50 [PATCH 00/10] RFC Partial Clone and Fetch git
2017-03-08 18:50 ` [PATCH 04/10] upload-pack: add partial (sparse) fetch git

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).