git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH/RFC 0/7] Add possibility to clone specific subdirectories
@ 2016-07-28 16:02 Robin Ruede
  2016-07-28 16:02 ` [PATCH/RFC 1/7] list-objects: add sparse-prefix option to rev_info Robin Ruede
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Robin Ruede @ 2016-07-28 16:02 UTC (permalink / raw)
  To: git; +Cc: Robin Ruede

This patch series adds a `--sparse-prefix=` option to multiple commands,
allowing fetching repository contents from only a subdirectory of a remote.

This works along with sparse-checkout, and is especially useful for repositories
where a subdirectory has meaning when standing alone.

* Motivation (example use cases)

1. Git repositories used for managing large/binary files
  My university has a repository containing lecture slides etc.
  as pdfs, with a subdirectory for each lecture. The bandwith for getting the
  whole repository (even with --depth=1) is 4GiB with significant processing
  time, getting the complete history of a single lecture uses 25MiB and
  completes instantly.
2. package-manager-like repositories. Examples:
  a) Arch Linux package build files repository [1]
  b) Rust crates.io packages [2]
  c) TypeScript type definitions [3]
3. Excluding a specific directory containing e.g. large binary assets
  Not currently possible with this patch set, but could be added
  (see problem 2 below).
4. Getting the history of a single file
5. Other uses
  As a non kernel developer, I wanted to quickly search through
  the code of only the btrfs filesystem using the git tools, but I do not have
  a local clone of the complete repository. Using `--depth=100` in combination
  with `--sparse-prefix=/fs/btrfs` allows me to have little bandwidth usage
  while still retaining some history.
6. This is trivial in SVN, and searching on the internet, there are multiple
questions about this feature [4-7]

* Examples usage:

Getting the source of the btrfs filesystem with a bit of history:

    $ git clone git@server:linux --depth=100 # shallow, not sparse
    Receiving objects: 100% (814945/814945), 438.55 MiB | 35.21 MiB/s, done.
    ...
    $ git clone git@server:linux --depth=100 --sparse-prefix=/fs/btrfs # sparse and shallow
    Receiving objects: 100% (503747/503747), 121.45 MiB | 59.75 MiB/s, done.
    ...
    $ cd linux && ls ./
    fs
    $ ls fs/
    btrfs
    $ git log --oneline
    (repo behaves the same as a full clone with sparse-checkout /fs/btrfs)



* Open problems:

1. Currently all trees are still included. It would be possible to
include only the trees relevant to the sparse files, which would significantly
reduce the pack sizes for repositories containing a lot of small files changing
often. For example package managers using git. Not sure in how many places all
trees are presumed present.

2. This patch set implements it as a simple single prefix check command line
option.
Using the exclude_list format (same as in sparse-checkout) might be useful.
The server needs to check these patterns for all files in history, so I'm not
sure if allowing multiple/complex patterns is a good idea.

3. This patch set assumes the sparse-prefix and sparse-checkout does not change.
running clone and fetch both need to have the --sparse-prefix= option, otherwise
complete packs will be fetched. Not sure what the best way to store the
information is, possibly create a new file `.git/sparse` similar to
`.git/shallow` containing the path(s).

3. Bitmap indices cannot be used, because they do not contain the paths of the
objects. So for creating packs, the whole DAG has to be walked.

4. Fsck complains about missing blobs. Should be fairly easy to fix.

5. Tests and documentation is missing.

[1]: https://git.archlinux.org/svntogit/packages.git/
[2]: https://github.com/rust-lang/crates.io-index
[3]: https://github.com/DefinitelyTyped/DefinitelyTyped
[4]: https://stackoverflow.com/questions/600079/is-there-any-way-to-clone-a-git-repositorys-sub-directory-only
[5]: https://stackoverflow.com/questions/11834386/cloning-only-a-subdirectory-with-git
[6]: https://askubuntu.com/questions/460885/how-to-clone-git-repository-only-some-directories
[7]: https://coderwall.com/p/o2fasg/how-to-download-a-project-subdirectory-from-github

Robin Ruede (7):
  list-objects: add sparse-prefix option to rev_info
  pack-objects: add sparse-prefix
  Skip checking integrity of files ignored by sparse
  fetch-pack: add sparse prefix to smart protocol
  fetch: add sparse-prefix option
  clone: add sparse-prefix option
  remote-curl: add sparse prefix

 builtin/clone.c        | 27 ++++++++++++++++++++++++---
 builtin/fetch-pack.c   |  6 ++++++
 builtin/fetch.c        | 19 ++++++++++++++-----
 builtin/pack-objects.c | 11 +++++++++++
 cache-tree.c           |  3 ++-
 connected.c            |  7 ++++++-
 fetch-pack.c           |  4 ++++
 fetch-pack.h           |  1 +
 list-objects.c         |  4 +++-
 remote-curl.c          | 17 ++++++++++++++++-
 revision.c             |  4 ++++
 revision.h             |  1 +
 transport.c            |  4 ++++
 transport.h            |  4 ++++
 upload-pack.c          | 15 ++++++++++++++-
 15 files changed, 114 insertions(+), 13 deletions(-)

-- 
2.9.1.283.g3ca5b4c.dirty


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH/RFC 1/7] list-objects: add sparse-prefix option to rev_info
  2016-07-28 16:02 [PATCH/RFC 0/7] Add possibility to clone specific subdirectories Robin Ruede
@ 2016-07-28 16:02 ` Robin Ruede
  2016-07-28 16:02 ` [PATCH/RFC 2/7] pack-objects: add sparse-prefix Robin Ruede
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Robin Ruede @ 2016-07-28 16:02 UTC (permalink / raw)
  To: git; +Cc: Robin Ruede

this skips all blob objects who's path does not begin with the specified
prefix

Signed-off-by: Robin Ruede <r.ruede@gmail.com>
---
 list-objects.c | 4 +++-
 revision.c     | 4 ++++
 revision.h     | 1 +
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/list-objects.c b/list-objects.c
index f3ca6aa..91a6091 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -28,7 +28,9 @@ static void process_blob(struct rev_info *revs,
 
 	pathlen = path->len;
 	strbuf_addstr(path, name);
-	show(obj, path->buf, cb_data);
+	if (!revs->sparse_prefix || starts_with(path->buf, revs->sparse_prefix + 1)) {
+		show(obj, path->buf, cb_data);
+	}
 	strbuf_setlen(path, pathlen);
 }
 
diff --git a/revision.c b/revision.c
index edba5b7..a36a796 100644
--- a/revision.c
+++ b/revision.c
@@ -1664,6 +1664,10 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if ((argcount = parse_long_opt("skip", argv, &optarg))) {
 		revs->skip_count = atoi(optarg);
 		return argcount;
+	} else if ((argcount = parse_long_opt("sparse-prefix", argv, &optarg))) {
+		if(optarg[0] != '/') return error(N_("sparse prefix must start with /"));
+		revs->sparse_prefix = optarg;
+		return argcount;
 	} else if ((*arg == '-') && isdigit(arg[1])) {
 		/* accept -<digit>, like traditional "head" */
 		if (strtol_i(arg + 1, 10, &revs->max_count) < 0 ||
diff --git a/revision.h b/revision.h
index 9fac1a6..2c7c5f2 100644
--- a/revision.h
+++ b/revision.h
@@ -113,6 +113,7 @@ struct rev_info {
 			ancestry_path:1,
 			first_parent_only:1,
 			line_level_traverse:1;
+	const char *sparse_prefix;
 
 	/* Diff flags */
 	unsigned int	diff:1,
-- 
2.9.1.283.g3ca5b4c.dirty


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH/RFC 2/7] pack-objects: add sparse-prefix
  2016-07-28 16:02 [PATCH/RFC 0/7] Add possibility to clone specific subdirectories Robin Ruede
  2016-07-28 16:02 ` [PATCH/RFC 1/7] list-objects: add sparse-prefix option to rev_info Robin Ruede
@ 2016-07-28 16:02 ` Robin Ruede
  2016-07-28 16:02 ` [PATCH/RFC 3/7] Skip checking integrity of files ignored by sparse Robin Ruede
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Robin Ruede @ 2016-07-28 16:02 UTC (permalink / raw)
  To: git; +Cc: Robin Ruede

this allows creating packs that contain only the blobs relevant to a
specific subdirectory, e.g.

    echo HEAD | git pack-objects --revs --sparse-prefix=/contrib/

will create a pack containing the complete history of HEAD, including
all commits, all trees, and the files in the contrib directory.

Signed-off-by: Robin Ruede <r.ruede@gmail.com>
---
 builtin/pack-objects.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a2f8cfd..0674f57 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2621,6 +2621,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	struct argv_array rp = ARGV_ARRAY_INIT;
 	int rev_list_unpacked = 0, rev_list_all = 0, rev_list_reflog = 0;
 	int rev_list_index = 0;
+	const char *sparse_prefix = NULL;
+
 	struct option pack_objects_options[] = {
 		OPT_SET_INT('q', "quiet", &progress,
 			    N_("do not show progress meter"), 0),
@@ -2685,6 +2687,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			 N_("create thin packs")),
 		OPT_BOOL(0, "shallow", &shallow,
 			 N_("create packs suitable for shallow fetches")),
+		OPT_STRING(0, "sparse-prefix", &sparse_prefix, N_("path"),
+		  N_("only include blobs relevant for sparse checkout (implies --revs)")),
 		OPT_BOOL(0, "honor-pack-keep", &ignore_packed_keep,
 			 N_("ignore packs that have companion .keep file")),
 		OPT_INTEGER(0, "compression", &pack_compression_level,
@@ -2725,6 +2729,13 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	} else
 		argv_array_push(&rp, "--objects");
 
+	if (sparse_prefix) {
+		use_internal_rev_list = 1;
+		// with bitmaps the path of the blobs is not known
+		use_bitmap_index = 0;
+		argv_array_push(&rp, "--sparse-prefix");
+		argv_array_push(&rp, sparse_prefix);
+	}
 	if (rev_list_all) {
 		use_internal_rev_list = 1;
 		argv_array_push(&rp, "--all");
-- 
2.9.1.283.g3ca5b4c.dirty


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH/RFC 3/7] Skip checking integrity of files ignored by sparse
  2016-07-28 16:02 [PATCH/RFC 0/7] Add possibility to clone specific subdirectories Robin Ruede
  2016-07-28 16:02 ` [PATCH/RFC 1/7] list-objects: add sparse-prefix option to rev_info Robin Ruede
  2016-07-28 16:02 ` [PATCH/RFC 2/7] pack-objects: add sparse-prefix Robin Ruede
@ 2016-07-28 16:02 ` Robin Ruede
  2016-07-28 16:02 ` [PATCH/RFC 4/7] fetch-pack: add sparse prefix to smart protocol Robin Ruede
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Robin Ruede @ 2016-07-28 16:02 UTC (permalink / raw)
  To: git; +Cc: Robin Ruede

(this might be wrong, not sure if this is the right place)

Signed-off-by: Robin Ruede <r.ruede@gmail.com>
---
 cache-tree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/cache-tree.c b/cache-tree.c
index f28b1f4..ab01ae5 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -354,7 +354,8 @@ static int update_one(struct cache_tree *it,
 			entlen = pathlen - baselen;
 			i++;
 		}
-		if (mode != S_IFGITLINK && !missing_ok && !has_sha1_file(sha1)) {
+		if (!ce_skip_worktree(ce) && mode != S_IFGITLINK
+				&& !missing_ok && !has_sha1_file(sha1)) {
 			strbuf_release(&buffer);
 			if (expected_missing)
 				return -1;
-- 
2.9.1.283.g3ca5b4c.dirty


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH/RFC 4/7] fetch-pack: add sparse prefix to smart protocol
  2016-07-28 16:02 [PATCH/RFC 0/7] Add possibility to clone specific subdirectories Robin Ruede
                   ` (2 preceding siblings ...)
  2016-07-28 16:02 ` [PATCH/RFC 3/7] Skip checking integrity of files ignored by sparse Robin Ruede
@ 2016-07-28 16:02 ` Robin Ruede
  2016-07-28 16:02 ` [PATCH/RFC 5/7] fetch: add sparse-prefix option Robin Ruede
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Robin Ruede @ 2016-07-28 16:02 UTC (permalink / raw)
  To: git; +Cc: Robin Ruede

For example

    git fetch-pack --sparse-prefix=/contrib/ origin HEAD

Should fetch a pack that is generated on the remote by

    echo HEAD | git pack-objects --revs --stdout --sparse-prefix=/contrib/

Signed-off-by: Robin Ruede <r.ruede@gmail.com>
---
 builtin/fetch-pack.c |  6 ++++++
 fetch-pack.c         |  4 ++++
 fetch-pack.h         |  1 +
 remote-curl.c        |  2 ++
 upload-pack.c        | 15 ++++++++++++++-
 5 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index bfd0be4..7f10001 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -64,6 +64,12 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 			args.uploadpack = arg + 14;
 			continue;
 		}
+		if (starts_with(arg, "--sparse-prefix=")) {
+			args.sparse_prefix = arg + 16;
+			if(args.sparse_prefix[0] != '/')
+				die(N_("sparse prefix must start with /"));
+			continue;
+		}
 		if (starts_with(arg, "--exec=")) {
 			args.uploadpack = arg + 7;
 			continue;
diff --git a/fetch-pack.c b/fetch-pack.c
index b501d5c..8571b02 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -326,6 +326,8 @@ static int find_common(struct fetch_pack_args *args,
 		return 1;
 	}
 
+	if(args->sparse_prefix)
+		packet_buf_write(&req_buf, "sparse-prefix %s", args->sparse_prefix);
 	if (is_repository_shallow())
 		write_shallow_commits(&req_buf, 1, NULL);
 	if (args->depth > 0)
@@ -811,6 +813,8 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 
 	if ((args->depth > 0 || is_repository_shallow()) && !server_supports("shallow"))
 		die("Server does not support shallow clients");
+	if (args->sparse_prefix && !server_supports("sparse-prefix"))
+		die("Server does not support sparse prefix");
 	if (server_supports("multi_ack_detailed")) {
 		if (args->verbose)
 			fprintf(stderr, "Server supports multi_ack_detailed\n");
diff --git a/fetch-pack.h b/fetch-pack.h
index bb7fd76..8f36ef4 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -8,6 +8,7 @@ struct sha1_array;
 
 struct fetch_pack_args {
 	const char *uploadpack;
+	const char *sparse_prefix;
 	int unpacklimit;
 	int depth;
 	unsigned quiet:1;
diff --git a/remote-curl.c b/remote-curl.c
index 6b83b77..e181e62 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -727,6 +727,8 @@ static int fetch_dumb(int nr_heads, struct ref **to_fetch)
 	ALLOC_ARRAY(targets, nr_heads);
 	if (options.depth)
 		die("dumb http transport does not support --depth");
+	if (options.sparse_prefix)
+		die("dumb http transport does not support --sparse-prefix");
 	for (i = 0; i < nr_heads; i++)
 		targets[i] = xstrdup(oid_to_hex(&to_fetch[i]->old_oid));
 
diff --git a/upload-pack.c b/upload-pack.c
index d4cc414..56d8c1a 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -57,6 +57,7 @@ static int use_sideband;
 static int advertise_refs;
 static int stateless_rpc;
 static const char *pack_objects_hook;
+static char *sparse_prefix;
 
 static void reset_timeout(void)
 {
@@ -125,6 +126,12 @@ static void create_pack_file(void)
 		argv_array_push(&pack_objects.args, "--delta-base-offset");
 	if (use_include_tag)
 		argv_array_push(&pack_objects.args, "--include-tag");
+	if (sparse_prefix) {
+		argv_array_push(&pack_objects.args, "--sparse-prefix");
+		argv_array_push(&pack_objects.args, sparse_prefix);
+		free(sparse_prefix);
+		sparse_prefix = NULL;
+	}
 
 	pack_objects.in = -1;
 	pack_objects.out = -1;
@@ -582,6 +589,12 @@ static void receive_needs(void)
 				die("Invalid deepen: %s", line);
 			continue;
 		}
+		if (starts_with(line, "sparse-prefix ")) {
+			if(sparse_prefix)
+				die("Only single sparse-prefix is allowed");
+			sparse_prefix = xstrdup(line + 14);
+			continue;
+		}
 		if (!starts_with(line, "want ") ||
 		    get_sha1_hex(line+5, sha1_buf))
 			die("git upload-pack: protocol error, "
@@ -730,7 +743,7 @@ static int send_ref(const char *refname, const struct object_id *oid,
 {
 	static const char *capabilities = "multi_ack thin-pack side-band"
 		" side-band-64k ofs-delta shallow no-progress"
-		" include-tag multi_ack_detailed";
+		" include-tag multi_ack_detailed sparse-prefix";
 	const char *refname_nons = strip_namespace(refname);
 	struct object_id peeled;
 
-- 
2.9.1.283.g3ca5b4c.dirty


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH/RFC 5/7] fetch: add sparse-prefix option
  2016-07-28 16:02 [PATCH/RFC 0/7] Add possibility to clone specific subdirectories Robin Ruede
                   ` (3 preceding siblings ...)
  2016-07-28 16:02 ` [PATCH/RFC 4/7] fetch-pack: add sparse prefix to smart protocol Robin Ruede
@ 2016-07-28 16:02 ` Robin Ruede
  2016-07-28 16:02 ` [PATCH/RFC 6/7] clone: " Robin Ruede
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Robin Ruede @ 2016-07-28 16:02 UTC (permalink / raw)
  To: git; +Cc: Robin Ruede

also pass sparse-prefix option from fetch to rev-list while checking
connectivity

Signed-off-by: Robin Ruede <r.ruede@gmail.com>
---
 builtin/fetch.c | 19 ++++++++++++++-----
 connected.c     |  7 ++++++-
 transport.c     |  4 ++++
 transport.h     |  4 ++++
 4 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index acd0cf1..b48537f 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -41,6 +41,7 @@ static int tags = TAGS_DEFAULT, unshallow, update_shallow;
 static int max_children = -1;
 static enum transport_family family;
 static const char *depth;
+static const char *sparse_prefix;
 static const char *upload_pack;
 static struct strbuf default_rla = STRBUF_INIT;
 static struct transport *gtransport;
@@ -117,6 +118,8 @@ static struct option builtin_fetch_options[] = {
 	OPT_BOOL(0, "progress", &progress, N_("force progress reporting")),
 	OPT_STRING(0, "depth", &depth, N_("depth"),
 		   N_("deepen history of shallow clone")),
+	OPT_STRING(0, "sparse-prefix", &sparse_prefix, N_("path-prefix"),
+		   N_("only fetch blobs for the specified path-prefix")),
 	{ OPTION_SET_INT, 0, "unshallow", &unshallow, NULL,
 		   N_("convert to a complete repository"),
 		   PARSE_OPT_NONEG | PARSE_OPT_NOARG, NULL, 1 },
@@ -706,9 +709,11 @@ static int iterate_ref_map(void *cb_data, unsigned char sha1[20])
 	return 0;
 }
 
-static int store_updated_refs(const char *raw_url, const char *remote_name,
+static int store_updated_refs(struct transport *transport,
 		struct ref *ref_map)
 {
+	const char *raw_url = transport->url;
+	const char *remote_name = transport->remote->name;
 	FILE *fp;
 	struct commit *commit;
 	int url_len, i, rc = 0;
@@ -729,7 +734,8 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		url = xstrdup("foreign");
 
 	rm = ref_map;
-	if (check_everything_connected(iterate_ref_map, 0, &rm)) {
+	if (check_everything_connected_with_transport(iterate_ref_map, 0, &rm,
+				transport)) {
 		rc = error(_("%s did not send all necessary objects\n"), url);
 		goto abort;
 	}
@@ -885,9 +891,7 @@ static int fetch_refs(struct transport *transport, struct ref *ref_map)
 	if (ret)
 		ret = transport_fetch_refs(transport, ref_map);
 	if (!ret)
-		ret |= store_updated_refs(transport->url,
-				transport->remote->name,
-				ref_map);
+		ret |= store_updated_refs(transport, ref_map);
 	transport_unlock_pack(transport);
 	return ret;
 }
@@ -993,6 +997,11 @@ static struct transport *prepare_transport(struct remote *remote)
 		set_option(transport, TRANS_OPT_KEEP, "yes");
 	if (depth)
 		set_option(transport, TRANS_OPT_DEPTH, depth);
+	if (sparse_prefix) {
+		if(sparse_prefix[0] != '/')
+			die(N_("sparse prefix must start with /"));
+		set_option(transport, TRANS_OPT_SPARSE_PREFIX, sparse_prefix);
+	}
 	if (update_shallow)
 		set_option(transport, TRANS_OPT_UPDATE_SHALLOW, "yes");
 	return transport;
diff --git a/connected.c b/connected.c
index bf1b12e..1534c5c 100644
--- a/connected.c
+++ b/connected.c
@@ -26,7 +26,7 @@ static int check_everything_connected_real(sha1_iterate_fn fn,
 					   const char *shallow_file)
 {
 	struct child_process rev_list = CHILD_PROCESS_INIT;
-	const char *argv[9];
+	const char *argv[11];
 	char commit[41];
 	unsigned char sha1[20];
 	int err = 0, ac = 0;
@@ -56,6 +56,11 @@ static int check_everything_connected_real(sha1_iterate_fn fn,
 	argv[ac++] = "--stdin";
 	argv[ac++] = "--not";
 	argv[ac++] = "--all";
+	if(transport && transport->smart_options && 
+			transport->smart_options->sparse_prefix) {
+		argv[ac++] = "--sparse-prefix";
+		argv[ac++] = transport->smart_options->sparse_prefix;
+	}
 	if (quiet)
 		argv[ac++] = "--quiet";
 	argv[ac] = NULL;
diff --git a/transport.c b/transport.c
index b233e3e..ce7e2e1 100644
--- a/transport.c
+++ b/transport.c
@@ -141,6 +141,9 @@ static int set_git_option(struct git_transport_options *opts,
 	} else if (!strcmp(name, TRANS_OPT_UPDATE_SHALLOW)) {
 		opts->update_shallow = !!value;
 		return 0;
+	} else if (!strcmp(name, TRANS_OPT_SPARSE_PREFIX)) {
+		opts->sparse_prefix = value;
+		return 0;
 	} else if (!strcmp(name, TRANS_OPT_DEPTH)) {
 		if (!value)
 			opts->depth = 0;
@@ -211,6 +214,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 	args.quiet = (transport->verbose < 0);
 	args.no_progress = !transport->progress;
 	args.depth = data->options.depth;
+	args.sparse_prefix = data->options.sparse_prefix;
 	args.check_self_contained_and_connected =
 		data->options.check_self_contained_and_connected;
 	args.cloning = transport->cloning;
diff --git a/transport.h b/transport.h
index c681408..abee186 100644
--- a/transport.h
+++ b/transport.h
@@ -15,6 +15,7 @@ struct git_transport_options {
 	int depth;
 	const char *uploadpack;
 	const char *receivepack;
+	const char *sparse_prefix;
 	struct push_cas_option *cas;
 };
 
@@ -179,6 +180,9 @@ int transport_restrict_protocols(void);
 /* Limit the depth of the fetch if not null */
 #define TRANS_OPT_DEPTH "depth"
 
+/* Only fetch blobs whose referenced path begins with this if not null */
+#define TRANS_OPT_SPARSE_PREFIX "sparse-prefix"
+
 /* Aggressively fetch annotated tags if possible */
 #define TRANS_OPT_FOLLOWTAGS "followtags"
 
-- 
2.9.1.283.g3ca5b4c.dirty


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH/RFC 6/7] clone: add sparse-prefix option
  2016-07-28 16:02 [PATCH/RFC 0/7] Add possibility to clone specific subdirectories Robin Ruede
                   ` (4 preceding siblings ...)
  2016-07-28 16:02 ` [PATCH/RFC 5/7] fetch: add sparse-prefix option Robin Ruede
@ 2016-07-28 16:02 ` Robin Ruede
  2016-07-28 16:02 ` [PATCH/RFC 7/7] remote-curl: add sparse prefix Robin Ruede
  2016-07-28 16:59 ` [PATCH/RFC 0/7] Add possibility to clone specific subdirectories Duy Nguyen
  7 siblings, 0 replies; 12+ messages in thread
From: Robin Ruede @ 2016-07-28 16:02 UTC (permalink / raw)
  To: git; +Cc: Robin Ruede

For example

    git clone git@remote:repo --sparse-prefix=/contrib/

would create a repository, cloning only the relevant files for the
contrib subdirectory, and also setting the sparse-checkout option, so
only those files will be checked out.

Signed-off-by: Robin Ruede <r.ruede@gmail.com>
---
 builtin/clone.c | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 31ea247..dc0d364 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -41,7 +41,7 @@ static const char * const builtin_clone_usage[] = {
 static int option_no_checkout, option_bare, option_mirror, option_single_branch = -1;
 static int option_local = -1, option_no_hardlinks, option_shared, option_recursive;
 static int option_shallow_submodules;
-static char *option_template, *option_depth;
+static char *option_template, *option_depth, *option_sparse_prefix;
 static char *option_origin = NULL;
 static char *option_branch = NULL;
 static const char *real_git_dir;
@@ -91,6 +91,8 @@ static struct option builtin_clone_options[] = {
 		   N_("path to git-upload-pack on the remote")),
 	OPT_STRING(0, "depth", &option_depth, N_("depth"),
 		    N_("create a shallow clone of that depth")),
+	OPT_STRING(0, "sparse-prefix", &option_sparse_prefix, N_("path-prefix"),
+	            N_("only fetch the blobs for the specified path-prefix")),
 	OPT_BOOL(0, "single-branch", &option_single_branch,
 		    N_("clone only one branch, HEAD or --branch")),
 	OPT_BOOL(0, "shallow-submodules", &option_shallow_submodules,
@@ -959,9 +961,22 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	}
 	init_db(option_template, INIT_DB_QUIET);
 	write_config(&option_config);
-
+	if (option_sparse_prefix) {
+		FILE *f;
+		const char *name;
+		if (option_sparse_prefix[0] != '/')
+			die(N_("sparse prefix must start with /"));
+		name = mkpath("%s/info/sparse-checkout", git_dir);
+		git_config_set("core.sparsecheckout", "true");
+		safe_create_leading_directories_const(name);
+		f = fopen(name, "w");
+		if(f == NULL) die("Could not open %s", name);
+		fprintf(f, "%s\n", option_sparse_prefix);
+		fclose(f);
+	}
 	git_config(git_default_config, NULL);
 
+
 	if (option_bare) {
 		if (option_mirror)
 			src_ref_prefix = "refs/";
@@ -977,6 +992,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	git_config_set(key.buf, repo);
 	strbuf_reset(&key);
 
+
 	if (option_reference.nr)
 		setup_reference();
 
@@ -995,6 +1011,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (is_local) {
 		if (option_depth)
 			warning(_("--depth is ignored in local clones; use file:// instead."));
+		if (option_sparse_prefix)
+			warning(_("--sparse-prefix is ignored in local clones; use file:// instead."));
 		if (!access(mkpath("%s/shallow", path), F_OK)) {
 			if (option_local > 0)
 				warning(_("source repository is shallow, ignoring --local"));
@@ -1013,6 +1031,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (option_depth)
 		transport_set_option(transport, TRANS_OPT_DEPTH,
 				     option_depth);
+	if (option_sparse_prefix)
+		transport_set_option(transport, TRANS_OPT_SPARSE_PREFIX,
+		                     option_sparse_prefix);
 	if (option_single_branch)
 		transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, "1");
 
@@ -1020,7 +1041,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		transport_set_option(transport, TRANS_OPT_UPLOADPACK,
 				     option_upload_pack);
 
-	if (transport->smart_options && !option_depth)
+	if (transport->smart_options && !option_depth && !option_sparse_prefix)
 		transport->smart_options->check_self_contained_and_connected = 1;
 
 	refs = transport_get_remote_refs(transport);
-- 
2.9.1.283.g3ca5b4c.dirty


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH/RFC 7/7] remote-curl: add sparse prefix
  2016-07-28 16:02 [PATCH/RFC 0/7] Add possibility to clone specific subdirectories Robin Ruede
                   ` (5 preceding siblings ...)
  2016-07-28 16:02 ` [PATCH/RFC 6/7] clone: " Robin Ruede
@ 2016-07-28 16:02 ` Robin Ruede
  2016-07-28 16:59 ` [PATCH/RFC 0/7] Add possibility to clone specific subdirectories Duy Nguyen
  7 siblings, 0 replies; 12+ messages in thread
From: Robin Ruede @ 2016-07-28 16:02 UTC (permalink / raw)
  To: git; +Cc: Robin Ruede

Signed-off-by: Robin Ruede <r.ruede@gmail.com>
---
 remote-curl.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index e181e62..b9f7cf1 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -20,6 +20,7 @@ static struct strbuf url = STRBUF_INIT;
 struct options {
 	int verbosity;
 	unsigned long depth;
+	const char *sparse_prefix;
 	unsigned progress : 1,
 		check_self_contained_and_connected : 1,
 		cloning : 1,
@@ -60,6 +61,10 @@ static int set_option(const char *name, const char *value)
 		options.depth = v;
 		return 0;
 	}
+	else if (!strcmp(name, "sparse-prefix")) {
+		options.sparse_prefix = xstrdup(value);
+		return 0;
+	}
 	else if (!strcmp(name, "followtags")) {
 		if (!strcmp(value, "true"))
 			options.followtags = 1;
@@ -754,8 +759,9 @@ static int fetch_git(struct discovery *heads,
 	struct rpc_state rpc;
 	struct strbuf preamble = STRBUF_INIT;
 	char *depth_arg = NULL;
+	char *sparse_arg = NULL;
 	int argc = 0, i, err;
-	const char *argv[17];
+	const char *argv[19];
 
 	argv[argc++] = "fetch-pack";
 	argv[argc++] = "--stateless-rpc";
@@ -783,6 +789,12 @@ static int fetch_git(struct discovery *heads,
 		depth_arg = strbuf_detach(&buf, NULL);
 		argv[argc++] = depth_arg;
 	}
+	if (options.sparse_prefix) {
+		struct strbuf buf = STRBUF_INIT;
+		strbuf_addf(&buf, "--sparse-prefix=%s", options.sparse_prefix);
+		sparse_arg = strbuf_detach(&buf, NULL);
+		argv[argc++] = sparse_arg;
+	}
 	argv[argc++] = url.buf;
 	argv[argc++] = NULL;
 
@@ -807,6 +819,7 @@ static int fetch_git(struct discovery *heads,
 	strbuf_release(&rpc.result);
 	strbuf_release(&preamble);
 	free(depth_arg);
+	free(sparse_arg);
 	return err;
 }
 
-- 
2.9.1.283.g3ca5b4c.dirty


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC 0/7] Add possibility to clone specific subdirectories
  2016-07-28 16:02 [PATCH/RFC 0/7] Add possibility to clone specific subdirectories Robin Ruede
                   ` (6 preceding siblings ...)
  2016-07-28 16:02 ` [PATCH/RFC 7/7] remote-curl: add sparse prefix Robin Ruede
@ 2016-07-28 16:59 ` Duy Nguyen
  2016-07-28 17:03   ` Duy Nguyen
  2016-07-28 20:33   ` Junio C Hamano
  7 siblings, 2 replies; 12+ messages in thread
From: Duy Nguyen @ 2016-07-28 16:59 UTC (permalink / raw)
  To: Robin Ruede; +Cc: Git Mailing List

On Thu, Jul 28, 2016 at 6:02 PM, Robin Ruede <r.ruede@gmail.com> wrote:
> This patch series adds a `--sparse-prefix=` option to multiple commands,
> allowing fetching repository contents from only a subdirectory of a remote.
>
> This works along with sparse-checkout, and is especially useful for repositories
> where a subdirectory has meaning when standing alone.

Ah.. this is what I call narrow checkout [1] (but gmane is down at the moment)

[1] http://thread.gmane.org/gmane.comp.version-control.git/155427

> * Motivation (example use cases)
>
> ...

nods nods.. all good stuff

> * Open problems:
>
> 1. Currently all trees are still included. It would be possible to
> include only the trees relevant to the sparse files, which would significantly
> reduce the pack sizes for repositories containing a lot of small files changing
> often. For example package managers using git. Not sure in how many places all
> trees are presumed present.

You can limit some trees by passing a pathspec to "git rev-list" (in
your "list-objects" patch). All trees completely outside sub/dir will
be excluded. Trees leading to it (e.g. root tree and "sub") are still
included. Not having all trees open up a new set of problems.. This is
what I did in narrow clone: pass some directories (as pathspec) to
rev-list on the server side, then deal with lack of trees on client
side.

> 2. This patch set implements it as a simple single prefix check command line
> option.
> Using the exclude_list format (same as in sparse-checkout) might be useful.
> The server needs to check these patterns for all files in history, so I'm not
> sure if allowing multiple/complex patterns is a good idea.

I would go with something else than sparse-checkout, which I call
narrow checkout: instead of flattening the entire tree in index and
keep only files there, we keep trees that we don't have as trees.
Those trees have the same "sparse checkout" attributes, e.g. ignore
worktree and some of submodules e.g. don't bother checking the
associated hash. This approach [2] eliminates changes in cache-tree.c
(i.e. 3/7).

And you would need something like that, when you don't have all the
trees (from open problem 1), because you just can't flatten trees when
you don't have them.

[2] https://github.com/pclouds/git/commits/lanh/narrow-checkout (I
think core functionality is in place, but narrow operation still needs
more work)

> 3. This patch set assumes the sparse-prefix and sparse-checkout does not change.
> running clone and fetch both need to have the --sparse-prefix= option, otherwise
> complete packs will be fetched. Not sure what the best way to store the
> information is, possibly create a new file `.git/sparse` similar to
> `.git/shallow` containing the path(s).

Something like .git/shallow, yes. It's similar in nature anyway
(shallow cuts depth, you cut the side)

> 3. Bitmap indices cannot be used, because they do not contain the paths of the
> objects. So for creating packs, the whole DAG has to be walked.

And shallow clones have this same problem. Something to be sorted out :)

> 4. Fsck complains about missing blobs. Should be fairly easy to fix.

Not really. You'll have to associate path information with blobs
before you decide that a blob should exist or not. Sparse patterns are
just not designed for that (tree walking). If you narrow (heh) down to
just path prefix not full blown sparse patterns, then it's feasible to
walk tree and filter. A subset of pathspec would be good because we
can already filter by pathspec, but I would not go full pathspec at
the first step.

> 5. Tests and documentation is missing.

Personally I would go with my narrow clone approach, but the ability
to selectively exclude some large blobs is still good, I think.
However, another approach to excluding some blobs is the external
object database [3]. It gives you what you need with a lot less code
impact (but you will not be able to work offline 100% the time like
what you can now with git)

[3] https://public-inbox.org/git/20160613085546.11784-1-chriscool%40tuxfamily.org/
-- 
Duy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC 0/7] Add possibility to clone specific subdirectories
  2016-07-28 16:59 ` [PATCH/RFC 0/7] Add possibility to clone specific subdirectories Duy Nguyen
@ 2016-07-28 17:03   ` Duy Nguyen
  2016-07-28 20:33   ` Junio C Hamano
  1 sibling, 0 replies; 12+ messages in thread
From: Duy Nguyen @ 2016-07-28 17:03 UTC (permalink / raw)
  To: Robin Ruede; +Cc: Git Mailing List

Corrections..

On Thu, Jul 28, 2016 at 6:59 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> Ah.. this is what I call narrow checkout [1] (but gmane is down at the moment)

s/checkout/clone/

> [2] https://github.com/pclouds/git/commits/lanh/narrow-checkout

s,lanh/,,
-- 
Duy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC 0/7] Add possibility to clone specific subdirectories
  2016-07-28 16:59 ` [PATCH/RFC 0/7] Add possibility to clone specific subdirectories Duy Nguyen
  2016-07-28 17:03   ` Duy Nguyen
@ 2016-07-28 20:33   ` Junio C Hamano
  2016-07-29 15:51     ` Duy Nguyen
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2016-07-28 20:33 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Robin Ruede, Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

>> 4. Fsck complains about missing blobs. Should be fairly easy to fix.
>
> Not really. You'll have to associate path information with blobs
> before you decide that a blob should exist or not.

Also the same blob or the tree can exist both inside and outside the
narrowed area, as people reorganize their trees all the time.  I am
not quite convinced a path-based approach (either yours or Robin's)
is workable in the longer term.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/RFC 0/7] Add possibility to clone specific subdirectories
  2016-07-28 20:33   ` Junio C Hamano
@ 2016-07-29 15:51     ` Duy Nguyen
  0 siblings, 0 replies; 12+ messages in thread
From: Duy Nguyen @ 2016-07-29 15:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Robin Ruede, Git Mailing List

On Thu, Jul 28, 2016 at 10:33 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
>
>>> 4. Fsck complains about missing blobs. Should be fairly easy to fix.
>>
>> Not really. You'll have to associate path information with blobs
>> before you decide that a blob should exist or not.
>
> Also the same blob or the tree can exist both inside and outside the
> narrowed area, as people reorganize their trees all the time.  I am
> not quite convinced a path-based approach (either yours or Robin's)
> is workable in the longer term.

I think it should be ok. What I meant was when we travel the trees to
find connected blobs, if a tree points to paths outside the narrowed
area, we do not add those blobs to our fsck list. Trees inside the
narrow area work as usually so those shared blobs are added to fsck
list anyway. Object islands are going to be problem (because we can't
assign paths to them)...
-- 
Duy

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-07-29 15:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-28 16:02 [PATCH/RFC 0/7] Add possibility to clone specific subdirectories Robin Ruede
2016-07-28 16:02 ` [PATCH/RFC 1/7] list-objects: add sparse-prefix option to rev_info Robin Ruede
2016-07-28 16:02 ` [PATCH/RFC 2/7] pack-objects: add sparse-prefix Robin Ruede
2016-07-28 16:02 ` [PATCH/RFC 3/7] Skip checking integrity of files ignored by sparse Robin Ruede
2016-07-28 16:02 ` [PATCH/RFC 4/7] fetch-pack: add sparse prefix to smart protocol Robin Ruede
2016-07-28 16:02 ` [PATCH/RFC 5/7] fetch: add sparse-prefix option Robin Ruede
2016-07-28 16:02 ` [PATCH/RFC 6/7] clone: " Robin Ruede
2016-07-28 16:02 ` [PATCH/RFC 7/7] remote-curl: add sparse prefix Robin Ruede
2016-07-28 16:59 ` [PATCH/RFC 0/7] Add possibility to clone specific subdirectories Duy Nguyen
2016-07-28 17:03   ` Duy Nguyen
2016-07-28 20:33   ` Junio C Hamano
2016-07-29 15:51     ` Duy Nguyen

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).