git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] index-pack: correct --keep[=<msg>]
@ 2016-03-03 19:14 Junio C Hamano
  2016-03-03 19:47 ` Jeff King
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Junio C Hamano @ 2016-03-03 19:14 UTC (permalink / raw)
  To: git

When 592ce208 (index-pack: use strip_suffix to avoid magic numbers,
2014-06-30) refactored the code to derive names of .idx and .keep
files from the name of .pack file, a copy-and-paste typo crept in,
mistakingly attempting to create and store the keep message file in
the .idx file we just created, instead of .keep file.

As we create the .keep file with O_CREAT|O_EXCL, and we do so after
we write the .idx file, we luckily do not clobber the .idx file, but
because we deliberately ignored EEXIST when creating .keep file
(which is justifiable because only the existence of .keep file
matters), nobody noticed this mistake so far.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * In a later patch, I'll be adding another file to the family of
   .pack/.idx/.keep files, and the body of these if() statements
   would be made into a helper function when it happens.  This is to
   fix it directly on top of the problematic commit without such a
   helper, the result of which could be merged to 2.1.x and later
   maintenance series.

 builtin/index-pack.c   | 2 +-
 t/t5300-pack-object.sh | 6 ++++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index d4b77fd..3814731 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1617,7 +1617,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 			die(_("packfile name '%s' does not end with '.pack'"),
 			    pack_name);
 		strbuf_add(&keep_name_buf, pack_name, len);
-		strbuf_addstr(&keep_name_buf, ".idx");
+		strbuf_addstr(&keep_name_buf, ".keep");
 		keep_name = keep_name_buf.buf;
 	}
 	if (verify) {
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 20c1961..0e3cadf 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -284,6 +284,12 @@ test_expect_success \
      git index-pack test-3.pack &&
      cmp test-3.idx test-3-${packname_3}.idx &&
 
+     cat test-1-${packname_1}.pack >test-4.pack &&
+     rm -f test-4.keep &&
+     git index-pack --keep=why test-4.pack &&
+     cmp test-1-${packname_1}.idx test-4.idx &&
+     test -f test-4.keep &&
+
      :'
 
 test_expect_success 'unpacking with --strict' '
-- 
v2.0.1-6-g592ce20

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] index-pack: correct --keep[=<msg>]
  2016-03-03 19:14 [PATCH] index-pack: correct --keep[=<msg>] Junio C Hamano
@ 2016-03-03 19:47 ` Jeff King
  2016-03-03 21:37 ` [PATCH] index-pack: add a helper function to derive .idx/.keep filename Junio C Hamano
  2016-03-03 21:44 ` [PATCH] index-pack: correct --keep[=<msg>] Eric Sunshine
  2 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2016-03-03 19:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Mar 03, 2016 at 11:14:46AM -0800, Junio C Hamano wrote:

> When 592ce208 (index-pack: use strip_suffix to avoid magic numbers,
> 2014-06-30) refactored the code to derive names of .idx and .keep
> files from the name of .pack file, a copy-and-paste typo crept in,
> mistakingly attempting to create and store the keep message file in
> the .idx file we just created, instead of .keep file.
> 
> As we create the .keep file with O_CREAT|O_EXCL, and we do so after
> we write the .idx file, we luckily do not clobber the .idx file, but
> because we deliberately ignored EEXIST when creating .keep file
> (which is justifiable because only the existence of .keep file
> matters), nobody noticed this mistake so far.

Eek. Sorry about that. Your explanation and fix looks obviously correct.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] index-pack: add a helper function to derive .idx/.keep filename
  2016-03-03 19:14 [PATCH] index-pack: correct --keep[=<msg>] Junio C Hamano
  2016-03-03 19:47 ` Jeff King
@ 2016-03-03 21:37 ` Junio C Hamano
  2016-03-03 22:29   ` Jeff King
  2016-03-03 21:44 ` [PATCH] index-pack: correct --keep[=<msg>] Eric Sunshine
  2 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2016-03-03 21:37 UTC (permalink / raw)
  To: git

These are automatically named by replacing .pack suffix in the
name of the packfile.  Add a small helper to do so, as I'll be
adding another one soonish.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/index-pack.c | 35 +++++++++++++++++------------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 285424f..a5588a2 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1598,6 +1598,18 @@ static void show_pack_info(int stat_only)
 	}
 }
 
+static const char *derive_filename(const char *pack_name, const char *suffix,
+				   struct strbuf *buf)
+{
+	size_t len;
+	if (!strip_suffix(pack_name, ".pack", &len))
+		die(_("packfile name '%s' does not end with '.pack'"),
+		    pack_name);
+	strbuf_add(buf, pack_name, len);
+	strbuf_addstr(buf, suffix);
+	return buf->buf;
+}
+
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
 	int i, fix_thin_pack = 0, verify = 0, stat_only = 0;
@@ -1706,24 +1718,11 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 		usage(index_pack_usage);
 	if (fix_thin_pack && !from_stdin)
 		die(_("--fix-thin cannot be used without --stdin"));
-	if (!index_name && pack_name) {
-		size_t len;
-		if (!strip_suffix(pack_name, ".pack", &len))
-			die(_("packfile name '%s' does not end with '.pack'"),
-			    pack_name);
-		strbuf_add(&index_name_buf, pack_name, len);
-		strbuf_addstr(&index_name_buf, ".idx");
-		index_name = index_name_buf.buf;
-	}
-	if (keep_msg && !keep_name && pack_name) {
-		size_t len;
-		if (!strip_suffix(pack_name, ".pack", &len))
-			die(_("packfile name '%s' does not end with '.pack'"),
-			    pack_name);
-		strbuf_add(&keep_name_buf, pack_name, len);
-		strbuf_addstr(&keep_name_buf, ".keep");
-		keep_name = keep_name_buf.buf;
-	}
+	if (!index_name && pack_name)
+		index_name = derive_filename(pack_name, ".idx", &index_name_buf);
+	if (keep_msg && !keep_name && pack_name)
+		keep_name = derive_filename(pack_name, ".keep", &keep_name_buf);
+
 	if (verify) {
 		if (!index_name)
 			die(_("--verify with no packfile name given"));
-- 
2.8.0-rc0-116-g5beb0d5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] index-pack: correct --keep[=<msg>]
  2016-03-03 19:14 [PATCH] index-pack: correct --keep[=<msg>] Junio C Hamano
  2016-03-03 19:47 ` Jeff King
  2016-03-03 21:37 ` [PATCH] index-pack: add a helper function to derive .idx/.keep filename Junio C Hamano
@ 2016-03-03 21:44 ` Eric Sunshine
  2 siblings, 0 replies; 9+ messages in thread
From: Eric Sunshine @ 2016-03-03 21:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List

On Thu, Mar 3, 2016 at 2:14 PM, Junio C Hamano <gitster@pobox.com> wrote:
> When 592ce208 (index-pack: use strip_suffix to avoid magic numbers,
> 2014-06-30) refactored the code to derive names of .idx and .keep
> files from the name of .pack file, a copy-and-paste typo crept in,
> mistakingly attempting to create and store the keep message file in

s/mistakingly/mistakenly/

> the .idx file we just created, instead of .keep file.
>
> As we create the .keep file with O_CREAT|O_EXCL, and we do so after
> we write the .idx file, we luckily do not clobber the .idx file, but
> because we deliberately ignored EEXIST when creating .keep file
> (which is justifiable because only the existence of .keep file
> matters), nobody noticed this mistake so far.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] index-pack: add a helper function to derive .idx/.keep filename
  2016-03-03 21:37 ` [PATCH] index-pack: add a helper function to derive .idx/.keep filename Junio C Hamano
@ 2016-03-03 22:29   ` Jeff King
  2016-03-03 22:57     ` [PATCH] index-pack: --clone-bundle option Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2016-03-03 22:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Mar 03, 2016 at 01:37:18PM -0800, Junio C Hamano wrote:

> These are automatically named by replacing .pack suffix in the
> name of the packfile.  Add a small helper to do so, as I'll be
> adding another one soonish.
> 
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  builtin/index-pack.c | 35 +++++++++++++++++------------------
>  1 file changed, 17 insertions(+), 18 deletions(-)

Even without the upcoming "another one", this is a nice readability
improvement.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] index-pack: --clone-bundle option
  2016-03-03 22:29   ` Jeff King
@ 2016-03-03 22:57     ` Junio C Hamano
  2016-03-03 23:20       ` Junio C Hamano
  2016-03-04 15:34       ` Jeff King
  0 siblings, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2016-03-03 22:57 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Teach a new option "--clone-bundle" to "git index-pack" to create a
split bundle file that uses an existing packfile as its data part.

The expected "typical" preparation for helping initial clone would
start by preparing a packfile that contains most of the history and
add another packfile that contains the remainder (e.g. the objects
that are only reachable from reflog entries).  The first pack can
then be used as the data part of a split bundle and these two files
can be served as static files to bootstrap the clients without
incurring any more CPU cycles to the server side.

Among the objects in the packfile, the ones that are not referenced
by no other objects are identified and recorded as the "references"
in the resulting bundle.  As the packfile does not record any ref
information, however, the names of the "references" recorded in the
bundle need to be synthesized; we arbitrarily choose to record the
object whose name is $SHA1 as refs/objects/$SHA1.

Note that this name choice does not matter very much in the larger
picture.  As an initial clone that bootstraps from a clone-bundle is
expected to do a rough equivalent of:

    # create a new repository
    git init new-repository &&
    git remote add origin $URL &&

    # prime the object store and anchor the history to temporary
    # references
    git fetch $bundle 'refs/*:refs/temporary/*' &&

    # fetch the more recent history from the true origin
    git fetch origin &&
    git checkout -f &&

    # remove the temporary refs
    git for-each-ref -z --format=%(refname) refs/temporary/ |
    xargs -0 git update-ref -d

the names recorded in the bundle will not really matter to the end
result.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * The upcoming "another one" looks like this.

 Documentation/git-index-pack.txt | 10 ++++++-
 builtin/index-pack.c             | 57 +++++++++++++++++++++++++++++++++++++---
 2 files changed, 62 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 7a4e055..ade2812 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -9,7 +9,7 @@ git-index-pack - Build pack index file for an existing packed archive
 SYNOPSIS
 --------
 [verse]
-'git index-pack' [-v] [-o <index-file>] <pack-file>
+'git index-pack' [-v] [-o <index-file>] [--clone-bundle] <pack-file>
 'git index-pack' --stdin [--fix-thin] [--keep] [-v] [-o <index-file>]
                  [<pack-file>]
 
@@ -35,6 +35,14 @@ OPTIONS
 	fails if the name of packed archive does not end
 	with .pack).
 
+--clone-bundle::
+	Write a split bundle file that uses the <pack-file> as its
+	data.  The <pack-file> must not contain any broken links, or
+	the bundle file will not be written.  The prerequisite list
+	of the resulting bundle will be empty.  The reference list
+	of the resulting bundle points at tips of the history in the
+	<pack-file>.
+
 --stdin::
 	When this flag is provided, the pack is read from stdin
 	instead and a copy is then written to <pack-file>. If
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index a5588a2..8fc04b0 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -13,7 +13,7 @@
 #include "thread-utils.h"
 
 static const char index_pack_usage[] =
-"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
+"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--verify] [--strict] [--clone-bundle] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
 
 struct object_entry {
 	struct pack_idx_entry idx;
@@ -1373,9 +1373,40 @@ static void fix_unresolved_deltas(struct sha1file *f)
 	free(sorted_by_pos);
 }
 
+static void write_bundle_file(const char *filename, unsigned char *sha1,
+			      const char *packname)
+{
+	int fd = open(filename, O_CREAT|O_EXCL|O_WRONLY, 0600);
+	struct strbuf buf = STRBUF_INIT;
+	struct stat st;
+	int i;
+
+	if (stat(packname, &st))
+		die(_("cannot stat %s"), packname);
+
+	strbuf_addstr(&buf, "# v3 git bundle\n");
+	strbuf_addf(&buf, "size: %lu\n", (unsigned long) st.st_size);
+	strbuf_addf(&buf, "sha1: %s\n", sha1_to_hex(sha1));
+	strbuf_addf(&buf, "data: %s\n\n", packname);
+
+	for (i = 0; i < nr_objects; i++) {
+		struct object *o = lookup_object(objects[i].idx.sha1);
+		if (!o || (o->flags & FLAG_LINK))
+			continue;
+		strbuf_addf(&buf, "%s refs/objects/%s\n",
+			    sha1_to_hex(o->oid.hash),
+			    sha1_to_hex(o->oid.hash));
+	}
+	if (write_in_full(fd, buf.buf, buf.len) != buf.len)
+		die(_("cannot write bundle header for %s"), packname);
+	close(fd);
+	strbuf_release(&buf);
+}
+
 static void final(const char *final_pack_name, const char *curr_pack_name,
 		  const char *final_index_name, const char *curr_index_name,
 		  const char *keep_name, const char *keep_msg,
+		  const char *bundle_name, int foreign_nr,
 		  unsigned char *sha1)
 {
 	const char *report = "pack";
@@ -1457,6 +1488,13 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
 			input_offset += err;
 		}
 	}
+
+	if (bundle_name) {
+		if (foreign_nr)
+			warning(_("not writing bundle for an incomplete pack"));
+		else
+			write_bundle_file(bundle_name, sha1, final_pack_name);
+	}
 }
 
 static int git_index_pack_config(const char *k, const char *v, void *cb)
@@ -1612,12 +1650,14 @@ static const char *derive_filename(const char *pack_name, const char *suffix,
 
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
-	int i, fix_thin_pack = 0, verify = 0, stat_only = 0;
+	int i, fix_thin_pack = 0, verify = 0, stat_only = 0, clone_bundle = 0;
 	const char *curr_index;
 	const char *index_name = NULL, *pack_name = NULL;
 	const char *keep_name = NULL, *keep_msg = NULL;
-	struct strbuf index_name_buf = STRBUF_INIT,
-		      keep_name_buf = STRBUF_INIT;
+	const char *bundle_name = NULL;
+	struct strbuf index_name_buf = STRBUF_INIT;
+	struct strbuf keep_name_buf = STRBUF_INIT;
+	struct strbuf bundle_name_buf = STRBUF_INIT;
 	struct pack_idx_entry **idx_objects;
 	struct pack_idx_option opts;
 	unsigned char pack_sha1[20];
@@ -1661,6 +1701,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 				verify = 1;
 				show_stat = 1;
 				stat_only = 1;
+			} else if (!strcmp(arg, "--clone-bundle")) {
+				strict = 1;
+				clone_bundle = 1;
+				check_self_contained_and_connected = 1;
 			} else if (!strcmp(arg, "--keep")) {
 				keep_msg = "";
 			} else if (starts_with(arg, "--keep=")) {
@@ -1718,10 +1762,14 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 		usage(index_pack_usage);
 	if (fix_thin_pack && !from_stdin)
 		die(_("--fix-thin cannot be used without --stdin"));
+	if (clone_bundle && from_stdin)
+		die(_("--clone-bundle cannot be used with --stdin"));
 	if (!index_name && pack_name)
 		index_name = derive_filename(pack_name, ".idx", &index_name_buf);
 	if (keep_msg && !keep_name && pack_name)
 		keep_name = derive_filename(pack_name, ".keep", &keep_name_buf);
+	if (clone_bundle)
+		bundle_name = derive_filename(pack_name, ".bndl", &bundle_name_buf);
 
 	if (verify) {
 		if (!index_name)
@@ -1768,6 +1816,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 		final(pack_name, curr_pack,
 		      index_name, curr_index,
 		      keep_name, keep_msg,
+		      bundle_name, foreign_nr,
 		      pack_sha1);
 	else
 		close(input_fd);
-- 
2.8.0-rc0-133-g6d295b5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] index-pack: --clone-bundle option
  2016-03-03 22:57     ` [PATCH] index-pack: --clone-bundle option Junio C Hamano
@ 2016-03-03 23:20       ` Junio C Hamano
  2016-03-04 15:51         ` Jeff King
  2016-03-04 15:34       ` Jeff King
  1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2016-03-03 23:20 UTC (permalink / raw)
  To: git; +Cc: Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> Note that this name choice does not matter very much in the larger
> picture.  As an initial clone that bootstraps from a clone-bundle is
> expected to do a rough equivalent of:
>
>     # create a new repository
>     git init new-repository &&
>     git remote add origin $URL &&
>
>     # prime the object store and anchor the history to temporary
>     # references
>     git fetch $bundle 'refs/*:refs/temporary/*' &&
>
>     # fetch the more recent history from the true origin
>     git fetch origin &&
>     git checkout -f &&
>
>     # remove the temporary refs
>     git for-each-ref -z --format=%(refname) refs/temporary/ |
>     xargs -0 git update-ref -d
>
> the names recorded in the bundle will not really matter to the end
> result.

Actually, the real implementation of "bootstrap with clone-bundle"
is more likely to go like this:

    * The client gets redirected to $name.bndl file, and obtains a
      fairly full $name.pack file by downloading them as static
      files;

    * The client initializes an empty repository;

    * The pack file is stored at .git/objects/pack/pack-$sha1.pack;

    * When the client does a "git fetch origin" to fill the more
      recent part, fetch-pack.c::find_common() would read from the
      "git bundle list-heads $name.bndl" to learn the "reference"
      objects.  These are thrown at rev_list_insert_ref() and are
      advertised as "have"s, just like we advertise objects at the
      tip of refs in alternate repository.

So there will be no refs/temporary/* hierarchy we would need to
worry about cleaning up.

Another possible variant is to redirect the client directly to
download pack-$sha1.pack; "index-pack" needs to be run on the client
side anyway to create pack-$sha1.idx, so at that time it could do
the equivalent of "--clone-bundle" processing (it is not strictly
necessary to create a split bundle) to find the tips of histories,
and use that information when running "git fetch origin".

So, even though I started working from "split bundle", we may not
have to have such a feature after all to support CDN offloadable and
resumable clone.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] index-pack: --clone-bundle option
  2016-03-03 22:57     ` [PATCH] index-pack: --clone-bundle option Junio C Hamano
  2016-03-03 23:20       ` Junio C Hamano
@ 2016-03-04 15:34       ` Jeff King
  1 sibling, 0 replies; 9+ messages in thread
From: Jeff King @ 2016-03-04 15:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Mar 03, 2016 at 02:57:08PM -0800, Junio C Hamano wrote:

> Teach a new option "--clone-bundle" to "git index-pack" to create a
> split bundle file that uses an existing packfile as its data part.
> 
> The expected "typical" preparation for helping initial clone would
> start by preparing a packfile that contains most of the history and
> add another packfile that contains the remainder (e.g. the objects
> that are only reachable from reflog entries).  The first pack can
> then be used as the data part of a split bundle and these two files
> can be served as static files to bootstrap the clients without
> incurring any more CPU cycles to the server side.

Just FYI, because I know our setup is anything but typical, but this
probably _won't_ be what we do at GitHub. We try to keep everything in a
single packfile that contains many "islands", which are not allowed to
delta between each other[1]. Individual forks of a repo get their own
islands, unreachable objects are in another one, and so forth. So we'd
probably never want to provide a whole packfile, as it has way too much
data. But we _would_ be able to take a bitmap over the set of packed
objects such that if you blindly spew out every object with its bit set,
the resulting pack has the desired objects.

That's not as turn-key for serving with a dumb http server, obviously.
And I don't expect you to write that part. I just wanted to let you know
where I foresee having to take this for GitHub to get it deployed.

[1] It's a little more complicated than "don't delta with each other".
    Each object has its own bitmap of which islands it is in, and the
    rule is that you can use a delta base iff it is in a subset of your
    own islands. The point is that a clone of a particular island should
    never have to throw away on-disk deltas.

    Cleaning up and sharing that code upstream has been on my todo list
    for about 2 years, but somehow there's always new code to be
    written. :-/ I'm happy to share if people want to look at the messy
    state.

> Among the objects in the packfile, the ones that are not referenced
> by no other objects are identified and recorded as the "references"

s/no/any/ (double negative)

> in the resulting bundle.  As the packfile does not record any ref
> information, however, the names of the "references" recorded in the
> bundle need to be synthesized; we arbitrarily choose to record the
> object whose name is $SHA1 as refs/objects/$SHA1.

Makes sense. In an "island" world I'd want to write one bitmap per
island, and then reconstruct that list on the fly by referencing the
island bitmap with the sha1 names in the pack revidx.

> +static void write_bundle_file(const char *filename, unsigned char *sha1,
> +			      const char *packname)
> +{
> +	int fd = open(filename, O_CREAT|O_EXCL|O_WRONLY, 0600);
> +	struct strbuf buf = STRBUF_INIT;
> +	struct stat st;
> +	int i;
> +

We should probably bail here if "fd < 0", though I guess technically we
will notice in write_in_full(). :)

> +	if (stat(packname, &st))
> +		die(_("cannot stat %s"), packname);
> +
> +	strbuf_addstr(&buf, "# v3 git bundle\n");
> +	strbuf_addf(&buf, "size: %lu\n", (unsigned long) st.st_size);
> +	strbuf_addf(&buf, "sha1: %s\n", sha1_to_hex(sha1));
> +	strbuf_addf(&buf, "data: %s\n\n", packname);

This "sha1" field is the sha1 of the packfile, right? If so, I think
it's redundant with the pack trailer found in the pack at "packname".
I'd prefer not to include that here, because it makes it harder to
generate these v3 bundle headers dynamically (you have to actually
checksum the pack subset to come up with sha1 up front, as opposed to
checksumming as you stream the pack out).

It _could_ be used as an http etag, but I think it would make more sense
to push implementations toward using a unique value for the "packname"
(i.e., so that fetching "https://example.com/foo/1234abcd.pack" is
basically immutable). And I think that comes basically for free, because
the packname has that same hash in it already.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] index-pack: --clone-bundle option
  2016-03-03 23:20       ` Junio C Hamano
@ 2016-03-04 15:51         ` Jeff King
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2016-03-04 15:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Mar 03, 2016 at 03:20:20PM -0800, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Note that this name choice does not matter very much in the larger
> > picture.  As an initial clone that bootstraps from a clone-bundle is
> > expected to do a rough equivalent of:
> >
> >     # create a new repository
> >     git init new-repository &&
> >     git remote add origin $URL &&
> >
> >     # prime the object store and anchor the history to temporary
> >     # references
> >     git fetch $bundle 'refs/*:refs/temporary/*' &&
> >
> >     # fetch the more recent history from the true origin
> >     git fetch origin &&
> >     git checkout -f &&
> >
> >     # remove the temporary refs
> >     git for-each-ref -z --format=%(refname) refs/temporary/ |
> >     xargs -0 git update-ref -d
> >
> > the names recorded in the bundle will not really matter to the end
> > result.
> 
> Actually, the real implementation of "bootstrap with clone-bundle"
> is more likely to go like this:
> 
>     * The client gets redirected to $name.bndl file, and obtains a
>       fairly full $name.pack file by downloading them as static
>       files;
> 
>     * The client initializes an empty repository;
> 
>     * The pack file is stored at .git/objects/pack/pack-$sha1.pack;
> 
>     * When the client does a "git fetch origin" to fill the more
>       recent part, fetch-pack.c::find_common() would read from the
>       "git bundle list-heads $name.bndl" to learn the "reference"
>       objects.  These are thrown at rev_list_insert_ref() and are
>       advertised as "have"s, just like we advertise objects at the
>       tip of refs in alternate repository.
> 
> So there will be no refs/temporary/* hierarchy we would need to
> worry about cleaning up.

I don't think details like this matter much to the bundle-generation
side, so this is pretty academic at this point. But I think unless we
want to do a lot of surgery to git-clone, we'll end up more with
something like:

  1. init empty repository

  2. contact the other side; find out they can redirect us to an
     alternate url

  3. fetch the alternate url; it turns out to be a split bundle. Grab
     the header, and then spool the data into a temp packfile. When it's
     all there, we can "index-pack --fix-thin" it in-place.

The reason I think we'll end up with this approach is that it keeps the
details of split-bundle fetching inside remote-curl. That keeps clone
cleaner, and also means we can grab a split-bundle for a fetch, too.

> Another possible variant is to redirect the client directly to
> download pack-$sha1.pack; "index-pack" needs to be run on the client
> side anyway to create pack-$sha1.idx, so at that time it could do
> the equivalent of "--clone-bundle" processing (it is not strictly
> necessary to create a split bundle) to find the tips of histories,
> and use that information when running "git fetch origin".
> 
> So, even though I started working from "split bundle", we may not
> have to have such a feature after all to support CDN offloadable and
> resumable clone.

Yeah. And I think we'd support this in my step (3) by responding to what
we get at the URL. I.e., "it turns out to be..." can have many outcomes,
and one of them is "a packfile".

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-03-04 15:52 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-03 19:14 [PATCH] index-pack: correct --keep[=<msg>] Junio C Hamano
2016-03-03 19:47 ` Jeff King
2016-03-03 21:37 ` [PATCH] index-pack: add a helper function to derive .idx/.keep filename Junio C Hamano
2016-03-03 22:29   ` Jeff King
2016-03-03 22:57     ` [PATCH] index-pack: --clone-bundle option Junio C Hamano
2016-03-03 23:20       ` Junio C Hamano
2016-03-04 15:51         ` Jeff King
2016-03-04 15:34       ` Jeff King
2016-03-03 21:44 ` [PATCH] index-pack: correct --keep[=<msg>] Eric Sunshine

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).