git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: git@vger.kernel.org
Subject: [PATCH v2 4/4] bundle v3: the beginning
Date: Wed,  2 Mar 2016 12:32:41 -0800	[thread overview]
Message-ID: <1456950761-19759-5-git-send-email-gitster@pobox.com> (raw)
In-Reply-To: <1456950761-19759-1-git-send-email-gitster@pobox.com>

The bundle v3 format introduces an ability to have the bundle header
(which describes what references in the bundled history can be
fetched, and what objects the receiving repository must have in
order to unbundle it successfully) in one file, and the bundled pack
stream data in a separate file.

A v3 bundle file begins with a line with "# v3 git bundle", followed
by zero or more "extended header" lines, and an empty line, finally
followed by the list of prerequisites and references in the same
format as v2 bundle.  If it uses the "split bundle" feature, there
is a "data: $URL" extended header line, and nothing follows the list
of prerequisites and references.  Also, "sha1: " extended header
line may exist to help validating that the pack stream data matches
the bundle header.

A typical expected use of a split bundle is to help initial clone
that involves a huge data transfer, and would go like this:

 - Any repository people would clone and fetch from would regularly
   be repacked, and it is expected that there would be a packfile
   without prerequisites that holds all (or at least most) of the
   history of it (call it pack-$name.pack).

 - After arranging that packfile to be downloadable over popular
   transfer methods used for serving static files (such as HTTP or
   HTTPS) that are easily resumable as $URL/pack-$name.pack, a v3
   bundle file (call it $name.bndl) can be prepared with an extended
   header "data: $URL/pack-$name.pack" to point at the download
   location for the packfile, and be served at "$URL/$name.bndl".

 - An updated Git client, when trying to "git clone" from such a
   repository, may be redirected to $URL/$name.bndl", which would be
   a tiny text file (when split bundle feature is used).

 - The client would then inspect the downloaded $name.bndl, learn
   that the corresponding packfile exists at $URL/pack-$name.pack,
   and downloads it as pack-$name.pack, until the download succeeds.
   This can easily be done with "wget --continue" equivalent over an
   unreliable link.  The checksum recorded on the "sha1: " header
   line is expected to be used by this downloader (not written yet).

 - After fully downloading $name.bndl and pack-$name.pack and
   storing them next to each other, the client would clone from the
   $name.bndl; this would populate the newly created repository with
   reasonably recent history.

 - Then the client can issue "git fetch" against the original
   repository to obtain the most recent part of the history created
   since the bundle was made.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 bundle.c          | 103 +++++++++++++++++++++++++++++++++++++++++++++++++-----
 bundle.h          |   3 ++
 t/t5704-bundle.sh |  64 +++++++++++++++++++++++++++++++++
 3 files changed, 161 insertions(+), 9 deletions(-)

diff --git a/bundle.c b/bundle.c
index 32bdb01..480630d 100644
--- a/bundle.c
+++ b/bundle.c
@@ -10,7 +10,8 @@
 #include "refs.h"
 #include "argv-array.h"
 
-static const char bundle_signature[] = "# v2 git bundle\n";
+static const char bundle_signature_v2[] = "# v2 git bundle\n";
+static const char bundle_signature_v3[] = "# v3 git bundle\n";
 
 static void add_to_ref_list(const unsigned char *sha1, const char *name,
 		struct ref_list *list)
@@ -33,16 +34,55 @@ static int parse_bundle_header(int fd, struct bundle_header *header, int quiet)
 	int status = 0;
 
 	/* The bundle header begins with the signature */
-	if (strbuf_getwholeline_fd(&buf, fd, '\n') ||
-	    strcmp(buf.buf, bundle_signature)) {
+	if (strbuf_getwholeline_fd(&buf, fd, '\n')) {
+	bad_bundle:
 		if (!quiet)
-			error(_("'%s' does not look like a v2 bundle file"),
+			error(_("'%s' does not look like a supported bundle file"),
 			      header->filename);
 		status = -1;
 		goto abort;
 	}
 
-	/* The bundle header ends with an empty line */
+	if (!strcmp(buf.buf, bundle_signature_v2))
+		header->bundle_version = 2;
+	else if (!strcmp(buf.buf, bundle_signature_v3))
+		header->bundle_version = 3;
+	else
+		goto bad_bundle;
+
+	if (header->bundle_version == 3) {
+		/*
+		 * bundle version v3 has extended headers before the
+		 * list of prerequisites and references.  The extended
+		 * headers end with an empty line.
+		 */
+		while (!strbuf_getwholeline_fd(&buf, fd, '\n')) {
+			const char *cp;
+			if (buf.len && buf.buf[buf.len - 1] == '\n')
+				buf.buf[--buf.len] = '\0';
+			if (!buf.len)
+				break;
+			if (skip_prefix(buf.buf, "data: ", &cp)) {
+				header->datafile = xstrdup(cp);
+				continue;
+			}
+			if (skip_prefix(buf.buf, "sha1: ", &cp)) {
+				unsigned char sha1[GIT_SHA1_RAWSZ];
+				if (get_sha1_hex(cp, sha1) ||
+				    cp[GIT_SHA1_HEXSZ])
+					goto bad_bundle;
+				hashcpy(header->csum, sha1);
+				continue;
+			}
+
+			goto bad_bundle;
+		}
+	}
+
+	/*
+	 * The bundle header lists prerequisites and
+	 * references, and the list ends with an empty line.
+	 */
 	while (!strbuf_getwholeline_fd(&buf, fd, '\n') &&
 	       buf.len && buf.buf[0] != '\n') {
 		unsigned char sha1[20];
@@ -77,7 +117,8 @@ static int parse_bundle_header(int fd, struct bundle_header *header, int quiet)
 
  abort:
 	if (status) {
-		close(fd);
+		if (0 <= fd)
+			close(fd);
 		fd = -1;
 	}
 	strbuf_release(&buf);
@@ -426,6 +467,7 @@ int create_bundle(struct bundle_header *header, const char *path,
 	int bundle_to_stdout;
 	int ref_count = 0;
 	struct rev_info revs;
+	const char *bundle_signature = bundle_signature_v2;
 
 	bundle_to_stdout = !strcmp(path, "-");
 	if (bundle_to_stdout)
@@ -480,22 +522,65 @@ int create_bundle(struct bundle_header *header, const char *path,
 	return 0;
 }
 
+/*
+ * v3 "split bundle" allows a separate packfile to be named
+ * as "data: $URL/$name_of_the_packfile".  This file is expected
+ * to be downloaded next to the bundle header file when the
+ * bundle is used.  Hence we find the path to the directory
+ * that contains the bundle header file, and append the basename
+ * part of the bundle_data_file to it, to form the name of the
+ * file that holds the pack data stream.
+ */
+static int open_bundle_data(struct bundle_header *header)
+{
+	const char *cp;
+	struct strbuf filename = STRBUF_INIT;
+	int fd;
+
+	assert(header->datafile);
+
+	cp = find_last_dir_sep(header->filename);
+	if (cp)
+		strbuf_add(&filename, header->filename,
+			   (cp - header->filename) + 1);
+	cp = find_last_dir_sep(header->datafile);
+	if (!cp)
+		cp = header->datafile;
+	strbuf_addstr(&filename, cp);
+
+	fd = open(filename.buf, O_RDONLY);
+	strbuf_release(&filename);
+	return fd;
+}
+
 int unbundle(struct bundle_header *header, int bundle_fd, int flags)
 {
 	const char *argv_index_pack[] = {"index-pack",
 					 "--fix-thin", "--stdin", NULL, NULL};
 	struct child_process ip = CHILD_PROCESS_INIT;
+	int status = 0, data_fd = -1;
 
 	if (flags & BUNDLE_VERBOSE)
 		argv_index_pack[3] = "-v";
 
 	if (verify_bundle(header, 0))
 		return -1;
+
+	if (header->datafile) {
+		data_fd = open_bundle_data(header);
+		if (data_fd < 0)
+			return error(_("bundle data not found"));
+		ip.in = data_fd;
+	} else {
+		ip.in = bundle_fd;
+	}
+
 	ip.argv = argv_index_pack;
-	ip.in = bundle_fd;
 	ip.no_stdout = 1;
 	ip.git_cmd = 1;
 	if (run_command(&ip))
-		return error(_("index-pack died"));
-	return 0;
+		status = error(_("index-pack died"));
+	if (0 <= data_fd)
+		close(data_fd);
+	return status;
 }
diff --git a/bundle.h b/bundle.h
index e059ccf..db55dc7 100644
--- a/bundle.h
+++ b/bundle.h
@@ -10,7 +10,10 @@ struct ref_list {
 };
 
 struct bundle_header {
+	int bundle_version;
 	const char *filename;
+	const char *datafile;
+	unsigned char csum[GIT_SHA1_RAWSZ];
 	struct ref_list prerequisites;
 	struct ref_list references;
 };
diff --git a/t/t5704-bundle.sh b/t/t5704-bundle.sh
index 348d9b3..e68523c 100755
--- a/t/t5704-bundle.sh
+++ b/t/t5704-bundle.sh
@@ -71,4 +71,68 @@ test_expect_success 'prerequisites with an empty commit message' '
 	git bundle verify bundle
 '
 
+# bundle v3 (experimental)
+test_expect_success 'clone from v3' '
+
+	# as "bundle create" does not exist yet for v3
+	# prepare it by hand here
+	head=$(git rev-parse HEAD) &&
+	name=$(echo $head | git pack-objects --revs v3) &&
+	test_when_finished "rm v3-$name.pack v3-$name.idx" &&
+	cat >v3.bndl <<-EOF &&
+	# v3 git bundle
+	data: v3-$name.pack
+
+	$head HEAD
+	$head refs/heads/master
+	EOF
+
+	git bundle verify v3.bndl &&
+	git bundle list-heads v3.bndl >actual &&
+	cat >expect <<-EOF &&
+	$head HEAD
+	$head refs/heads/master
+	EOF
+	test_cmp expect actual &&
+
+	git clone v3.bndl v3dst &&
+	git -C v3dst for-each-ref --format="%(objectname) %(refname)" >actual &&
+	cat >expect <<-EOF &&
+	$head refs/heads/master
+	$head refs/remotes/origin/HEAD
+	$head refs/remotes/origin/master
+	EOF
+	test_cmp expect actual &&
+	git -C v3dst fsck &&
+
+	# an "inline" v3 is still possible.
+	cat >v3i.bndl <<-EOF &&
+	# v3 git bundle
+
+	$head HEAD
+	$head refs/heads/master
+
+	EOF
+	cat v3-$name.pack >>v3i.bndl &&
+	test_when_finished "rm v3i.bndl" &&
+
+	git bundle verify v3i.bndl &&
+	git bundle list-heads v3i.bndl >actual &&
+	cat >expect <<-EOF &&
+	$head HEAD
+	$head refs/heads/master
+	EOF
+	test_cmp expect actual &&
+
+	git clone v3i.bndl v3idst &&
+	git -C v3idst for-each-ref --format="%(objectname) %(refname)" >actual &&
+	cat >expect <<-EOF &&
+	$head refs/heads/master
+	$head refs/remotes/origin/HEAD
+	$head refs/remotes/origin/master
+	EOF
+	test_cmp expect actual &&
+	git -C v3idst fsck
+'
+
 test_done
-- 
2.8.0-rc0-114-g0b3e5e5

  parent reply	other threads:[~2016-03-02 20:33 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-01 23:35 [PATCH 1/2] bundle: plug resource leak Junio C Hamano
2016-03-01 23:36 ` [PATCH 2/2] bundle: keep a copy of bundle file name in the in-core bundle header Junio C Hamano
2016-03-02  9:01   ` Jeff King
2016-03-02 18:15     ` Junio C Hamano
2016-03-02 20:32       ` [PATCH v2 0/4] "split bundle" preview Junio C Hamano
2016-03-02 20:32         ` [PATCH v2 1/4] bundle doc: 'verify' is not about verifying the bundle Junio C Hamano
2016-03-02 20:32         ` [PATCH v2 2/4] bundle: plug resource leak Junio C Hamano
2016-03-02 20:32         ` [PATCH v2 3/4] bundle: keep a copy of bundle file name in the in-core bundle header Junio C Hamano
2016-03-02 20:49           ` Jeff King
2016-03-02 20:32         ` Junio C Hamano [this message]
2016-03-03  1:36           ` [PATCH v2 4/4] bundle v3: the beginning Duy Nguyen
2016-03-03  2:57             ` Junio C Hamano
2016-03-03  5:15               ` Duy Nguyen
2016-05-20 12:39           ` Christian Couder
2016-05-31 12:43             ` Duy Nguyen
2016-05-31 13:18               ` Christian Couder
2016-06-01 13:37                 ` Duy Nguyen
2016-06-07 14:49                   ` Christian Couder
2016-06-01 14:00                 ` Duy Nguyen
2016-06-07  8:46                   ` Christian Couder
2016-06-07  8:53                     ` Mike Hommey
2016-06-07 10:22                     ` Duy Nguyen
2016-06-07 19:23                     ` Junio C Hamano
2016-06-07 20:23                       ` Jeff King
2016-06-08 10:44                         ` Duy Nguyen
2016-06-08 16:19                           ` Jeff King
2016-06-09  8:53                             ` Duy Nguyen
2016-06-09 17:23                               ` Jeff King
2016-06-08 18:05                         ` Junio C Hamano
2016-06-08 19:00                           ` Jeff King
2016-05-31 22:23               ` Jeff King
2016-05-31 22:31             ` Jeff King
2016-06-07 13:19               ` Christian Couder
2016-06-07 20:35                 ` Jeff King
2016-03-02  8:54 ` [PATCH 1/2] bundle: plug resource leak Jeff King
2016-03-02  9:00   ` Junio C Hamano
2016-03-02  9:02     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1456950761-19759-5-git-send-email-gitster@pobox.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).