git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCHv2] Add support for subversion dump format v3
@ 2010-10-15 12:54 David Barr
  2010-10-15 12:54 ` [PATCH 1/5] fast-import: Let importers retrieve blobs David Barr
                   ` (5 more replies)
  0 siblings, 6 replies; 34+ messages in thread
From: David Barr @ 2010-10-15 12:54 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Jonathan Nieder, Ramkumar Ramachandra, Sverre Rabbelier

This series follows Jonathan Nieder's svn diff applier series.

Patch 1 adds the required infrastructure to fast-import.
This features the addition of the cat-blob command to
fast-import. This allows access to blobs written to the
the current pack prior to a checkpoint and is critical to
retrieving full-texts to drive the diff applier.

Patch 2 adds the basic parsing necessary to process the v3 format.

Patch 3 adds logic around decoding prop deltas.

Patch 5 integrates svn-fe with svn-da to decode text deltas.
It is based on a large patch authored by Jonathan and inspired by Ram.
It has been heavily trimmed to reduce code bloat and enabled me to
determine the logic for the previous patches.

A bit shout-out to Jonathan Nieder and Ramkumar Ramachandra for
their help in bring this series into existence.

I have tried to incorporate all the feedback on the list and over
at #git-devel. I hope to impress.

--
David Barr.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/5] fast-import: Let importers retrieve blobs
  2010-10-15 12:54 [PATCHv2] Add support for subversion dump format v3 David Barr
@ 2010-10-15 12:54 ` David Barr
  2010-10-18  7:36   ` Ramkumar Ramachandra
                     ` (3 more replies)
  2010-10-15 12:54 ` [PATCH 2/5] vcs-svn: Extend svndump to parse version 3 format David Barr
                   ` (4 subsequent siblings)
  5 siblings, 4 replies; 34+ messages in thread
From: David Barr @ 2010-10-15 12:54 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Jonathan Nieder, Ramkumar Ramachandra, Sverre Rabbelier,
	David Barr, Shawn O. Pearce

As the description of the "progress" option in git-fast-import.1
hints, there is no convenient way to immediately access the blobs
written to a new repository through fast-import.  Until a checkpoint
has been started and finishes writing the pack index, any new blobs
will not be accessible using standard git tools.

So introduce another way: a "cat-blob" command introduced in the
command stream requests for fast-import to print a blob to stdout
or a file descriptor specified by the argument --cat-blob-fd.

The output uses the same format as "git cat-file --batch".

Cc: Shawn O. Pearce <spearce@spearce.org>
Cc: Ramkumar Ramachandra <artagnon@gmail.com>
Helped-by: Sverre Rabbelier <srabbelier@gmail.com>
Based-on-patch-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
---
 Documentation/git-fast-import.txt |   34 ++++++++++++++
 fast-import.c                     |   92 +++++++++++++++++++++++++++++++++++++
 2 files changed, 126 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 966ba4f..42a100b 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -92,6 +92,17 @@ OPTIONS
 	--(no-)-relative-marks= with the --(import|export)-marks=
 	options.
 
+--cat-blob-fd=<fd>::
+	Specify the file descriptor that will be written to
+	when the `cat-blob` command is encountered in the stream.
+	The default behaviour is to write to `stdout`.
++
+The described objects are not necessarily accessible
+using standard git plumbing tools until a little while
+after the next checkpoint.  To request access to the
+blobs before then, use `cat-blob` lines in the command
+stream.
+
 --export-pack-edges=<file>::
 	After creating a packfile, print a line of data to
 	<file> listing the filename of the packfile and the last
@@ -320,6 +331,11 @@ and control the current import process.  More detailed discussion
 	standard output.  This command is optional and is not needed
 	to perform an import.
 
+`cat-blob`::
+	Causes fast-import to print a blob in 'cat-file --batch'
+	format to the file descriptor set with `--cat-blob-fd` or
+	`stdout` if unspecified.
+
 `feature`::
 	Require that fast-import supports the specified feature, or
 	abort if it does not.
@@ -876,6 +892,23 @@ Placing a `progress` command immediately after a `checkpoint` will
 inform the reader when the `checkpoint` has been completed and it
 can safely access the refs that fast-import updated.
 
+`cat-blob`
+~~~~~
+Causes fast-import to print a blob to a file descriptor previously
+arranged with the `--cat-blob-fd` argument.  The command otherwise
+has no impact on the current import; its main purpose is to
+retrieve blobs that may be in fast-import's memory but not
+accessible from the target repository a little quicker than by the
+method suggested by the description of the `progress` option.
+
+....
+	'cat-blob' SP <dataref> LF
+....
+
+The `<dataref>` can be either a mark reference (`:<idnum>`)
+set previously, or a full 40-byte SHA-1 of any Git blob,
+preexisting or ready to be written.
+
 `feature`
 ~~~~~~~~~
 Require that fast-import supports the specified feature, or abort if
@@ -896,6 +929,7 @@ The following features are currently supported:
 * date-format
 * import-marks
 * export-marks
+* cat-blob
 * relative-marks
 * no-relative-marks
 * force
diff --git a/fast-import.c b/fast-import.c
index 2317b0f..ea3e529 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -55,6 +55,8 @@ Format of STDIN stream:
     ('from' sp committish lf)?
     lf?;
 
+  cat_blob ::= 'cat-blob' sp (hexsha1 | idnum) lf;
+
   checkpoint ::= 'checkpoint' lf
     lf?;
 
@@ -361,6 +363,9 @@ static uintmax_t next_mark;
 static struct strbuf new_data = STRBUF_INIT;
 static int seen_data_command;
 
+/* Where to write output of cat-blob commands */
+static int cat_blob_fd = 1;
+
 static void parse_argv(void);
 
 static void write_branch_report(FILE *rpt, struct branch *b)
@@ -2680,6 +2685,77 @@ static void parse_reset_branch(void)
 		unread_command_buf = 1;
 }
 
+static void cat_blob_write(const char *buf, unsigned long size)
+{
+	if (write_in_full(cat_blob_fd, buf, size) != size)
+		die_errno("Write to frontend failed");
+}
+
+static void cat_blob(struct object_entry *oe, unsigned char sha1[20])
+{
+	struct strbuf line = STRBUF_INIT;
+	unsigned long size;
+	enum object_type type = 0;
+	char *buf;
+
+	if (oe && oe->pack_id != MAX_PACK_ID) {
+		type = oe->type;
+		buf = gfi_unpack_entry(oe, &size);
+	} else {
+		buf = read_sha1_file(sha1, &type, &size);
+	}
+	if (!buf)
+		die("Can't read object %s", sha1_to_hex(sha1));
+
+	/*
+	 * Output based on batch_one_object() from cat-file.c.
+	 */
+	if (type <= 0) {
+		strbuf_reset(&line);
+		strbuf_addf(&line, "%s missing\n", sha1_to_hex(sha1));
+		cat_blob_write(line.buf, line.len);
+		return;
+	} else if (type != OBJ_BLOB) {
+		die("Object %s is a %s but a blob was expected.",
+		    sha1_to_hex(sha1), typename(type));
+	}
+	strbuf_reset(&line);
+	strbuf_addf(&line, "%s %s %lu\n", sha1_to_hex(sha1),
+						typename(type), size);
+	cat_blob_write(line.buf, line.len);
+	cat_blob_write(buf, size);
+	cat_blob_write("\n", 1);
+	free(buf);
+}
+
+
+static void parse_cat_blob(void)
+{
+	const char *p;
+	struct object_entry *oe = oe;
+	unsigned char sha1[20];
+
+	/* cat SP <object> */
+	p = command_buf.buf + strlen("cat-blob ");
+	if (*p == ':') {
+		char *x;
+		oe = find_mark(strtoumax(p + 1, &x, 10));
+		if (x == p + 1)
+			die("Invalid mark: %s", command_buf.buf);
+		if (!oe)
+			die("Unknown mark: %s", command_buf.buf);
+		p = x;
+		hashcpy(sha1, oe->idx.sha1);
+	} else {
+		if (get_sha1_hex(p, sha1))
+			die("Invalid SHA1: %s", command_buf.buf);
+		p += 40;
+		oe = find_object(sha1);
+	}
+
+	cat_blob(oe, sha1);
+}
+
 static void parse_checkpoint(void)
 {
 	if (object_count) {
@@ -2755,6 +2831,14 @@ static void option_export_marks(const char *marks)
 	safe_create_leading_directories_const(export_marks_file);
 }
 
+static void option_cat_blob_fd(const char *fd)
+{
+	unsigned long n = strtoul(fd, NULL, 0);
+	if (n > (unsigned long) INT_MAX)
+		die("--cat-blob-fd cannot exceed %d", INT_MAX);
+	cat_blob_fd = (int) n;
+}
+
 static void option_export_pack_edges(const char *edges)
 {
 	if (pack_edges)
@@ -2808,6 +2892,8 @@ static int parse_one_feature(const char *feature, int from_stream)
 		option_import_marks(feature + 13, from_stream);
 	} else if (!prefixcmp(feature, "export-marks=")) {
 		option_export_marks(feature + 13);
+	} else if (!prefixcmp(feature, "cat-blob")) {
+		/* Don't die - this feature is supported */
 	} else if (!prefixcmp(feature, "relative-marks")) {
 		relative_marks_paths = 1;
 	} else if (!prefixcmp(feature, "no-relative-marks")) {
@@ -2896,6 +2982,10 @@ static void parse_argv(void)
 		if (*a != '-' || !strcmp(a, "--"))
 			break;
 
+		if (!prefixcmp(a + 2, "cat-blob-fd=")) {
+			option_cat_blob_fd(a + 2 + 12);
+		}
+
 		if (parse_one_option(a + 2))
 			continue;
 
@@ -2953,6 +3043,8 @@ int main(int argc, const char **argv)
 			parse_new_tag();
 		else if (!prefixcmp(command_buf.buf, "reset "))
 			parse_reset_branch();
+		else if (!prefixcmp(command_buf.buf, "cat-blob "))
+			parse_cat_blob();
 		else if (!strcmp("checkpoint", command_buf.buf))
 			parse_checkpoint();
 		else if (!prefixcmp(command_buf.buf, "progress "))
-- 
1.7.3.32.g634ef

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/5] vcs-svn: Extend svndump to parse version 3 format
  2010-10-15 12:54 [PATCHv2] Add support for subversion dump format v3 David Barr
  2010-10-15 12:54 ` [PATCH 1/5] fast-import: Let importers retrieve blobs David Barr
@ 2010-10-15 12:54 ` David Barr
  2010-10-15 12:54 ` [PATCH 3/5] vcs-svn: Implement prop-delta handling David Barr
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 34+ messages in thread
From: David Barr @ 2010-10-15 12:54 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Jonathan Nieder, Ramkumar Ramachandra, Sverre Rabbelier,
	David Barr

Add 1 new dump header, SVN-fs-dump-format-version.
Add 6 new node headers:
* Text-delta: true|false
* Prop-delta: true|false
* Text-delta-base-md5: <32 hex digits>
* Text-delta-base-sha1: <40 hex digits>
* Text-copy-source-sha1: <40 hex digits>
* Text-content-sha1: <40 hex digits>

This change simply populates the context.
Further changes will be needed to handle text and prop deltas.

Signed-off-by: David Barr <david.barr@cordelta.com>
---
 vcs-svn/svndump.c |   58 ++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
index 3bba0fe..458053e 100644
--- a/vcs-svn/svndump.c
+++ b/vcs-svn/svndump.c
@@ -27,6 +27,9 @@
 #define LENGTH_UNKNOWN (~0)
 #define DATE_RFC2822_LEN 31
 
+#define MD5_HEX_LENGTH 32
+#define SHA1_HEX_LENGTH 40
+
 /* Create memory pool for log messages */
 obj_pool_gen(log, char, 4096)
 
@@ -44,6 +47,11 @@ static char* log_copy(uint32_t length, char *log)
 static struct {
 	uint32_t action, propLength, textLength, srcRev, srcMode, mark, type;
 	uint32_t src[REPO_MAX_PATH_DEPTH], dst[REPO_MAX_PATH_DEPTH];
+	uint32_t text_delta, prop_delta;
+	char text_delta_base_md5[MD5_HEX_LENGTH + 1];
+	char text_content_sha1[SHA1_HEX_LENGTH + 1];
+	char text_delta_base_sha1[SHA1_HEX_LENGTH + 1];
+	char text_copy_source_sha1[SHA1_HEX_LENGTH + 1];
 } node_ctx;
 
 static struct {
@@ -53,14 +61,20 @@ static struct {
 } rev_ctx;
 
 static struct {
-	uint32_t uuid, url;
+	uint32_t version, uuid, url;
 } dump_ctx;
 
 static struct {
-	uint32_t svn_log, svn_author, svn_date, svn_executable, svn_special, uuid,
+	uint32_t svn_log, svn_author, svn_date, svn_executable, svn_special,
 		revision_number, node_path, node_kind, node_action,
 		node_copyfrom_path, node_copyfrom_rev, text_content_length,
-		prop_content_length, content_length;
+		prop_content_length, content_length,
+		/* SVN dump version 2 */
+		uuid, svn_fs_dump_format_version,
+		/* SVN dump version 3 */
+		text_delta, prop_delta, text_content_sha1,
+		text_delta_base_md5, text_delta_base_sha1,
+		text_copy_source_sha1;
 } keys;
 
 static void reset_node_ctx(char *fname)
@@ -74,6 +88,12 @@ static void reset_node_ctx(char *fname)
 	node_ctx.srcMode = 0;
 	pool_tok_seq(REPO_MAX_PATH_DEPTH, node_ctx.dst, "/", fname);
 	node_ctx.mark = 0;
+	node_ctx.text_delta = 0;
+	node_ctx.prop_delta = 0;
+	*node_ctx.text_delta_base_md5 = '\0';
+	*node_ctx.text_content_sha1 = '\0';
+	*node_ctx.text_delta_base_sha1 = '\0';
+	*node_ctx.text_copy_source_sha1 = '\0';
 }
 
 static void reset_rev_ctx(uint32_t revision)
@@ -87,6 +107,7 @@ static void reset_rev_ctx(uint32_t revision)
 static void reset_dump_ctx(uint32_t url)
 {
 	dump_ctx.url = url;
+	dump_ctx.version = 1;
 	dump_ctx.uuid = ~0;
 }
 
@@ -97,7 +118,6 @@ static void init_keys(void)
 	keys.svn_date = pool_intern("svn:date");
 	keys.svn_executable = pool_intern("svn:executable");
 	keys.svn_special = pool_intern("svn:special");
-	keys.uuid = pool_intern("UUID");
 	keys.revision_number = pool_intern("Revision-number");
 	keys.node_path = pool_intern("Node-path");
 	keys.node_kind = pool_intern("Node-kind");
@@ -107,6 +127,16 @@ static void init_keys(void)
 	keys.text_content_length = pool_intern("Text-content-length");
 	keys.prop_content_length = pool_intern("Prop-content-length");
 	keys.content_length = pool_intern("Content-length");
+	/* SVN dump version 2 */
+	keys.svn_fs_dump_format_version = pool_intern("SVN-fs-dump-format-version");
+	keys.uuid = pool_intern("UUID");
+	/* SVN dump version 3 */
+	keys.text_delta = pool_intern("Text-delta");
+	keys.prop_delta = pool_intern("Prop-delta");
+	keys.text_delta_base_md5 = pool_intern("Text-delta-base-md5");
+	keys.text_delta_base_sha1 = pool_intern("Text-delta-base-sha1");
+	keys.text_copy_source_sha1 = pool_intern("Text-copy-source-sha1");
+	keys.text_content_sha1 = pool_intern("Text-content-sha1");
 }
 
 static void read_props(void)
@@ -209,7 +239,9 @@ void svndump_read(const char *url)
 		*val++ = '\0';
 		key = pool_intern(t);
 
-		if (key == keys.uuid) {
+		if (key == keys.svn_fs_dump_format_version) {
+			dump_ctx.version = atoi(val);
+		} else if (key == keys.uuid) {
 			dump_ctx.uuid = pool_intern(val);
 		} else if (key == keys.revision_number) {
 			if (active_ctx == NODE_CTX)
@@ -251,6 +283,22 @@ void svndump_read(const char *url)
 			node_ctx.textLength = atoi(val);
 		} else if (key == keys.prop_content_length) {
 			node_ctx.propLength = atoi(val);
+		} else if (key == keys.text_delta) {
+			node_ctx.text_delta = !strcmp(val, "true");
+		} else if (key == keys.prop_delta) {
+			node_ctx.prop_delta = !strcmp(val, "true");
+		} else if (key == keys.text_delta_base_md5) {
+			strncpy(node_ctx.text_delta_base_md5, val,
+				MD5_HEX_LENGTH + 1);
+		} else if (key == keys.text_delta_base_sha1) {
+			strncpy(node_ctx.text_delta_base_sha1, val,
+				SHA1_HEX_LENGTH + 1);
+		} else if (key == keys.text_copy_source_sha1) {
+			strncpy(node_ctx.text_copy_source_sha1, val,
+				SHA1_HEX_LENGTH + 1);
+		} else if (key == keys.text_content_sha1) {
+			strncpy(node_ctx.text_content_sha1, val,
+				SHA1_HEX_LENGTH + 1);
 		} else if (key == keys.content_length) {
 			len = atoi(val);
 			buffer_read_line(&input);
-- 
1.7.3.32.g634ef

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/5] vcs-svn: Implement prop-delta handling.
  2010-10-15 12:54 [PATCHv2] Add support for subversion dump format v3 David Barr
  2010-10-15 12:54 ` [PATCH 1/5] fast-import: Let importers retrieve blobs David Barr
  2010-10-15 12:54 ` [PATCH 2/5] vcs-svn: Extend svndump to parse version 3 format David Barr
@ 2010-10-15 12:54 ` David Barr
  2010-10-18 15:10   ` Ramkumar Ramachandra
  2010-10-15 12:54 ` [PATCH 4/5] vcs-svn: Add outfile option to buffer_copy_bytes() David Barr
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 34+ messages in thread
From: David Barr @ 2010-10-15 12:54 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Jonathan Nieder, Ramkumar Ramachandra, Sverre Rabbelier,
	David Barr

By testing against the Apache Software Foundation
repository, some simple rules for decoding prop
deltas were derived.

'Node-action: replace' implies the empty prop set
as the base for the delta.
Otherwise, if a copyfrom source is given that node
forms the basis for the delta.
Lastly, if the destination path exists in the active
revision it forms the basis.

The same rules ought to apply to text deltas as well.

Apply these rules to prop handling.
Add a placeholder srcMark parameter to fast_export_blob().

Signed-off-by: David Barr <david.barr@cordelta.com>
---
 vcs-svn/fast_export.c |    4 +++-
 vcs-svn/fast_export.h |    3 ++-
 vcs-svn/repo_tree.c   |   23 +++++++++++++++++++++++
 vcs-svn/repo_tree.h   |    2 ++
 vcs-svn/svndump.c     |   35 +++++++++++++++++++++++++++++++----
 5 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index 260cf50..d984aaa 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -63,7 +63,9 @@ void fast_export_commit(uint32_t revision, uint32_t author, char *log,
 	printf("progress Imported commit %"PRIu32".\n\n", revision);
 }
 
-void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len, struct line_buffer *input)
+void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len,
+			uint32_t delta, uint32_t srcMark, uint32_t srcMode,
+			struct line_buffer *input)
 {
 	if (mode == REPO_MODE_LNK) {
 		/* svn symlink blobs start with "link " */
diff --git a/vcs-svn/fast_export.h b/vcs-svn/fast_export.h
index 054e7d5..634d9c6 100644
--- a/vcs-svn/fast_export.h
+++ b/vcs-svn/fast_export.h
@@ -9,6 +9,7 @@ void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
 void fast_export_commit(uint32_t revision, uint32_t author, char *log,
 			uint32_t uuid, uint32_t url, unsigned long timestamp);
 void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len,
-		      struct line_buffer *input);
+			uint32_t delta, uint32_t srcMark, uint32_t srcMode,
+			struct line_buffer *input);
 
 #endif
diff --git a/vcs-svn/repo_tree.c b/vcs-svn/repo_tree.c
index e94d91d..b616bda 100644
--- a/vcs-svn/repo_tree.c
+++ b/vcs-svn/repo_tree.c
@@ -157,6 +157,29 @@ static void repo_write_dirent(uint32_t *path, uint32_t mode,
 		dent_remove(&dir_pointer(parent_dir_o)->entries, dent);
 }
 
+uint32_t repo_read_mark(uint32_t revision, uint32_t *path)
+{
+	uint32_t mode = 0, content_offset = 0;
+	struct repo_dirent *src_dent;
+	src_dent = repo_read_dirent(revision, path);
+	if (src_dent != NULL) {
+		mode = src_dent->mode;
+		content_offset = src_dent->content_offset;
+	}
+	return mode && mode != REPO_MODE_DIR ? content_offset : 0;
+}
+
+uint32_t repo_read_mode(uint32_t revision, uint32_t *path)
+{
+	uint32_t mode = 0;
+	struct repo_dirent *src_dent;
+	src_dent = repo_read_dirent(revision, path);
+	if (src_dent != NULL) {
+		mode = src_dent->mode;
+	}
+	return mode;
+}
+
 uint32_t repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst)
 {
 	uint32_t mode = 0, content_offset = 0;
diff --git a/vcs-svn/repo_tree.h b/vcs-svn/repo_tree.h
index 5476175..bd6a3f7 100644
--- a/vcs-svn/repo_tree.h
+++ b/vcs-svn/repo_tree.h
@@ -12,6 +12,8 @@
 #define REPO_MAX_PATH_DEPTH 1000
 
 uint32_t next_blob_mark(void);
+uint32_t repo_read_mark(uint32_t revision, uint32_t *path);
+uint32_t repo_read_mode(uint32_t revision, uint32_t *path);
 uint32_t repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst);
 void repo_add(uint32_t *path, uint32_t mode, uint32_t blob_mark);
 uint32_t repo_replace(uint32_t *path, uint32_t blob_mark);
diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
index 458053e..3431c22 100644
--- a/vcs-svn/svndump.c
+++ b/vcs-svn/svndump.c
@@ -45,7 +45,7 @@ static char* log_copy(uint32_t length, char *log)
 }
 
 static struct {
-	uint32_t action, propLength, textLength, srcRev, srcMode, mark, type;
+	uint32_t action, propLength, textLength, srcRev, srcMode, srcMark, mark, type;
 	uint32_t src[REPO_MAX_PATH_DEPTH], dst[REPO_MAX_PATH_DEPTH];
 	uint32_t text_delta, prop_delta;
 	char text_delta_base_md5[MD5_HEX_LENGTH + 1];
@@ -86,6 +86,7 @@ static void reset_node_ctx(char *fname)
 	node_ctx.src[0] = ~0;
 	node_ctx.srcRev = 0;
 	node_ctx.srcMode = 0;
+	node_ctx.srcMark = 0;
 	pool_tok_seq(REPO_MAX_PATH_DEPTH, node_ctx.dst, "/", fname);
 	node_ctx.mark = 0;
 	node_ctx.text_delta = 0;
@@ -168,17 +169,42 @@ static void read_props(void)
 			}
 			key = ~0;
 			buffer_read_line(&input);
+		} else if (!strncmp(t, "D ", 2)) {
+			len = atoi(&t[2]);
+			key = pool_intern(buffer_read_string(&input, len));
+			buffer_read_line(&input);
+			if (key == keys.svn_executable) {
+				if (node_ctx.type == REPO_MODE_EXE)
+					node_ctx.type = REPO_MODE_BLB;
+			} else if (key == keys.svn_special) {
+				if (node_ctx.type == REPO_MODE_LNK)
+					node_ctx.type = REPO_MODE_BLB;
+			}
+			key = ~0;
 		}
 	}
 }
 
 static void handle_node(void)
 {
+	if (node_ctx.prop_delta) {
+		if (node_ctx.srcRev)
+			node_ctx.srcMode = repo_read_mode(node_ctx.srcRev, node_ctx.src);
+		else
+			node_ctx.srcMode = repo_read_mode(rev_ctx.revision, node_ctx.dst);
+		if (node_ctx.srcMode && node_ctx.action != NODEACT_REPLACE)
+			node_ctx.type = node_ctx.srcMode;
+	}
+
 	if (node_ctx.propLength != LENGTH_UNKNOWN && node_ctx.propLength)
 		read_props();
 
-	if (node_ctx.srcRev)
+	if (node_ctx.srcRev) {
+		node_ctx.srcMark = repo_read_mark(node_ctx.srcRev, node_ctx.src);
 		node_ctx.srcMode = repo_copy(node_ctx.srcRev, node_ctx.src, node_ctx.dst);
+	} else {
+		node_ctx.srcMark = repo_read_mark(rev_ctx.revision, node_ctx.dst);
+	}
 
 	if (node_ctx.textLength != LENGTH_UNKNOWN &&
 	    node_ctx.type != REPO_MODE_DIR)
@@ -209,8 +235,9 @@ static void handle_node(void)
 		node_ctx.type = node_ctx.srcMode;
 
 	if (node_ctx.mark)
-		fast_export_blob(node_ctx.type,
-				 node_ctx.mark, node_ctx.textLength, &input);
+		fast_export_blob(node_ctx.type, node_ctx.mark, node_ctx.textLength,
+				node_ctx.text_delta, node_ctx.srcMark, node_ctx.srcMode,
+				&input);
 	else if (node_ctx.textLength != LENGTH_UNKNOWN)
 		buffer_skip_bytes(&input, node_ctx.textLength);
 }
-- 
1.7.3.32.g634ef

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/5] vcs-svn: Add outfile option to buffer_copy_bytes()
  2010-10-15 12:54 [PATCHv2] Add support for subversion dump format v3 David Barr
                   ` (2 preceding siblings ...)
  2010-10-15 12:54 ` [PATCH 3/5] vcs-svn: Implement prop-delta handling David Barr
@ 2010-10-15 12:54 ` David Barr
  2010-10-18  8:59   ` Jonathan Nieder
  2010-10-15 12:54 ` [PATCH 5/5] svn-fe: Use the cat-blob command to apply deltas David Barr
  2010-10-18  9:54 ` [PATCHv2] Add support for subversion dump format v3 Jonathan Nieder
  5 siblings, 1 reply; 34+ messages in thread
From: David Barr @ 2010-10-15 12:54 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Jonathan Nieder, Ramkumar Ramachandra, Sverre Rabbelier,
	David Barr

Explicitly declare that output is to stdout for existing use.
Allow users of buffer_copy_bytes() to specify the output file.

Signed-off-by: David Barr <david.barr@cordelta.com>
---
 test-line-buffer.c    |    2 +-
 vcs-svn/fast_export.c |    2 +-
 vcs-svn/line_buffer.c |    6 +++---
 vcs-svn/line_buffer.h |    2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/test-line-buffer.c b/test-line-buffer.c
index f9af892..adc23e8 100644
--- a/test-line-buffer.c
+++ b/test-line-buffer.c
@@ -36,7 +36,7 @@ int main(int argc, char *argv[])
 		buffer_skip_bytes(&buf, 1);
 		if (!(s = buffer_read_line(&buf)))
 			break;
-		buffer_copy_bytes(&buf, strtouint32(s) + 1);
+		buffer_copy_bytes(&buf, stdout, strtouint32(s) + 1);
 	}
 	if (buffer_deinit(&buf))
 		die("input error");
diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index d984aaa..b017dfb 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -73,6 +73,6 @@ void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len,
 		len -= 5;
 	}
 	printf("blob\nmark :%"PRIu32"\ndata %"PRIu32"\n", mark, len);
-	buffer_copy_bytes(input, len);
+	buffer_copy_bytes(input, stdout, len);
 	fputc('\n', stdout);
 }
diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index c54031b..676cb62 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -82,7 +82,7 @@ void buffer_read_binary(struct strbuf *sb, uint32_t size,
 	strbuf_fread(sb, size, buf->infile);
 }
 
-void buffer_copy_bytes(struct line_buffer *buf, off_t len)
+void buffer_copy_bytes(struct line_buffer *buf, FILE *outfile, off_t len)
 {
 	char byte_buffer[COPY_BUFFER_LEN];
 	uint32_t in;
@@ -90,8 +90,8 @@ void buffer_copy_bytes(struct line_buffer *buf, off_t len)
 		in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
 		in = fread(byte_buffer, 1, in, buf->infile);
 		len -= in;
-		fwrite(byte_buffer, 1, in, stdout);
-		if (ferror(stdout)) {
+		fwrite(byte_buffer, 1, in, outfile);
+		if (ferror(outfile)) {
 			buffer_skip_bytes(buf, len);
 			return;
 		}
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
index 2375ee1..23f4931 100644
--- a/vcs-svn/line_buffer.h
+++ b/vcs-svn/line_buffer.h
@@ -20,7 +20,7 @@ char *buffer_read_line(struct line_buffer *buf);
 char *buffer_read_string(struct line_buffer *buf, uint32_t len);
 int buffer_read_char(struct line_buffer *buf);
 void buffer_read_binary(struct strbuf *sb, uint32_t len, struct line_buffer *f);
-void buffer_copy_bytes(struct line_buffer *buf, off_t len);
+void buffer_copy_bytes(struct line_buffer *buf, FILE *outfile, off_t len);
 off_t buffer_skip_bytes(struct line_buffer *buf, off_t len);
 void buffer_reset(struct line_buffer *buf);
 
-- 
1.7.3.32.g634ef

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 5/5] svn-fe: Use the cat-blob command to apply deltas
  2010-10-15 12:54 [PATCHv2] Add support for subversion dump format v3 David Barr
                   ` (3 preceding siblings ...)
  2010-10-15 12:54 ` [PATCH 4/5] vcs-svn: Add outfile option to buffer_copy_bytes() David Barr
@ 2010-10-15 12:54 ` David Barr
  2010-10-18  6:57   ` Ramkumar Ramachandra
  2010-10-18  9:54 ` [PATCHv2] Add support for subversion dump format v3 Jonathan Nieder
  5 siblings, 1 reply; 34+ messages in thread
From: David Barr @ 2010-10-15 12:54 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Jonathan Nieder, Ramkumar Ramachandra, Sverre Rabbelier,
	David Barr

Use the new cat-blob command for fast-import to extract
blobs so that text-deltas may be applied.

The backchannel should only need to be configured when
parsing v3 svn dump streams.

Based-on-patch-by: Ramkumar Ramachandra <artagnon@gmail.com>
Based-on-patch-by: Jonathan Nieder <jrnieder@gmail.com>
Tested-by: David Barr <david.barr@cordelta.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
---
 contrib/svn-fe/svn-fe.txt |    6 +++-
 t/t9010-svn-fe.sh         |    6 ++--
 vcs-svn/fast_export.c     |   86 +++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 92 insertions(+), 6 deletions(-)

diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
index 35f84bd..39ffa07 100644
--- a/contrib/svn-fe/svn-fe.txt
+++ b/contrib/svn-fe/svn-fe.txt
@@ -7,7 +7,11 @@ svn-fe - convert an SVN "dumpfile" to a fast-import stream
 
 SYNOPSIS
 --------
-svnadmin dump --incremental REPO | svn-fe [url] | git fast-import
+[verse]
+mkfifo backchannel &&
+svnadmin dump --incremental REPO |
+	svn-fe [url] 3<backchannel |
+	git fast-import --cat-blob-fd=3 3>backchannel
 
 DESCRIPTION
 -----------
diff --git a/t/t9010-svn-fe.sh b/t/t9010-svn-fe.sh
index de976ed..d750c7a 100755
--- a/t/t9010-svn-fe.sh
+++ b/t/t9010-svn-fe.sh
@@ -34,10 +34,10 @@ test_dump () {
 		svnadmin load "$label-svn" < "$TEST_DIRECTORY/$dump" &&
 		svn_cmd export "file://$PWD/$label-svn" "$label-svnco" &&
 		git init "$label-git" &&
-		test-svn-fe "$TEST_DIRECTORY/$dump" >"$label.fe" &&
 		(
-			cd "$label-git" &&
-			git fast-import < ../"$label.fe"
+			cd "$label-git" && mkfifo backchannel && \
+			test-svn-fe "$TEST_DIRECTORY/$dump" 3< backchannel | \
+			git fast-import --cat-blob-fd=3 3> backchannel
 		) &&
 		(
 			cd "$label-svnco" &&
diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index b017dfb..812563d 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -8,10 +8,17 @@
 #include "line_buffer.h"
 #include "repo_tree.h"
 #include "string_pool.h"
+#include "svndiff.h"
 
 #define MAX_GITSVN_LINE_LEN 4096
+#define REPORT_FILENO 3
+
+#define SHA1_HEX_LENGTH 40
 
 static uint32_t first_commit_done;
+static struct line_buffer preimage = LINE_BUFFER_INIT;
+static struct line_buffer postimage = LINE_BUFFER_INIT;
+static struct line_buffer backchannel = LINE_BUFFER_INIT;
 
 void fast_export_delete(uint32_t depth, uint32_t *path)
 {
@@ -63,16 +70,91 @@ void fast_export_commit(uint32_t revision, uint32_t author, char *log,
 	printf("progress Imported commit %"PRIu32".\n\n", revision);
 }
 
+static int fast_export_save_blob(FILE *out)
+{
+	size_t len;
+	char *header;
+	char *end;
+	char *tail;
+
+	if (!backchannel.infile)
+		backchannel.infile = fdopen(REPORT_FILENO, "r");
+	if (!backchannel.infile)
+		return error("Could not open backchannel fd: %d", REPORT_FILENO);
+	header = buffer_read_line(&backchannel);
+	if (header == NULL)
+		return 1;
+	end = strchr(header, '\0');
+	if (end - header > 7 && !strcmp(end - 7, "missing"))
+		return error("cat-blob reports missing blob: %s", header);
+	if (end - header < SHA1_HEX_LENGTH)
+		return error("cat-blob header too short for SHA1: %s", header);
+	if (strncmp(header + SHA1_HEX_LENGTH, " blob ", 6))
+		return error("cat-blob header has wrong object type: %s", header);
+	len = strtoumax(header + SHA1_HEX_LENGTH + 6, &end, 10);
+	if (end == header + SHA1_HEX_LENGTH + 6)
+		return error("cat-blob header did not contain length: %s", header);
+	if (*end)
+		return error("cat-blob header contained garbage after length: %s", header);
+	buffer_copy_bytes(&backchannel, out, len);
+	tail = buffer_read_line(&backchannel);
+	if (!tail)
+		return 1;
+	if (*tail)
+		return error("cat-blob trailing line contained garbage: %s", tail);
+	return 0;
+}
+
 void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len,
 			uint32_t delta, uint32_t srcMark, uint32_t srcMode,
 			struct line_buffer *input)
 {
+	long preimage_len = 0;
+
+	if (delta) {
+		if (!preimage.infile)
+			preimage.infile = tmpfile();
+		if (!preimage.infile)
+			die("Unable to open temp file for blob retrieval");
+		if (srcMark) {
+			printf("cat-blob :%"PRIu32"\n", srcMark);
+			fflush(stdout);
+			if (srcMode == REPO_MODE_LNK)
+				fwrite("link ", 1, 5, preimage.infile);
+			if (fast_export_save_blob(preimage.infile))
+				die("Failed to retrieve blob for delta application");
+		}
+		preimage_len = ftell(preimage.infile);
+		fseek(preimage.infile, 0, SEEK_SET);
+		if (!postimage.infile)
+			postimage.infile = tmpfile();
+		if (!postimage.infile)
+			die("Unable to open temp file for blob application");
+		svndiff0_apply(input, len, &preimage, postimage.infile);
+		len = ftell(postimage.infile);
+		fseek(postimage.infile, 0, SEEK_SET);
+	}
+
 	if (mode == REPO_MODE_LNK) {
 		/* svn symlink blobs start with "link " */
-		buffer_skip_bytes(input, 5);
+		if (delta)
+			buffer_skip_bytes(&postimage, 5);
+		else
+			buffer_skip_bytes(input, 5);
 		len -= 5;
 	}
 	printf("blob\nmark :%"PRIu32"\ndata %"PRIu32"\n", mark, len);
-	buffer_copy_bytes(input, stdout, len);
+	if (!delta)
+		buffer_copy_bytes(input, stdout, len);
+	else
+		buffer_copy_bytes(&postimage, stdout, len);
 	fputc('\n', stdout);
+
+	if (preimage.infile) {
+		fseek(preimage.infile, 0, SEEK_SET);
+	}
+
+	if (postimage.infile) {
+		fseek(postimage.infile, 0, SEEK_SET);
+	}
 }
-- 
1.7.3.32.g634ef

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/5] svn-fe: Use the cat-blob command to apply deltas
  2010-10-15 12:54 ` [PATCH 5/5] svn-fe: Use the cat-blob command to apply deltas David Barr
@ 2010-10-18  6:57   ` Ramkumar Ramachandra
  2010-10-18  9:24     ` Jonathan Nieder
  0 siblings, 1 reply; 34+ messages in thread
From: Ramkumar Ramachandra @ 2010-10-18  6:57 UTC (permalink / raw)
  To: David Barr; +Cc: Git Mailing List, Jonathan Nieder, Sverre Rabbelier

Hi David,

David Barr writes:
> Use the new cat-blob command for fast-import to extract
> blobs so that text-deltas may be applied.

I like this straightforward approach, and I like the name 'cat-blob'.

> The backchannel should only need to be configured when
> parsing v3 svn dump streams.

Maybe get the synopsis to say this as well?

> Based-on-patch-by: Ramkumar Ramachandra <artagnon@gmail.com>
> Based-on-patch-by: Jonathan Nieder <jrnieder@gmail.com>
> Tested-by: David Barr <david.barr@cordelta.com>
> Signed-off-by: David Barr <david.barr@cordelta.com>
> ---
>  contrib/svn-fe/svn-fe.txt |    6 +++-
>  t/t9010-svn-fe.sh         |    6 ++--
>  vcs-svn/fast_export.c     |   86 +++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 92 insertions(+), 6 deletions(-)
> 
> diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
> index 35f84bd..39ffa07 100644
> --- a/contrib/svn-fe/svn-fe.txt
> +++ b/contrib/svn-fe/svn-fe.txt
> @@ -7,7 +7,11 @@ svn-fe - convert an SVN "dumpfile" to a fast-import stream
>  
>  SYNOPSIS
>  --------
> -svnadmin dump --incremental REPO | svn-fe [url] | git fast-import
> +[verse]
> +mkfifo backchannel &&
> +svnadmin dump --incremental REPO |
> +	svn-fe [url] 3<backchannel |
> +	git fast-import --cat-blob-fd=3 3>backchannel

See above.

>  DESCRIPTION
>  -----------
> diff --git a/t/t9010-svn-fe.sh b/t/t9010-svn-fe.sh
> index de976ed..d750c7a 100755
> --- a/t/t9010-svn-fe.sh
> +++ b/t/t9010-svn-fe.sh
> @@ -34,10 +34,10 @@ test_dump () {
>  		svnadmin load "$label-svn" < "$TEST_DIRECTORY/$dump" &&
>  		svn_cmd export "file://$PWD/$label-svn" "$label-svnco" &&
>  		git init "$label-git" &&
> -		test-svn-fe "$TEST_DIRECTORY/$dump" >"$label.fe" &&
>  		(
> -			cd "$label-git" &&
> -			git fast-import < ../"$label.fe"
> +			cd "$label-git" && mkfifo backchannel && \
> +			test-svn-fe "$TEST_DIRECTORY/$dump" 3< backchannel | \
> +			git fast-import --cat-blob-fd=3 3> backchannel
>  		) &&
>  		(
>  			cd "$label-svnco" &&

Ok.

> diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
> index b017dfb..812563d 100644
> --- a/vcs-svn/fast_export.c
> +++ b/vcs-svn/fast_export.c
> @@ -8,10 +8,17 @@
>  #include "line_buffer.h"
>  #include "repo_tree.h"
>  #include "string_pool.h"
> +#include "svndiff.h"
>  
>  #define MAX_GITSVN_LINE_LEN 4096
> +#define REPORT_FILENO 3
> +
> +#define SHA1_HEX_LENGTH 40
>  
>  static uint32_t first_commit_done;
> +static struct line_buffer preimage = LINE_BUFFER_INIT;
> +static struct line_buffer postimage = LINE_BUFFER_INIT;
> +static struct line_buffer backchannel = LINE_BUFFER_INIT;

Elegant :)

>  void fast_export_delete(uint32_t depth, uint32_t *path)
>  {
> @@ -63,16 +70,91 @@ void fast_export_commit(uint32_t revision, uint32_t author, char *log,
>  	printf("progress Imported commit %"PRIu32".\n\n", revision);
>  }
>  
> +static int fast_export_save_blob(FILE *out)
> +{
> +	size_t len;
> +	char *header;
> +	char *end;
> +	char *tail;
> +
> +	if (!backchannel.infile)
> +		backchannel.infile = fdopen(REPORT_FILENO, "r");
> +	if (!backchannel.infile)
> +		return error("Could not open backchannel fd: %d", REPORT_FILENO);

REPORT_FILENO = 3 is hard-coded. Is this intended? Maybe a
command-line option to specify the fd?

> +	header = buffer_read_line(&backchannel);
> +	if (header == NULL)
> +		return 1;

Note to self: This prints the error "Failed to retrieve blob for delta
application" in the caller.

> +	end = strchr(header, '\0');
> +	if (end - header > 7 && !strcmp(end - 7, "missing"))
> +		return error("cat-blob reports missing blob: %s", header);
> +	if (end - header < SHA1_HEX_LENGTH)
> +		return error("cat-blob header too short for SHA1: %s", header);
> +	if (strncmp(header + SHA1_HEX_LENGTH, " blob ", 6))
> +		return error("cat-blob header has wrong object type: %s", header);
> +	len = strtoumax(header + SHA1_HEX_LENGTH + 6, &end, 10);
> +	if (end == header + SHA1_HEX_LENGTH + 6)
> +		return error("cat-blob header did not contain length: %s", header);
> +	if (*end)
> +		return error("cat-blob header contained garbage after length: %s", header);
> +	buffer_copy_bytes(&backchannel, out, len);
> +	tail = buffer_read_line(&backchannel);
> +	if (!tail)
> +		return 1;

Could you clarify when exactly will this happen?

> +	if (*tail)
> +		return error("cat-blob trailing line contained garbage: %s", tail);
> +	return 0;
> +}
> +
>  void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len,
>  			uint32_t delta, uint32_t srcMark, uint32_t srcMode,
>  			struct line_buffer *input)
>  {

Note to reviewers: The function looks like this in `master`:
void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len)

New parameters intrduced in the svn-fe3 series: srcMark, srcMode,
delta, input.

> +	long preimage_len = 0;
> +
> +	if (delta) {
> +		if (!preimage.infile)
> +			preimage.infile = tmpfile();

Didn't you later decide against this and use one tmpfile instead? In
this case, the temporary file will be automatically deleted when
`preimage.infile` goes out of scope.

> +		if (!preimage.infile)
> +			die("Unable to open temp file for blob retrieval");
> +		if (srcMark) {
> +			printf("cat-blob :%"PRIu32"\n", srcMark);
> +			fflush(stdout);
> +			if (srcMode == REPO_MODE_LNK)
> +				fwrite("link ", 1, 5, preimage.infile);

Special handling for symbolic links. Perhaps you should mention it in
a comment here?

> +			if (fast_export_save_blob(preimage.infile))
> +				die("Failed to retrieve blob for delta application");
> +		}
> +		preimage_len = ftell(preimage.infile);
> +		fseek(preimage.infile, 0, SEEK_SET);
> +		if (!postimage.infile)
> +			postimage.infile = tmpfile();

One tmpfile?

> +		if (!postimage.infile)
> +			die("Unable to open temp file for blob application");
> +		svndiff0_apply(input, len, &preimage, postimage.infile);
> +		len = ftell(postimage.infile);

Since you already have a preimage_len, perhaps name this postimage_len
to avoid confusion?

> +		fseek(postimage.infile, 0, SEEK_SET);
> +	}
> +
>  	if (mode == REPO_MODE_LNK) {
>  		/* svn symlink blobs start with "link " */
> -		buffer_skip_bytes(input, 5);
> +		if (delta)
> +			buffer_skip_bytes(&postimage, 5);
> +		else
> +			buffer_skip_bytes(input, 5);
>  		len -= 5;
>  	}
>  	printf("blob\nmark :%"PRIu32"\ndata %"PRIu32"\n", mark, len);
> -	buffer_copy_bytes(input, stdout, len);
> +	if (!delta)
> +		buffer_copy_bytes(input, stdout, len);
> +	else
> +		buffer_copy_bytes(&postimage, stdout, len);
>  	fputc('\n', stdout);

I should have asked this a long time ago: why the extra newline?

> +
> +	if (preimage.infile) {
> +		fseek(preimage.infile, 0, SEEK_SET);
> +	}
> +
> +	if (postimage.infile) {
> +		fseek(postimage.infile, 0, SEEK_SET);
> +	}

Style nits: The extra braces around the `if` statement are unnecessary.

Overall, pleasant read. Thanks for taking this forward.

-- Ram

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/5] fast-import: Let importers retrieve blobs
  2010-10-15 12:54 ` [PATCH 1/5] fast-import: Let importers retrieve blobs David Barr
@ 2010-10-18  7:36   ` Ramkumar Ramachandra
  2010-10-18  8:50     ` Jonathan Nieder
  2010-10-18  8:26   ` Jonathan Nieder
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 34+ messages in thread
From: Ramkumar Ramachandra @ 2010-10-18  7:36 UTC (permalink / raw)
  To: David Barr
  Cc: Git Mailing List, Jonathan Nieder, Sverre Rabbelier,
	Shawn O. Pearce

Hi David,

David Barr writes:
> As the description of the "progress" option in git-fast-import.1
> hints, there is no convenient way to immediately access the blobs
> written to a new repository through fast-import.  Until a checkpoint
> has been started and finishes writing the pack index, any new blobs
> will not be accessible using standard git tools.
> 
> So introduce another way: a "cat-blob" command introduced in the
> command stream requests for fast-import to print a blob to stdout
> or a file descriptor specified by the argument --cat-blob-fd.
> 
> The output uses the same format as "git cat-file --batch".

Nice :) It looks like we finally have a nice polished version.
Caution: Most of the review are just notes to self, or style
nitpicks. Feel free to ignore.

> Cc: Shawn O. Pearce <spearce@spearce.org>
> Cc: Ramkumar Ramachandra <artagnon@gmail.com>
> Helped-by: Sverre Rabbelier <srabbelier@gmail.com>
> Based-on-patch-by: Jonathan Nieder <jrnieder@gmail.com>
> Signed-off-by: David Barr <david.barr@cordelta.com>
> ---
>  Documentation/git-fast-import.txt |   34 ++++++++++++++
>  fast-import.c                     |   92 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 126 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
> index 966ba4f..42a100b 100644
> --- a/Documentation/git-fast-import.txt
> +++ b/Documentation/git-fast-import.txt
> @@ -92,6 +92,17 @@ OPTIONS
>  	--(no-)-relative-marks= with the --(import|export)-marks=
>  	options.
>  
> +--cat-blob-fd=<fd>::
> +	Specify the file descriptor that will be written to
> +	when the `cat-blob` command is encountered in the stream.
> +	The default behaviour is to write to `stdout`.
> ++
> +The described objects are not necessarily accessible
> +using standard git plumbing tools until a little while
> +after the next checkpoint.  To request access to the
> +blobs before then, use `cat-blob` lines in the command
> +stream.
> +
>  --export-pack-edges=<file>::
>  	After creating a packfile, print a line of data to
>  	<file> listing the filename of the packfile and the last
> @@ -320,6 +331,11 @@ and control the current import process.  More detailed discussion
>  	standard output.  This command is optional and is not needed
>  	to perform an import.
>  
> +`cat-blob`::
> +	Causes fast-import to print a blob in 'cat-file --batch'
> +	format to the file descriptor set with `--cat-blob-fd` or
> +	`stdout` if unspecified.
> +
>  `feature`::
>  	Require that fast-import supports the specified feature, or
>  	abort if it does not.
> @@ -876,6 +892,23 @@ Placing a `progress` command immediately after a `checkpoint` will
>  inform the reader when the `checkpoint` has been completed and it
>  can safely access the refs that fast-import updated.
>  
> +`cat-blob`
> +~~~~~
> +Causes fast-import to print a blob to a file descriptor previously
> +arranged with the `--cat-blob-fd` argument.  The command otherwise
> +has no impact on the current import; its main purpose is to
> +retrieve blobs that may be in fast-import's memory but not
> +accessible from the target repository a little quicker than by the
> +method suggested by the description of the `progress` option.
> +
> +....
> +	'cat-blob' SP <dataref> LF
> +....
> +
> +The `<dataref>` can be either a mark reference (`:<idnum>`)
> +set previously, or a full 40-byte SHA-1 of any Git blob,
> +preexisting or ready to be written.
> +
>  `feature`
>  ~~~~~~~~~
>  Require that fast-import supports the specified feature, or abort if
> @@ -896,6 +929,7 @@ The following features are currently supported:
>  * date-format
>  * import-marks
>  * export-marks
> +* cat-blob
>  * relative-marks
>  * no-relative-marks
>  * force

Looks perfect.

> diff --git a/fast-import.c b/fast-import.c
> index 2317b0f..ea3e529 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -55,6 +55,8 @@ Format of STDIN stream:
>      ('from' sp committish lf)?
>      lf?;
>  
> +  cat_blob ::= 'cat-blob' sp (hexsha1 | idnum) lf;
> +
>    checkpoint ::= 'checkpoint' lf
>      lf?;
>  
> @@ -361,6 +363,9 @@ static uintmax_t next_mark;
>  static struct strbuf new_data = STRBUF_INIT;
>  static int seen_data_command;
>  
> +/* Where to write output of cat-blob commands */
> +static int cat_blob_fd = 1;
> +

Right. Defaults to stdout, as described in documentation.

>  static void parse_argv(void);
>  
>  static void write_branch_report(FILE *rpt, struct branch *b)
> @@ -2680,6 +2685,77 @@ static void parse_reset_branch(void)
>  		unread_command_buf = 1;
>  }
>  
> +static void cat_blob_write(const char *buf, unsigned long size)
> +{
> +	if (write_in_full(cat_blob_fd, buf, size) != size)
> +		die_errno("Write to frontend failed");
> +}

Note to self: I didn't notice write_in_full in wrapper.c
earlier. Returns the actual size written to the fd from the buffer.

> +static void cat_blob(struct object_entry *oe, unsigned char sha1[20])
> +{
> +	struct strbuf line = STRBUF_INIT;
> +	unsigned long size;
> +	enum object_type type = 0;
> +	char *buf;
> +
> +	if (oe && oe->pack_id != MAX_PACK_ID) {
> +		type = oe->type;
> +		buf = gfi_unpack_entry(oe, &size);
> +	} else {
> +		buf = read_sha1_file(sha1, &type, &size);
> +	}
> +	if (!buf)
> +		die("Can't read object %s", sha1_to_hex(sha1));

Ok. Looks similar enough to the code in the other functions.

> +
> +	/*
> +	 * Output based on batch_one_object() from cat-file.c.
> +	 */
> +	if (type <= 0) {
> +		strbuf_reset(&line);
> +		strbuf_addf(&line, "%s missing\n", sha1_to_hex(sha1));
> +		cat_blob_write(line.buf, line.len);
> +		return;
> +	} else if (type != OBJ_BLOB) {
> +		die("Object %s is a %s but a blob was expected.",
> +		    sha1_to_hex(sha1), typename(type));
> +	}
> +	strbuf_reset(&line);
> +	strbuf_addf(&line, "%s %s %lu\n", sha1_to_hex(sha1),
> +						typename(type), size);
> +	cat_blob_write(line.buf, line.len);
> +	cat_blob_write(buf, size);
> +	cat_blob_write("\n", 1);
> +	free(buf);

Cool.

> +}
> +
> +
> +static void parse_cat_blob(void)
> +{
> +	const char *p;
> +	struct object_entry *oe = oe;
> +	unsigned char sha1[20];
> +
> +	/* cat SP <object> */
> +	p = command_buf.buf + strlen("cat-blob ");
> +	if (*p == ':') {
> +		char *x;
> +		oe = find_mark(strtoumax(p + 1, &x, 10));
> +		if (x == p + 1)
> +			die("Invalid mark: %s", command_buf.buf);
> +		if (!oe)
> +			die("Unknown mark: %s", command_buf.buf);
> +		p = x;
> +		hashcpy(sha1, oe->idx.sha1);

The mark case.

> +	} else {
> +		if (get_sha1_hex(p, sha1))
> +			die("Invalid SHA1: %s", command_buf.buf);
> +		p += 40;
> +		oe = find_object(sha1);
> +	}

The SHA1 case.
Looks similar enough to the code in the other functions.

> +
> +	cat_blob(oe, sha1);
> +}
> +
>  static void parse_checkpoint(void)
>  {
>  	if (object_count) {
> @@ -2755,6 +2831,14 @@ static void option_export_marks(const char *marks)
>  	safe_create_leading_directories_const(export_marks_file);
>  }
>  
> +static void option_cat_blob_fd(const char *fd)
> +{
> +	unsigned long n = strtoul(fd, NULL, 0);
> +	if (n > (unsigned long) INT_MAX)
> +		die("--cat-blob-fd cannot exceed %d", INT_MAX);
> +	cat_blob_fd = (int) n;
> +}
> +

You don't display an appropriate error when n < 0.

>  static void option_export_pack_edges(const char *edges)
>  {
>  	if (pack_edges)
> @@ -2808,6 +2892,8 @@ static int parse_one_feature(const char *feature, int from_stream)
>  		option_import_marks(feature + 13, from_stream);
>  	} else if (!prefixcmp(feature, "export-marks=")) {
>  		option_export_marks(feature + 13);
> +	} else if (!prefixcmp(feature, "cat-blob")) {
> +		/* Don't die - this feature is supported */

Style nit: There are no statements in this branch. Maybe put a
semicolon?

>  	} else if (!prefixcmp(feature, "relative-marks")) {
>  		relative_marks_paths = 1;
>  	} else if (!prefixcmp(feature, "no-relative-marks")) {
> @@ -2896,6 +2982,10 @@ static void parse_argv(void)
>  		if (*a != '-' || !strcmp(a, "--"))
>  			break;
>  
> +		if (!prefixcmp(a + 2, "cat-blob-fd=")) {
> +			option_cat_blob_fd(a + 2 + 12);
> +		}
> +

Style nit: Unnecessary braces around `if` statement.

>  		if (parse_one_option(a + 2))
>  			continue;
>  
> @@ -2953,6 +3043,8 @@ int main(int argc, const char **argv)
>  			parse_new_tag();
>  		else if (!prefixcmp(command_buf.buf, "reset "))
>  			parse_reset_branch();
> +		else if (!prefixcmp(command_buf.buf, "cat-blob "))
> +			parse_cat_blob();
>  		else if (!strcmp("checkpoint", command_buf.buf))
>  			parse_checkpoint();
>  		else if (!prefixcmp(command_buf.buf, "progress "))
> -- 
> 1.7.3.32.g634ef
> 

Ok. I'm eager to see this go through to `master`.
Reviewed-by: Ramkumar Ramachandra <artagnon@gmail.com>

-- Ram

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/5] fast-import: Let importers retrieve blobs
  2010-10-15 12:54 ` [PATCH 1/5] fast-import: Let importers retrieve blobs David Barr
  2010-10-18  7:36   ` Ramkumar Ramachandra
@ 2010-10-18  8:26   ` Jonathan Nieder
       [not found]   ` <20101119093530.GA19061@burratino>
  2010-11-28 19:41   ` [PATCH/RFC v3 resend 0/4] fast-import: Let importers retrieve blobs Jonathan Nieder
  3 siblings, 0 replies; 34+ messages in thread
From: Jonathan Nieder @ 2010-10-18  8:26 UTC (permalink / raw)
  To: David Barr
  Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier,
	Shawn O. Pearce, Sam Vilain

(+cc: Sam)

Hi,

David Barr wrote:

> So introduce another way: a "cat-blob" command introduced in the
> command stream requests for fast-import to print a blob to stdout
> or a file descriptor specified by the argument --cat-blob-fd.

Yes, please!

> Cc: Shawn O. Pearce <spearce@spearce.org>
> Cc: Ramkumar Ramachandra <artagnon@gmail.com>

It turns out these Cc tags are not supposed to be used except in
some very weird circumstances.  See [1] if curious.

[...]
> --- a/Documentation/git-fast-import.txt
> +++ b/Documentation/git-fast-import.txt
> @@ -92,6 +92,17 @@ OPTIONS
>  	--(no-)-relative-marks= with the --(import|export)-marks=
>  	options.
>  
> +--cat-blob-fd=<fd>::
> +	Specify the file descriptor that will be written to
> +	when the `cat-blob` command is encountered in the stream.
> +	The default behaviour is to write to `stdout`.

Sounds good.

> ++
> +The described objects are not necessarily accessible
> +using standard git plumbing tools until a little while
> +after the next checkpoint.  To request access to the
> +blobs before then, use `cat-blob` lines in the command
> +stream.

This is stale explanation from --report-fd, I think, to explain
why the commit ids it printed were not very useful.  It would
be possible to reword it to describe cat-blob-fd but since the
frontend does not have easy access to blob names as it is, I
think cat-blob motivates itself on its own.

[...]
> @@ -876,6 +892,23 @@ Placing a `progress` command immediately after a `checkpoint` will
>  inform the reader when the `checkpoint` has been completed and it
>  can safely access the refs that fast-import updated.
>  
> +`cat-blob`
> +~~~~~

   ~~~~~~~~~~

> @@ -896,6 +929,7 @@ The following features are currently supported:
>  * date-format
>  * import-marks
>  * export-marks
> +* cat-blob

The explanation says (paraphrased) "Features work identically to their
option counterparts, with the exception of import-marks as described
below".

Maybe ought to be reworded?

 date-format::
 export-marks::
 relative-marks::
 no-relative-marks::
 force::
	See the corresponding command-line option.

 import-marks::
	Like --import-marks, except in two respects.  First, only one
	"feature import-marks" command is allowed per stream.  Second,
	an --import-marks= specified on the command line will override it.

 cat-blob::
	No-op to check that the importer supports the cat-blob command.

By the way, it might be nice to make cat-blob not just check the importer
but the environment in which it was invoked, like this:

	exporter says:
		feature this
		feature that
		feature cat-blob
		feature another
		...

	importer says:
		feature cat-blob

That is, after writing "feature cat-blob\n", an exporter could tell if
the backchannel was set up correctly by reading for "feature cat-blob\n"
from the importer.

> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -2680,6 +2685,77 @@ static void parse_reset_branch(void)
>  		unread_command_buf = 1;
>  }
>  
> +static void cat_blob_write(const char *buf, unsigned long size)
> +{
> +	if (write_in_full(cat_blob_fd, buf, size) != size)
> +		die_errno("Write to frontend failed");
> +}

An odd operation, since if the pipe_buf gets filled then it blocks
until the exporter finds time to read.  Maybe in some future version
this would write to a private ring buffer and there would be an
event loop or seperate thread to flush it out when the exporter is
ready.

Upshot: I am happy with this as a separate function.

[...]
> @@ -2808,6 +2892,8 @@ static int parse_one_feature(const char *feature, int from_stream)
>  		option_import_marks(feature + 13, from_stream);
>  	} else if (!prefixcmp(feature, "export-marks=")) {
>  		option_export_marks(feature + 13);
> +	} else if (!prefixcmp(feature, "cat-blob")) {
> +		/* Don't die - this feature is supported */

Maybe if (!strcmp(...?

[...]
> @@ -2896,6 +2982,10 @@ static void parse_argv(void)
>  		if (*a != '-' || !strcmp(a, "--"))
>  			break;
>  
> +		if (!prefixcmp(a + 2, "cat-blob-fd=")) {
> +			option_cat_blob_fd(a + 2 + 12);
> +		}
> +
>  		if (parse_one_option(a + 2))
>  			continue;

Probably worth mentioning in the manual, under the option command:

  The following command-line option describes the environment
  in which fast-import was executed and may not be passed to
  'option':

   * cat-blob-fd

> @@ -2953,6 +3043,8 @@ int main(int argc, const char **argv)
>  			parse_new_tag();
>  		else if (!prefixcmp(command_buf.buf, "reset "))
>  			parse_reset_branch();
> +		else if (!prefixcmp(command_buf.buf, "cat-blob "))
> +			parse_cat_blob();

Thanks.  That was simple. :)

Tests?

[1] http://thread.gmane.org/gmane.comp.version-control.git/157711

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/5] fast-import: Let importers retrieve blobs
  2010-10-18  7:36   ` Ramkumar Ramachandra
@ 2010-10-18  8:50     ` Jonathan Nieder
  0 siblings, 0 replies; 34+ messages in thread
From: Jonathan Nieder @ 2010-10-18  8:50 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: David Barr, Git Mailing List, Sverre Rabbelier, Shawn O. Pearce

Ramkumar Ramachandra wrote:

> Note to self: I didn't notice write_in_full in wrapper.c
> earlier. Returns the actual size written to the fd from the buffer.

Yeah, write_in_full() is write() without the partial-read semantics
when interrupted by a signal, on Windows, or encountering an empty
pipe.

>> +static void option_cat_blob_fd(const char *fd)
>> +{
>> +	unsigned long n = strtoul(fd, NULL, 0);
>> +	if (n > (unsigned long) INT_MAX)
>> +		die("--cat-blob-fd cannot exceed %d", INT_MAX);
>> +	cat_blob_fd = (int) n;
>> +}
>> +
>
> You don't display an appropriate error when n < 0.

How can n be < 0?  strtoul returns an unsigned long.

But more to the point, yes, this does not return an appropriate
error when "--cat-blob-fd=" is not followed by an unsigned
integer.  At least it's consistent with --depth=nonsense et al.

Rough patch below (needs tests).

> Ok. I'm eager to see this go through to `master`.
> Reviewed-by: Ramkumar Ramachandra <artagnon@gmail.com>

Thanks.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
diff --git a/fast-import.c b/fast-import.c
index eb6860d..20023c1 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2822,16 +2822,25 @@ static void option_date_format(const char *fmt)
 		die("unknown --date-format argument %s", fmt);
 }
 
+static unsigned long ulong_arg(const char *arg)
+{
+	char *endptr;
+	unsigned long rv = strtoul(arg, &endptr, 0);
+	if (endptr == arg || *endptr)
+		die("%s: argument must be an unsigned integer", arg);
+	return rv;
+}
+
 static void option_depth(const char *depth)
 {
-	max_depth = strtoul(depth, NULL, 0);
+	max_depth = ulong_arg(depth);
 	if (max_depth > MAX_DEPTH)
 		die("--depth cannot exceed %u", MAX_DEPTH);
 }
 
 static void option_active_branches(const char *branches)
 {
-	max_active_branches = strtoul(branches, NULL, 0);
+	max_active_branches = ulong_arg(branches);
 }
 
 static void option_export_marks(const char *marks)
@@ -2842,7 +2851,7 @@ static void option_export_marks(const char *marks)
 
 static void option_cat_blob_fd(const char *fd)
 {
-	unsigned long n = strtoul(fd, NULL, 0);
+	unsigned long n = ulong_arg(fd);
 	if (n > (unsigned long) INT_MAX)
 		die("--cat-blob-fd cannot exceed %d", INT_MAX);
 	cat_blob_fd = (int) n;

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] vcs-svn: Add outfile option to buffer_copy_bytes()
  2010-10-15 12:54 ` [PATCH 4/5] vcs-svn: Add outfile option to buffer_copy_bytes() David Barr
@ 2010-10-18  8:59   ` Jonathan Nieder
  0 siblings, 0 replies; 34+ messages in thread
From: Jonathan Nieder @ 2010-10-18  8:59 UTC (permalink / raw)
  To: David Barr; +Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier

David Barr wrote:

> Explicitly declare that output is to stdout for existing use.
> Allow users of buffer_copy_bytes() to specify the output file.

Probably worth mentioning the motivation, which is presumably
that svn-fe will be streaming the preimage for files expressed as
deltas from the cat-file-fd to a temporary file.

> Signed-off-by: David Barr <david.barr@cordelta.com>

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks.  (It's a good API change.)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/5] svn-fe: Use the cat-blob command to apply deltas
  2010-10-18  6:57   ` Ramkumar Ramachandra
@ 2010-10-18  9:24     ` Jonathan Nieder
  2010-10-18 12:18       ` Ramkumar Ramachandra
  0 siblings, 1 reply; 34+ messages in thread
From: Jonathan Nieder @ 2010-10-18  9:24 UTC (permalink / raw)
  To: Ramkumar Ramachandra; +Cc: David Barr, Git Mailing List, Sverre Rabbelier

Hi Ram,

Glad to see you are feeling a little better.

Ramkumar Ramachandra wrote:
> David Barr writes:

>> +	if (!backchannel.infile)
>> +		backchannel.infile = fdopen(REPORT_FILENO, "r");
>> +	if (!backchannel.infile)
>> +		return error("Could not open backchannel fd: %d", REPORT_FILENO);
>
> REPORT_FILENO = 3 is hard-coded. Is this intended? Maybe a
> command-line option to specify the fd?

fast-import gets the --cat-file-fd parameter to choose between stdout,
stdin-as-socket, stderr, or another fd (not necessarily 3 because it
might have to compete with other similar features some day).

For svn-fe, it is just like another stdin.  stdin is always fd 0,
so...

For callers other than svn-fe, it would be especially useful to
make it configurable, yes.

>> +	tail = buffer_read_line(&backchannel);
>> +	if (!tail)
>> +		return 1;
>
> Could you clarify when exactly will this happen?

buffer_read_line() returns NULL on error and when data is exhausted
without the trailing newline appearing.  The input here is supposed to
be just a single newline (trimmed to an empty string).

>> +	long preimage_len = 0;
>> +
>> +	if (delta) {
>> +		if (!preimage.infile)
>> +			preimage.infile = tmpfile();
>
> Didn't you later decide against this and use one tmpfile instead?

This is a single tempfile (because static).  Or am I missing
something?

>> +		if (!preimage.infile)
>> +			die("Unable to open temp file for blob retrieval");
>> +		if (srcMark) {
>> +			printf("cat-blob :%"PRIu32"\n", srcMark);
>> +			fflush(stdout);
>> +			if (srcMode == REPO_MODE_LNK)
>> +				fwrite("link ", 1, 5, preimage.infile);
>
> Special handling for symbolic links. Perhaps you should mention it in
> a comment here?

Or better yet, a comment in the commit message. :)

>> +			if (fast_export_save_blob(preimage.infile))
>> +				die("Failed to retrieve blob for delta application");
>> +		}
>> +		preimage_len = ftell(preimage.infile);
>> +		fseek(preimage.infile, 0, SEEK_SET);
>> +		if (!postimage.infile)
>> +			postimage.infile = tmpfile();
>
> One tmpfile?

Do you mean letting the preimage and postimage share a file?

[...]
>>  	printf("blob\nmark :%"PRIu32"\ndata %"PRIu32"\n", mark, len);
>> -	buffer_copy_bytes(input, stdout, len);
>> +	if (!delta)
>> +		buffer_copy_bytes(input, stdout, len);
>> +	else
>> +		buffer_copy_bytes(&postimage, stdout, len);
>>  	fputc('\n', stdout);
>
> I should have asked this a long time ago: why the extra newline?

>From the fast-import manual:

	The LF after <raw> is optional (it used to be required)
	but recommended. Always including it makes debugging a
	fast-import stream easier as the next command always
	starts in column 0 of the next line, even if <raw> did
	not end with an LF.

> Overall, pleasant read. Thanks for taking this forward.

Seconded.  Thanks, both.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv2] Add support for subversion dump format v3
  2010-10-15 12:54 [PATCHv2] Add support for subversion dump format v3 David Barr
                   ` (4 preceding siblings ...)
  2010-10-15 12:54 ` [PATCH 5/5] svn-fe: Use the cat-blob command to apply deltas David Barr
@ 2010-10-18  9:54 ` Jonathan Nieder
  5 siblings, 0 replies; 34+ messages in thread
From: Jonathan Nieder @ 2010-10-18  9:54 UTC (permalink / raw)
  To: David Barr
  Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier,
	Junio C Hamano

David Barr wrote:

> Patch 1 adds the required infrastructure to fast-import.
> This features the addition of the cat-blob command

Patch 1: maybe someone wants to pick this up and make the minor
changes it needs (a test or two to maintain sanity)?

> Patch 2 adds the basic parsing necessary to process the v3 format.

The log message doesn't give context but the patch is good
and safe.  Unknown keys are ignored so it is basically a
no-op except for using a little more memory.

> Patch 3 adds logic around decoding prop deltas.

It would be nice if someone who is not Junio cleans up the style.

Patch 4 (unmentioned for some reason): the log message doesn't give
context but the patch is good.  I think this could be picked up right
away.  There would be semantically unimportant merge conflicts if
cherry-picking without the patches introducing buffer_read_binary()
and changing buffer_copy_bytes() to take an off_t.

> Patch 5 integrates svn-fe with svn-da to decode text deltas.

I like it a lot but am interested in the follow-ups to Ram's comments.
Of course this requires the svn-da series so I'd prefer to give it
a few more days' cooking.

Summary:

 - patch 4 could be picked up right away imho
 - the rest need some work, but not much
 - the series is available from
   git://github.com/barrbrain/git.git svn-fe3

Regards,
Jonathan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/5] svn-fe: Use the cat-blob command to apply deltas
  2010-10-18  9:24     ` Jonathan Nieder
@ 2010-10-18 12:18       ` Ramkumar Ramachandra
  0 siblings, 0 replies; 34+ messages in thread
From: Ramkumar Ramachandra @ 2010-10-18 12:18 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: David Barr, Git Mailing List, Sverre Rabbelier

Hi Jonathan,

Jonathan Nieder writes:
> Ramkumar Ramachandra wrote:
> > David Barr writes:
> 
> >> +	if (!backchannel.infile)
> >> +		backchannel.infile = fdopen(REPORT_FILENO, "r");
> >> +	if (!backchannel.infile)
> >> +		return error("Could not open backchannel fd: %d", REPORT_FILENO);
> >
> > REPORT_FILENO = 3 is hard-coded. Is this intended? Maybe a
> > command-line option to specify the fd?
> 
> fast-import gets the --cat-file-fd parameter to choose between stdout,
> stdin-as-socket, stderr, or another fd (not necessarily 3 because it
> might have to compete with other similar features some day).
> 
> For svn-fe, it is just like another stdin.  stdin is always fd 0,
> so...
> 
> For callers other than svn-fe, it would be especially useful to
> make it configurable, yes.

Right, got it.

> >> +	tail = buffer_read_line(&backchannel);
> >> +	if (!tail)
> >> +		return 1;
> >
> > Could you clarify when exactly will this happen?
> 
> buffer_read_line() returns NULL on error and when data is exhausted
> without the trailing newline appearing.  The input here is supposed to
> be just a single newline (trimmed to an empty string).

Thanks for the clarification.

> >> +	long preimage_len = 0;
> >> +
> >> +	if (delta) {
> >> +		if (!preimage.infile)
> >> +			preimage.infile = tmpfile();
> >
> > Didn't you later decide against this and use one tmpfile instead?
> 
> This is a single tempfile (because static).  Or am I missing
> something?

Er, sorry about that. When I saw this code, it immediately reminded me
of one of David's commits that used several temporary files- a later
one made it a global variable. I didn't notice the static here.

> >> +		if (!preimage.infile)
> >> +			die("Unable to open temp file for blob retrieval");
> >> +		if (srcMark) {
> >> +			printf("cat-blob :%"PRIu32"\n", srcMark);
> >> +			fflush(stdout);
> >> +			if (srcMode == REPO_MODE_LNK)
> >> +				fwrite("link ", 1, 5, preimage.infile);
> >
> > Special handling for symbolic links. Perhaps you should mention it in
> > a comment here?
> 
> Or better yet, a comment in the commit message. :)

*nod*

> >> +			if (fast_export_save_blob(preimage.infile))
> >> +				die("Failed to retrieve blob for delta application");
> >> +		}
> >> +		preimage_len = ftell(preimage.infile);
> >> +		fseek(preimage.infile, 0, SEEK_SET);
> >> +		if (!postimage.infile)
> >> +			postimage.infile = tmpfile();
> >
> > One tmpfile?
> 
> Do you mean letting the preimage and postimage share a file?

No :)

> [...]
> >>  	printf("blob\nmark :%"PRIu32"\ndata %"PRIu32"\n", mark, len);
> >> -	buffer_copy_bytes(input, stdout, len);
> >> +	if (!delta)
> >> +		buffer_copy_bytes(input, stdout, len);
> >> +	else
> >> +		buffer_copy_bytes(&postimage, stdout, len);
> >>  	fputc('\n', stdout);
> >
> > I should have asked this a long time ago: why the extra newline?
> 
> From the fast-import manual:
> 
> 	The LF after <raw> is optional (it used to be required)
> 	but recommended. Always including it makes debugging a
> 	fast-import stream easier as the next command always
> 	starts in column 0 of the next line, even if <raw> did
> 	not end with an LF.

Thanks for the explanation. I really should have looked this up
earlier, but I suppose it's not a biggie.

-- Ram

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/5] vcs-svn: Implement prop-delta handling.
  2010-10-15 12:54 ` [PATCH 3/5] vcs-svn: Implement prop-delta handling David Barr
@ 2010-10-18 15:10   ` Ramkumar Ramachandra
  0 siblings, 0 replies; 34+ messages in thread
From: Ramkumar Ramachandra @ 2010-10-18 15:10 UTC (permalink / raw)
  To: David Barr; +Cc: Git Mailing List, Jonathan Nieder, Sverre Rabbelier

Hi again,

Here's another review.

David Barr writes:
> By testing against the Apache Software Foundation
> repository, some simple rules for decoding prop
> deltas were derived.
> 
> 'Node-action: replace' implies the empty prop set
> as the base for the delta.
> Otherwise, if a copyfrom source is given that node
> forms the basis for the delta.
> Lastly, if the destination path exists in the active
> revision it forms the basis.
> 
> The same rules ought to apply to text deltas as well.
> 
> Apply these rules to prop handling.

> Add a placeholder srcMark parameter to fast_export_blob().

Is this related to the Prop-delta handling? Why is it in this patch?

> Signed-off-by: David Barr <david.barr@cordelta.com>

Nit: I don't know how you managed to wrap your commit message like
that- it looks like you did it by hand. My Emacs wraps at 70
characters, and that seems to be the convention in git.git as well.

Also, I don't like the commit message. Maybe something like this would
be clearer?

-- 8< --
Handle property deltas that occur in dumpfile v3. While "Prop-delta:
false" trivially implies that all the properties are given in full,
"Prop-delta: true" implies a delta against:

1. The props of the previous revision of the node when `Node-action`
   is `change`.
2. Nothing when `Node-action` is `add`. However, when
   `Node-copyfrom-path`/ `Node-copyfrom-rev` headers are present, the
   delta is against the node being copied.
3. Nothing when `Node-action` is `replace` and the destination path
   doesn't already exist in the current revision. If
   `Node-copyfrom-path`/ `Node-copyfrom-rev` headers are present, the
   delta is against the node being copied. Finally, if the destination
   path already exists in the current revision, the delta is against
   the props of that node.

> diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
> index 260cf50..d984aaa 100644
> --- a/vcs-svn/fast_export.c
> +++ b/vcs-svn/fast_export.c
> @@ -63,7 +63,9 @@ void fast_export_commit(uint32_t revision, uint32_t author, char *log,
>  	printf("progress Imported commit %"PRIu32".\n\n", revision);
>  }
>  
> -void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len, struct line_buffer *input)
> +void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len,
> +			uint32_t delta, uint32_t srcMark, uint32_t srcMode,
> +			struct line_buffer *input)

Note to self: You've switched indentation style from "tabs to align +
spaces to indent" to the "Linux tabs only" style used in linux.git.

Wait, does this change belong here?

> --- a/vcs-svn/fast_export.h
> +++ b/vcs-svn/fast_export.h
> @@ -9,6 +9,7 @@ void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
>  void fast_export_commit(uint32_t revision, uint32_t author, char *log,
>  			uint32_t uuid, uint32_t url, unsigned long timestamp);
>  void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len,
> -		      struct line_buffer *input);
> +			uint32_t delta, uint32_t srcMark, uint32_t srcMode,
> +			struct line_buffer *input);

And this?

> diff --git a/vcs-svn/repo_tree.c b/vcs-svn/repo_tree.c
> index e94d91d..b616bda 100644
> --- a/vcs-svn/repo_tree.c
> +++ b/vcs-svn/repo_tree.c
> @@ -157,6 +157,29 @@ static void repo_write_dirent(uint32_t *path, uint32_t mode,
>  		dent_remove(&dir_pointer(parent_dir_o)->entries, dent);
>  }
>  
> +uint32_t repo_read_mark(uint32_t revision, uint32_t *path)
> +{
> +	uint32_t mode = 0, content_offset = 0;
> +	struct repo_dirent *src_dent;
> +	src_dent = repo_read_dirent(revision, path);
> +	if (src_dent != NULL) {
> +		mode = src_dent->mode;
> +		content_offset = src_dent->content_offset;
> +	}
> +	return mode && mode != REPO_MODE_DIR ? content_offset : 0;

Make this clearer with an `if` statement perhaps? This looks ugly,
especially with the reader having to parse it with the correct
operator precedence in mind.

> +uint32_t repo_read_mode(uint32_t revision, uint32_t *path)
> +{
> +	uint32_t mode = 0;
> +	struct repo_dirent *src_dent;
> +	src_dent = repo_read_dirent(revision, path);
> +	if (src_dent != NULL) {
> +		mode = src_dent->mode;
> +	}

Style nit: Unnecessary braces around `if` statement.

Wait, what does all this have to do with prop deltas? Ok, I found this
slightly confusing -- I just went through the rest of the patch and
found that repo_read_mode and repo_read_mark are dependencies of the
prop-delta handling code. Maybe put them in a separate patch
immediately preceeding this one?

> diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
> index 458053e..3431c22 100644
> --- a/vcs-svn/svndump.c
> +++ b/vcs-svn/svndump.c
> @@ -45,7 +45,7 @@ static char* log_copy(uint32_t length, char *log)
>  }
>  
>  static struct {
> -	uint32_t action, propLength, textLength, srcRev, srcMode, mark, type;
> +	uint32_t action, propLength, textLength, srcRev, srcMode, srcMark, mark, type;
>  	uint32_t src[REPO_MAX_PATH_DEPTH], dst[REPO_MAX_PATH_DEPTH];
>  	uint32_t text_delta, prop_delta;
>  	char text_delta_base_md5[MD5_HEX_LENGTH + 1];
> @@ -86,6 +86,7 @@ static void reset_node_ctx(char *fname)
>  	node_ctx.src[0] = ~0;
>  	node_ctx.srcRev = 0;
>  	node_ctx.srcMode = 0;
> +	node_ctx.srcMark = 0;
>  	pool_tok_seq(REPO_MAX_PATH_DEPTH, node_ctx.dst, "/", fname);
>  	node_ctx.mark = 0;
>  	node_ctx.text_delta = 0;
> @@ -168,17 +169,42 @@ static void read_props(void)
>  			}
>  			key = ~0;
>  			buffer_read_line(&input);
> +		} else if (!strncmp(t, "D ", 2)) {
> +			len = atoi(&t[2]);
> +			key = pool_intern(buffer_read_string(&input, len));
> +			buffer_read_line(&input);
> +			if (key == keys.svn_executable) {
> +				if (node_ctx.type == REPO_MODE_EXE)
> +					node_ctx.type = REPO_MODE_BLB;
> +			} else if (key == keys.svn_special) {
> +				if (node_ctx.type == REPO_MODE_LNK)
> +					node_ctx.type = REPO_MODE_BLB;
> +			}
> +			key = ~0;

Deleted props have to be printed in dumpfile v3 (for obvious reasons:
how else would we indicate a delta?). I know this, but I don't know
what another reviewer would make of this. You should mention it
explicitly in a comment or in the commit message.

Note to other reviewers:

Props are printed as
V <key length>
<key>
K <value length>
<value>

Deleted props are printed as
D <key length>
<key>

Yes, value is omitted.

This change populates the context with deleted props (more precisely:
changes the context when deleted props are encountered). I'm not
entirely happy that it's part of this patch: why not squash it into
2/5?

>  static void handle_node(void)
>  {
> +	if (node_ctx.prop_delta) {
> +		if (node_ctx.srcRev)
> +			node_ctx.srcMode = repo_read_mode(node_ctx.srcRev, node_ctx.src);
> +		else
> +			node_ctx.srcMode = repo_read_mode(rev_ctx.revision, node_ctx.dst);
> +		if (node_ctx.srcMode && node_ctx.action != NODEACT_REPLACE)
> +			node_ctx.type = node_ctx.srcMode;
> +	}
> +

Okay, the code to handle prop deltas. As a note to self and for the
benefit of other reviewers, here's the English version of the above:
0. Mode can be one of REPO_MODE_DIR, REPO_MODE_BLB, REPO_MODE_EXE, and
   REPO_MODE_LNK.
1. If Node-copyfrom-rev/ Node-copyfrom-path are present, set srcMode
   to the mode of the source node (as present in the source revision
   ofcourse).
2. If not, set the srcMode to the mode of the dst path in the previous
   revision.

After doing this, if srcMode is present and if Node-action is not
replace, set the mode (called `type` for historical reasons*?) of the
destination node to that of the source node. Now that we've copied the
props successfully, the call to read_props() in line 200 will reads
the props of the current revision and update the mode accordingly.

* Wait, why must we be stuck with this historical cruft?

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
-- 8< --
diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
index 3431c22..9878e3a 100644
--- a/vcs-svn/svndump.c
+++ b/vcs-svn/svndump.c
@@ -45,7 +45,7 @@ static char* log_copy(uint32_t length, char *log)
 }
 
 static struct {
-	uint32_t action, propLength, textLength, srcRev, srcMode, srcMark, mark, type;
+	uint32_t action, propLength, textLength, srcRev, srcMode, srcMark, mark, mode;
 	uint32_t src[REPO_MAX_PATH_DEPTH], dst[REPO_MAX_PATH_DEPTH];
 	uint32_t text_delta, prop_delta;
 	char text_delta_base_md5[MD5_HEX_LENGTH + 1];
@@ -79,7 +79,7 @@ static struct {
 
 static void reset_node_ctx(char *fname)
 {
-	node_ctx.type = 0;
+	node_ctx.mode = 0;
 	node_ctx.action = NODEACT_UNKNOWN;
 	node_ctx.propLength = LENGTH_UNKNOWN;
 	node_ctx.textLength = LENGTH_UNKNOWN;
@@ -163,9 +163,9 @@ static void read_props(void)
 				if (parse_date_basic(val, &rev_ctx.timestamp, NULL))
 					fprintf(stderr, "Invalid timestamp: %s\n", val);
 			} else if (key == keys.svn_executable) {
-				node_ctx.type = REPO_MODE_EXE;
+				node_ctx.mode = REPO_MODE_EXE;
 			} else if (key == keys.svn_special) {
-				node_ctx.type = REPO_MODE_LNK;
+				node_ctx.mode = REPO_MODE_LNK;
 			}
 			key = ~0;
 			buffer_read_line(&input);
@@ -174,11 +174,11 @@ static void read_props(void)
 			key = pool_intern(buffer_read_string(&input, len));
 			buffer_read_line(&input);
 			if (key == keys.svn_executable) {
-				if (node_ctx.type == REPO_MODE_EXE)
-					node_ctx.type = REPO_MODE_BLB;
+				if (node_ctx.mode == REPO_MODE_EXE)
+					node_ctx.mode = REPO_MODE_BLB;
 			} else if (key == keys.svn_special) {
-				if (node_ctx.type == REPO_MODE_LNK)
-					node_ctx.type = REPO_MODE_BLB;
+				if (node_ctx.mode == REPO_MODE_LNK)
+					node_ctx.mode = REPO_MODE_BLB;
 			}
 			key = ~0;
 		}
@@ -193,7 +193,7 @@ static void handle_node(void)
 		else
 			node_ctx.srcMode = repo_read_mode(rev_ctx.revision, node_ctx.dst);
 		if (node_ctx.srcMode && node_ctx.action != NODEACT_REPLACE)
-			node_ctx.type = node_ctx.srcMode;
+			node_ctx.mode = node_ctx.srcMode;
 	}
 
 	if (node_ctx.propLength != LENGTH_UNKNOWN && node_ctx.propLength)
@@ -207,7 +207,7 @@ static void handle_node(void)
 	}
 
 	if (node_ctx.textLength != LENGTH_UNKNOWN &&
-	    node_ctx.type != REPO_MODE_DIR)
+	    node_ctx.mode != REPO_MODE_DIR)
 		node_ctx.mark = next_blob_mark();
 
 	if (node_ctx.action == NODEACT_DELETE) {
@@ -215,27 +215,27 @@ static void handle_node(void)
 	} else if (node_ctx.action == NODEACT_CHANGE ||
 			   node_ctx.action == NODEACT_REPLACE) {
 		if (node_ctx.action == NODEACT_REPLACE &&
-		    node_ctx.type == REPO_MODE_DIR)
+		    node_ctx.mode == REPO_MODE_DIR)
 			repo_replace(node_ctx.dst, node_ctx.mark);
 		else if (node_ctx.propLength != LENGTH_UNKNOWN)
-			repo_modify(node_ctx.dst, node_ctx.type, node_ctx.mark);
+			repo_modify(node_ctx.dst, node_ctx.mode, node_ctx.mark);
 		else if (node_ctx.textLength != LENGTH_UNKNOWN)
 			node_ctx.srcMode = repo_replace(node_ctx.dst, node_ctx.mark);
 	} else if (node_ctx.action == NODEACT_ADD) {
 		if (node_ctx.srcRev && node_ctx.propLength != LENGTH_UNKNOWN)
-			repo_modify(node_ctx.dst, node_ctx.type, node_ctx.mark);
+			repo_modify(node_ctx.dst, node_ctx.mode, node_ctx.mark);
 		else if (node_ctx.srcRev && node_ctx.textLength != LENGTH_UNKNOWN)
 			node_ctx.srcMode = repo_replace(node_ctx.dst, node_ctx.mark);
-		else if ((node_ctx.type == REPO_MODE_DIR && !node_ctx.srcRev) ||
+		else if ((node_ctx.mode == REPO_MODE_DIR && !node_ctx.srcRev) ||
 			 node_ctx.textLength != LENGTH_UNKNOWN)
-			repo_add(node_ctx.dst, node_ctx.type, node_ctx.mark);
+			repo_add(node_ctx.dst, node_ctx.mode, node_ctx.mark);
 	}
 
 	if (node_ctx.propLength == LENGTH_UNKNOWN && node_ctx.srcMode)
-		node_ctx.type = node_ctx.srcMode;
+		node_ctx.mode = node_ctx.srcMode;
 
 	if (node_ctx.mark)
-		fast_export_blob(node_ctx.type, node_ctx.mark, node_ctx.textLength,
+		fast_export_blob(node_ctx.mode, node_ctx.mark, node_ctx.textLength,
 				node_ctx.text_delta, node_ctx.srcMark, node_ctx.srcMode,
 				&input);
 	else if (node_ctx.textLength != LENGTH_UNKNOWN)
@@ -284,9 +284,9 @@ void svndump_read(const char *url)
 			reset_node_ctx(val);
 		} else if (key == keys.node_kind) {
 			if (!strcmp(val, "dir"))
-				node_ctx.type = REPO_MODE_DIR;
+				node_ctx.mode = REPO_MODE_DIR;
 			else if (!strcmp(val, "file"))
-				node_ctx.type = REPO_MODE_BLB;
+				node_ctx.mode = REPO_MODE_BLB;
 			else
 				fprintf(stderr, "Unknown node-kind: %s\n", val);
 		} else if (key == keys.node_action) {


>  	if (node_ctx.propLength != LENGTH_UNKNOWN && node_ctx.propLength)
>  		read_props();
>  
> -	if (node_ctx.srcRev)
> +	if (node_ctx.srcRev) {
> +		node_ctx.srcMark = repo_read_mark(node_ctx.srcRev, node_ctx.src);
>  		node_ctx.srcMode = repo_copy(node_ctx.srcRev, node_ctx.src, node_ctx.dst);
> +	} else {
> +		node_ctx.srcMark = repo_read_mark(rev_ctx.revision, node_ctx.dst);
> +	}

Nit: Braces around `else` branch.
Again, does this change belong here?

Overall, a pleasant read. Thanks.

-- Ram

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/4] fast-import: let importers retrieve blobs
       [not found]   ` <20101119093530.GA19061@burratino>
@ 2010-11-19  9:47     ` Jonathan Nieder
  2010-11-19  9:51     ` [PATCH 4/4] fast-import: Allow cat-blob requests at arbitrary points in stream Jonathan Nieder
       [not found]     ` <20101119094045.GC19061@burratino>
  2 siblings, 0 replies; 34+ messages in thread
From: Jonathan Nieder @ 2010-11-19  9:47 UTC (permalink / raw)
  To: David Barr
  Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier,
	Shawn O. Pearce, vcs-fast-import-devs

From: David Barr <david.barr@cordelta.com>

New objects written by fast-import are not available immediately.
Until a checkpoint has been started and finishes writing the pack
index, any new blobs will not be accessible using standard git tools.

So introduce a new way to access them: a "cat-blob" command in the
command stream requests for fast-import to print a blob to stdout or a
file descriptor specified by the argument to --cat-blob-fd.  The value
for cat-blob-fd cannot be specified in the stream because that would
be a layering violation: the decision of where to direct a stream has
to be made when fast-import is started anyway, so we might as well
make the stream format is independent of that detail.

Output uses the same format as "git cat-file --batch".

Thanks to Sverre Rabbelier and Sam Vilain for guidance in designing
the protocol.

Based-on-patch-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
If you use this, you might want to do something like:

	blob 
	mark :1
	data <<EOT
	testing 1 2 3
	EOT
	cat-blob :1

before proceeding with the rest of the stream.  This allows wiring
mistakes to be caught early.

 Documentation/git-fast-import.txt |   41 ++++++++
 fast-import.c                     |   95 ++++++++++++++++++
 t/t9300-fast-import.sh            |  193 ++++++++++++++++++++++++++++++++++++-
 3 files changed, 327 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 3bf04e3..be444da 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -92,6 +92,11 @@ OPTIONS
 	--(no-)-relative-marks= with the --(import|export)-marks=
 	options.
 
+--cat-blob-fd=<fd>::
+	Specify the file descriptor that will be written to
+	when the `cat-blob` command is encountered in the stream.
+	The default behaviour is to write to `stdout`.
+
 --export-pack-edges=<file>::
 	After creating a packfile, print a line of data to
 	<file> listing the filename of the packfile and the last
@@ -320,6 +325,11 @@ and control the current import process.  More detailed discussion
 	standard output.  This command is optional and is not needed
 	to perform an import.
 
+`cat-blob`::
+	Causes fast-import to print a blob in 'cat-file --batch'
+	format to the file descriptor set with `--cat-blob-fd` or
+	`stdout` if unspecified.
+
 `feature`::
 	Require that fast-import supports the specified feature, or
 	abort if it does not.
@@ -872,6 +882,29 @@ Placing a `progress` command immediately after a `checkpoint` will
 inform the reader when the `checkpoint` has been completed and it
 can safely access the refs that fast-import updated.
 
+`cat-blob`
+~~~~~~~~~~
+Causes fast-import to print a blob to a file descriptor previously
+arranged with the `--cat-blob-fd` argument.  The command otherwise
+has no impact on the current import; its main purpose is to
+retrieve blobs that may be in fast-import's memory but not
+accessible from the target repository.
+
+....
+	'cat-blob' SP <dataref> LF
+....
+
+The `<dataref>` can be either a mark reference (`:<idnum>`)
+set previously or a full 40-byte SHA-1 of a Git blob, preexisting or
+ready to be written.
+
+output uses the same format as `git cat-file --batch`:
+
+====
+	<sha1> SP 'blob' SP <size> LF
+	<contents> LF
+====
+
 `feature`
 ~~~~~~~~~
 Require that fast-import supports the specified feature, or abort if
@@ -898,6 +931,13 @@ import-marks::
 	second, an --import-marks= command-line option overrides
 	any "feature import-marks" command in the stream.
 
+cat-blob::
+	Ignored.  Versions of fast-import not supporting the
+	"cat-blob" command will exit with a message indicating so.
+	This lets the import error out early with a clear message,
+	rather than wasting time on the early part of an import
+	before the unsupported command is detected.
+
 `option`
 ~~~~~~~~
 Processes the specified option so that git fast-import behaves in a
@@ -923,6 +963,7 @@ not be passed as option:
 * date-format
 * import-marks
 * export-marks
+* cat-blob-fd
 * force
 
 Crash Reports
diff --git a/fast-import.c b/fast-import.c
index 959afef..88547c6 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -55,6 +55,8 @@ Format of STDIN stream:
     ('from' sp committish lf)?
     lf?;
 
+  cat_blob ::= 'cat-blob' sp (hexsha1 | idnum) lf;
+
   checkpoint ::= 'checkpoint' lf
     lf?;
 
@@ -361,6 +363,9 @@ static uintmax_t next_mark;
 static struct strbuf new_data = STRBUF_INIT;
 static int seen_data_command;
 
+/* Where to write output of cat-blob commands */
+static int cat_blob_fd = STDOUT_FILENO;
+
 static void parse_argv(void);
 
 static void write_branch_report(FILE *rpt, struct branch *b)
@@ -2689,6 +2694,79 @@ static void parse_reset_branch(void)
 		unread_command_buf = 1;
 }
 
+static void cat_blob_write(const char *buf, unsigned long size)
+{
+	if (write_in_full(cat_blob_fd, buf, size) != size)
+		die_errno("Write to frontend failed");
+}
+
+static void cat_blob(struct object_entry *oe, unsigned char sha1[20])
+{
+	struct strbuf line = STRBUF_INIT;
+	unsigned long size;
+	enum object_type type = 0;
+	char *buf;
+
+	if (!oe || oe->pack_id == MAX_PACK_ID) {
+		buf = read_sha1_file(sha1, &type, &size);
+	} else {
+		type = oe->type;
+		buf = gfi_unpack_entry(oe, &size);
+	}
+
+	/*
+	 * Output based on batch_one_object() from cat-file.c.
+	 */
+	if (type <= 0) {
+		strbuf_reset(&line);
+		strbuf_addf(&line, "%s missing\n", sha1_to_hex(sha1));
+		cat_blob_write(line.buf, line.len);
+		free(buf);
+		return;
+	}
+	if (!buf)
+		die("Can't read object %s", sha1_to_hex(sha1));
+	if (type != OBJ_BLOB)
+		die("Object %s is a %s but a blob was expected.",
+		    sha1_to_hex(sha1), typename(type));
+	strbuf_reset(&line);
+	strbuf_addf(&line, "%s %s %lu\n", sha1_to_hex(sha1),
+						typename(type), size);
+	cat_blob_write(line.buf, line.len);
+	cat_blob_write(buf, size);
+	cat_blob_write("\n", 1);
+	free(buf);
+}
+
+static void parse_cat_blob(void)
+{
+	const char *p;
+	struct object_entry *oe = oe;
+	unsigned char sha1[20];
+
+	/* cat-blob SP <object> LF */
+	p = command_buf.buf + strlen("cat-blob ");
+	if (*p == ':') {
+		char *x;
+		oe = find_mark(strtoumax(p + 1, &x, 10));
+		if (x == p + 1)
+			die("Invalid mark: %s", command_buf.buf);
+		if (!oe)
+			die("Unknown mark: %s", command_buf.buf);
+		if (*x)
+			die("Garbage after mark: %s", command_buf.buf);
+		hashcpy(sha1, oe->idx.sha1);
+	} else {
+		if (get_sha1_hex(p, sha1))
+			die("Invalid SHA1: %s", command_buf.buf);
+		if (p[40])
+			die("Garbage after SHA1: %s", command_buf.buf);
+		oe = find_object(sha1);
+	}
+
+	cat_blob(oe, sha1);
+}
+
 static void parse_checkpoint(void)
 {
 	if (object_count) {
@@ -2771,6 +2849,14 @@ static void option_export_marks(const char *marks)
 	export_marks_file = make_fast_import_path(marks);
 }
 
+static void option_cat_blob_fd(const char *fd)
+{
+	unsigned long n = ulong_arg("--cat-blob-fd", fd);
+	if (n > (unsigned long) INT_MAX)
+		die("--cat-blob-fd cannot exceed %d", INT_MAX);
+	cat_blob_fd = (int) n;
+}
+
 static void option_export_pack_edges(const char *edges)
 {
 	if (pack_edges)
@@ -2824,6 +2910,8 @@ static int parse_one_feature(const char *feature, int from_stream)
 		option_import_marks(feature + 13, from_stream);
 	} else if (!prefixcmp(feature, "export-marks=")) {
 		option_export_marks(feature + 13);
+	} else if (!strcmp(feature, "cat-blob")) {
+		; /* Don't die - this feature is supported */
 	} else if (!prefixcmp(feature, "relative-marks")) {
 		relative_marks_paths = 1;
 	} else if (!prefixcmp(feature, "no-relative-marks")) {
@@ -2918,6 +3006,11 @@ static void parse_argv(void)
 		if (parse_one_feature(a + 2, 0))
 			continue;
 
+		if (!prefixcmp(a + 2, "cat-blob-fd=")) {
+			option_cat_blob_fd(a + 2 + strlen("cat-blob-fd="));
+			continue;
+		}
+
 		die("unknown option %s", a);
 	}
 	if (i != global_argc)
@@ -2969,6 +3062,8 @@ int main(int argc, const char **argv)
 			parse_new_tag();
 		else if (!prefixcmp(command_buf.buf, "reset "))
 			parse_reset_branch();
+		else if (!prefixcmp(command_buf.buf, "cat-blob "))
+			parse_cat_blob();
 		else if (!strcmp("checkpoint", command_buf.buf))
 			parse_checkpoint();
 		else if (!prefixcmp(command_buf.buf, "progress "))
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 2c27da6..3e2741b 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -23,11 +23,18 @@ file5_data='an inline file.
 file6_data='#!/bin/sh
 echo "$@"'
 
+>empty
+
 ###
 ### series A
 ###
 
 test_tick
+
+test_expect_success 'empty stream succeeds' '
+	git fast-import </dev/null
+'
+
 cat >input <<INPUT_END
 blob
 mark :2
@@ -1501,6 +1508,190 @@ test_expect_success 'R: feature no-relative-marks should be honoured' '
     test_cmp marks.new non-relative.out
 '
 
+test_expect_success 'R: feature cat-blob supported' '
+	echo "feature cat-blob" |
+	git fast-import
+'
+
+test_expect_success 'R: cat-blob-fd must be a nonnegative integer' '
+	test_must_fail git fast-import --cat-blob-fd=-1 </dev/null
+'
+
+test_expect_success 'R: print old blob' '
+	blob=$(echo "yes it can" | git hash-object -w --stdin) &&
+	cat >expect <<-EOF &&
+	${blob} blob 11
+	yes it can
+
+	EOF
+	echo "cat-blob $blob" |
+	git fast-import --cat-blob-fd=6 6>actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'R: in-stream cat-blob-fd not respected' '
+	echo hello >greeting &&
+	blob=$(git hash-object -w greeting) &&
+	cat >expect <<-EOF &&
+	${blob} blob 6
+	hello
+
+	EOF
+	git fast-import --cat-blob-fd=3 3>actual.3 >actual.1 <<-EOF &&
+	cat-blob $blob
+	EOF
+	test_cmp expect actual.3 &&
+	test_cmp empty actual.1 &&
+	git fast-import 3>actual.3 >actual.1 <<-EOF &&
+	option cat-blob-fd=3
+	cat-blob $blob
+	EOF
+	test_cmp empty actual.3 &&
+	test_cmp expect actual.1
+'
+
+test_expect_success 'R: print new blob' '
+	blob=$(echo "yep yep yep" | git hash-object --stdin) &&
+	cat >expect <<-EOF &&
+	${blob} blob 12
+	yep yep yep
+
+	EOF
+	git fast-import --cat-blob-fd=6 6>actual <<-\EOF &&
+	blob
+	mark :1
+	data <<BLOB_END
+	yep yep yep
+	BLOB_END
+	cat-blob :1
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'R: print new blob by sha1' '
+	blob=$(echo "a new blob named by sha1" | git hash-object --stdin) &&
+	cat >expect <<-EOF &&
+	${blob} blob 25
+	a new blob named by sha1
+
+	EOF
+	git fast-import --cat-blob-fd=6 6>actual <<-EOF &&
+	blob
+	data <<BLOB_END
+	a new blob named by sha1
+	BLOB_END
+	cat-blob $blob
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'setup: big file' '
+	(
+		echo "the quick brown fox jumps over the lazy dog" >big &&
+		for i in 1 2 3
+		do
+			cat big big big big >bigger &&
+			cat bigger bigger bigger bigger >big ||
+			exit
+		done
+	)
+'
+
+test_expect_success 'R: print two blobs to stdout' '
+	blob1=$(git hash-object big) &&
+	blob1_len=$(wc -c <big) &&
+	blob2=$(echo hello | git hash-object --stdin) &&
+	{
+		echo ${blob1} blob $blob1_len &&
+		cat big &&
+		cat <<-EOF
+
+		${blob2} blob 6
+		hello
+
+		EOF
+	} >expect &&
+	{
+		cat <<-\END_PART1 &&
+			blob
+			mark :1
+			data <<data_end
+		END_PART1
+		cat big &&
+		cat <<-\EOF
+			data_end
+			blob
+			mark :2
+			data <<data_end
+			hello
+			data_end
+			cat-blob :1
+			cat-blob :2
+		EOF
+	} |
+	git fast-import >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'setup: have pipes?' '
+	rm -f frob &&
+	if mkfifo frob
+	then
+		test_set_prereq PIPE
+	fi
+'
+
+test_expect_success PIPE 'R: copy using cat-file' '
+	expect_id=$(git hash-object big) &&
+	expect_len=$(wc -c <big) &&
+	echo $expect_id blob $expect_len >expect.response &&
+
+	rm -f blobs &&
+	cat >frontend <<-\FRONTEND_END &&
+	#!/bin/sh
+	cat <<EOF &&
+	feature cat-blob
+	blob
+	mark :1
+	data <<BLOB
+	EOF
+	cat big
+	cat <<EOF
+	BLOB
+	cat-blob :1
+	EOF
+
+	read blob_id type size <&3 &&
+	echo "$blob_id $type $size" >response &&
+	dd if=/dev/stdin of=blob bs=$size count=1 <&3 &&
+	read newline <&3 &&
+
+	cat <<EOF &&
+	commit refs/heads/copied
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	copy big file as file3
+	COMMIT
+	M 644 inline file3
+	data <<BLOB
+	EOF
+	cat blob &&
+	cat <<EOF
+	BLOB
+	EOF
+	FRONTEND_END
+
+	mkfifo blobs &&
+	(
+		export GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL GIT_COMMITTER_DATE &&
+		sh frontend 3<blobs |
+		git fast-import --cat-blob-fd=3 3>blobs
+	) &&
+	git show copied:file3 >actual &&
+	test_cmp expect.response response &&
+	test_cmp big actual
+'
+
 cat >input << EOF
 option git quiet
 blob
@@ -1509,8 +1700,6 @@ hi
 
 EOF
 
-touch empty
-
 test_expect_success 'R: quiet option results in no stats being output' '
     cat input | git fast-import 2> output &&
     test_cmp empty output
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/4] fast-import: Allow cat-blob requests at arbitrary points in stream
       [not found]   ` <20101119093530.GA19061@burratino>
  2010-11-19  9:47     ` [PATCH 3/4] fast-import: let " Jonathan Nieder
@ 2010-11-19  9:51     ` Jonathan Nieder
       [not found]     ` <20101119094045.GC19061@burratino>
  2 siblings, 0 replies; 34+ messages in thread
From: Jonathan Nieder @ 2010-11-19  9:51 UTC (permalink / raw)
  To: David Barr
  Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier,
	Shawn O. Pearce, vcs-fast-import-devs

The new rule: a "cat-blob" can be inserted wherever a comment is
allowed, which means at the start of any line except in the middle of
a "data" command.

This saves frontends from having to loop over everything they want to
commit in the next commit and cat-ing the necessary objects in
advance.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
That's the end of the series.  Thanks for reading.

The early history is at [1] if that's your kind of thing.

[1] http://thread.gmane.org/gmane.comp.version-control.git/150005/focus=155417

 Documentation/git-fast-import.txt |    4 ++
 fast-import.c                     |   28 +++++++++-------
 t/t9300-fast-import.sh            |   66 +++++++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+), 12 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index be444da..d569564 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -905,6 +905,10 @@ output uses the same format as `git cat-file --batch`:
 	<contents> LF
 ====
 
+This command can be used anywhere in the stream that comments are
+accepted.  In particular, the `cat-blob` command can be used in the
+middle of a commit but not in the middle of a `data` command.
+
 `feature`
 ~~~~~~~~~
 Require that fast-import supports the specified feature, or abort if
diff --git a/fast-import.c b/fast-import.c
index 88547c6..33c6981 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -55,8 +55,6 @@ Format of STDIN stream:
     ('from' sp committish lf)?
     lf?;
 
-  cat_blob ::= 'cat-blob' sp (hexsha1 | idnum) lf;
-
   checkpoint ::= 'checkpoint' lf
     lf?;
 
@@ -134,14 +132,17 @@ Format of STDIN stream:
   ts    ::= # time since the epoch in seconds, ascii base10 notation;
   tz    ::= # GIT style timezone;
 
-     # note: comments may appear anywhere in the input, except
-     # within a data command.  Any form of the data command
-     # always escapes the related input from comment processing.
+     # note: comments and cat requests may appear anywhere
+     # in the input, except within a data command.  Any form
+     # of the data command always escapes the related input
+     # from comment processing.
      #
      # In case it is not clear, the '#' that starts the comment
      # must be the first character on that line (an lf
      # preceded it).
      #
+  cat_blob ::= 'cat-blob' sp (hexsha1 | idnum) lf;
+
   comment ::= '#' not_lf* lf;
   not_lf  ::= # Any byte that is not ASCII newline (LF);
 */
@@ -367,6 +368,7 @@ static int seen_data_command;
 static int cat_blob_fd = STDOUT_FILENO;
 
 static void parse_argv(void);
+static void parse_cat_blob(void);
 
 static void write_branch_report(FILE *rpt, struct branch *b)
 {
@@ -1791,7 +1793,6 @@ static void read_marks(void)
 	fclose(f);
 }
 
-
 static int read_next_command(void)
 {
 	static int stdin_eof = 0;
@@ -1801,7 +1802,7 @@ static int read_next_command(void)
 		return EOF;
 	}
 
-	do {
+	for (;;) {
 		if (unread_command_buf) {
 			unread_command_buf = 0;
 		} else {
@@ -1834,9 +1835,14 @@ static int read_next_command(void)
 			rc->prev->next = rc;
 			cmd_tail = rc;
 		}
-	} while (command_buf.buf[0] == '#');
-
-	return 0;
+		if (!prefixcmp(command_buf.buf, "cat-blob ")) {
+			parse_cat_blob();
+			continue;
+		}
+		if (command_buf.buf[0] == '#')
+			continue;
+		return 0;
+	}
 }
 
 static void skip_optional_lf(void)
@@ -3062,8 +3068,6 @@ int main(int argc, const char **argv)
 			parse_new_tag();
 		else if (!prefixcmp(command_buf.buf, "reset "))
 			parse_reset_branch();
-		else if (!prefixcmp(command_buf.buf, "cat-blob "))
-			parse_cat_blob();
 		else if (!strcmp("checkpoint", command_buf.buf))
 			parse_checkpoint();
 		else if (!prefixcmp(command_buf.buf, "progress "))
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 3e2741b..2a050c7 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -1692,6 +1692,72 @@ test_expect_success PIPE 'R: copy using cat-file' '
 	test_cmp big actual
 '
 
+test_expect_success PIPE 'R: print blob mid-commit' '
+	rm -f blobs &&
+	echo "A blob from _before_ the commit." >expect &&
+	mkfifo blobs &&
+	(
+		exec 3<blobs &&
+		cat <<-EOF &&
+		feature cat-blob
+		blob
+		mark :1
+		data <<BLOB
+		A blob from _before_ the commit.
+		BLOB
+		commit refs/heads/temporary
+		committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+		data <<COMMIT
+		Empty commit
+		COMMIT
+		cat-blob :1
+		EOF
+
+		read blob_id type size <&3 &&
+		dd if=/dev/stdin of=actual bs=$size count=1 <&3 &&
+		read newline <&3 &&
+
+		echo
+	) |
+	git fast-import --cat-blob-fd=3 3>blobs &&
+	test_cmp expect actual
+'
+
+test_expect_success PIPE 'R: print staged blob within commit' '
+	rm -f blobs &&
+	echo "A blob from _within_ the commit." >expect &&
+	mkfifo blobs &&
+	(
+		exec 3<blobs &&
+		cat <<-EOF &&
+		feature cat-blob
+		commit refs/heads/within
+		committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+		data <<COMMIT
+		Empty commit
+		COMMIT
+		M 644 inline within
+		data <<BLOB
+		A blob from _within_ the commit.
+		BLOB
+		EOF
+
+		to_get=$(
+			echo "A blob from _within_ the commit." |
+			git hash-object --stdin
+		) &&
+		echo "cat-blob $to_get" &&
+
+		read blob_id type size <&3 &&
+		dd if=/dev/stdin of=actual bs=$size count=1 <&3 &&
+		read newline <&3 &&
+
+		echo deleteall
+	) |
+	git fast-import --cat-blob-fd=3 3>blobs &&
+	test_cmp expect actual
+'
+
 cat >input << EOF
 option git quiet
 blob
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/4] fast-import: clarify documentation of "feature" command
       [not found]     ` <20101119094045.GC19061@burratino>
@ 2010-11-19 11:58       ` Sverre Rabbelier
  0 siblings, 0 replies; 34+ messages in thread
From: Sverre Rabbelier @ 2010-11-19 11:58 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: David Barr, Git Mailing List, Ramkumar Ramachandra,
	Shawn O. Pearce, vcs-fast-import-devs

Heya,

On Fri, Nov 19, 2010 at 10:40, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Make this more obvious by being more explicit about how the analogy
> between most "feature" commands and command-line options works.  Treat
> the feature (import-marks) that does not fit this analogy separately.

Acked-by: Sverre Rabbelier <srabbelier@gmail.com>

> In particular, it is not obvious to me whether cat-blob, ls-tree,
> and so on ought to be considered a single feature but with the
> feature command syntax, we could dodge the issue. :)  Sane?

Yes, I like that idea, although I'm not sure how clean the
implementation would be :)

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH/RFC v3 resend 0/4] fast-import: Let importers retrieve blobs
  2010-10-15 12:54 ` [PATCH 1/5] fast-import: Let importers retrieve blobs David Barr
                     ` (2 preceding siblings ...)
       [not found]   ` <20101119093530.GA19061@burratino>
@ 2010-11-28 19:41   ` Jonathan Nieder
  2010-11-28 19:42     ` [PATCH 1/4] fast-import: stricter parsing of integer options Jonathan Nieder
                       ` (3 more replies)
  3 siblings, 4 replies; 34+ messages in thread
From: Jonathan Nieder @ 2010-11-28 19:41 UTC (permalink / raw)
  To: David Barr
  Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier,
	Shawn O. Pearce, Junio C Hamano

[resending since git@vger doesn't seem to have accepted the previous
copy.  Sorry for the noise.]

David Barr wrote:

> So introduce another way: a "cat-blob" command introduced in the
> command stream requests for fast-import to print a blob to stdout
> or a file descriptor specified by the argument --cat-blob-fd.
> 
> The output uses the same format as "git cat-file --batch".

I am very fond of this patch.  Still, the fact remains that until this
command is implemented by some other fast-import backend, it is hard
to know what git-specific concepts are encoded in its current
implementation.  (I am in particular worried about what should be
gauranteed about the blob_identifier in the

	blob blob_identifier 1823

line and whether

	cat-blob blob_name
	cat-blob :11

are going to overlap and cause trouble for some backends.)

On the other hand, development of svn-fe continues to benefit from
cat-blob and its cousins ls-tree and ls[1].  and there has been brief
discussion of using cat-blob to make cvs2git more friendly.  So here's
a reroll, since it seems clear that this feature should be in future
versions of fast-import in some form.

Thoughts welcome, as always (even as simple as "I have a bad feeling
about this" or "everything in this patch looks ready to go").

Patches based against v1.7.0.7 for no particular reason.

David Barr (1):
  fast-import: let importers retrieve blobs

Jonathan Nieder (3):
  fast-import: stricter parsing of integer options
  fast-import: clarify documentation of "feature" command
  fast-import: Allow cat-blob requests at arbitrary points in stream

 Documentation/git-fast-import.txt |   76 ++++++++---
 fast-import.c                     |  128 ++++++++++++++++--
 t/t9300-fast-import.sh            |  267 ++++++++++++++++++++++++++++++++++++-
 3 files changed, 442 insertions(+), 29 deletions(-)

[1] ls-tree commit_name "path/to/file" would have output format
100644 blob blob_identifier	path/to/file and within a commit
command, ls "path/to/file" would produce output in the same
format.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/4] fast-import: stricter parsing of integer options
  2010-11-28 19:41   ` [PATCH/RFC v3 resend 0/4] fast-import: Let importers retrieve blobs Jonathan Nieder
@ 2010-11-28 19:42     ` Jonathan Nieder
  2010-11-30  1:01       ` Junio C Hamano
  2010-11-28 19:43     ` [PATCH 2/4] fast-import: clarify documentation of "feature" command Jonathan Nieder
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 34+ messages in thread
From: Jonathan Nieder @ 2010-11-28 19:42 UTC (permalink / raw)
  To: David Barr
  Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier,
	Shawn O. Pearce, Junio C Hamano

Check the result from strtoul to avoid accepting arguments like
--depth=-1 and --active-branches=foo,bar,baz.

Requested-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
See http://thread.gmane.org/gmane.comp.version-control.git/159117/focus=159236
for context.

 fast-import.c          |   13 +++++++++++--
 t/t9300-fast-import.sh |    8 ++++++++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 74f08bd..959afef 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2745,16 +2745,25 @@ static void option_date_format(const char *fmt)
 		die("unknown --date-format argument %s", fmt);
 }
 
+static unsigned long ulong_arg(const char *option, const char *arg)
+{
+	char *endptr;
+	unsigned long rv = strtoul(arg, &endptr, 0);
+	if (strchr(arg, '-') || endptr == arg || *endptr)
+		die("%s: argument must be an unsigned integer", option);
+	return rv;
+}
+
 static void option_depth(const char *depth)
 {
-	max_depth = strtoul(depth, NULL, 0);
+	max_depth = ulong_arg("--depth", depth);
 	if (max_depth > MAX_DEPTH)
 		die("--depth cannot exceed %u", MAX_DEPTH);
 }
 
 static void option_active_branches(const char *branches)
 {
-	max_active_branches = strtoul(branches, NULL, 0);
+	max_active_branches = ulong_arg("--active-branches", branches);
 }
 
 static void option_export_marks(const char *marks)
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 131f032..2c27da6 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -1528,6 +1528,14 @@ test_expect_success 'R: unknown commandline options are rejected' '\
     test_must_fail git fast-import --non-existing-option < /dev/null
 '
 
+test_expect_success 'R: die on invalid option argument' '
+	echo "option git active-branches=-5" |
+	test_must_fail git fast-import &&
+	echo "option git depth=" |
+	test_must_fail git fast-import &&
+	test_must_fail git fast-import --depth="5 elephants" </dev/null
+'
+
 cat >input <<EOF
 option non-existing-vcs non-existing-option
 EOF

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/4] fast-import: clarify documentation of "feature" command
  2010-11-28 19:41   ` [PATCH/RFC v3 resend 0/4] fast-import: Let importers retrieve blobs Jonathan Nieder
  2010-11-28 19:42     ` [PATCH 1/4] fast-import: stricter parsing of integer options Jonathan Nieder
@ 2010-11-28 19:43     ` Jonathan Nieder
  2010-11-28 19:45     ` [PATCH 3/4] fast-import: let importers retrieve blobs Jonathan Nieder
  2010-11-28 19:45     ` [PATCH 4/4] fast-import: Allow cat-blob requests at arbitrary points in stream Jonathan Nieder
  3 siblings, 0 replies; 34+ messages in thread
From: Jonathan Nieder @ 2010-11-28 19:43 UTC (permalink / raw)
  To: David Barr
  Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier,
	Shawn O. Pearce, Junio C Hamano

The "feature" command allows streams to specify options for the import
that must not be ignored.  Logically, they are part of the stream,
even though technically most supported features are synonyms to
command-line options.

Make this more obvious by being more explicit about how the analogy
between most "feature" commands and command-line options works.  Treat
the feature (import-marks) that does not fit this analogy separately.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Sverre Rabbelier <srabbelier@gmail.com>
---
Side note: I am thinking of introducing a syntax

	'feature' SP 'command' SP <command name> LF

which would just check if <command name> is a recognized command.
This way, when a feature introduces a new command, it would get
a feature name to go along with that with no extra effort.

In particular, it is not obvious to me whether cat-blob, ls-tree,
and so on ought to be considered a single feature but with the
feature command syntax, we could dodge the issue. :)  Sane?

 Documentation/git-fast-import.txt |   33 +++++++++++++++------------------
 1 files changed, 15 insertions(+), 18 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 19082b0..3bf04e3 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -878,28 +878,25 @@ Require that fast-import supports the specified feature, or abort if
 it does not.
 
 ....
-	'feature' SP <feature> LF
+	'feature' SP <feature> ('=' <argument>)? LF
 ....
 
-The <feature> part of the command may be any string matching
-^[a-zA-Z][a-zA-Z-]*$ and should be understood by fast-import.
+The <feature> part of the command may be any one of the following:
 
-Feature work identical as their option counterparts with the
-exception of the import-marks feature, see below.
+date-format::
+export-marks::
+relative-marks::
+no-relative-marks::
+force::
+	Act as though the corresponding command-line option with
+	a leading '--' was passed on the command line
+	(see OPTIONS, above).
 
-The following features are currently supported:
-
-* date-format
-* import-marks
-* export-marks
-* relative-marks
-* no-relative-marks
-* force
-
-The import-marks behaves differently from when it is specified as
-commandline option in that only one "feature import-marks" is allowed
-per stream. Also, any --import-marks= specified on the commandline
-will override those from the stream (if any).
+import-marks::
+	Like --import-marks except in two respects: first, only one
+	"feature import-marks" command is allowed per stream;
+	second, an --import-marks= command-line option overrides
+	any "feature import-marks" command in the stream.
 
 `option`
 ~~~~~~~~

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/4] fast-import: let importers retrieve blobs
  2010-11-28 19:41   ` [PATCH/RFC v3 resend 0/4] fast-import: Let importers retrieve blobs Jonathan Nieder
  2010-11-28 19:42     ` [PATCH 1/4] fast-import: stricter parsing of integer options Jonathan Nieder
  2010-11-28 19:43     ` [PATCH 2/4] fast-import: clarify documentation of "feature" command Jonathan Nieder
@ 2010-11-28 19:45     ` Jonathan Nieder
  2010-11-29 23:48       ` [PATCH] fixup! " David Barr
                         ` (3 more replies)
  2010-11-28 19:45     ` [PATCH 4/4] fast-import: Allow cat-blob requests at arbitrary points in stream Jonathan Nieder
  3 siblings, 4 replies; 34+ messages in thread
From: Jonathan Nieder @ 2010-11-28 19:45 UTC (permalink / raw)
  To: David Barr
  Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier,
	Shawn O. Pearce, Junio C Hamano

From: David Barr <david.barr@cordelta.com>

New objects written by fast-import are not available immediately.
Until a checkpoint has been started and finishes writing the pack
index, any new blobs will not be accessible using standard git tools.

So introduce a new way to access them: a "cat-blob" command in the
command stream requests for fast-import to print a blob to stdout or a
file descriptor specified by the argument to --cat-blob-fd.  The value
for cat-blob-fd cannot be specified in the stream because that would
be a layering violation: the decision of where to direct a stream has
to be made when fast-import is started anyway, so we might as well
make the stream format is independent of that detail.

Output uses the same format as "git cat-file --batch".

Thanks to Sverre Rabbelier and Sam Vilain for guidance in designing
the protocol.

Based-on-patch-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
If you use this, you might want to do something like:

	blob 
	mark :1
	data <<EOT
	testing 1 2 3
	EOT
	cat-blob :1

before proceeding with the rest of the stream.  This allows wiring
mistakes to be caught early.

 Documentation/git-fast-import.txt |   41 ++++++++
 fast-import.c                     |   95 ++++++++++++++++++
 t/t9300-fast-import.sh            |  193 ++++++++++++++++++++++++++++++++++++-
 3 files changed, 327 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 3bf04e3..be444da 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -92,6 +92,11 @@ OPTIONS
 	--(no-)-relative-marks= with the --(import|export)-marks=
 	options.
 
+--cat-blob-fd=<fd>::
+	Specify the file descriptor that will be written to
+	when the `cat-blob` command is encountered in the stream.
+	The default behaviour is to write to `stdout`.
+
 --export-pack-edges=<file>::
 	After creating a packfile, print a line of data to
 	<file> listing the filename of the packfile and the last
@@ -320,6 +325,11 @@ and control the current import process.  More detailed discussion
 	standard output.  This command is optional and is not needed
 	to perform an import.
 
+`cat-blob`::
+	Causes fast-import to print a blob in 'cat-file --batch'
+	format to the file descriptor set with `--cat-blob-fd` or
+	`stdout` if unspecified.
+
 `feature`::
 	Require that fast-import supports the specified feature, or
 	abort if it does not.
@@ -872,6 +882,29 @@ Placing a `progress` command immediately after a `checkpoint` will
 inform the reader when the `checkpoint` has been completed and it
 can safely access the refs that fast-import updated.
 
+`cat-blob`
+~~~~~~~~~~
+Causes fast-import to print a blob to a file descriptor previously
+arranged with the `--cat-blob-fd` argument.  The command otherwise
+has no impact on the current import; its main purpose is to
+retrieve blobs that may be in fast-import's memory but not
+accessible from the target repository.
+
+....
+	'cat-blob' SP <dataref> LF
+....
+
+The `<dataref>` can be either a mark reference (`:<idnum>`)
+set previously or a full 40-byte SHA-1 of a Git blob, preexisting or
+ready to be written.
+
+output uses the same format as `git cat-file --batch`:
+
+====
+	<sha1> SP 'blob' SP <size> LF
+	<contents> LF
+====
+
 `feature`
 ~~~~~~~~~
 Require that fast-import supports the specified feature, or abort if
@@ -898,6 +931,13 @@ import-marks::
 	second, an --import-marks= command-line option overrides
 	any "feature import-marks" command in the stream.
 
+cat-blob::
+	Ignored.  Versions of fast-import not supporting the
+	"cat-blob" command will exit with a message indicating so.
+	This lets the import error out early with a clear message,
+	rather than wasting time on the early part of an import
+	before the unsupported command is detected.
+
 `option`
 ~~~~~~~~
 Processes the specified option so that git fast-import behaves in a
@@ -923,6 +963,7 @@ not be passed as option:
 * date-format
 * import-marks
 * export-marks
+* cat-blob-fd
 * force
 
 Crash Reports
diff --git a/fast-import.c b/fast-import.c
index 959afef..88547c6 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -55,6 +55,8 @@ Format of STDIN stream:
     ('from' sp committish lf)?
     lf?;
 
+  cat_blob ::= 'cat-blob' sp (hexsha1 | idnum) lf;
+
   checkpoint ::= 'checkpoint' lf
     lf?;
 
@@ -361,6 +363,9 @@ static uintmax_t next_mark;
 static struct strbuf new_data = STRBUF_INIT;
 static int seen_data_command;
 
+/* Where to write output of cat-blob commands */
+static int cat_blob_fd = STDOUT_FILENO;
+
 static void parse_argv(void);
 
 static void write_branch_report(FILE *rpt, struct branch *b)
@@ -2689,6 +2694,79 @@ static void parse_reset_branch(void)
 		unread_command_buf = 1;
 }
 
+static void cat_blob_write(const char *buf, unsigned long size)
+{
+	if (write_in_full(cat_blob_fd, buf, size) != size)
+		die_errno("Write to frontend failed");
+}
+
+static void cat_blob(struct object_entry *oe, unsigned char sha1[20])
+{
+	struct strbuf line = STRBUF_INIT;
+	unsigned long size;
+	enum object_type type = 0;
+	char *buf;
+
+	if (!oe || oe->pack_id == MAX_PACK_ID) {
+		buf = read_sha1_file(sha1, &type, &size);
+	} else {
+		type = oe->type;
+		buf = gfi_unpack_entry(oe, &size);
+	}
+
+	/*
+	 * Output based on batch_one_object() from cat-file.c.
+	 */
+	if (type <= 0) {
+		strbuf_reset(&line);
+		strbuf_addf(&line, "%s missing\n", sha1_to_hex(sha1));
+		cat_blob_write(line.buf, line.len);
+		free(buf);
+		return;
+	}
+	if (!buf)
+		die("Can't read object %s", sha1_to_hex(sha1));
+	if (type != OBJ_BLOB)
+		die("Object %s is a %s but a blob was expected.",
+		    sha1_to_hex(sha1), typename(type));
+	strbuf_reset(&line);
+	strbuf_addf(&line, "%s %s %lu\n", sha1_to_hex(sha1),
+						typename(type), size);
+	cat_blob_write(line.buf, line.len);
+	cat_blob_write(buf, size);
+	cat_blob_write("\n", 1);
+	free(buf);
+}
+
+static void parse_cat_blob(void)
+{
+	const char *p;
+	struct object_entry *oe = oe;
+	unsigned char sha1[20];
+
+	/* cat-blob SP <object> LF */
+	p = command_buf.buf + strlen("cat-blob ");
+	if (*p == ':') {
+		char *x;
+		oe = find_mark(strtoumax(p + 1, &x, 10));
+		if (x == p + 1)
+			die("Invalid mark: %s", command_buf.buf);
+		if (!oe)
+			die("Unknown mark: %s", command_buf.buf);
+		if (*x)
+			die("Garbage after mark: %s", command_buf.buf);
+		hashcpy(sha1, oe->idx.sha1);
+	} else {
+		if (get_sha1_hex(p, sha1))
+			die("Invalid SHA1: %s", command_buf.buf);
+		if (p[40])
+			die("Garbage after SHA1: %s", command_buf.buf);
+		oe = find_object(sha1);
+	}
+
+	cat_blob(oe, sha1);
+}
+
 static void parse_checkpoint(void)
 {
 	if (object_count) {
@@ -2771,6 +2849,14 @@ static void option_export_marks(const char *marks)
 	export_marks_file = make_fast_import_path(marks);
 }
 
+static void option_cat_blob_fd(const char *fd)
+{
+	unsigned long n = ulong_arg("--cat-blob-fd", fd);
+	if (n > (unsigned long) INT_MAX)
+		die("--cat-blob-fd cannot exceed %d", INT_MAX);
+	cat_blob_fd = (int) n;
+}
+
 static void option_export_pack_edges(const char *edges)
 {
 	if (pack_edges)
@@ -2824,6 +2910,8 @@ static int parse_one_feature(const char *feature, int from_stream)
 		option_import_marks(feature + 13, from_stream);
 	} else if (!prefixcmp(feature, "export-marks=")) {
 		option_export_marks(feature + 13);
+	} else if (!strcmp(feature, "cat-blob")) {
+		; /* Don't die - this feature is supported */
 	} else if (!prefixcmp(feature, "relative-marks")) {
 		relative_marks_paths = 1;
 	} else if (!prefixcmp(feature, "no-relative-marks")) {
@@ -2918,6 +3006,11 @@ static void parse_argv(void)
 		if (parse_one_feature(a + 2, 0))
 			continue;
 
+		if (!prefixcmp(a + 2, "cat-blob-fd=")) {
+			option_cat_blob_fd(a + 2 + strlen("cat-blob-fd="));
+			continue;
+		}
+
 		die("unknown option %s", a);
 	}
 	if (i != global_argc)
@@ -2969,6 +3062,8 @@ int main(int argc, const char **argv)
 			parse_new_tag();
 		else if (!prefixcmp(command_buf.buf, "reset "))
 			parse_reset_branch();
+		else if (!prefixcmp(command_buf.buf, "cat-blob "))
+			parse_cat_blob();
 		else if (!strcmp("checkpoint", command_buf.buf))
 			parse_checkpoint();
 		else if (!prefixcmp(command_buf.buf, "progress "))
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 2c27da6..3e2741b 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -23,11 +23,18 @@ file5_data='an inline file.
 file6_data='#!/bin/sh
 echo "$@"'
 
+>empty
+
 ###
 ### series A
 ###
 
 test_tick
+
+test_expect_success 'empty stream succeeds' '
+	git fast-import </dev/null
+'
+
 cat >input <<INPUT_END
 blob
 mark :2
@@ -1501,6 +1508,190 @@ test_expect_success 'R: feature no-relative-marks should be honoured' '
     test_cmp marks.new non-relative.out
 '
 
+test_expect_success 'R: feature cat-blob supported' '
+	echo "feature cat-blob" |
+	git fast-import
+'
+
+test_expect_success 'R: cat-blob-fd must be a nonnegative integer' '
+	test_must_fail git fast-import --cat-blob-fd=-1 </dev/null
+'
+
+test_expect_success 'R: print old blob' '
+	blob=$(echo "yes it can" | git hash-object -w --stdin) &&
+	cat >expect <<-EOF &&
+	${blob} blob 11
+	yes it can
+
+	EOF
+	echo "cat-blob $blob" |
+	git fast-import --cat-blob-fd=6 6>actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'R: in-stream cat-blob-fd not respected' '
+	echo hello >greeting &&
+	blob=$(git hash-object -w greeting) &&
+	cat >expect <<-EOF &&
+	${blob} blob 6
+	hello
+
+	EOF
+	git fast-import --cat-blob-fd=3 3>actual.3 >actual.1 <<-EOF &&
+	cat-blob $blob
+	EOF
+	test_cmp expect actual.3 &&
+	test_cmp empty actual.1 &&
+	git fast-import 3>actual.3 >actual.1 <<-EOF &&
+	option cat-blob-fd=3
+	cat-blob $blob
+	EOF
+	test_cmp empty actual.3 &&
+	test_cmp expect actual.1
+'
+
+test_expect_success 'R: print new blob' '
+	blob=$(echo "yep yep yep" | git hash-object --stdin) &&
+	cat >expect <<-EOF &&
+	${blob} blob 12
+	yep yep yep
+
+	EOF
+	git fast-import --cat-blob-fd=6 6>actual <<-\EOF &&
+	blob
+	mark :1
+	data <<BLOB_END
+	yep yep yep
+	BLOB_END
+	cat-blob :1
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'R: print new blob by sha1' '
+	blob=$(echo "a new blob named by sha1" | git hash-object --stdin) &&
+	cat >expect <<-EOF &&
+	${blob} blob 25
+	a new blob named by sha1
+
+	EOF
+	git fast-import --cat-blob-fd=6 6>actual <<-EOF &&
+	blob
+	data <<BLOB_END
+	a new blob named by sha1
+	BLOB_END
+	cat-blob $blob
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'setup: big file' '
+	(
+		echo "the quick brown fox jumps over the lazy dog" >big &&
+		for i in 1 2 3
+		do
+			cat big big big big >bigger &&
+			cat bigger bigger bigger bigger >big ||
+			exit
+		done
+	)
+'
+
+test_expect_success 'R: print two blobs to stdout' '
+	blob1=$(git hash-object big) &&
+	blob1_len=$(wc -c <big) &&
+	blob2=$(echo hello | git hash-object --stdin) &&
+	{
+		echo ${blob1} blob $blob1_len &&
+		cat big &&
+		cat <<-EOF
+
+		${blob2} blob 6
+		hello
+
+		EOF
+	} >expect &&
+	{
+		cat <<-\END_PART1 &&
+			blob
+			mark :1
+			data <<data_end
+		END_PART1
+		cat big &&
+		cat <<-\EOF
+			data_end
+			blob
+			mark :2
+			data <<data_end
+			hello
+			data_end
+			cat-blob :1
+			cat-blob :2
+		EOF
+	} |
+	git fast-import >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'setup: have pipes?' '
+	rm -f frob &&
+	if mkfifo frob
+	then
+		test_set_prereq PIPE
+	fi
+'
+
+test_expect_success PIPE 'R: copy using cat-file' '
+	expect_id=$(git hash-object big) &&
+	expect_len=$(wc -c <big) &&
+	echo $expect_id blob $expect_len >expect.response &&
+
+	rm -f blobs &&
+	cat >frontend <<-\FRONTEND_END &&
+	#!/bin/sh
+	cat <<EOF &&
+	feature cat-blob
+	blob
+	mark :1
+	data <<BLOB
+	EOF
+	cat big
+	cat <<EOF
+	BLOB
+	cat-blob :1
+	EOF
+
+	read blob_id type size <&3 &&
+	echo "$blob_id $type $size" >response &&
+	dd if=/dev/stdin of=blob bs=$size count=1 <&3 &&
+	read newline <&3 &&
+
+	cat <<EOF &&
+	commit refs/heads/copied
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	copy big file as file3
+	COMMIT
+	M 644 inline file3
+	data <<BLOB
+	EOF
+	cat blob &&
+	cat <<EOF
+	BLOB
+	EOF
+	FRONTEND_END
+
+	mkfifo blobs &&
+	(
+		export GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL GIT_COMMITTER_DATE &&
+		sh frontend 3<blobs |
+		git fast-import --cat-blob-fd=3 3>blobs
+	) &&
+	git show copied:file3 >actual &&
+	test_cmp expect.response response &&
+	test_cmp big actual
+'
+
 cat >input << EOF
 option git quiet
 blob
@@ -1509,8 +1700,6 @@ hi
 
 EOF
 
-touch empty
-
 test_expect_success 'R: quiet option results in no stats being output' '
     cat input | git fast-import 2> output &&
     test_cmp empty output

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/4] fast-import: Allow cat-blob requests at arbitrary points in stream
  2010-11-28 19:41   ` [PATCH/RFC v3 resend 0/4] fast-import: Let importers retrieve blobs Jonathan Nieder
                       ` (2 preceding siblings ...)
  2010-11-28 19:45     ` [PATCH 3/4] fast-import: let importers retrieve blobs Jonathan Nieder
@ 2010-11-28 19:45     ` Jonathan Nieder
  3 siblings, 0 replies; 34+ messages in thread
From: Jonathan Nieder @ 2010-11-28 19:45 UTC (permalink / raw)
  To: David Barr
  Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier,
	Shawn O. Pearce, Junio C Hamano

The new rule: a "cat-blob" can be inserted wherever a comment is
allowed, which means at the start of any line except in the middle of
a "data" command.

This saves frontends from having to loop over everything they want to
commit in the next commit and cat-ing the necessary objects in
advance.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
That's the end of the series.  Thanks for reading.

The early history is at [1] if that's your kind of thing.

[1] http://thread.gmane.org/gmane.comp.version-control.git/150005/focus=155417

 Documentation/git-fast-import.txt |    4 ++
 fast-import.c                     |   28 +++++++++-------
 t/t9300-fast-import.sh            |   66 +++++++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+), 12 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index be444da..d569564 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -905,6 +905,10 @@ output uses the same format as `git cat-file --batch`:
 	<contents> LF
 ====
 
+This command can be used anywhere in the stream that comments are
+accepted.  In particular, the `cat-blob` command can be used in the
+middle of a commit but not in the middle of a `data` command.
+
 `feature`
 ~~~~~~~~~
 Require that fast-import supports the specified feature, or abort if
diff --git a/fast-import.c b/fast-import.c
index 88547c6..33c6981 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -55,8 +55,6 @@ Format of STDIN stream:
     ('from' sp committish lf)?
     lf?;
 
-  cat_blob ::= 'cat-blob' sp (hexsha1 | idnum) lf;
-
   checkpoint ::= 'checkpoint' lf
     lf?;
 
@@ -134,14 +132,17 @@ Format of STDIN stream:
   ts    ::= # time since the epoch in seconds, ascii base10 notation;
   tz    ::= # GIT style timezone;
 
-     # note: comments may appear anywhere in the input, except
-     # within a data command.  Any form of the data command
-     # always escapes the related input from comment processing.
+     # note: comments and cat requests may appear anywhere
+     # in the input, except within a data command.  Any form
+     # of the data command always escapes the related input
+     # from comment processing.
      #
      # In case it is not clear, the '#' that starts the comment
      # must be the first character on that line (an lf
      # preceded it).
      #
+  cat_blob ::= 'cat-blob' sp (hexsha1 | idnum) lf;
+
   comment ::= '#' not_lf* lf;
   not_lf  ::= # Any byte that is not ASCII newline (LF);
 */
@@ -367,6 +368,7 @@ static int seen_data_command;
 static int cat_blob_fd = STDOUT_FILENO;
 
 static void parse_argv(void);
+static void parse_cat_blob(void);
 
 static void write_branch_report(FILE *rpt, struct branch *b)
 {
@@ -1791,7 +1793,6 @@ static void read_marks(void)
 	fclose(f);
 }
 
-
 static int read_next_command(void)
 {
 	static int stdin_eof = 0;
@@ -1801,7 +1802,7 @@ static int read_next_command(void)
 		return EOF;
 	}
 
-	do {
+	for (;;) {
 		if (unread_command_buf) {
 			unread_command_buf = 0;
 		} else {
@@ -1834,9 +1835,14 @@ static int read_next_command(void)
 			rc->prev->next = rc;
 			cmd_tail = rc;
 		}
-	} while (command_buf.buf[0] == '#');
-
-	return 0;
+		if (!prefixcmp(command_buf.buf, "cat-blob ")) {
+			parse_cat_blob();
+			continue;
+		}
+		if (command_buf.buf[0] == '#')
+			continue;
+		return 0;
+	}
 }
 
 static void skip_optional_lf(void)
@@ -3062,8 +3068,6 @@ int main(int argc, const char **argv)
 			parse_new_tag();
 		else if (!prefixcmp(command_buf.buf, "reset "))
 			parse_reset_branch();
-		else if (!prefixcmp(command_buf.buf, "cat-blob "))
-			parse_cat_blob();
 		else if (!strcmp("checkpoint", command_buf.buf))
 			parse_checkpoint();
 		else if (!prefixcmp(command_buf.buf, "progress "))
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 3e2741b..2a050c7 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -1692,6 +1692,72 @@ test_expect_success PIPE 'R: copy using cat-file' '
 	test_cmp big actual
 '
 
+test_expect_success PIPE 'R: print blob mid-commit' '
+	rm -f blobs &&
+	echo "A blob from _before_ the commit." >expect &&
+	mkfifo blobs &&
+	(
+		exec 3<blobs &&
+		cat <<-EOF &&
+		feature cat-blob
+		blob
+		mark :1
+		data <<BLOB
+		A blob from _before_ the commit.
+		BLOB
+		commit refs/heads/temporary
+		committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+		data <<COMMIT
+		Empty commit
+		COMMIT
+		cat-blob :1
+		EOF
+
+		read blob_id type size <&3 &&
+		dd if=/dev/stdin of=actual bs=$size count=1 <&3 &&
+		read newline <&3 &&
+
+		echo
+	) |
+	git fast-import --cat-blob-fd=3 3>blobs &&
+	test_cmp expect actual
+'
+
+test_expect_success PIPE 'R: print staged blob within commit' '
+	rm -f blobs &&
+	echo "A blob from _within_ the commit." >expect &&
+	mkfifo blobs &&
+	(
+		exec 3<blobs &&
+		cat <<-EOF &&
+		feature cat-blob
+		commit refs/heads/within
+		committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+		data <<COMMIT
+		Empty commit
+		COMMIT
+		M 644 inline within
+		data <<BLOB
+		A blob from _within_ the commit.
+		BLOB
+		EOF
+
+		to_get=$(
+			echo "A blob from _within_ the commit." |
+			git hash-object --stdin
+		) &&
+		echo "cat-blob $to_get" &&
+
+		read blob_id type size <&3 &&
+		dd if=/dev/stdin of=actual bs=$size count=1 <&3 &&
+		read newline <&3 &&
+
+		echo deleteall
+	) |
+	git fast-import --cat-blob-fd=3 3>blobs &&
+	test_cmp expect actual
+'
+
 cat >input << EOF
 option git quiet
 blob

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH] fixup! fast-import: let importers retrieve blobs
  2010-11-28 19:45     ` [PATCH 3/4] fast-import: let importers retrieve blobs Jonathan Nieder
@ 2010-11-29 23:48       ` David Barr
  2010-11-30  0:16         ` David Barr
  2010-11-30  1:22         ` Jonathan Nieder
  2010-12-03 10:30       ` [PATCH 3/4] " Thomas Rast
                         ` (2 subsequent siblings)
  3 siblings, 2 replies; 34+ messages in thread
From: David Barr @ 2010-11-29 23:48 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Git Mailing List, Junio C Hamano, David Barr

Signed-off-by: David Barr <david.barr@cordelta.com>
---
 fast-import.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 4dfea07..aa8f260 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2752,6 +2752,7 @@ static void cat_blob(struct object_entry *oe, unsigned char sha1[20])
 		strbuf_reset(&line);
 		strbuf_addf(&line, "%s missing\n", sha1_to_hex(sha1));
 		cat_blob_write(line.buf, line.len);
+		strbuf_release(&line);
 		free(buf);
 		return;
 	}
@@ -2764,6 +2765,7 @@ static void cat_blob(struct object_entry *oe, unsigned char sha1[20])
 	strbuf_addf(&line, "%s %s %lu\n", sha1_to_hex(sha1),
 						typename(type), size);
 	cat_blob_write(line.buf, line.len);
+	strbuf_release(&line);
 	cat_blob_write(buf, size);
 	cat_blob_write("\n", 1);
 	free(buf);
-- 
1.7.3.2.846.gf4b062

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH] fixup! fast-import: let importers retrieve blobs
  2010-11-29 23:48       ` [PATCH] fixup! " David Barr
@ 2010-11-30  0:16         ` David Barr
  2010-11-30  1:22         ` Jonathan Nieder
  1 sibling, 0 replies; 34+ messages in thread
From: David Barr @ 2010-11-30  0:16 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Git Mailing List, Junio C Hamano

> David Barr wrote:
> 
> [plug two memory leaks in "[PATCH 3/4] fast-import: let importers retr..."]
> >
> > Signed-off-by: David Barr <david.barr@cordelta.com>
> 
> Good eyes, thanks!
> 
> Acked-by: Jonathan Nieder <jrnieder@gmail.com>

I only caught it because I was copying and adapting the same bit of code for 
my 'ls' implementation. The svndiff0 implementation taught me to feel nervous
wherever I see STRBUF_INIT ;)

--
David Barr

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] fast-import: stricter parsing of integer options
  2010-11-28 19:42     ` [PATCH 1/4] fast-import: stricter parsing of integer options Jonathan Nieder
@ 2010-11-30  1:01       ` Junio C Hamano
  0 siblings, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2010-11-30  1:01 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: David Barr, Git Mailing List, Ramkumar Ramachandra,
	Sverre Rabbelier, Shawn O. Pearce, Junio C Hamano

Jonathan Nieder <jrnieder@gmail.com> writes:

> +static unsigned long ulong_arg(const char *option, const char *arg)
> +{
> +	char *endptr;
> +	unsigned long rv = strtoul(arg, &endptr, 0);
> +	if (strchr(arg, '-') || endptr == arg || *endptr)
> +		die("%s: argument must be an unsigned integer", option);

Micronit.

It probably is Ok for the target audience, but it might be more proper to
call it "non-negative integer" ("unsigned integer" is a container to hold
such quantity).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] fixup! fast-import: let importers retrieve blobs
  2010-11-29 23:48       ` [PATCH] fixup! " David Barr
  2010-11-30  0:16         ` David Barr
@ 2010-11-30  1:22         ` Jonathan Nieder
  1 sibling, 0 replies; 34+ messages in thread
From: Jonathan Nieder @ 2010-11-30  1:22 UTC (permalink / raw)
  To: David Barr; +Cc: Git Mailing List, Junio C Hamano

David Barr wrote:

[plug two memory leaks in "[PATCH 3/4] fast-import: let importers retr..."]
>
> Signed-off-by: David Barr <david.barr@cordelta.com>

Good eyes, thanks!

Acked-by: Jonathan Nieder <jrnieder@gmail.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/4] fast-import: let importers retrieve blobs
  2010-11-28 19:45     ` [PATCH 3/4] fast-import: let importers retrieve blobs Jonathan Nieder
  2010-11-29 23:48       ` [PATCH] fixup! " David Barr
@ 2010-12-03 10:30       ` Thomas Rast
  2010-12-03 19:06         ` Jonathan Nieder
                           ` (2 more replies)
  2010-12-04  2:35       ` Jonathan Nieder
  2011-01-16  2:16       ` [PATCH] Documentation/fast-import: capitalize beginning of sentence Jonathan Nieder
  3 siblings, 3 replies; 34+ messages in thread
From: Thomas Rast @ 2010-12-03 10:30 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: David Barr, Git Mailing List, Ramkumar Ramachandra,
	Sverre Rabbelier, Shawn O. Pearce, Junio C Hamano

Jonathan Nieder wrote:
> +test_expect_success PIPE 'R: copy using cat-file' '
[...]
> +	dd if=/dev/stdin of=blob bs=$size count=1 <&3 &&

This breaks my automated tester, though I am not sure exactly why.  It
runs RHEL5, and I have

  $ ls -l /dev/std*
  lrwxrwxrwx 1 root root 15 Sep  1 09:25 /dev/stderr -> /proc/self/fd/2
  lrwxrwxrwx 1 root root 15 Sep  1 09:25 /dev/stdin -> /proc/self/fd/0
  lrwxrwxrwx 1 root root 15 Sep  1 09:25 /dev/stdout -> /proc/self/fd/1

But from the tests I get back

  dd: opening `/dev/stdin': No such file or directory
  error: git-fast-import died of signal 13
  not ok - 110 R: copy using cat-file

In any case I cannot see a reason to use this construct: 'dd' reads
from stdin by default, so you could just leave away the option.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/4] fast-import: let importers retrieve blobs
  2010-12-03 10:30       ` [PATCH 3/4] " Thomas Rast
@ 2010-12-03 19:06         ` Jonathan Nieder
  2010-12-03 20:17         ` Junio C Hamano
  2010-12-04 13:24         ` Thomas Rast
  2 siblings, 0 replies; 34+ messages in thread
From: Jonathan Nieder @ 2010-12-03 19:06 UTC (permalink / raw)
  To: Thomas Rast
  Cc: David Barr, Git Mailing List, Ramkumar Ramachandra,
	Sverre Rabbelier, Shawn O. Pearce, Junio C Hamano

Thomas Rast wrote:

> In any case I cannot see a reason to use this construct: 'dd' reads
> from stdin by default, so you could just leave away the option.

Yes, sorry about that.  I had confused dd with 'tar' which defaults to
rmt0 on some systems.  Fixed locally.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/4] fast-import: let importers retrieve blobs
  2010-12-03 10:30       ` [PATCH 3/4] " Thomas Rast
  2010-12-03 19:06         ` Jonathan Nieder
@ 2010-12-03 20:17         ` Junio C Hamano
  2010-12-03 20:26           ` Jonathan Nieder
  2010-12-04 13:24         ` Thomas Rast
  2 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2010-12-03 20:17 UTC (permalink / raw)
  To: Thomas Rast
  Cc: Jonathan Nieder, David Barr, Git Mailing List,
	Ramkumar Ramachandra, Sverre Rabbelier, Shawn O. Pearce

Thomas Rast <trast@student.ethz.ch> writes:

> But from the tests I get back
>
>   dd: opening `/dev/stdin': No such file or directory
>   error: git-fast-import died of signal 13
>   not ok - 110 R: copy using cat-file
>
> In any case I cannot see a reason to use this construct: 'dd' reads
> from stdin by default, so you could just leave away the option.

Thanks for testing and reporting.
-- >8 --
Subject: t9300: remove unnecessary use of /dev/stdin

We really shouldn't be using these funny /dev/* files that did not exist
in V7 UNIX in our tests when we do not have to.

Output from

    $ git grep -n -e /dev/ --and --not -e /dev/null t/

tells us that, aside from use of /dev/urandom in apache.conf used in http
tests, "dd if=/dev/stdin" added recently to t/t9300-fast-import.sh are the
only offenders, so removing them should be straightforward.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 t/t9300-fast-import.sh |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index d615d04..055ddc6 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -1794,7 +1794,7 @@ test_expect_success PIPE 'R: copy using cat-file' '
 
 	read blob_id type size <&3 &&
 	echo "$blob_id $type $size" >response &&
-	dd if=/dev/stdin of=blob bs=$size count=1 <&3 &&
+	dd of=blob bs=$size count=1 <&3 &&
 	read newline <&3 &&
 
 	cat <<EOF &&
@@ -1845,7 +1845,7 @@ test_expect_success PIPE 'R: print blob mid-commit' '
 		EOF
 
 		read blob_id type size <&3 &&
-		dd if=/dev/stdin of=actual bs=$size count=1 <&3 &&
+		dd of=actual bs=$size count=1 <&3 &&
 		read newline <&3 &&
 
 		echo
@@ -1880,7 +1880,7 @@ test_expect_success PIPE 'R: print staged blob within commit' '
 		echo "cat-blob $to_get" &&
 
 		read blob_id type size <&3 &&
-		dd if=/dev/stdin of=actual bs=$size count=1 <&3 &&
+		dd of=actual bs=$size count=1 <&3 &&
 		read newline <&3 &&
 
 		echo deleteall

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/4] fast-import: let importers retrieve blobs
  2010-12-03 20:17         ` Junio C Hamano
@ 2010-12-03 20:26           ` Jonathan Nieder
  0 siblings, 0 replies; 34+ messages in thread
From: Jonathan Nieder @ 2010-12-03 20:26 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Thomas Rast, David Barr, Git Mailing List, Ramkumar Ramachandra,
	Sverre Rabbelier, Shawn O. Pearce

Junio C Hamano wrote:

> Output from
> 
>     $ git grep -n -e /dev/ --and --not -e /dev/null t/

FWIW

	$ git grep -e 'dd if='

shows a few missed harmless examples.  Perhaps

	$ git grep -e '/dev/[^n]'

would have been the simplest way to catch them.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/4] fast-import: let importers retrieve blobs
  2010-11-28 19:45     ` [PATCH 3/4] fast-import: let importers retrieve blobs Jonathan Nieder
  2010-11-29 23:48       ` [PATCH] fixup! " David Barr
  2010-12-03 10:30       ` [PATCH 3/4] " Thomas Rast
@ 2010-12-04  2:35       ` Jonathan Nieder
  2011-01-16  2:16       ` [PATCH] Documentation/fast-import: capitalize beginning of sentence Jonathan Nieder
  3 siblings, 0 replies; 34+ messages in thread
From: Jonathan Nieder @ 2010-12-04  2:35 UTC (permalink / raw)
  To: David Barr
  Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier,
	Shawn O. Pearce, Junio C Hamano

Jonathan Nieder wrote:
>                                                              The value
> for cat-blob-fd cannot be specified in the stream because that would
> be a layering violation: the decision of where to direct a stream has
> to be made when fast-import is started anyway, so we might as well
> make the stream format is independent of that detail.

Ungrammatical.  I think I meant:

 There is no POSIX facility to open a file descriptor from outside
 after a process has already started; therefore, the frontend has to
 prepare a file descriptor for writing blobs before executing
 git fast-import.  The --cat-blob-fd command line option indicates
 which file descriptor that is, defaulting to 1.

 It does not make sense to wait until the stream starts to specify
 which fd so it is not allowed, avoiding a potential layering
 violation.  Other fast-import backends might provide other ways to
 specify where the blob stream should be written.

> +++ b/fast-import.c
> @@ -2824,6 +2910,8 @@ static int parse_one_feature(const char *feature, int from_stream)
>  		option_import_marks(feature + 13, from_stream);
>  	} else if (!prefixcmp(feature, "export-marks=")) {
>  		option_export_marks(feature + 13);
> +	} else if (!strcmp(feature, "cat-blob")) {
> +		; /* Don't die - this feature is supported */

Implies support for a "--cat-blob" command line option
that checks for cat-blob support.  Is this wanted?

(If so, it should be documented.  If not, the condition should be 
"from_stream && !strcmp(...)".)

> @@ -2918,6 +3006,11 @@ static void parse_argv(void)
>  		if (parse_one_feature(a + 2, 0))
>  			continue;
>  
> +		if (!prefixcmp(a + 2, "cat-blob-fd=")) {
> +			option_cat_blob_fd(a + 2 + strlen("cat-blob-fd="));
> +			continue;
> +		}
> +

Would be simpler and more explicit to put in parse_one_feature:

	} else if (!from_stream && !prefixcmp(feature, "cat-blob-fd=")) {

Sorry this is taking so long to get right. :-/
Jonathan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/4] fast-import: let importers retrieve blobs
  2010-12-03 10:30       ` [PATCH 3/4] " Thomas Rast
  2010-12-03 19:06         ` Jonathan Nieder
  2010-12-03 20:17         ` Junio C Hamano
@ 2010-12-04 13:24         ` Thomas Rast
  2 siblings, 0 replies; 34+ messages in thread
From: Thomas Rast @ 2010-12-04 13:24 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: David Barr, Git Mailing List, Ramkumar Ramachandra,
	Sverre Rabbelier, Shawn O. Pearce, Junio C Hamano

Thomas Rast wrote:
> Jonathan Nieder wrote:
> > +test_expect_success PIPE 'R: copy using cat-file' '
> [...]
> > +	dd if=/dev/stdin of=blob bs=$size count=1 <&3 &&
> 
> This breaks my automated tester, though I am not sure exactly why.  It
> runs RHEL5, and I have
> 
>   lrwxrwxrwx 1 root root 15 Sep  1 09:25 /dev/stdin -> /proc/self/fd/0

Ah, answering my own question: on my normal box, strace'ing dd[1] in
such an invocation uses

  open("/dev/stdin", O_RDONLY)            = 3
  dup2(3, 0)                              = 0
  close(3)                                = 0

OTOH on RHEL5[2] it tries a different order:

  close(0)                                = 0
  open("/dev/stdin", O_RDONLY)            = -1 ENOENT (No such file or directory)

Oops.


[1] dd --version says: dd (coreutils) 7.1
[2] dd (coreutils) 5.97

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH] Documentation/fast-import: capitalize beginning of sentence
  2010-11-28 19:45     ` [PATCH 3/4] fast-import: let importers retrieve blobs Jonathan Nieder
                         ` (2 preceding siblings ...)
  2010-12-04  2:35       ` Jonathan Nieder
@ 2011-01-16  2:16       ` Jonathan Nieder
  3 siblings, 0 replies; 34+ messages in thread
From: Jonathan Nieder @ 2011-01-16  2:16 UTC (permalink / raw)
  To: David Barr
  Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier,
	Shawn O. Pearce, Junio C Hamano

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Noticed by skimming through the v1.7.3..v1.7.4-rc2 diff.

 Documentation/git-fast-import.txt |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index e2a46a5..43d2174 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -905,7 +905,7 @@ The `<dataref>` can be either a mark reference (`:<idnum>`)
 set previously or a full 40-byte SHA-1 of a Git blob, preexisting or
 ready to be written.
 
-output uses the same format as `git cat-file --batch`:
+Output uses the same format as `git cat-file --batch`:
 
 ====
 	<sha1> SP 'blob' SP <size> LF
-- 
1.7.4.rc2

^ permalink raw reply related	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2011-01-16  2:16 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-15 12:54 [PATCHv2] Add support for subversion dump format v3 David Barr
2010-10-15 12:54 ` [PATCH 1/5] fast-import: Let importers retrieve blobs David Barr
2010-10-18  7:36   ` Ramkumar Ramachandra
2010-10-18  8:50     ` Jonathan Nieder
2010-10-18  8:26   ` Jonathan Nieder
     [not found]   ` <20101119093530.GA19061@burratino>
2010-11-19  9:47     ` [PATCH 3/4] fast-import: let " Jonathan Nieder
2010-11-19  9:51     ` [PATCH 4/4] fast-import: Allow cat-blob requests at arbitrary points in stream Jonathan Nieder
     [not found]     ` <20101119094045.GC19061@burratino>
2010-11-19 11:58       ` [PATCH 2/4] fast-import: clarify documentation of "feature" command Sverre Rabbelier
2010-11-28 19:41   ` [PATCH/RFC v3 resend 0/4] fast-import: Let importers retrieve blobs Jonathan Nieder
2010-11-28 19:42     ` [PATCH 1/4] fast-import: stricter parsing of integer options Jonathan Nieder
2010-11-30  1:01       ` Junio C Hamano
2010-11-28 19:43     ` [PATCH 2/4] fast-import: clarify documentation of "feature" command Jonathan Nieder
2010-11-28 19:45     ` [PATCH 3/4] fast-import: let importers retrieve blobs Jonathan Nieder
2010-11-29 23:48       ` [PATCH] fixup! " David Barr
2010-11-30  0:16         ` David Barr
2010-11-30  1:22         ` Jonathan Nieder
2010-12-03 10:30       ` [PATCH 3/4] " Thomas Rast
2010-12-03 19:06         ` Jonathan Nieder
2010-12-03 20:17         ` Junio C Hamano
2010-12-03 20:26           ` Jonathan Nieder
2010-12-04 13:24         ` Thomas Rast
2010-12-04  2:35       ` Jonathan Nieder
2011-01-16  2:16       ` [PATCH] Documentation/fast-import: capitalize beginning of sentence Jonathan Nieder
2010-11-28 19:45     ` [PATCH 4/4] fast-import: Allow cat-blob requests at arbitrary points in stream Jonathan Nieder
2010-10-15 12:54 ` [PATCH 2/5] vcs-svn: Extend svndump to parse version 3 format David Barr
2010-10-15 12:54 ` [PATCH 3/5] vcs-svn: Implement prop-delta handling David Barr
2010-10-18 15:10   ` Ramkumar Ramachandra
2010-10-15 12:54 ` [PATCH 4/5] vcs-svn: Add outfile option to buffer_copy_bytes() David Barr
2010-10-18  8:59   ` Jonathan Nieder
2010-10-15 12:54 ` [PATCH 5/5] svn-fe: Use the cat-blob command to apply deltas David Barr
2010-10-18  6:57   ` Ramkumar Ramachandra
2010-10-18  9:24     ` Jonathan Nieder
2010-10-18 12:18       ` Ramkumar Ramachandra
2010-10-18  9:54 ` [PATCHv2] Add support for subversion dump format v3 Jonathan Nieder

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).