git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 1/7] Teach fast-import to print the id of each imported commit
  2010-10-12 13:40 David Barr
@ 2010-10-12 13:40 ` David Barr
  0 siblings, 0 replies; 29+ messages in thread
From: David Barr @ 2010-10-12 13:40 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Jonathan Nieder, Ramkumar Ramachandra, Sverre Rabbelier,
	Shawn O. Pearce

From: Jonathan Nieder <jrnieder@gmail.com>

For the svn importer, it would be useful to build a map from
subversion revision numbers to git commits as the import takes place.
This is particularly relevant because the subversion api sometimes
represents as "copy this directory from this revision", and the
importer needs to be able to access the corresponding trees.  So
(optionally) print each commit id when the corresponding object is
written.

Unfortunately when each commit object is written, it is not yet
accessible to the caller until a checkpoint has finished.  A later
patch will teach fast-import to directly pass on the relevant data on
request, using the same channel.

Cc: Shawn O. Pearce <spearce@spearce.org>
Cc: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/git-fast-import.txt |   13 +++++++++++++
 fast-import.c                     |   18 ++++++++++++++++++
 t/t9300-fast-import.sh            |   36 ++++++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 966ba4f..e217635 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -92,6 +92,18 @@ OPTIONS
 	--(no-)-relative-marks= with the --(import|export)-marks=
 	options.
 
+--report-fd=<fd>::
+	Print the 40-character object name for each commit to
+	the specified file descriptor before writing it to the
+	pack.  This information may be useful if the importer
+	needs to maintain a map from revisions in the source
+	repository to commit ids in the target repository
+	during the import.
++
+The described objects are not necessarily accessible
+using standard git plumbing tools until a little while
+after the next checkpoint.
+
 --export-pack-edges=<file>::
 	After creating a packfile, print a line of data to
 	<file> listing the filename of the packfile and the last
@@ -896,6 +908,7 @@ The following features are currently supported:
 * date-format
 * import-marks
 * export-marks
+* report-fd
 * relative-marks
 * no-relative-marks
 * force
diff --git a/fast-import.c b/fast-import.c
index 2317b0f..ef0cee7 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -361,6 +361,9 @@ static uintmax_t next_mark;
 static struct strbuf new_data = STRBUF_INIT;
 static int seen_data_command;
 
+/* Where to report commits */
+static int report_fd = -1;
+
 static void parse_argv(void);
 
 static void write_branch_report(FILE *rpt, struct branch *b)
@@ -2571,6 +2574,11 @@ static void parse_new_commit(void)
 
 	if (!store_object(OBJ_COMMIT, &new_data, NULL, b->sha1, next_mark))
 		b->pack_id = pack_id;
+	if (report_fd != -1) {
+		char *buf = sha1_to_hex(b->sha1);
+		buf[40] = '\n';
+		write_or_die(report_fd, buf, 41);
+	}
 	b->last_commit = object_count_by_type[OBJ_COMMIT];
 }
 
@@ -2755,6 +2763,14 @@ static void option_export_marks(const char *marks)
 	safe_create_leading_directories_const(export_marks_file);
 }
 
+static void option_report_fd(const char *fd)
+{
+	unsigned long n = strtoul(fd, NULL, 0);
+	if (n > (unsigned long) INT_MAX)
+		die("--report-fd cannot exceed %d", INT_MAX);
+	report_fd = (int) n;
+}
+
 static void option_export_pack_edges(const char *edges)
 {
 	if (pack_edges)
@@ -2808,6 +2824,8 @@ static int parse_one_feature(const char *feature, int from_stream)
 		option_import_marks(feature + 13, from_stream);
 	} else if (!prefixcmp(feature, "export-marks=")) {
 		option_export_marks(feature + 13);
+	} else if (!prefixcmp(feature, "report-fd=")) {
+		option_report_fd(feature + strlen("report-fd="));
 	} else if (!prefixcmp(feature, "relative-marks")) {
 		relative_marks_paths = 1;
 	} else if (!prefixcmp(feature, "no-relative-marks")) {
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 7c05920..4371308 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -1611,6 +1611,42 @@ test_expect_success 'R: feature no-relative-marks should be honoured' '
     test_cmp marks.new non-relative.out
 '
 
+test_expect_success 'setup: have pipes?' '
+	test_when_finished "rm -f frob" &&
+	if mkfifo frob
+	then
+		test_set_prereq PIPE
+	fi
+'
+
+test_expect_success PIPE 'R: fast-import --report-fd' '
+	cat >frontend <<-\FRONTEND_END &&
+	#!/bin/sh
+	cat <<EOF &&
+	commit refs/heads/printed
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	to be printed
+	COMMIT
+
+	from refs/heads/master
+	D file3
+
+	EOF
+
+	read cid <&3 &&
+	echo "$cid" >received
+	EOF
+	FRONTEND_END
+
+	mkfifo commits &&
+	test_when_finished "rm -f commits" &&
+	sh frontend 3<commits |
+	git fast-import --report-fd=3 3>commits &&
+	git rev-parse --verify printed >real &&
+	test_cmp real received
+'
+
 cat >input << EOF
 option git quiet
 blob
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH/RFC] Add support for subversion dump format v3
@ 2010-10-12 13:50 David Barr
  2010-10-12 13:50 ` [PATCH 1/7] Teach fast-import to print the id of each imported commit David Barr
                   ` (7 more replies)
  0 siblings, 8 replies; 29+ messages in thread
From: David Barr @ 2010-10-12 13:50 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Jonathan Nieder, Ramkumar Ramachandra, Sverre Rabbelier

This series follows Jonathan Nieder's svn diff applier series.

Patches 1 to 4 add the required infrastructure to fast-import.
The primary feature is the addition of the cat-file command to
fast-import. This allows access to objects written to the
the current pack prior to a checkpoint and is critical to
retrieving full-texts to drive the diff applier.

Patch 5 adds the basic parsing necessary to process the v3 format.

Patch 6 adds logic around decoding prop deltas.

Patch 7 integrates svn-fe with svn-da to decode text deltas.
It was primarily authored by Jonathan but inspired by Ram and
completed by myself.
This is probably the most sprawling of the series and likely the
heaviest target of review.

 Documentation/git-fast-import.txt |   49 +++++++++
 contrib/svn-fe/svn-fe.txt         |    6 +-
 fast-import.c                     |  147 +++++++++++++++++++++++++-
 t/t9010-svn-fe.sh                 |    6 +-
 t/t9300-fast-import.sh            |  210 +++++++++++++++++++++++++++++++++++++
 test-svn-fe.c                     |    3 +-
 vcs-svn/fast_export.c             |  187 ++++++++++++++++++++++++++++++++-
 vcs-svn/fast_export.h             |   13 ++-
 vcs-svn/repo_tree.c               |   37 +++++++-
 vcs-svn/repo_tree.h               |    3 +
 vcs-svn/svndiff.c                 |   11 +--
 vcs-svn/svndiff.h                 |    3 +-
 vcs-svn/svndump.c                 |   99 ++++++++++++++++--
 13 files changed, 739 insertions(+), 35 deletions(-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/7] Teach fast-import to print the id of each imported commit
  2010-10-12 13:50 [PATCH/RFC] Add support for subversion dump format v3 David Barr
@ 2010-10-12 13:50 ` David Barr
  2010-10-12 14:42   ` Sverre Rabbelier
  2010-10-12 22:06   ` Jonathan Nieder
  2010-10-12 13:50 ` [PATCH 2/7] fast-import: Let importers retrieve the objects being written David Barr
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 29+ messages in thread
From: David Barr @ 2010-10-12 13:50 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Jonathan Nieder, Ramkumar Ramachandra, Sverre Rabbelier,
	Shawn O. Pearce

From: Jonathan Nieder <jrnieder@gmail.com>

For the svn importer, it would be useful to build a map from
subversion revision numbers to git commits as the import takes place.
This is particularly relevant because the subversion api sometimes
represents as "copy this directory from this revision", and the
importer needs to be able to access the corresponding trees.  So
(optionally) print each commit id when the corresponding object is
written.

Unfortunately when each commit object is written, it is not yet
accessible to the caller until a checkpoint has finished.  A later
patch will teach fast-import to directly pass on the relevant data on
request, using the same channel.

Cc: Shawn O. Pearce <spearce@spearce.org>
Cc: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/git-fast-import.txt |   13 +++++++++++++
 fast-import.c                     |   18 ++++++++++++++++++
 t/t9300-fast-import.sh            |   36 ++++++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 966ba4f..e217635 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -92,6 +92,18 @@ OPTIONS
 	--(no-)-relative-marks= with the --(import|export)-marks=
 	options.
 
+--report-fd=<fd>::
+	Print the 40-character object name for each commit to
+	the specified file descriptor before writing it to the
+	pack.  This information may be useful if the importer
+	needs to maintain a map from revisions in the source
+	repository to commit ids in the target repository
+	during the import.
++
+The described objects are not necessarily accessible
+using standard git plumbing tools until a little while
+after the next checkpoint.
+
 --export-pack-edges=<file>::
 	After creating a packfile, print a line of data to
 	<file> listing the filename of the packfile and the last
@@ -896,6 +908,7 @@ The following features are currently supported:
 * date-format
 * import-marks
 * export-marks
+* report-fd
 * relative-marks
 * no-relative-marks
 * force
diff --git a/fast-import.c b/fast-import.c
index 2317b0f..ef0cee7 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -361,6 +361,9 @@ static uintmax_t next_mark;
 static struct strbuf new_data = STRBUF_INIT;
 static int seen_data_command;
 
+/* Where to report commits */
+static int report_fd = -1;
+
 static void parse_argv(void);
 
 static void write_branch_report(FILE *rpt, struct branch *b)
@@ -2571,6 +2574,11 @@ static void parse_new_commit(void)
 
 	if (!store_object(OBJ_COMMIT, &new_data, NULL, b->sha1, next_mark))
 		b->pack_id = pack_id;
+	if (report_fd != -1) {
+		char *buf = sha1_to_hex(b->sha1);
+		buf[40] = '\n';
+		write_or_die(report_fd, buf, 41);
+	}
 	b->last_commit = object_count_by_type[OBJ_COMMIT];
 }
 
@@ -2755,6 +2763,14 @@ static void option_export_marks(const char *marks)
 	safe_create_leading_directories_const(export_marks_file);
 }
 
+static void option_report_fd(const char *fd)
+{
+	unsigned long n = strtoul(fd, NULL, 0);
+	if (n > (unsigned long) INT_MAX)
+		die("--report-fd cannot exceed %d", INT_MAX);
+	report_fd = (int) n;
+}
+
 static void option_export_pack_edges(const char *edges)
 {
 	if (pack_edges)
@@ -2808,6 +2824,8 @@ static int parse_one_feature(const char *feature, int from_stream)
 		option_import_marks(feature + 13, from_stream);
 	} else if (!prefixcmp(feature, "export-marks=")) {
 		option_export_marks(feature + 13);
+	} else if (!prefixcmp(feature, "report-fd=")) {
+		option_report_fd(feature + strlen("report-fd="));
 	} else if (!prefixcmp(feature, "relative-marks")) {
 		relative_marks_paths = 1;
 	} else if (!prefixcmp(feature, "no-relative-marks")) {
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 7c05920..4371308 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -1611,6 +1611,42 @@ test_expect_success 'R: feature no-relative-marks should be honoured' '
     test_cmp marks.new non-relative.out
 '
 
+test_expect_success 'setup: have pipes?' '
+	test_when_finished "rm -f frob" &&
+	if mkfifo frob
+	then
+		test_set_prereq PIPE
+	fi
+'
+
+test_expect_success PIPE 'R: fast-import --report-fd' '
+	cat >frontend <<-\FRONTEND_END &&
+	#!/bin/sh
+	cat <<EOF &&
+	commit refs/heads/printed
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	to be printed
+	COMMIT
+
+	from refs/heads/master
+	D file3
+
+	EOF
+
+	read cid <&3 &&
+	echo "$cid" >received
+	EOF
+	FRONTEND_END
+
+	mkfifo commits &&
+	test_when_finished "rm -f commits" &&
+	sh frontend 3<commits |
+	git fast-import --report-fd=3 3>commits &&
+	git rev-parse --verify printed >real &&
+	test_cmp real received
+'
+
 cat >input << EOF
 option git quiet
 blob
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/7] fast-import: Let importers retrieve the objects being written
  2010-10-12 13:50 [PATCH/RFC] Add support for subversion dump format v3 David Barr
  2010-10-12 13:50 ` [PATCH 1/7] Teach fast-import to print the id of each imported commit David Barr
@ 2010-10-12 13:50 ` David Barr
  2010-10-12 22:38   ` Jonathan Nieder
  2010-10-12 13:50 ` [PATCH 3/7] fast-import: Allow cat command with empty path David Barr
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: David Barr @ 2010-10-12 13:50 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Jonathan Nieder, Ramkumar Ramachandra, Sverre Rabbelier,
	Shawn O. Pearce, David Barr

From: Jonathan Nieder <jrnieder@gmail.com>

As the description of the "progress" option in git-fast-import.1
hints, there is no convenient way to immediately access the objects
written to a new repository through fast-import.  Until a checkpoint
has been started and finishes writing the pack index, any new blobs,
trees, and commits will not be accessible using standard git tools.

So introduce another way: a "cat" command introduced in the command
stream requests for fast-import to print an object to the same
report-fd stream used to report commits being written.

The output uses the same format as "git cat-file --batch".

Like cat-file --batch, this does not provide an option to dereference
objects to a type of the requestor's choosing.  Tags are presented
as tags, commits as commits, and trees as trees.

Objects can be specified by path within a tree as well, using a

 cat TREE "PATH"

syntax.  With this syntax, also, the tree can only be specified by
:n marker or 40-digit tree id.

Cc: Shawn O. Pearce <spearce@spearce.org>
Cc: David Barr <david.barr@cordelta.com>
Cc: Ramkumar Ramachandra <artagnon@gmail.com>
Helped-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/git-fast-import.txt |   34 +++++++++++-
 fast-import.c                     |  108 +++++++++++++++++++++++++++++++++++++
 t/t9300-fast-import.sh            |  102 ++++++++++++++++++++++++++++++++++
 3 files changed, 243 insertions(+), 1 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index e217635..2cf48f5 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -102,7 +102,8 @@ OPTIONS
 +
 The described objects are not necessarily accessible
 using standard git plumbing tools until a little while
-after the next checkpoint.
+after the next checkpoint.  To request access to the
+objects before then, use `cat` lines in the command stream.
 
 --export-pack-edges=<file>::
 	After creating a packfile, print a line of data to
@@ -332,6 +333,10 @@ and control the current import process.  More detailed discussion
 	standard output.  This command is optional and is not needed
 	to perform an import.
 
+`cat`::
+	Causes fast-import to print an object in 'cat-file --batch'
+	format to the file descriptor set with "feature report-fd".
+
 `feature`::
 	Require that fast-import supports the specified feature, or
 	abort if it does not.
@@ -888,6 +893,33 @@ Placing a `progress` command immediately after a `checkpoint` will
 inform the reader when the `checkpoint` has been completed and it
 can safely access the refs that fast-import updated.
 
+`cat`
+~~~~~
+Causes fast-import to print an object to a file descriptor
+previously arranged with the `--report-fd` option.  The command
+otherwise has no impact on the current import; its main purpose is to
+retrieve objects that may be in fast-import's memory but not
+accessible from the target repository a little quicker than by the
+method suggested by the description of the `progress` option.
+
+....
+	'cat' SP <dataref> LF
+....
+
+The `<dataref>` can be either a mark reference (`:<idnum>`)
+set previously, or a full 40-byte SHA-1 of any Git object,
+preexisting or ready to be written.
+
+If `<dataref>` refers to a tree object, it may be followed by
+a path within that tree to retrieve a subtree or blob.
+
+....
+	'cat' SP <treeref> SP <path> LF
+....
+
+A `<path>` string should be surrounded with quotation marks and
+use C-style escaping.
+
 `feature`
 ~~~~~~~~~
 Require that fast-import supports the specified feature, or abort if
diff --git a/fast-import.c b/fast-import.c
index ef0cee7..099f63e 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -55,6 +55,9 @@ Format of STDIN stream:
     ('from' sp committish lf)?
     lf?;
 
+  cat_request ::= 'cat' sp (hexsha1 | idnum) lf
+    | 'cat' sp hexsha1 sp path_str lf;
+
   checkpoint ::= 'checkpoint' lf
     lf?;
 
@@ -2688,6 +2691,109 @@ static void parse_reset_branch(void)
 		unread_command_buf = 1;
 }
 
+static void quoted_path_sha1(unsigned char sha1[20], struct tree_entry *root,
+				const char *path, const char *line)
+{
+	struct strbuf uq = STRBUF_INIT;
+	struct tree_entry leaf = {0};
+	const char *x;
+
+	if (unquote_c_style(&uq, path, &x))
+		die("Invalid path: %s", line);
+	if (*x)
+		die("Garbage after path: %s", line);
+	tree_content_get(root, uq.buf, &leaf);
+	if (!leaf.versions[1].mode)
+		die("Path %s not in branch", uq.buf);
+	hashcpy(sha1, leaf.versions[1].sha1);
+}
+
+static void sendreport(const char *buf, unsigned long size)
+{
+	if (write_in_full(report_fd, buf, size) != size)
+		die_errno("Write to frontend failed");
+}
+
+static void cat_object(struct object_entry *oe, unsigned char sha1[20])
+{
+	struct strbuf line = STRBUF_INIT;
+	unsigned long size;
+	enum object_type type = 0;
+	char *buf;
+
+	if (report_fd < 0)
+		die("Internal error: bad report_fd %d", report_fd);
+	if (oe && oe->pack_id != MAX_PACK_ID) {
+		type = oe->type;
+		buf = gfi_unpack_entry(oe, &size);
+	} else {
+		buf = read_sha1_file(sha1, &type, &size);
+	}
+	if (!buf)
+		die("Can't read object %s", sha1_to_hex(sha1));
+
+	/*
+	 * Output based on batch_one_object() from cat-file.c.
+	 */
+	if (type <= 0) {
+		strbuf_reset(&line);
+		strbuf_addf(&line, "%s missing\n", sha1_to_hex(sha1));
+		if (write_in_full(report_fd, line.buf, line.len) != line.len)
+			die_errno("Write to frontend failed 1");
+	}
+	strbuf_reset(&line);
+	strbuf_addf(&line, "%s %s %lu\n", sha1_to_hex(sha1),
+						typename(type), size);
+	sendreport(line.buf, line.len);
+	sendreport(buf, size);
+	sendreport("\n", 1);
+	free(buf);
+}
+
+
+static void parse_cat_request(void)
+{
+	const char *p;
+	struct object_entry *oe = oe;
+	unsigned char sha1[20];
+	struct tree_entry root = {0};
+
+	/* cat SP <object> */
+	p = command_buf.buf + strlen("cat ");
+	if (report_fd < 0)
+		die("The cat command requires the report-fd feature.");
+	if (*p == ':') {
+		char *x;
+		oe = find_mark(strtoumax(p + 1, &x, 10));
+		if (x == p + 1)
+			die("Invalid mark: %s", command_buf.buf);
+		if (!oe)
+			die("Unknown mark: %s", command_buf.buf);
+		p = x;
+		hashcpy(sha1, oe->idx.sha1);
+	} else {
+		if (get_sha1_hex(p, sha1))
+			die("Invalid SHA1: %s", command_buf.buf);
+		p += 40;
+		if (!*p)
+			oe = find_object(sha1);
+	}
+
+	/* [ SP "<path>" ] */
+	if (*p) {
+		if (*p++ != ' ')
+			die("Missing space after SHA1: %s", command_buf.buf);
+
+		/* cat <tree> "<path>" form. */
+		hashcpy(root.versions[1].sha1, sha1);
+		load_tree(&root);
+		quoted_path_sha1(sha1, &root, p, command_buf.buf);
+		oe = find_object(sha1);
+	}
+
+	cat_object(oe, sha1);
+}
+
 static void parse_checkpoint(void)
 {
 	if (object_count) {
@@ -2971,6 +3077,8 @@ int main(int argc, const char **argv)
 			parse_new_tag();
 		else if (!prefixcmp(command_buf.buf, "reset "))
 			parse_reset_branch();
+		else if (!prefixcmp(command_buf.buf, "cat "))
+			parse_cat_request();
 		else if (!strcmp("checkpoint", command_buf.buf))
 			parse_checkpoint();
 		else if (!prefixcmp(command_buf.buf, "progress "))
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 4371308..8155b85 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -1647,6 +1647,108 @@ test_expect_success PIPE 'R: fast-import --report-fd' '
 	test_cmp real received
 '
 
+test_expect_success PIPE 'R: report-fd: can feed back printed tree' '
+	cat >frontend <<-\FRONTEND_END &&
+	#!/bin/sh
+	cat <<EOF &&
+	commit refs/heads/printed
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	to be printed
+	COMMIT
+
+	from refs/heads/master
+	D file3
+
+	EOF
+
+	read commit_id <&3 &&
+	echo "$commit_id" >printed &&
+	echo "$commit_id commit" >expect.response &&
+	echo "cat $commit_id" &&
+	read cid2 type size <&3 &&
+	echo "$cid2 $type" >response &&
+	dd if=/dev/stdin of=commit bs=1 count=$size <&3 &&
+	read newline <&3 &&
+	read tree tree_id <commit &&
+
+	cat <<EOF &&
+	commit refs/heads/printed
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	to be printed
+	COMMIT
+
+	from refs/heads/printed^0
+	M 040000 $tree_id old
+
+	EOF
+	read cid <&3
+	FRONTEND_END
+
+	mkfifo commits &&
+	test_when_finished "rm -f commits" &&
+	sh frontend 3<commits |
+	git fast-import --report-fd=3 3>commits &&
+	git rev-parse printed^ >expect.printed &&
+	git cat-file commit printed^ >expect.commit &&
+
+	test_cmp expect.printed printed &&
+	test_cmp expect.response response &&
+	test_cmp expect.commit commit
+'
+
+test_expect_success PIPE 'R: report-fd: can feed back printed blob' '
+	file6_id=$(echo "$file6_data" | git hash-object --stdin) &&
+	cat >expect <<-EOF &&
+	:100755 100644 $file6_id $file6_id C100	newdir/exec.sh	file6
+	EOF
+
+	cat >frontend <<-\FRONTEND_END &&
+	#!/bin/sh
+
+	branch=$(git rev-parse --verify refs/heads/branch) &&
+	cat <<EOF &&
+	feature report-fd=3
+	cat $branch
+	EOF
+
+	read commit_id type size <&3 &&
+	dd if=/dev/stdin of=commit bs=1 count=$size <&3 &&
+	read newline <&3 &&
+	read tree tree_id <commit &&
+
+	echo "cat $tree_id \"newdir/exec.sh\"" &&
+	read blob_id type size <&3 &&
+	dd if=/dev/stdin of=blob bs=1 count=$size <&3 &&
+	read newline <&3 &&
+
+	cat <<EOF &&
+	commit refs/heads/copyblob
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	copy file6 to top level
+	COMMIT
+
+	from refs/heads/branch^0
+	M 644 inline "file6"
+	data $size
+	EOF
+	cat blob &&
+	echo &&
+	echo &&
+
+	read cid <&3
+	FRONTEND_END
+
+	mkfifo commits &&
+	test_when_finished "rm -f commits" &&
+	sh frontend 3<commits |
+	git fast-import 3>commits &&
+	git diff-tree -C --find-copies-harder -r copyblob^ copyblob >actual &&
+	compare_diff_raw expect actual
+'
+
 cat >input << EOF
 option git quiet
 blob
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/7] fast-import: Allow cat command with empty path
  2010-10-12 13:50 [PATCH/RFC] Add support for subversion dump format v3 David Barr
  2010-10-12 13:50 ` [PATCH 1/7] Teach fast-import to print the id of each imported commit David Barr
  2010-10-12 13:50 ` [PATCH 2/7] fast-import: Let importers retrieve the objects being written David Barr
@ 2010-10-12 13:50 ` David Barr
  2010-10-12 22:47   ` Jonathan Nieder
  2010-10-12 13:50 ` [PATCH 4/7] fast-import: Allow cat requests at arbitrary points in stream David Barr
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: David Barr @ 2010-10-12 13:50 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Jonathan Nieder, Ramkumar Ramachandra, Sverre Rabbelier,
	David Barr

From: Jonathan Nieder <jrnieder@gmail.com>

Rather than erroring out, treat an empty path as the path to
the root of a tree so frontends can be simplified a little.

While at it, fix a typo in an error message: the cat command is used
to examine paths within trees, not branches.

Cc: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 fast-import.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 099f63e..f3c4123 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2702,9 +2702,13 @@ static void quoted_path_sha1(unsigned char sha1[20], struct tree_entry *root,
 		die("Invalid path: %s", line);
 	if (*x)
 		die("Garbage after path: %s", line);
+	if (uq.len == 0) {
+		hashcpy(sha1, root->versions[1].sha1);
+		return;
+	}
 	tree_content_get(root, uq.buf, &leaf);
 	if (!leaf.versions[1].mode)
-		die("Path %s not in branch", uq.buf);
+		die("Path %s not in tree", uq.buf);
 	hashcpy(sha1, leaf.versions[1].sha1);
 }
 
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/7] fast-import: Allow cat requests at arbitrary points in stream
  2010-10-12 13:50 [PATCH/RFC] Add support for subversion dump format v3 David Barr
                   ` (2 preceding siblings ...)
  2010-10-12 13:50 ` [PATCH 3/7] fast-import: Allow cat command with empty path David Barr
@ 2010-10-12 13:50 ` David Barr
  2010-10-12 22:50   ` Jonathan Nieder
  2010-10-12 13:50 ` [PATCH 5/7] vcs-svn: extend svndump to parse version 3 format David Barr
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: David Barr @ 2010-10-12 13:50 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Jonathan Nieder, Ramkumar Ramachandra, Sverre Rabbelier,
	David Barr

From: Jonathan Nieder <jrnieder@gmail.com>

Rather than planning in advance and saving up a bunch of objects
before a "commit" command, frontends might want to wait until the
middle of a commit and request objects then.

Allow them to.

The new rule: a "cat" request can be inserted wherever a comment
is allowed, which means at the start of any line except in the
middle of a "data" command.

Cc: Ramkumar Ramachandra <artagnon@gmail.com>
Cc: David Barr <david.barr@cordelta.com>
Cc: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/git-fast-import.txt |    4 ++
 fast-import.c                     |   27 ++++++++------
 t/t9300-fast-import.sh            |   72 +++++++++++++++++++++++++++++++++++++
 3 files changed, 92 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 2cf48f5..b670c7d 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -920,6 +920,10 @@ a path within that tree to retrieve a subtree or blob.
 A `<path>` string should be surrounded with quotation marks and
 use C-style escaping.
 
+The `cat` command can be used anywhere that comments are
+acceptable.  In particular, the `cat` command can be used in the
+middle of a commit, but not in the middle of a `data` stream.
+
 `feature`
 ~~~~~~~~~
 Require that fast-import supports the specified feature, or abort if
diff --git a/fast-import.c b/fast-import.c
index f3c4123..87246d5 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -55,9 +55,6 @@ Format of STDIN stream:
     ('from' sp committish lf)?
     lf?;
 
-  cat_request ::= 'cat' sp (hexsha1 | idnum) lf
-    | 'cat' sp hexsha1 sp path_str lf;
-
   checkpoint ::= 'checkpoint' lf
     lf?;
 
@@ -135,14 +132,18 @@ Format of STDIN stream:
   ts    ::= # time since the epoch in seconds, ascii base10 notation;
   tz    ::= # GIT style timezone;
 
-     # note: comments may appear anywhere in the input, except
-     # within a data command.  Any form of the data command
-     # always escapes the related input from comment processing.
+     # note: comments and cat requests may appear anywhere
+     # in the input, except within a data command.  Any form
+     # of the data command always escapes the related input
+     # from comment processing.
      #
      # In case it is not clear, the '#' that starts the comment
      # must be the first character on that line (an lf
      # preceded it).
      #
+  cat_request ::= 'cat' sp (hexsha1 | idnum) lf
+    | 'cat' sp hexsha1 sp path_str lf;
+
   comment ::= '#' not_lf* lf;
   not_lf  ::= # Any byte that is not ASCII newline (LF);
 */
@@ -368,6 +369,7 @@ static int seen_data_command;
 static int report_fd = -1;
 
 static void parse_argv(void);
+static void parse_cat_request(void);
 
 static void write_branch_report(FILE *rpt, struct branch *b)
 {
@@ -1777,7 +1779,6 @@ static void read_marks(void)
 	fclose(f);
 }
 
-
 static int read_next_command(void)
 {
 	static int stdin_eof = 0;
@@ -1787,7 +1788,7 @@ static int read_next_command(void)
 		return EOF;
 	}
 
-	do {
+	for (;;) {
 		if (unread_command_buf) {
 			unread_command_buf = 0;
 		} else {
@@ -1820,7 +1821,13 @@ static int read_next_command(void)
 			rc->prev->next = rc;
 			cmd_tail = rc;
 		}
-	} while (command_buf.buf[0] == '#');
+		if (!prefixcmp(command_buf.buf, "cat ")) {
+			parse_cat_request();
+			continue;
+		}
+		if (command_buf.buf[0] != '#')
+			return 0;
+	}
 
 	return 0;
 }
@@ -3081,8 +3088,6 @@ int main(int argc, const char **argv)
 			parse_new_tag();
 		else if (!prefixcmp(command_buf.buf, "reset "))
 			parse_reset_branch();
-		else if (!prefixcmp(command_buf.buf, "cat "))
-			parse_cat_request();
 		else if (!strcmp("checkpoint", command_buf.buf))
 			parse_checkpoint();
 		else if (!prefixcmp(command_buf.buf, "progress "))
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 8155b85..a85d068 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -1749,6 +1749,78 @@ test_expect_success PIPE 'R: report-fd: can feed back printed blob' '
 	compare_diff_raw expect actual
 '
 
+test_expect_success PIPE 'R: report-fd: cat commands mid-commit' '
+	file4_id=$(
+		printf "%s" "$file4_data" |
+		git hash-object --stdin
+	) &&
+	cat >expect <<-EOF &&
+	:100755 100644 $file4_id $file4_id C100 file4	file4-copy
+	EOF
+	git cat-file commit refs/heads/branch >expect.commit &&
+	echo tree >expect.type &&
+	git cat-file tree refs/heads/branch >expect.tree &&
+
+	cat >frontend <<-\FRONTEND_END &&
+	#!/bin/sh
+
+	branch=$(git rev-parse --verify refs/heads/branch) &&
+	cat <<EOF &&
+	feature report-fd=3
+	commit refs/heads/catdemo
+	cat $branch
+	EOF
+
+	read commit_id type size <&3 &&
+	dd if=/dev/stdin of=commit bs=1 count=$size <&3 &&
+	read newline <&3 &&
+	read tree tree_id <commit &&
+
+	echo "cat $tree_id" &&
+
+	read tree_id2 type size <&3 &&
+	dd if=/dev/stdin of=tree bs=1 count=$size <&3 &&
+	read newline <&3 &&
+	echo $type >type &&
+
+	grep ^committer commit &&
+	cat <<EOF &&
+	data <<COMMIT
+	copy file4 as file4-copy
+	COMMIT
+
+	from refs/heads/branch^0
+	cat $tree_id "file4"
+	EOF
+
+	read blob_id type size <&3 &&
+	dd if=/dev/stdin of=/dev/null bs=1 count=$size <&3 &&
+	read newline <&3 &&
+
+	cat <<EOF &&
+	M 644 $blob_id "file4-copy"
+	cat $tree_id ""
+
+	EOF
+
+	read tree_id3 type size <&3 &&
+	dd if=/dev/stdin of=tree2 bs=1 count=$size <&3 &&
+	read newline <&3 &&
+	read cid <&3
+	FRONTEND_END
+
+	mkfifo commits &&
+	test_when_finished "rm -f commits" &&
+	sh frontend 3<commits |
+	git fast-import 3>commits &&
+	git diff-tree -C catdemo^ catdemo >actual &&
+	compare_diff_raw expect actual
+	test_cmp expect.commit commit &&
+	test_cmp expect.type type &&
+	test_cmp expect.tree tree &&
+	test_cmp expect.tree tree2
+'
+
 cat >input << EOF
 option git quiet
 blob
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 5/7] vcs-svn: extend svndump to parse version 3 format
  2010-10-12 13:50 [PATCH/RFC] Add support for subversion dump format v3 David Barr
                   ` (3 preceding siblings ...)
  2010-10-12 13:50 ` [PATCH 4/7] fast-import: Allow cat requests at arbitrary points in stream David Barr
@ 2010-10-12 13:50 ` David Barr
  2010-10-12 22:56   ` Jonathan Nieder
  2010-10-12 13:50 ` [PATCH 6/7] vcs-svn: implement prop-delta handling David Barr
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: David Barr @ 2010-10-12 13:50 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Jonathan Nieder, Ramkumar Ramachandra, Sverre Rabbelier,
	David Barr

Added 1 new dump header, SVN-fs-dump-format-version.
Added 6 new node headers:
* Text-delta: true|false
* Prop-delta: true|false
* Text-delta-base-md5: <32 hex digits>
* Text-delta-base-sha1: <40 hex digits>
* Text-copy-source-sha1: <40 hex digits>
* Text-content-sha1: <40 hex digits>

This change simply populates the context.
Further changes will be needed to handle text and prop deltas.

Signed-off-by: David Barr <david.barr@cordelta.com>
---
 vcs-svn/svndump.c |   58 ++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
index 3bba0fe..458053e 100644
--- a/vcs-svn/svndump.c
+++ b/vcs-svn/svndump.c
@@ -27,6 +27,9 @@
 #define LENGTH_UNKNOWN (~0)
 #define DATE_RFC2822_LEN 31
 
+#define MD5_HEX_LENGTH 32
+#define SHA1_HEX_LENGTH 40
+
 /* Create memory pool for log messages */
 obj_pool_gen(log, char, 4096)
 
@@ -44,6 +47,11 @@ static char* log_copy(uint32_t length, char *log)
 static struct {
 	uint32_t action, propLength, textLength, srcRev, srcMode, mark, type;
 	uint32_t src[REPO_MAX_PATH_DEPTH], dst[REPO_MAX_PATH_DEPTH];
+	uint32_t text_delta, prop_delta;
+	char text_delta_base_md5[MD5_HEX_LENGTH + 1];
+	char text_content_sha1[SHA1_HEX_LENGTH + 1];
+	char text_delta_base_sha1[SHA1_HEX_LENGTH + 1];
+	char text_copy_source_sha1[SHA1_HEX_LENGTH + 1];
 } node_ctx;
 
 static struct {
@@ -53,14 +61,20 @@ static struct {
 } rev_ctx;
 
 static struct {
-	uint32_t uuid, url;
+	uint32_t version, uuid, url;
 } dump_ctx;
 
 static struct {
-	uint32_t svn_log, svn_author, svn_date, svn_executable, svn_special, uuid,
+	uint32_t svn_log, svn_author, svn_date, svn_executable, svn_special,
 		revision_number, node_path, node_kind, node_action,
 		node_copyfrom_path, node_copyfrom_rev, text_content_length,
-		prop_content_length, content_length;
+		prop_content_length, content_length,
+		/* SVN dump version 2 */
+		uuid, svn_fs_dump_format_version,
+		/* SVN dump version 3 */
+		text_delta, prop_delta, text_content_sha1,
+		text_delta_base_md5, text_delta_base_sha1,
+		text_copy_source_sha1;
 } keys;
 
 static void reset_node_ctx(char *fname)
@@ -74,6 +88,12 @@ static void reset_node_ctx(char *fname)
 	node_ctx.srcMode = 0;
 	pool_tok_seq(REPO_MAX_PATH_DEPTH, node_ctx.dst, "/", fname);
 	node_ctx.mark = 0;
+	node_ctx.text_delta = 0;
+	node_ctx.prop_delta = 0;
+	*node_ctx.text_delta_base_md5 = '\0';
+	*node_ctx.text_content_sha1 = '\0';
+	*node_ctx.text_delta_base_sha1 = '\0';
+	*node_ctx.text_copy_source_sha1 = '\0';
 }
 
 static void reset_rev_ctx(uint32_t revision)
@@ -87,6 +107,7 @@ static void reset_rev_ctx(uint32_t revision)
 static void reset_dump_ctx(uint32_t url)
 {
 	dump_ctx.url = url;
+	dump_ctx.version = 1;
 	dump_ctx.uuid = ~0;
 }
 
@@ -97,7 +118,6 @@ static void init_keys(void)
 	keys.svn_date = pool_intern("svn:date");
 	keys.svn_executable = pool_intern("svn:executable");
 	keys.svn_special = pool_intern("svn:special");
-	keys.uuid = pool_intern("UUID");
 	keys.revision_number = pool_intern("Revision-number");
 	keys.node_path = pool_intern("Node-path");
 	keys.node_kind = pool_intern("Node-kind");
@@ -107,6 +127,16 @@ static void init_keys(void)
 	keys.text_content_length = pool_intern("Text-content-length");
 	keys.prop_content_length = pool_intern("Prop-content-length");
 	keys.content_length = pool_intern("Content-length");
+	/* SVN dump version 2 */
+	keys.svn_fs_dump_format_version = pool_intern("SVN-fs-dump-format-version");
+	keys.uuid = pool_intern("UUID");
+	/* SVN dump version 3 */
+	keys.text_delta = pool_intern("Text-delta");
+	keys.prop_delta = pool_intern("Prop-delta");
+	keys.text_delta_base_md5 = pool_intern("Text-delta-base-md5");
+	keys.text_delta_base_sha1 = pool_intern("Text-delta-base-sha1");
+	keys.text_copy_source_sha1 = pool_intern("Text-copy-source-sha1");
+	keys.text_content_sha1 = pool_intern("Text-content-sha1");
 }
 
 static void read_props(void)
@@ -209,7 +239,9 @@ void svndump_read(const char *url)
 		*val++ = '\0';
 		key = pool_intern(t);
 
-		if (key == keys.uuid) {
+		if (key == keys.svn_fs_dump_format_version) {
+			dump_ctx.version = atoi(val);
+		} else if (key == keys.uuid) {
 			dump_ctx.uuid = pool_intern(val);
 		} else if (key == keys.revision_number) {
 			if (active_ctx == NODE_CTX)
@@ -251,6 +283,22 @@ void svndump_read(const char *url)
 			node_ctx.textLength = atoi(val);
 		} else if (key == keys.prop_content_length) {
 			node_ctx.propLength = atoi(val);
+		} else if (key == keys.text_delta) {
+			node_ctx.text_delta = !strcmp(val, "true");
+		} else if (key == keys.prop_delta) {
+			node_ctx.prop_delta = !strcmp(val, "true");
+		} else if (key == keys.text_delta_base_md5) {
+			strncpy(node_ctx.text_delta_base_md5, val,
+				MD5_HEX_LENGTH + 1);
+		} else if (key == keys.text_delta_base_sha1) {
+			strncpy(node_ctx.text_delta_base_sha1, val,
+				SHA1_HEX_LENGTH + 1);
+		} else if (key == keys.text_copy_source_sha1) {
+			strncpy(node_ctx.text_copy_source_sha1, val,
+				SHA1_HEX_LENGTH + 1);
+		} else if (key == keys.text_content_sha1) {
+			strncpy(node_ctx.text_content_sha1, val,
+				SHA1_HEX_LENGTH + 1);
 		} else if (key == keys.content_length) {
 			len = atoi(val);
 			buffer_read_line(&input);
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 6/7] vcs-svn: implement prop-delta handling.
  2010-10-12 13:50 [PATCH/RFC] Add support for subversion dump format v3 David Barr
                   ` (4 preceding siblings ...)
  2010-10-12 13:50 ` [PATCH 5/7] vcs-svn: extend svndump to parse version 3 format David Barr
@ 2010-10-12 13:50 ` David Barr
  2010-10-12 22:59   ` Jonathan Nieder
  2010-10-12 13:50 ` [PATCH 7/7] svn-fe: Use the --report-fd feature David Barr
  2010-10-13  2:24 ` [PATCH/RFC] Add support for subversion dump format v3 Jonathan Nieder
  7 siblings, 1 reply; 29+ messages in thread
From: David Barr @ 2010-10-12 13:50 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Jonathan Nieder, Ramkumar Ramachandra, Sverre Rabbelier,
	David Barr

By testing against the Apache Software Foundation
repository, some simple rules for decoding prop
deltas were derived.

'Node-action: replace' implies the empty prop set
as the base for the delta.
Otherwise, if a copyfrom source is given that node
forms the basis for the delta.
Lastly, if the destination path exists in the active
revision it forms the basis.

Signed-off-by: David Barr <david.barr@cordelta.com>
---
 vcs-svn/fast_export.c |    4 +++-
 vcs-svn/fast_export.h |    3 ++-
 vcs-svn/repo_tree.c   |   23 +++++++++++++++++++++++
 vcs-svn/repo_tree.h   |    2 ++
 vcs-svn/svndump.c     |   35 +++++++++++++++++++++++++++++++----
 5 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index 260cf50..d984aaa 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -63,7 +63,9 @@ void fast_export_commit(uint32_t revision, uint32_t author, char *log,
 	printf("progress Imported commit %"PRIu32".\n\n", revision);
 }
 
-void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len, struct line_buffer *input)
+void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len,
+			uint32_t delta, uint32_t srcMark, uint32_t srcMode,
+			struct line_buffer *input)
 {
 	if (mode == REPO_MODE_LNK) {
 		/* svn symlink blobs start with "link " */
diff --git a/vcs-svn/fast_export.h b/vcs-svn/fast_export.h
index 054e7d5..634d9c6 100644
--- a/vcs-svn/fast_export.h
+++ b/vcs-svn/fast_export.h
@@ -9,6 +9,7 @@ void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
 void fast_export_commit(uint32_t revision, uint32_t author, char *log,
 			uint32_t uuid, uint32_t url, unsigned long timestamp);
 void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len,
-		      struct line_buffer *input);
+			uint32_t delta, uint32_t srcMark, uint32_t srcMode,
+			struct line_buffer *input);
 
 #endif
diff --git a/vcs-svn/repo_tree.c b/vcs-svn/repo_tree.c
index e94d91d..b616bda 100644
--- a/vcs-svn/repo_tree.c
+++ b/vcs-svn/repo_tree.c
@@ -157,6 +157,29 @@ static void repo_write_dirent(uint32_t *path, uint32_t mode,
 		dent_remove(&dir_pointer(parent_dir_o)->entries, dent);
 }
 
+uint32_t repo_read_mark(uint32_t revision, uint32_t *path)
+{
+	uint32_t mode = 0, content_offset = 0;
+	struct repo_dirent *src_dent;
+	src_dent = repo_read_dirent(revision, path);
+	if (src_dent != NULL) {
+		mode = src_dent->mode;
+		content_offset = src_dent->content_offset;
+	}
+	return mode && mode != REPO_MODE_DIR ? content_offset : 0;
+}
+
+uint32_t repo_read_mode(uint32_t revision, uint32_t *path)
+{
+	uint32_t mode = 0;
+	struct repo_dirent *src_dent;
+	src_dent = repo_read_dirent(revision, path);
+	if (src_dent != NULL) {
+		mode = src_dent->mode;
+	}
+	return mode;
+}
+
 uint32_t repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst)
 {
 	uint32_t mode = 0, content_offset = 0;
diff --git a/vcs-svn/repo_tree.h b/vcs-svn/repo_tree.h
index 5476175..bd6a3f7 100644
--- a/vcs-svn/repo_tree.h
+++ b/vcs-svn/repo_tree.h
@@ -12,6 +12,8 @@
 #define REPO_MAX_PATH_DEPTH 1000
 
 uint32_t next_blob_mark(void);
+uint32_t repo_read_mark(uint32_t revision, uint32_t *path);
+uint32_t repo_read_mode(uint32_t revision, uint32_t *path);
 uint32_t repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst);
 void repo_add(uint32_t *path, uint32_t mode, uint32_t blob_mark);
 uint32_t repo_replace(uint32_t *path, uint32_t blob_mark);
diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
index 458053e..3431c22 100644
--- a/vcs-svn/svndump.c
+++ b/vcs-svn/svndump.c
@@ -45,7 +45,7 @@ static char* log_copy(uint32_t length, char *log)
 }
 
 static struct {
-	uint32_t action, propLength, textLength, srcRev, srcMode, mark, type;
+	uint32_t action, propLength, textLength, srcRev, srcMode, srcMark, mark, type;
 	uint32_t src[REPO_MAX_PATH_DEPTH], dst[REPO_MAX_PATH_DEPTH];
 	uint32_t text_delta, prop_delta;
 	char text_delta_base_md5[MD5_HEX_LENGTH + 1];
@@ -86,6 +86,7 @@ static void reset_node_ctx(char *fname)
 	node_ctx.src[0] = ~0;
 	node_ctx.srcRev = 0;
 	node_ctx.srcMode = 0;
+	node_ctx.srcMark = 0;
 	pool_tok_seq(REPO_MAX_PATH_DEPTH, node_ctx.dst, "/", fname);
 	node_ctx.mark = 0;
 	node_ctx.text_delta = 0;
@@ -168,17 +169,42 @@ static void read_props(void)
 			}
 			key = ~0;
 			buffer_read_line(&input);
+		} else if (!strncmp(t, "D ", 2)) {
+			len = atoi(&t[2]);
+			key = pool_intern(buffer_read_string(&input, len));
+			buffer_read_line(&input);
+			if (key == keys.svn_executable) {
+				if (node_ctx.type == REPO_MODE_EXE)
+					node_ctx.type = REPO_MODE_BLB;
+			} else if (key == keys.svn_special) {
+				if (node_ctx.type == REPO_MODE_LNK)
+					node_ctx.type = REPO_MODE_BLB;
+			}
+			key = ~0;
 		}
 	}
 }
 
 static void handle_node(void)
 {
+	if (node_ctx.prop_delta) {
+		if (node_ctx.srcRev)
+			node_ctx.srcMode = repo_read_mode(node_ctx.srcRev, node_ctx.src);
+		else
+			node_ctx.srcMode = repo_read_mode(rev_ctx.revision, node_ctx.dst);
+		if (node_ctx.srcMode && node_ctx.action != NODEACT_REPLACE)
+			node_ctx.type = node_ctx.srcMode;
+	}
+
 	if (node_ctx.propLength != LENGTH_UNKNOWN && node_ctx.propLength)
 		read_props();
 
-	if (node_ctx.srcRev)
+	if (node_ctx.srcRev) {
+		node_ctx.srcMark = repo_read_mark(node_ctx.srcRev, node_ctx.src);
 		node_ctx.srcMode = repo_copy(node_ctx.srcRev, node_ctx.src, node_ctx.dst);
+	} else {
+		node_ctx.srcMark = repo_read_mark(rev_ctx.revision, node_ctx.dst);
+	}
 
 	if (node_ctx.textLength != LENGTH_UNKNOWN &&
 	    node_ctx.type != REPO_MODE_DIR)
@@ -209,8 +235,9 @@ static void handle_node(void)
 		node_ctx.type = node_ctx.srcMode;
 
 	if (node_ctx.mark)
-		fast_export_blob(node_ctx.type,
-				 node_ctx.mark, node_ctx.textLength, &input);
+		fast_export_blob(node_ctx.type, node_ctx.mark, node_ctx.textLength,
+				node_ctx.text_delta, node_ctx.srcMark, node_ctx.srcMode,
+				&input);
 	else if (node_ctx.textLength != LENGTH_UNKNOWN)
 		buffer_skip_bytes(&input, node_ctx.textLength);
 }
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 7/7] svn-fe: Use the --report-fd feature
  2010-10-12 13:50 [PATCH/RFC] Add support for subversion dump format v3 David Barr
                   ` (5 preceding siblings ...)
  2010-10-12 13:50 ` [PATCH 6/7] vcs-svn: implement prop-delta handling David Barr
@ 2010-10-12 13:50 ` David Barr
  2010-10-12 14:55   ` Sverre Rabbelier
  2010-10-12 23:59   ` Jonathan Nieder
  2010-10-13  2:24 ` [PATCH/RFC] Add support for subversion dump format v3 Jonathan Nieder
  7 siblings, 2 replies; 29+ messages in thread
From: David Barr @ 2010-10-12 13:50 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Jonathan Nieder, Ramkumar Ramachandra, Sverre Rabbelier,
	David Barr

From: Jonathan Nieder <jrnieder@gmail.com>

On one hand, this makes the interface much uglier.  But on the
other hand, it makes it possible to retrieve blobs by name, a
facility we will be using soon.

Based-on-patch-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Helped-by: David Barr <david.barr@cordelta.com>
Tested-by: David Barr <david.barr@cordelta.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
---
 contrib/svn-fe/svn-fe.txt |    6 +-
 t/t9010-svn-fe.sh         |    6 +-
 test-svn-fe.c             |    3 +-
 vcs-svn/fast_export.c     |  183 ++++++++++++++++++++++++++++++++++++++++++++-
 vcs-svn/fast_export.h     |   10 ++-
 vcs-svn/repo_tree.c       |   14 +++-
 vcs-svn/repo_tree.h       |    1 +
 vcs-svn/svndiff.c         |   11 +--
 vcs-svn/svndiff.h         |    3 +-
 vcs-svn/svndump.c         |    6 ++
 10 files changed, 225 insertions(+), 18 deletions(-)

diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
index 35f84bd..62c57dd 100644
--- a/contrib/svn-fe/svn-fe.txt
+++ b/contrib/svn-fe/svn-fe.txt
@@ -7,7 +7,11 @@ svn-fe - convert an SVN "dumpfile" to a fast-import stream
 
 SYNOPSIS
 --------
-svnadmin dump --incremental REPO | svn-fe [url] | git fast-import
+[verse]
+mkfifo backchannel &&
+svnadmin dump --incremental REPO |
+	svn-fe [url] 3<backchannel |
+	git fast-import --report-fd=3 3>backchannel
 
 DESCRIPTION
 -----------
diff --git a/t/t9010-svn-fe.sh b/t/t9010-svn-fe.sh
index de976ed..c0eb956 100755
--- a/t/t9010-svn-fe.sh
+++ b/t/t9010-svn-fe.sh
@@ -34,10 +34,10 @@ test_dump () {
 		svnadmin load "$label-svn" < "$TEST_DIRECTORY/$dump" &&
 		svn_cmd export "file://$PWD/$label-svn" "$label-svnco" &&
 		git init "$label-git" &&
-		test-svn-fe "$TEST_DIRECTORY/$dump" >"$label.fe" &&
 		(
-			cd "$label-git" &&
-			git fast-import < ../"$label.fe"
+			cd "$label-git" && mkfifo backflow
+			test-svn-fe "$TEST_DIRECTORY/$dump" 3< backflow | \
+			git fast-import --report-fd=3 3> backflow
 		) &&
 		(
 			cd "$label-svnco" &&
diff --git a/test-svn-fe.c b/test-svn-fe.c
index 658c2a7..9a1338c 100644
--- a/test-svn-fe.c
+++ b/test-svn-fe.c
@@ -23,10 +23,11 @@ int main(int argc, char *argv[])
 	if (argc == 5 && !strcmp(argv[1], "-d")) {
 		struct line_buffer preimage = LINE_BUFFER_INIT;
 		struct line_buffer delta = LINE_BUFFER_INIT;
+		struct view preimage_view = {&preimage, 0, STRBUF_INIT};
 		buffer_init(&preimage, argv[2]);
 		buffer_init(&delta, argv[3]);
 		if (svndiff0_apply(&delta, (off_t) strtoull(argv[4], NULL, 0),
-				   &preimage, stdout))
+				   &preimage_view, stdout))
 			return 1;
 		if (buffer_deinit(&preimage))
 			die_errno("cannot close preimage");
diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index d984aaa..77ddea9 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -8,8 +8,10 @@
 #include "line_buffer.h"
 #include "repo_tree.h"
 #include "string_pool.h"
+#include "svndiff.h"
 
 #define MAX_GITSVN_LINE_LEN 4096
+#define REPORT_FILENO 3
 
 static uint32_t first_commit_done;
 
@@ -30,10 +32,109 @@ void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
 	putchar('\n');
 }
 
+static void fast_export_read_bytes(size_t len, char buf[len])
+{
+	if (read_in_full(REPORT_FILENO, buf, len) != len)
+		warning("early eof. Expecting %"PRIu64" bytes", (uint64_t) len);
+}
+
+static void fast_export_skip_bytes(unsigned int len)
+{
+	off_t nread = 0;
+	off_t toread;
+	char buf[4096];
+	while (nread != len) {
+		toread = len - nread < 4096 ? len - nread : 4096;
+		if (read_in_full(REPORT_FILENO, buf, toread) != toread) {
+			warning("early eof. Expecting %u bytes", len);
+			break;
+		}
+		nread += toread;
+	}
+}
+
+static void fast_export_copy_bytes(FILE *out, size_t len)
+{
+	while (len > 0) {
+		char buf[4096];
+		ssize_t nread = xread(REPORT_FILENO, buf,
+					len < sizeof(buf) ? len : sizeof(buf));
+		if (nread <= 0) {
+			warning("read failure (early eof?) "
+				" expecting %"PRIu64" bytes", (uint64_t) len);
+			return;
+		}
+		if (fwrite(buf, 1, nread, out) != nread)
+			warning("write failure, "
+				"attempting %"PRIu64" bytes", (uint64_t) nread);
+		len -= nread;
+	}
+}
+
+static void fast_export_expect(const char *string)
+{
+	const char *p;
+	for (p = string; *p; p++) {
+		char buf[1];
+		if (xread(REPORT_FILENO, buf, 1) <= 0) {
+			warning("read failure (early eof?). Expecting %s", string);
+			return;
+		}
+		if (*buf != *p) {
+			warning("char mismatch %c != %c. Expecting %s",
+						*buf, *p, string);
+			return;
+		}
+	}
+}
+
+static size_t fast_export_read_integer_at_eol(void)
+{
+	size_t result = 0;
+	for (;;) {
+		char buf[1];
+		if (xread(REPORT_FILENO, buf, 1) <= 0) {
+			warning("read failure (early eof?). Expecting digit");
+			return result;
+		}
+		if (*buf == '\n')	/* success */
+			return result;
+		if (!isdigit(*buf)) {
+			warning("expecting digit, found nondigit %c", *buf);
+			return result;
+		}
+		if (result >= SIZE_MAX / 10 - 9) {
+			warning("too many digits");
+			return result;
+		}
+		result *= 10;
+		result += *buf - '0';
+	}
+}
+
+void fast_export_parse_commit(char tree_id[SHA1_HEX_LENGTH])
+{
+	size_t len;
+
+	fast_export_skip_bytes(SHA1_HEX_LENGTH);
+	fast_export_expect(" commit ");
+	len = fast_export_read_integer_at_eol();
+
+	fast_export_expect("tree ");
+	len -= strlen("tree ");
+	fast_export_read_bytes(SHA1_HEX_LENGTH, tree_id);
+	len -= SHA1_HEX_LENGTH;
+	fast_export_expect("\n");
+	len--;
+
+	fast_export_skip_bytes(len);
+	fast_export_expect("\n");
+}
+
 static char gitsvnline[MAX_GITSVN_LINE_LEN];
 void fast_export_commit(uint32_t revision, uint32_t author, char *log,
 			uint32_t uuid, uint32_t url,
-			unsigned long timestamp)
+			unsigned long timestamp, char tree_id[SHA1_HEX_LENGTH])
 {
 	if (!log)
 		log = "";
@@ -60,19 +161,95 @@ void fast_export_commit(uint32_t revision, uint32_t author, char *log,
 	repo_diff(revision - 1, revision);
 	fputc('\n', stdout);
 
+	fflush(stdout);
 	printf("progress Imported commit %"PRIu32".\n\n", revision);
+
+	/* Now fast-import returns the commitsha1 */
+	fast_export_read_bytes(SHA1_HEX_LENGTH, tree_id);
+	fast_export_expect("\n");
+	printf("cat %.*s\n", SHA1_HEX_LENGTH, tree_id);
+	fflush(stdout);
+	fast_export_parse_commit(tree_id);
+}
+
+void fast_export_save_blob(FILE *out)
+{
+	size_t len;
+
+	if (!out) {
+		warning("cannot open temporary: %s", strerror(errno));
+		out = fopen("/dev/null", "w");
+	}
+	if (!out) {
+		warning("cannot open /dev/null: %s", strerror(errno));
+		return;
+	}
+	fast_export_skip_bytes(SHA1_HEX_LENGTH);
+	fast_export_expect(" blob ");
+	len = fast_export_read_integer_at_eol();
+
+	fast_export_copy_bytes(out, len);
+	fast_export_expect("\n");
+	if (ferror(out))
+		warning("cannot write temporary: %s", strerror(errno));
 }
 
+FILE *preimage = NULL;
+FILE *postimage = NULL;
+
 void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len,
 			uint32_t delta, uint32_t srcMark, uint32_t srcMode,
 			struct line_buffer *input)
 {
+	struct line_buffer preimage_buf = LINE_BUFFER_INIT;
+	struct line_buffer postimage_buf = LINE_BUFFER_INIT;
+	struct view preimage_view = {&preimage_buf, 0, STRBUF_INIT};
+	long preimage_len = 0;
+
+	if (delta) {
+		if (!preimage)
+			preimage = tmpfile();
+		if (srcMark) {
+			printf("cat :%"PRIu32"\n", srcMark);
+			fflush(stdout);
+			if (srcMode == REPO_MODE_LNK)
+				fwrite("link ", 1, 5, preimage);
+			fast_export_save_blob(preimage);
+		}
+		preimage_len = ftell(preimage);
+		fseek(preimage, 0, SEEK_SET);
+		preimage_buf.infile = preimage;
+		if (!postimage)
+			postimage = tmpfile();
+		svndiff0_apply(input, len, &preimage_view, postimage);
+		strbuf_release(&preimage_view.buf);
+		len = ftell(postimage);
+		fseek(postimage, 0, SEEK_SET);
+		postimage_buf.infile = postimage;
+	}
+
 	if (mode == REPO_MODE_LNK) {
 		/* svn symlink blobs start with "link " */
-		buffer_skip_bytes(input, 5);
+		if (delta)
+			buffer_skip_bytes(&postimage_buf, 5);
+		else
+			buffer_skip_bytes(input, 5);
 		len -= 5;
 	}
 	printf("blob\nmark :%"PRIu32"\ndata %"PRIu32"\n", mark, len);
-	buffer_copy_bytes(input, len);
+	if (!delta)
+		buffer_copy_bytes(input, len);
+	else
+		buffer_copy_bytes(&postimage_buf, len);
 	fputc('\n', stdout);
+
+	if (preimage) {
+		fseek(preimage, 0, SEEK_SET);
+		ftruncate(fileno(preimage), 0);
+	}
+
+	if (postimage) {
+		fseek(postimage, 0, SEEK_SET);
+		ftruncate(fileno(postimage), 0);
+	}
 }
diff --git a/vcs-svn/fast_export.h b/vcs-svn/fast_export.h
index 634d9c6..265a271 100644
--- a/vcs-svn/fast_export.h
+++ b/vcs-svn/fast_export.h
@@ -3,13 +3,21 @@
 
 #include "line_buffer.h"
 
+#ifndef SHA1_HEX_LENGTH
+#define SHA1_HEX_LENGTH 40
+#endif
+
 void fast_export_delete(uint32_t depth, uint32_t *path);
 void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
 			uint32_t mark);
 void fast_export_commit(uint32_t revision, uint32_t author, char *log,
-			uint32_t uuid, uint32_t url, unsigned long timestamp);
+			uint32_t uuid, uint32_t url, unsigned long timestamp,
+			char tree_id[SHA1_HEX_LENGTH]);
 void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len,
 			uint32_t delta, uint32_t srcMark, uint32_t srcMode,
 			struct line_buffer *input);
 
+void fast_export_parse_commit(char tree_id[SHA1_HEX_LENGTH]);
+void fast_export_save_blob(FILE* file);
+
 #endif
diff --git a/vcs-svn/repo_tree.c b/vcs-svn/repo_tree.c
index b616bda..e77abb2 100644
--- a/vcs-svn/repo_tree.c
+++ b/vcs-svn/repo_tree.c
@@ -12,6 +12,9 @@
 
 #include "trp.h"
 
+/* git hash-object -t tree --stdin </dev/null */
+#define EMPTY_TREE_HASH "4b825dc642cb6eb9a060e54bf8d69288fbee4904"
+
 struct repo_dirent {
 	uint32_t name_offset;
 	struct trp_node children;
@@ -25,6 +28,7 @@ struct repo_dir {
 
 struct repo_commit {
 	uint32_t root_dir_offset;
+	char tree_id[SHA1_HEX_LENGTH];
 };
 
 /* Memory pools for commit, dir and dirent */
@@ -308,7 +312,8 @@ void repo_diff(uint32_t r1, uint32_t r2)
 void repo_commit(uint32_t revision, uint32_t author, char *log, uint32_t uuid,
 		 uint32_t url, unsigned long timestamp)
 {
-	fast_export_commit(revision, author, log, uuid, url, timestamp);
+	fast_export_commit(revision, author, log, uuid, url, timestamp,
+				commit_pointer(active_commit)->tree_id);
 	dent_commit();
 	dir_commit();
 	active_commit = commit_alloc(1);
@@ -316,6 +321,11 @@ void repo_commit(uint32_t revision, uint32_t author, char *log, uint32_t uuid,
 		commit_pointer(active_commit - 1)->root_dir_offset;
 }
 
+char *repo_revision_tree(uint32_t revision)
+{
+	return commit_pointer(revision)->tree_id;
+}
+
 static void mark_init(void)
 {
 	uint32_t i;
@@ -334,6 +344,8 @@ void repo_init(void)
 		/* Create empty tree for commit 0. */
 		commit_alloc(1);
 		commit_pointer(0)->root_dir_offset = dir_alloc(1);
+		memcpy(commit_pointer(0)->tree_id, EMPTY_TREE_HASH,
+						SHA1_HEX_LENGTH);
 		dir_pointer(0)->entries.trp_root = ~0;
 		dir_commit();
 	}
diff --git a/vcs-svn/repo_tree.h b/vcs-svn/repo_tree.h
index bd6a3f7..3b3d58c 100644
--- a/vcs-svn/repo_tree.h
+++ b/vcs-svn/repo_tree.h
@@ -22,6 +22,7 @@ void repo_delete(uint32_t *path);
 void repo_commit(uint32_t revision, uint32_t author, char *log, uint32_t uuid,
 		 uint32_t url, long unsigned timestamp);
 void repo_diff(uint32_t r1, uint32_t r2);
+char *repo_revision_tree(uint32_t revision);
 void repo_init(void);
 void repo_reset(void);
 
diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index e23aced..a8aa503 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -313,11 +313,10 @@ static int apply_one_window(struct line_buffer *delta, off_t *delta_len,
 }
 
 int svndiff0_apply(struct line_buffer *delta, off_t delta_len,
-		   struct line_buffer *preimage, FILE *postimage)
+		   struct view *preimage_view, FILE *postimage)
 {
-	struct view preimage_view = {preimage, 0, STRBUF_INIT};
 	off_t out_offset = 0;
-	assert(delta && preimage && postimage);
+	assert(delta && preimage_view && postimage);
 
 	if (read_magic(delta, &delta_len))
 		goto fail;
@@ -326,9 +325,9 @@ int svndiff0_apply(struct line_buffer *delta, off_t delta_len,
 		size_t pre_len;
 		if (read_offset(delta, &pre_off, &delta_len) ||
 		    read_length(delta, &pre_len, &delta_len) ||
-		    move_window(&preimage_view, pre_off, pre_len) ||
+		    move_window(preimage_view, pre_off, pre_len) ||
 		    apply_one_window(delta, &delta_len,
-				     &preimage_view, pre_len,
+				     preimage_view, pre_len,
 				     &out_offset, postimage))
 			goto fail;
 		if (delta_len && buffer_at_eof(delta)) {
@@ -337,9 +336,7 @@ int svndiff0_apply(struct line_buffer *delta, off_t delta_len,
 			goto fail;
 		}
 	}
-	strbuf_release(&preimage_view.buf);
 	return 0;
  fail:
-	strbuf_release(&preimage_view.buf);
 	return -1;
 }
diff --git a/vcs-svn/svndiff.h b/vcs-svn/svndiff.h
index a986099..f75dec3 100644
--- a/vcs-svn/svndiff.h
+++ b/vcs-svn/svndiff.h
@@ -2,8 +2,9 @@
 #define SVNDIFF_H_
 
 #include "line_buffer.h"
+#include "sliding_window.h"
 
 extern int svndiff0_apply(struct line_buffer *delta, off_t delta_len,
-			  struct line_buffer *preimage, FILE *postimage);
+			  struct view *preimage_view, FILE *postimage);
 
 #endif
diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
index 3431c22..f408458 100644
--- a/vcs-svn/svndump.c
+++ b/vcs-svn/svndump.c
@@ -30,6 +30,9 @@
 #define MD5_HEX_LENGTH 32
 #define SHA1_HEX_LENGTH 40
 
+/* git hash-object -t tree --stdin </dev/null */
+#define EMPTY_TREE_HASH "4b825dc642cb6eb9a060e54bf8d69288fbee4904"
+
 /* Create memory pool for log messages */
 obj_pool_gen(log, char, 4096)
 
@@ -58,6 +61,7 @@ static struct {
 	uint32_t revision, author;
 	unsigned long timestamp;
 	char *log;
+	char tree_id[SHA1_HEX_LENGTH + 1];
 } rev_ctx;
 
 static struct {
@@ -103,6 +107,8 @@ static void reset_rev_ctx(uint32_t revision)
 	rev_ctx.timestamp = 0;
 	rev_ctx.log = NULL;
 	rev_ctx.author = ~0;
+	if (!revision)
+		memcpy(rev_ctx.tree_id, EMPTY_TREE_HASH, SHA1_HEX_LENGTH + 1);
 }
 
 static void reset_dump_ctx(uint32_t url)
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] Teach fast-import to print the id of each imported commit
  2010-10-12 13:50 ` [PATCH 1/7] Teach fast-import to print the id of each imported commit David Barr
@ 2010-10-12 14:42   ` Sverre Rabbelier
  2010-10-12 18:48     ` Jonathan Nieder
  2010-10-12 22:06   ` Jonathan Nieder
  1 sibling, 1 reply; 29+ messages in thread
From: Sverre Rabbelier @ 2010-10-12 14:42 UTC (permalink / raw)
  To: David Barr
  Cc: Git Mailing List, Jonathan Nieder, Ramkumar Ramachandra,
	Shawn O. Pearce

Heya,

On Tue, Oct 12, 2010 at 15:50, David Barr <david.barr@cordelta.com> wrote:
> +       } else if (!prefixcmp(feature, "report-fd=")) {
> +               option_report_fd(feature + strlen("report-fd="));

Note that adding it here means that we _do_ support in-stream
'report-fd' specification, which is fine by me since it's overridable
on the commandline, but there was some discussion earlier that we
_shouldn't_ support this.

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] svn-fe: Use the --report-fd feature
  2010-10-12 13:50 ` [PATCH 7/7] svn-fe: Use the --report-fd feature David Barr
@ 2010-10-12 14:55   ` Sverre Rabbelier
  2010-10-12 23:03     ` Jonathan Nieder
  2010-10-12 23:59   ` Jonathan Nieder
  1 sibling, 1 reply; 29+ messages in thread
From: Sverre Rabbelier @ 2010-10-12 14:55 UTC (permalink / raw)
  To: David Barr; +Cc: Git Mailing List, Jonathan Nieder, Ramkumar Ramachandra

Heya,

On Tue, Oct 12, 2010 at 15:50, David Barr <david.barr@cordelta.com> wrote:
> On one hand, this makes the interface much uglier.

Can you quantify this? What does it make uglier, and why?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] Teach fast-import to print the id of each imported commit
  2010-10-12 14:42   ` Sverre Rabbelier
@ 2010-10-12 18:48     ` Jonathan Nieder
  2010-10-12 18:57       ` Sverre Rabbelier
  0 siblings, 1 reply; 29+ messages in thread
From: Jonathan Nieder @ 2010-10-12 18:48 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: David Barr, Git Mailing List, Ramkumar Ramachandra,
	Shawn O. Pearce

Sverre Rabbelier wrote:
> On Tue, Oct 12, 2010 at 15:50, David Barr <david.barr@cordelta.com> wrote:

>> +       } else if (!prefixcmp(feature, "report-fd=")) {
>> +               option_report_fd(feature + strlen("report-fd="));
>
> Note that adding it here means that we _do_ support in-stream
> 'report-fd' specification, which is fine by me since it's overridable
> on the commandline, but there was some discussion earlier that we
> _shouldn't_ support this.

Reference: http://thread.gmane.org/gmane.comp.version-control.git/156280

Thanks for the reminder.  I still think Sam is right fwiw.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] Teach fast-import to print the id of each imported commit
  2010-10-12 18:48     ` Jonathan Nieder
@ 2010-10-12 18:57       ` Sverre Rabbelier
  2010-10-12 19:07         ` Jonathan Nieder
  0 siblings, 1 reply; 29+ messages in thread
From: Sverre Rabbelier @ 2010-10-12 18:57 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: David Barr, Git Mailing List, Ramkumar Ramachandra,
	Shawn O. Pearce

Heya,

On Tue, Oct 12, 2010 at 20:48, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Thanks for the reminder.  I still think Sam is right fwiw.

Perhaps we can instead make '--report-fd' have a default value of
'stdout'? I don't see why we would want to _disallow_ the value from
being specified in stream (we allow import/export-marks in-stream
too), as long as they can be overruled by the commandline.

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] Teach fast-import to print the id of each imported commit
  2010-10-12 18:57       ` Sverre Rabbelier
@ 2010-10-12 19:07         ` Jonathan Nieder
  0 siblings, 0 replies; 29+ messages in thread
From: Jonathan Nieder @ 2010-10-12 19:07 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: David Barr, Git Mailing List, Ramkumar Ramachandra,
	Shawn O. Pearce, Sam Vilain

Sverre Rabbelier wrote:

> Perhaps we can instead make '--report-fd' have a default value of
> 'stdout'?

Sounds sane.

> I don't see why we would want to _disallow_ the value from
> being specified in stream (we allow import/export-marks in-stream
> too), as long as they can be overruled by the commandline.

I do.  If the stream is not being piped to a process the frontend
started (maybe it's being sent over the wire to some fast-import
service instead), what would

	feature report-fd=7

even mean?  So we should discourage it, or better, forbid it now
while we have the chance.

By contrast, as you explained,

	feature relative-marks
	feature export-marks=MARKS1

is meaningful because without knowing anything about the underlying
filing system, a frontend can rely on a later

	feature relative-marks
	feature import-marks=MARKS1

picking up where it left off.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] Teach fast-import to print the id of each imported commit
  2010-10-12 13:50 ` [PATCH 1/7] Teach fast-import to print the id of each imported commit David Barr
  2010-10-12 14:42   ` Sverre Rabbelier
@ 2010-10-12 22:06   ` Jonathan Nieder
  2010-10-12 23:05     ` Sverre Rabbelier
  1 sibling, 1 reply; 29+ messages in thread
From: Jonathan Nieder @ 2010-10-12 22:06 UTC (permalink / raw)
  To: David Barr
  Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier,
	Shawn O. Pearce

David Barr wrote:

> --- a/Documentation/git-fast-import.txt
> +++ b/Documentation/git-fast-import.txt
> @@ -92,6 +92,18 @@ OPTIONS
>  	--(no-)-relative-marks= with the --(import|export)-marks=
>  	options.
>  
> +--report-fd=<fd>::
> +	Print the 40-character object name for each commit to
> +	the specified file descriptor before writing it to the
> +	pack.

This aspect (the printing of commit ids) of the report-fd feature is
not used by the current svn-fe3 code.  I'd suggest dropping it or at
least making it optional; marks work pretty well for that purpose and
it is not worth the cost or complication of making the frontend
synchronize with fast-import with every commit.

> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -361,6 +361,9 @@ static uintmax_t next_mark;
>  static struct strbuf new_data = STRBUF_INIT;
>  static int seen_data_command;
>  
> +/* Where to report commits */
> +static int report_fd = -1;

This aspect is still useful (for use by the cat command). :)

Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/7] fast-import: Let importers retrieve the objects being written
  2010-10-12 13:50 ` [PATCH 2/7] fast-import: Let importers retrieve the objects being written David Barr
@ 2010-10-12 22:38   ` Jonathan Nieder
  2010-10-12 23:07     ` Sverre Rabbelier
  0 siblings, 1 reply; 29+ messages in thread
From: Jonathan Nieder @ 2010-10-12 22:38 UTC (permalink / raw)
  To: David Barr
  Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier,
	Shawn O. Pearce

David Barr wrote:

> The output uses the same format as "git cat-file --batch".

This means:

	<sha1> SP <type> SP <size> LF
	<contents> LF

where <contents> are the raw content of the object in question.

If a person just wants to read the <sha1>, the <contents> are
wasted computation and I/O.  Does that matter?

Suppose some day fast-import gains a feature to respond to requests
out of order.  The response would need to include the name of the
object as requested for the frontend to make any sense of it.  Would
it be worth preparing the format for that now, so the same frontend
code can deal with the format produced by fast-import with and without
that feature?

The format of tags, trees, and commits (though not blobs) is specific
to git: fast-import backends for other version control systems would
not be able to respond in the same way.  So maybe it would make sense
to restrict cat requests to blobs for now.

> Objects can be specified by path within a tree as well, using a
>
>  cat TREE "PATH"
>
> syntax.  With this syntax, also, the tree can only be specified by
> :n marker or 40-digit tree id.

I would prefer

	cat "<path>"

within commit commands, which would spit out the content at that
path in the currently-staged version of the commit, and a separate

	C <tree> "<path>" "<path>"

to copy data from previous commits.  So cat <tree> "<path>" would be
replaced with

	C <tree> "<path>" "dest"
	cat "dest"

when using content from tree:path to prepare dest.  Of course, that
would be less flexible, but I find it more intuitive for svn-fe at
least, perhaps because it fits better with svn's object model.

In either regime, I hear there is some demand for allowing commits to
be used in place of trees.

At any rate, that is not needed for the current version of svn-fe,
is it?  So maybe it would make sense to strip down the patch to just
allow

	cat <blob>

Thanks for resending it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/7] fast-import: Allow cat command with empty path
  2010-10-12 13:50 ` [PATCH 3/7] fast-import: Allow cat command with empty path David Barr
@ 2010-10-12 22:47   ` Jonathan Nieder
  0 siblings, 0 replies; 29+ messages in thread
From: Jonathan Nieder @ 2010-10-12 22:47 UTC (permalink / raw)
  To: David Barr; +Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier

David Barr wrote:

> While at it, fix a typo in an error message: the cat command is used
> to examine paths within trees, not branches.

If we need the cat <tree> "<path>" functionality, probably this part
should be squashed with the previous one.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/7] fast-import: Allow cat requests at arbitrary points in stream
  2010-10-12 13:50 ` [PATCH 4/7] fast-import: Allow cat requests at arbitrary points in stream David Barr
@ 2010-10-12 22:50   ` Jonathan Nieder
  0 siblings, 0 replies; 29+ messages in thread
From: Jonathan Nieder @ 2010-10-12 22:50 UTC (permalink / raw)
  To: David Barr; +Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier

David Barr wrote:

> Rather than planning in advance and saving up a bunch of objects
> before a "commit" command, frontends might want to wait until the
> middle of a commit and request objects then.

Not needed for svn-fe3, is it?

But I do like it.  (Obviously, I'm biased.)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/7] vcs-svn: extend svndump to parse version 3 format
  2010-10-12 13:50 ` [PATCH 5/7] vcs-svn: extend svndump to parse version 3 format David Barr
@ 2010-10-12 22:56   ` Jonathan Nieder
  0 siblings, 0 replies; 29+ messages in thread
From: Jonathan Nieder @ 2010-10-12 22:56 UTC (permalink / raw)
  To: David Barr; +Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier

David Barr wrote:

> Added 1 new dump header, SVN-fs-dump-format-version.
> Added 6 new node headers:

Style: using the imperative mood, present tense would make this
fit better with other log entries.

> --- a/vcs-svn/svndump.c
> +++ b/vcs-svn/svndump.c
> @@ -27,6 +27,9 @@
>  #define LENGTH_UNKNOWN (~0)
>  #define DATE_RFC2822_LEN 31
>  
> +#define MD5_HEX_LENGTH 32
> +#define SHA1_HEX_LENGTH 40

I was surprised to find this is not in any existing header.  Looks
good to me.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/7] vcs-svn: implement prop-delta handling.
  2010-10-12 13:50 ` [PATCH 6/7] vcs-svn: implement prop-delta handling David Barr
@ 2010-10-12 22:59   ` Jonathan Nieder
  0 siblings, 0 replies; 29+ messages in thread
From: Jonathan Nieder @ 2010-10-12 22:59 UTC (permalink / raw)
  To: David Barr; +Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier

David Barr wrote:

> [Subject:  vcs-svn: implement prop-delta handling.]
>
> By testing against the Apache Software Foundation
> repository, some simple rules for decoding prop
> deltas were derived.
> 
> 'Node-action: replace' implies the empty prop set
> as the base for the delta.
> Otherwise, if a copyfrom source is given that node
> forms the basis for the delta.
> Lastly, if the destination path exists in the active
> revision it forms the basis.

These rules are used for text deltas, too, right?  Probably
worth mentioning the new (placeholder) fast_export_blob()
parameter in the commit message.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] svn-fe: Use the --report-fd feature
  2010-10-12 14:55   ` Sverre Rabbelier
@ 2010-10-12 23:03     ` Jonathan Nieder
  2010-10-12 23:11       ` Sverre Rabbelier
  0 siblings, 1 reply; 29+ messages in thread
From: Jonathan Nieder @ 2010-10-12 23:03 UTC (permalink / raw)
  To: Sverre Rabbelier; +Cc: David Barr, Git Mailing List, Ramkumar Ramachandra

Sverre Rabbelier wrote:
> On Tue, Oct 12, 2010 at 15:50, David Barr <david.barr@cordelta.com> wrote:

>> On one hand, this makes the interface much uglier.
>
> Can you quantify this? What does it make uglier, and why?

| -svnadmin dump --incremental REPO | svn-fe [url] | git fast-import
| +mkfifo backchannel &&
| +svnadmin dump --incremental REPO |
| +	svn-fe [url] 3<backchannel |
| +	git fast-import --report-fd=3 3>backchannel

The caller has to take care of backflow of data.  Even if the
"cat" command were never used, at least 3>/dev/null would be
necessary to avoid fast-import waiting forever for the frontend
to read from it.

One way to work around this would be for svn-fe to launch fast-import
itself.  The main downside is that that would require a way to
configure how fast-import is to be launched (for use with other
vcs-fast-import backends and in even stranger scenarios).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] Teach fast-import to print the id of each imported commit
  2010-10-12 22:06   ` Jonathan Nieder
@ 2010-10-12 23:05     ` Sverre Rabbelier
  0 siblings, 0 replies; 29+ messages in thread
From: Sverre Rabbelier @ 2010-10-12 23:05 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: David Barr, Git Mailing List, Ramkumar Ramachandra,
	Shawn O. Pearce

Heya,

On Wed, Oct 13, 2010 at 00:06, Jonathan Nieder <jrnieder@gmail.com> wrote:
>> +/* Where to report commits */
>> +static int report_fd = -1;
>
> This aspect is still useful (for use by the cat command). :)

Perhaps this part should be folded in with the cat command then?
Otherwise it'd be impossible to test.

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/7] fast-import: Let importers retrieve the objects being written
  2010-10-12 22:38   ` Jonathan Nieder
@ 2010-10-12 23:07     ` Sverre Rabbelier
  2010-10-13  0:07       ` Jonathan Nieder
  0 siblings, 1 reply; 29+ messages in thread
From: Sverre Rabbelier @ 2010-10-12 23:07 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: David Barr, Git Mailing List, Ramkumar Ramachandra,
	Shawn O. Pearce

Heya,

On Wed, Oct 13, 2010 at 00:38, Jonathan Nieder <jrnieder@gmail.com> wrote:
> At any rate, that is not needed for the current version of svn-fe,
> is it?  So maybe it would make sense to strip down the patch to just
> allow
>
>        cat <blob>

Do we want some sort of type indicator to make it easier to extend? So
"cat blob <blob>"?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] svn-fe: Use the --report-fd feature
  2010-10-12 23:03     ` Jonathan Nieder
@ 2010-10-12 23:11       ` Sverre Rabbelier
  2010-10-12 23:36         ` Jonathan Nieder
  0 siblings, 1 reply; 29+ messages in thread
From: Sverre Rabbelier @ 2010-10-12 23:11 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: David Barr, Git Mailing List, Ramkumar Ramachandra

Heya,

On Wed, Oct 13, 2010 at 01:03, Jonathan Nieder <jrnieder@gmail.com> wrote:
> The caller has to take care of backflow of data.  Even if the
> "cat" command were never used, at least 3>/dev/null would be
> necessary to avoid fast-import waiting forever for the frontend
> to read from it.

Why is that? If we disable the auto-printing of sha's wouldn't it only
write to the fd iff the cat command is used?

> One way to work around this would be for svn-fe to launch fast-import
> itself.

What would the pipeline look like then? In particular for git-remote-svn?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] svn-fe: Use the --report-fd feature
  2010-10-12 23:11       ` Sverre Rabbelier
@ 2010-10-12 23:36         ` Jonathan Nieder
  0 siblings, 0 replies; 29+ messages in thread
From: Jonathan Nieder @ 2010-10-12 23:36 UTC (permalink / raw)
  To: Sverre Rabbelier; +Cc: David Barr, Git Mailing List, Ramkumar Ramachandra

Sverre Rabbelier wrote:
> On Wed, Oct 13, 2010 at 01:03, Jonathan Nieder <jrnieder@gmail.com> wrote:

>> The caller has to take care of backflow of data.  Even if the
>> "cat" command were never used, at least 3>/dev/null would be
>> necessary to avoid fast-import waiting forever for the frontend
>> to read from it.
>
> Why is that? If we disable the auto-printing of sha's

Right, if we disable the auto-printing of sha's then that aspect goes
away.

>> One way to work around this would be for svn-fe to launch fast-import
>> itself.
>
> What would the pipeline look like then? In particular for git-remote-svn?

Probably something like:

svnrdump dump uri -r <lower>:<upper> |
svn-fe --fastimport='svn-filter-root ...'

Doesn't the transport helper take care of launching fast-import
itself?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] svn-fe: Use the --report-fd feature
  2010-10-12 13:50 ` [PATCH 7/7] svn-fe: Use the --report-fd feature David Barr
  2010-10-12 14:55   ` Sverre Rabbelier
@ 2010-10-12 23:59   ` Jonathan Nieder
  1 sibling, 0 replies; 29+ messages in thread
From: Jonathan Nieder @ 2010-10-12 23:59 UTC (permalink / raw)
  To: David Barr; +Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier

David Barr wrote:

> On one hand, this makes the interface much uglier.  But on the
> other hand, it makes it possible to retrieve blobs by name, a
> facility we will be using soon.

Stale log message.

> Helped-by: David Barr <david.barr@cordelta.com>

I think you wrote most of the patch at this point.

> --- a/t/t9010-svn-fe.sh
> +++ b/t/t9010-svn-fe.sh
> @@ -34,10 +34,10 @@ test_dump () {
>  		svnadmin load "$label-svn" < "$TEST_DIRECTORY/$dump" &&
>  		svn_cmd export "file://$PWD/$label-svn" "$label-svnco" &&
>  		git init "$label-git" &&
> -		test-svn-fe "$TEST_DIRECTORY/$dump" >"$label.fe" &&
>  		(
> -			cd "$label-git" &&
> -			git fast-import < ../"$label.fe"
> +			cd "$label-git" && mkfifo backflow

Missing && (not that it matters here, just good to get in the habit).

> --- a/test-svn-fe.c
> +++ b/test-svn-fe.c
> @@ -23,10 +23,11 @@ int main(int argc, char *argv[])
>  	if (argc == 5 && !strcmp(argv[1], "-d")) {
>  		struct line_buffer preimage = LINE_BUFFER_INIT;
>  		struct line_buffer delta = LINE_BUFFER_INIT;
> +		struct view preimage_view = {&preimage, 0, STRBUF_INIT};
>  		buffer_init(&preimage, argv[2]);
>  		buffer_init(&delta, argv[3]);
>  		if (svndiff0_apply(&delta, (off_t) strtoull(argv[4], NULL, 0),
> -				   &preimage, stdout))
> +				   &preimage_view, stdout))

This interface change is really neat.  Probably it should get its
own commit.

> --- a/vcs-svn/fast_export.c
> +++ b/vcs-svn/fast_export.c
> @@ -8,8 +8,10 @@
[...]
> +#define REPORT_FILENO 3
>  
>  static uint32_t first_commit_done;
>  
> @@ -30,10 +32,109 @@ void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
>  	putchar('\n');
>  }
>  
> +static void fast_export_read_bytes(size_t len, char buf[len])
> +{
> +	if (read_in_full(REPORT_FILENO, buf, len) != len)
> +		warning("early eof. Expecting %"PRIu64" bytes", (uint64_t) len);
> +}
> +

Would it make sense for functions like this to use the line_buffer
module?  If we set the O_NONBLOCK flag on the report fd, I think it
could work (with buffer_fdopen()), but I'm not sure how portable that
is.

But maybe it's easier to use fds directly.

> +static void fast_export_expect(const char *string)
> +{
> +	const char *p;
> +	for (p = string; *p; p++) {
> +		char buf[1];
> +		if (xread(REPORT_FILENO, buf, 1) <= 0) {

Might be simpler to read strlen(string) bytes at a time with
read_in_full() (my fault, I know).

[...]
> +static size_t fast_export_read_integer_at_eol(void)
> +{
> +	size_t result = 0;
> +	for (;;) {
> +		char buf[1];
> +		if (xread(REPORT_FILENO, buf, 1) <= 0) {

Could use fscanf() if O_NONBLOCK is usable.  Otherwise I fear
the read(1) is needed. :(

[...]
> +void fast_export_parse_commit(char tree_id[SHA1_HEX_LENGTH])
> +{
> +	size_t len;
> +
> +	fast_export_skip_bytes(SHA1_HEX_LENGTH);
> +	fast_export_expect(" commit ");
> +	len = fast_export_read_integer_at_eol();

I think you mentioned that you'd prefer for this to use

	fast_export_expect(commit_id);

?  Anyway, this functionality is not used in the current patch.

[...]
> +void fast_export_save_blob(FILE *out)
[...]
> +	if (!out) {
> +		warning("cannot open temporary: %s", strerror(errno));
> +		out = fopen("/dev/null", "w");
> +	}
> +	if (!out) {
> +		warning("cannot open /dev/null: %s", strerror(errno));
> +		return;
> +	}

I think the caller should take care of the error message.

[...]
> +	if (ferror(out))
> +		warning("cannot write temporary: %s", strerror(errno));

And this, too, probably (since only the caller knows if it's
temporary).

>  }
>  
> +FILE *preimage = NULL;
> +FILE *postimage = NULL;

static?  And the usual git style is to leave the "= NULL" implicit for
globals.

>  void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len,
>  			uint32_t delta, uint32_t srcMark, uint32_t srcMode,
>  			struct line_buffer *input)
>  {
> +	struct line_buffer preimage_buf = LINE_BUFFER_INIT;
> +	struct line_buffer postimage_buf = LINE_BUFFER_INIT;
> +	struct view preimage_view = {&preimage_buf, 0, STRBUF_INIT};
> +	long preimage_len = 0;
> +
> +	if (delta) {
> +		if (!preimage)
> +			preimage = tmpfile();

This is not closed until exit() takes care of it, right?

> +		if (srcMark) {
> +			printf("cat :%"PRIu32"\n", srcMark);
> +			fflush(stdout);
> +			if (srcMode == REPO_MODE_LNK)
> +				fwrite("link ", 1, 5, preimage);

strbuf_addstr(&preimage_buf.buf, "link ");

would allow the fast_export_save_blob() to use fd 3 directly,
perhaps.  If so,

> +			fast_export_save_blob(preimage);
> +		}
> +		preimage_len = ftell(preimage);

this would be read from the output of the "cat" command.

Currently we are not using the preimage_len at all as far as I
can tell.  It would not be hard to teach svndiff0_apply() to keep
track of it again.

> +		fseek(preimage, 0, SEEK_SET);
> +		preimage_buf.infile = preimage;
> +		if (!postimage)
> +			postimage = tmpfile();
> +		svndiff0_apply(input, len, &preimage_view, postimage);
> +		strbuf_release(&preimage_view.buf);
> +		len = ftell(postimage);

The postimage has to be a temporary file in the current design, since
we do not know the postimage length to pass to fast-import until we've
written the whole thing out.  The data <<delim form is not usable for
this because there is no forbidden string to use as a delimiter.
A separate pass to add up the postimage lengths from windows would
fail with deltas like those Ram first supplied for the test suite
(though have we checked that those come up in the wild?).

[...]
> +	if (!delta)
> +		buffer_copy_bytes(input, len);
> +	else
> +		buffer_copy_bytes(&postimage_buf, len);

Maintaining support for dumpfilev2.  Nice.

>  	fputc('\n', stdout);
> +
> +	if (preimage) {
> +		fseek(preimage, 0, SEEK_SET);
> +		ftruncate(fileno(preimage), 0);
> +	}
> +
> +	if (postimage) {
> +		fseek(postimage, 0, SEEK_SET);

I think postimage is seeked twice --- are both needed?

[...]
> --- a/vcs-svn/repo_tree.c
> +++ b/vcs-svn/repo_tree.c
> @@ -12,6 +12,9 @@
>  
>  #include "trp.h"
>  
> +/* git hash-object -t tree --stdin </dev/null */
> +#define EMPTY_TREE_HASH "4b825dc642cb6eb9a060e54bf8d69288fbee4904"
> +
>  struct repo_dirent {
>  	uint32_t name_offset;
>  	struct trp_node children;
> @@ -25,6 +28,7 @@ struct repo_dir {
>  
>  struct repo_commit {
>  	uint32_t root_dir_offset;
> +	char tree_id[SHA1_HEX_LENGTH];
>  };

Not needed.

Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/7] fast-import: Let importers retrieve the objects being written
  2010-10-12 23:07     ` Sverre Rabbelier
@ 2010-10-13  0:07       ` Jonathan Nieder
  2010-10-13 13:10         ` Sverre Rabbelier
  0 siblings, 1 reply; 29+ messages in thread
From: Jonathan Nieder @ 2010-10-13  0:07 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: David Barr, Git Mailing List, Ramkumar Ramachandra,
	Shawn O. Pearce

Sverre Rabbelier wrote:
> On Wed, Oct 13, 2010 at 00:38, Jonathan Nieder <jrnieder@gmail.com> wrote:

>> At any rate, that is not needed for the current version of svn-fe,
>> is it?  So maybe it would make sense to strip down the patch to just
>> allow
>>
>>        cat <blob>
>
> Do we want some sort of type indicator to make it easier to extend? So
> "cat blob <blob>"?

I'm not convinced it's needed.  cat-file does that to allow peeling
references:

	git cat-file tree v1.7.1
vs
	git cat-file tag v1.7.1

But the "cat" name might be too valuable namespace to reserve for only
this.  I have no strong preference either way.  Maybe the command
should be named "cat-blob".

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH/RFC] Add support for subversion dump format v3
  2010-10-12 13:50 [PATCH/RFC] Add support for subversion dump format v3 David Barr
                   ` (6 preceding siblings ...)
  2010-10-12 13:50 ` [PATCH 7/7] svn-fe: Use the --report-fd feature David Barr
@ 2010-10-13  2:24 ` Jonathan Nieder
  7 siblings, 0 replies; 29+ messages in thread
From: Jonathan Nieder @ 2010-10-13  2:24 UTC (permalink / raw)
  To: David Barr; +Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier

David Barr wrote:

> This series follows Jonathan Nieder's svn diff applier series.
>
> Patches 1 to 4 add the required infrastructure to fast-import.
> The primary feature is the addition of the cat-file command to
> fast-import. This allows access to objects written to the
> the current pack prior to a checkpoint and is critical to
> retrieving full-texts to drive the diff applier.
>
> Patch 5 adds the basic parsing necessary to process the v3 format.
>
> Patch 6 adds logic around decoding prop deltas.
>
> Patch 7 integrates svn-fe with svn-da to decode text deltas.
> It was primarily authored by Jonathan but inspired by Ram and
> completed by myself.
> This is probably the most sprawling of the series and likely the
> heaviest target of review.

Just wanted to say: thank you for working on this.  The code
(especially on the fast-import side) needs some pruning and the result
needs tests, but in the end, we are not far from an svn-fe that can
work with "svnrdump dump" output.

In case someone is feeling generous, here is a wishlist:

 - simplify the interface or add a wrapper script so callers do not
   have to use mkfifo

 - keep track of whether an error was encountered and exit nonzero
   in that case

 - rely on git to fetch old revisions (for copyfrom_rev etc) and stop
   keeping track of them in svn-fe.  This would simplify svn-fe a lot
   and pave the way for:

 - incremental imports (since no state has to persist)

And for the sake of tests:

 - a simple delta producer (the remote helper would need this to
   drive "svnrdump load", anyway)

 - clarification (sharpening) of the notes/svndiff and
   notes/dump-load-format.txt specs in Subversion

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/7] fast-import: Let importers retrieve the objects being written
  2010-10-13  0:07       ` Jonathan Nieder
@ 2010-10-13 13:10         ` Sverre Rabbelier
  0 siblings, 0 replies; 29+ messages in thread
From: Sverre Rabbelier @ 2010-10-13 13:10 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: David Barr, Git Mailing List, Ramkumar Ramachandra,
	Shawn O. Pearce

Heya,

On Wed, Oct 13, 2010 at 02:07, Jonathan Nieder <jrnieder@gmail.com> wrote:
> But the "cat" name might be too valuable namespace to reserve for only
> this.  I have no strong preference either way.  Maybe the command
> should be named "cat-blob".

Let's do that, call it "cat-blob" and restrict it to blobs.

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2010-10-13 13:12 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-12 13:50 [PATCH/RFC] Add support for subversion dump format v3 David Barr
2010-10-12 13:50 ` [PATCH 1/7] Teach fast-import to print the id of each imported commit David Barr
2010-10-12 14:42   ` Sverre Rabbelier
2010-10-12 18:48     ` Jonathan Nieder
2010-10-12 18:57       ` Sverre Rabbelier
2010-10-12 19:07         ` Jonathan Nieder
2010-10-12 22:06   ` Jonathan Nieder
2010-10-12 23:05     ` Sverre Rabbelier
2010-10-12 13:50 ` [PATCH 2/7] fast-import: Let importers retrieve the objects being written David Barr
2010-10-12 22:38   ` Jonathan Nieder
2010-10-12 23:07     ` Sverre Rabbelier
2010-10-13  0:07       ` Jonathan Nieder
2010-10-13 13:10         ` Sverre Rabbelier
2010-10-12 13:50 ` [PATCH 3/7] fast-import: Allow cat command with empty path David Barr
2010-10-12 22:47   ` Jonathan Nieder
2010-10-12 13:50 ` [PATCH 4/7] fast-import: Allow cat requests at arbitrary points in stream David Barr
2010-10-12 22:50   ` Jonathan Nieder
2010-10-12 13:50 ` [PATCH 5/7] vcs-svn: extend svndump to parse version 3 format David Barr
2010-10-12 22:56   ` Jonathan Nieder
2010-10-12 13:50 ` [PATCH 6/7] vcs-svn: implement prop-delta handling David Barr
2010-10-12 22:59   ` Jonathan Nieder
2010-10-12 13:50 ` [PATCH 7/7] svn-fe: Use the --report-fd feature David Barr
2010-10-12 14:55   ` Sverre Rabbelier
2010-10-12 23:03     ` Jonathan Nieder
2010-10-12 23:11       ` Sverre Rabbelier
2010-10-12 23:36         ` Jonathan Nieder
2010-10-12 23:59   ` Jonathan Nieder
2010-10-13  2:24 ` [PATCH/RFC] Add support for subversion dump format v3 Jonathan Nieder
  -- strict thread matches above, loose matches on Subject: below --
2010-10-12 13:40 David Barr
2010-10-12 13:40 ` [PATCH 1/7] Teach fast-import to print the id of each imported commit David Barr

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).