git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC/PATCH 00/10] vcs-svn: prepare for (implement?) incremental import
@ 2010-12-10 10:20 Jonathan Nieder
  2010-12-10 10:21 ` [PATCH 01/10] vcs-svn: use higher mark numbers for blobs Jonathan Nieder
                   ` (13 more replies)
  0 siblings, 14 replies; 37+ messages in thread
From: Jonathan Nieder @ 2010-12-10 10:20 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Hi,

Using David's "ls" command we can eliminate the in-memory repo_tree
and rely on the target repository for information about old revs.
This means all state that needs to persist between svn-fe runs is
either in the target repo or in the marks file, so in theory this
should allow incremental imports already (I haven't tried it).

Caveats:

Not split up as nicely as it ought to be.  In particular, the last
patch should be split up at least into one patch to modify git
fast-import, another for the vcs-svn lib.  Perhaps it could be split
finer than that.

There is a potential deadlock, marked with NEEDSWORK.

Error checking is not so great.  Error states tend to result in
deadlock rather than a nicer error.

In particular, we have no way to distinguish between a missing
(nonsense) path and an empty directory.

There's a little blind alley in the first few patches --- cat_mark and
the function to apply deltas to an old rev do not survive to the end
of the series.  I kept that because (1) I am lazy and (2) it serves as
a quick intro to use of the "ls" command.

Probably there are all sorts of bugs and unclear code.  I haven't
tested beyond running the test suite.

You can find the series alone at

	git://repo.or.cz/git/jrn.git refs/topics/db/vcs-svn-incremental

and merged with other topics at

	git://repo.or.cz/git/jrn.git vcs-svn-pu

Requires the db/text-delta and db/fast-import-blob-access topics,
available from the same place.

Any thoughts would be welcome.

Jonathan Nieder (10):
  vcs-svn: use higher mark numbers for blobs
  vcs-svn: save marks for imported commits
  vcs-svn: introduce cat_mark function to retrieve a marked blob
  vcs-svn: make apply_delta caller retrieve preimage
  vcs-svn: split off function to export result from delta application
  vcs-svn: do not rely on marks for old blobs
  vcs-svn: split off function to make 'ls' requests
  vcs-svn: prepare to eliminate repo_tree structure
  vcs-svn: simplifications for repo_modify_path et al
  vcs-svn: eliminate repo_tree structure

 cache.h               |    2 +
 fast-import.c         |   42 +++++--
 t/t9010-svn-fe.sh     |   55 +++++----
 vcs-svn/fast_export.c |  168 +++++++++++++++++++------
 vcs-svn/fast_export.h |   22 +++-
 vcs-svn/repo_tree.c   |  333 ++++---------------------------------------------
 vcs-svn/repo_tree.h   |    6 +-
 vcs-svn/string_pool.c |    2 +-
 vcs-svn/string_pool.h |    2 +-
 vcs-svn/svndump.c     |   89 +++++++++----
 10 files changed, 301 insertions(+), 420 deletions(-)

-- 
1.7.2.4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 01/10] vcs-svn: use higher mark numbers for blobs
  2010-12-10 10:20 [RFC/PATCH 00/10] vcs-svn: prepare for (implement?) incremental import Jonathan Nieder
@ 2010-12-10 10:21 ` Jonathan Nieder
  2010-12-10 10:22 ` [PATCH 02/10] vcs-svn: save marks for imported commits Jonathan Nieder
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2010-12-10 10:21 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Prepare to use mark :5 for the commit corresponding to r5 (and so on).

1 billion seems sufficiently high for blob marks to avoid conflicting
with rev marks, while still leaving room for 3 billion blobs.  Such
high mark numbers cause trouble with ancient fast-import versions, but
this topic cannot support git fast-import versions before 1.7.4 (which
introduces the cat-blob command) anyway.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/repo_tree.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/vcs-svn/repo_tree.c b/vcs-svn/repo_tree.c
index eb55636..a4d8340 100644
--- a/vcs-svn/repo_tree.c
+++ b/vcs-svn/repo_tree.c
@@ -298,7 +298,7 @@ void repo_commit(uint32_t revision, uint32_t author, char *log, uint32_t uuid,
 static void mark_init(void)
 {
 	uint32_t i;
-	mark = 0;
+	mark = 1024 * 1024 * 1024;
 	for (i = 0; i < dent_pool.size; i++)
 		if (!repo_dirent_is_dir(dent_pointer(i)) &&
 		    dent_pointer(i)->content_offset > mark)
-- 
1.7.2.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 02/10] vcs-svn: save marks for imported commits
  2010-12-10 10:20 [RFC/PATCH 00/10] vcs-svn: prepare for (implement?) incremental import Jonathan Nieder
  2010-12-10 10:21 ` [PATCH 01/10] vcs-svn: use higher mark numbers for blobs Jonathan Nieder
@ 2010-12-10 10:22 ` Jonathan Nieder
  2011-03-06 11:15   ` Jonathan Nieder
  2010-12-10 10:23 ` [PATCH 03/10] vcs-svn: introduce cat_mark function to retrieve a marked blob Jonathan Nieder
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 37+ messages in thread
From: Jonathan Nieder @ 2010-12-10 10:22 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/fast_export.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index e6ebdb8..093ce1d 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -61,6 +61,7 @@ void fast_export_commit(uint32_t revision, uint32_t author, char *log,
 		*gitsvnline = '\0';
 	}
 	printf("commit refs/heads/master\n");
+	printf("mark :%"PRIu32"\n", revision);
 	printf("committer %s <%s@%s> %ld +0000\n",
 		   ~author ? pool_fetch(author) : "nobody",
 		   ~author ? pool_fetch(author) : "nobody",
-- 
1.7.2.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 03/10] vcs-svn: introduce cat_mark function to retrieve a marked blob
  2010-12-10 10:20 [RFC/PATCH 00/10] vcs-svn: prepare for (implement?) incremental import Jonathan Nieder
  2010-12-10 10:21 ` [PATCH 01/10] vcs-svn: use higher mark numbers for blobs Jonathan Nieder
  2010-12-10 10:22 ` [PATCH 02/10] vcs-svn: save marks for imported commits Jonathan Nieder
@ 2010-12-10 10:23 ` Jonathan Nieder
  2010-12-10 10:23 ` [PATCH 04/10] vcs-svn: make apply_delta caller retrieve preimage Jonathan Nieder
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2010-12-10 10:23 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
A blind alley.  But it demonstrates how this works.

 vcs-svn/fast_export.c |   23 +++++++++++++++--------
 1 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index 093ce1d..daac201 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -116,6 +116,19 @@ static const char *get_response_line(void)
 	return response_line.buf;
 }
 
+static off_t cat_mark(uint32_t mark)
+{
+	const char *response;
+	off_t length = length;
+
+	printf("cat-blob :%"PRIu32"\n", mark);
+	fflush(stdout);
+	response = get_response_line();
+	if (parse_cat_response_line(response, &length))
+		die("invalid cat-blob response: %s", response);
+	return length;
+}
+
 static long apply_delta(uint32_t mark, off_t len, struct line_buffer *input,
 			uint32_t old_mark, uint32_t old_mode)
 {
@@ -126,14 +139,8 @@ static long apply_delta(uint32_t mark, off_t len, struct line_buffer *input,
 
 	if (init_postimage() || !(out = buffer_tmpfile_rewind(&postimage)))
 		die("cannot open temporary file for blob retrieval");
-	if (old_mark) {
-		const char *response;
-		printf("cat-blob :%"PRIu32"\n", old_mark);
-		fflush(stdout);
-		response = get_response_line();
-		if (parse_cat_response_line(response, &preimage_len))
-			die("invalid cat-blob response: %s", response);
-	}
+	if (old_mark)
+		preimage_len = cat_mark(old_mark);
 	if (old_mode == REPO_MODE_LNK) {
 		strbuf_addstr(&preimage.buf, "link ");
 		preimage_len += strlen("link ");
-- 
1.7.2.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 04/10] vcs-svn: make apply_delta caller retrieve preimage
  2010-12-10 10:20 [RFC/PATCH 00/10] vcs-svn: prepare for (implement?) incremental import Jonathan Nieder
                   ` (2 preceding siblings ...)
  2010-12-10 10:23 ` [PATCH 03/10] vcs-svn: introduce cat_mark function to retrieve a marked blob Jonathan Nieder
@ 2010-12-10 10:23 ` Jonathan Nieder
  2010-12-10 10:25 ` [PATCH 05/10] vcs-svn: split off function to export result from delta application Jonathan Nieder
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2010-12-10 10:23 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

The preimage argument to apply_delta is currently a mark, but some
callers might want to use a preimage named by sha1 or by revision
number and path instead.  Let the caller take care of that.

The preimage_len argument represents the length of the preimage that
will appear followed by a newline in the REPORT_FD stream, or -1 to
just use an empty preimage.  apply_delta is renamed to delta_apply so
callers that have not been updated can be detected at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
The special-cased behavior of -1 won't need to survive.

 vcs-svn/fast_export.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index daac201..4168184 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -129,25 +129,23 @@ static off_t cat_mark(uint32_t mark)
 	return length;
 }
 
-static long apply_delta(uint32_t mark, off_t len, struct line_buffer *input,
-			uint32_t old_mark, uint32_t old_mode)
+static long delta_apply(uint32_t mark, off_t len, struct line_buffer *input,
+			off_t preimage_len, uint32_t old_mode)
 {
 	long ret;
-	off_t preimage_len = 0;
 	struct view preimage = {REPORT_FILENO, 0, STRBUF_INIT};
 	FILE *out;
 
 	if (init_postimage() || !(out = buffer_tmpfile_rewind(&postimage)))
 		die("cannot open temporary file for blob retrieval");
-	if (old_mark)
-		preimage_len = cat_mark(old_mark);
 	if (old_mode == REPO_MODE_LNK) {
 		strbuf_addstr(&preimage.buf, "link ");
-		preimage_len += strlen("link ");
+		if (preimage_len >= 0)
+			preimage_len += strlen("link ");
 	}
 	if (svndiff0_apply(input, len, &preimage, out))
 		die("cannot apply delta");
-	if (old_mark) {
+	if (preimage_len >= 0) {
 		/* Read the remainder of preimage and trailing newline. */
 		if (move_window(&preimage, preimage_len, 1))
 			die("cannot seek to end of input");
@@ -180,7 +178,9 @@ void fast_export_blob_delta(uint32_t mode, uint32_t mark,
 	long postimage_len;
 	if (len > maximum_signed_value_of_type(off_t))
 		die("enormous delta");
-	postimage_len = apply_delta(mark, (off_t) len, input, old_mark, old_mode);
+	postimage_len = delta_apply(mark, (off_t) len, input,
+						old_mark ? cat_mark(old_mark) : -1,
+						old_mode);
 	if (mode == REPO_MODE_LNK) {
 		buffer_skip_bytes(&postimage, strlen("link "));
 		postimage_len -= strlen("link ");
-- 
1.7.2.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 05/10] vcs-svn: split off function to export result from delta application
  2010-12-10 10:20 [RFC/PATCH 00/10] vcs-svn: prepare for (implement?) incremental import Jonathan Nieder
                   ` (3 preceding siblings ...)
  2010-12-10 10:23 ` [PATCH 04/10] vcs-svn: make apply_delta caller retrieve preimage Jonathan Nieder
@ 2010-12-10 10:25 ` Jonathan Nieder
  2010-12-10 10:26 ` [PATCH 06/10] vcs-svn: do not rely on marks for old blobs Jonathan Nieder
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2010-12-10 10:25 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

In this time of heavy experimentation, we will need multiple
fast_export_blob_delta variants.  Make this easier by factoring
out the common part.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Also not needed, except perhaps as code cleanup.

 vcs-svn/fast_export.c |   20 +++++++++++++-------
 1 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index 4168184..960b252 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -159,6 +159,18 @@ static long delta_apply(uint32_t mark, off_t len, struct line_buffer *input,
 	return ret;
 }
 
+static void record_postimage(uint32_t mark, uint32_t mode,
+				long postimage_len)
+{
+	if (mode == REPO_MODE_LNK) {
+		buffer_skip_bytes(&postimage, strlen("link "));
+		postimage_len -= strlen("link ");
+	}
+	printf("blob\nmark :%"PRIu32"\ndata %ld\n", mark, postimage_len);
+	buffer_copy_bytes(&postimage, postimage_len);
+	fputc('\n', stdout);
+}
+
 void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len, struct line_buffer *input)
 {
 	if (mode == REPO_MODE_LNK) {
@@ -181,11 +193,5 @@ void fast_export_blob_delta(uint32_t mode, uint32_t mark,
 	postimage_len = delta_apply(mark, (off_t) len, input,
 						old_mark ? cat_mark(old_mark) : -1,
 						old_mode);
-	if (mode == REPO_MODE_LNK) {
-		buffer_skip_bytes(&postimage, strlen("link "));
-		postimage_len -= strlen("link ");
-	}
-	printf("blob\nmark :%"PRIu32"\ndata %ld\n", mark, postimage_len);
-	buffer_copy_bytes(&postimage, postimage_len);
-	fputc('\n', stdout);
+	record_postimage(mark, mode, postimage_len);
 }
-- 
1.7.2.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 06/10] vcs-svn: do not rely on marks for old blobs
  2010-12-10 10:20 [RFC/PATCH 00/10] vcs-svn: prepare for (implement?) incremental import Jonathan Nieder
                   ` (4 preceding siblings ...)
  2010-12-10 10:25 ` [PATCH 05/10] vcs-svn: split off function to export result from delta application Jonathan Nieder
@ 2010-12-10 10:26 ` Jonathan Nieder
  2010-12-10 10:27 ` [PATCH 07/10] vcs-svn: split off function to make 'ls' requests Jonathan Nieder
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2010-12-10 10:26 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Retrieve old blobs by name and revision number from fast-import.
One step closer to bounded memory usage in svn-fe.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Superfluous except that it shows how to parse 'ls' responses.
A demo.

 vcs-svn/fast_export.c |   53 +++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/fast_export.h |    3 ++
 vcs-svn/string_pool.c |    2 +-
 vcs-svn/string_pool.h |    2 +-
 vcs-svn/svndump.c     |    6 +++++
 5 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index 960b252..cca9810 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -88,6 +88,21 @@ static int ends_with(const char *s, size_t len, const char *suffix)
 	return !memcmp(s + len - suffixlen, suffix, suffixlen);
 }
 
+static int parse_ls_response_line(const char *line, struct strbuf *objnam)
+{
+	const char *end = line + strlen(line);
+	const char *name, *tab;
+
+	if (end - line < strlen("100644 blob "))
+		return error("ls response too short: %s", line);
+	name = line + strlen("100644 blob ");
+	tab = memchr(name, '\t', end - name);
+	if (!tab)
+		return error("ls response does not contain tab: %s", line);
+	strbuf_add(objnam, name, tab - name);
+	return 0;
+}
+
 static int parse_cat_response_line(const char *header, off_t *len)
 {
 	size_t headerlen = strlen(header);
@@ -129,6 +144,31 @@ static off_t cat_mark(uint32_t mark)
 	return length;
 }
 
+static off_t cat_from_rev(uint32_t rev, const uint32_t *path)
+{
+	const char *response;
+	off_t length = length;
+	struct strbuf blob_name = STRBUF_INIT;
+
+	/* ls :5 "path/to/old/file" */
+	printf("ls :%"PRIu32" \"", rev);
+	pool_print_seq(REPO_MAX_PATH_DEPTH, path, '/', stdout);
+	printf("\"\n");
+	fflush(stdout);
+
+	response = get_response_line();
+	if (parse_ls_response_line(response, &blob_name))
+		die("invalid ls response: %s", response);
+
+	printf("cat-blob %s\n", blob_name.buf);
+	fflush(stdout);
+	response = get_response_line();
+	if (parse_cat_response_line(response, &length))
+		die("invalid cat-blob response: %s", response);
+	strbuf_release(&blob_name);
+	return length;
+}
+
 static long delta_apply(uint32_t mark, off_t len, struct line_buffer *input,
 			off_t preimage_len, uint32_t old_mode)
 {
@@ -195,3 +235,16 @@ void fast_export_blob_delta(uint32_t mode, uint32_t mark,
 						old_mode);
 	record_postimage(mark, mode, postimage_len);
 }
+
+void fast_export_blob_delta_rev(uint32_t mode, uint32_t mark, uint32_t old_mode,
+				uint32_t old_rev, const uint32_t *old_path,
+				uint32_t len, struct line_buffer *input)
+{
+	long postimage_len;
+	if (len > maximum_signed_value_of_type(off_t))
+		die("enormous delta");
+	postimage_len = delta_apply(mark, (off_t) len, input,
+					cat_from_rev(old_rev, old_path),
+					old_mode);
+	record_postimage(mark, mode, postimage_len);
+}
diff --git a/vcs-svn/fast_export.h b/vcs-svn/fast_export.h
index 6f77c3b..487d3d4 100644
--- a/vcs-svn/fast_export.h
+++ b/vcs-svn/fast_export.h
@@ -13,5 +13,8 @@ void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len,
 void fast_export_blob_delta(uint32_t mode, uint32_t mark,
 			uint32_t old_mode, uint32_t old_mark,
 			uint32_t len, struct line_buffer *input);
+void fast_export_blob_delta_rev(uint32_t mode, uint32_t mark, uint32_t old_mode,
+			uint32_t old_rev, const uint32_t *old_path,
+			uint32_t len, struct line_buffer *input);
 
 #endif
diff --git a/vcs-svn/string_pool.c b/vcs-svn/string_pool.c
index f5b1da8..c08abac 100644
--- a/vcs-svn/string_pool.c
+++ b/vcs-svn/string_pool.c
@@ -65,7 +65,7 @@ uint32_t pool_tok_r(char *str, const char *delim, char **saveptr)
 	return token ? pool_intern(token) : ~0;
 }
 
-void pool_print_seq(uint32_t len, uint32_t *seq, char delim, FILE *stream)
+void pool_print_seq(uint32_t len, const uint32_t *seq, char delim, FILE *stream)
 {
 	uint32_t i;
 	for (i = 0; i < len && ~seq[i]; i++) {
diff --git a/vcs-svn/string_pool.h b/vcs-svn/string_pool.h
index 222fb66..3720cf8 100644
--- a/vcs-svn/string_pool.h
+++ b/vcs-svn/string_pool.h
@@ -4,7 +4,7 @@
 uint32_t pool_intern(const char *key);
 const char *pool_fetch(uint32_t entry);
 uint32_t pool_tok_r(char *str, const char *delim, char **saveptr);
-void pool_print_seq(uint32_t len, uint32_t *seq, char delim, FILE *stream);
+void pool_print_seq(uint32_t len, const uint32_t *seq, char delim, FILE *stream);
 uint32_t pool_tok_seq(uint32_t sz, uint32_t *seq, const char *delim, char *str);
 void pool_reset(void);
 
diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
index c6d6337..da968fa 100644
--- a/vcs-svn/svndump.c
+++ b/vcs-svn/svndump.c
@@ -259,6 +259,12 @@ static void handle_node(void)
 		fast_export_blob(node_ctx.type, mark, node_ctx.textLength, &input);
 		return;
 	}
+	if (node_ctx.srcRev) {
+		fast_export_blob_delta_rev(node_ctx.type, mark, old_mode,
+					node_ctx.srcRev, node_ctx.src,
+					node_ctx.textLength, &input);
+		return;
+	}
 	fast_export_blob_delta(node_ctx.type, mark, old_mode, old_mark,
 				node_ctx.textLength, &input);
 }
-- 
1.7.2.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 07/10] vcs-svn: split off function to make 'ls' requests
  2010-12-10 10:20 [RFC/PATCH 00/10] vcs-svn: prepare for (implement?) incremental import Jonathan Nieder
                   ` (5 preceding siblings ...)
  2010-12-10 10:26 ` [PATCH 06/10] vcs-svn: do not rely on marks for old blobs Jonathan Nieder
@ 2010-12-10 10:27 ` Jonathan Nieder
  2010-12-10 10:28 ` [PATCH 08/10] vcs-svn: prepare to eliminate repo_tree structure Jonathan Nieder
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2010-12-10 10:27 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
ls_from_rev will survive; cat_from_rev will not.

 vcs-svn/fast_export.c |   16 ++++++++++------
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index cca9810..6a4a689 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -80,6 +80,15 @@ void fast_export_commit(uint32_t revision, uint32_t author, char *log,
 	printf("progress Imported commit %"PRIu32".\n\n", revision);
 }
 
+static void ls_from_rev(uint32_t rev, const uint32_t *path)
+{
+	/* ls :5 "path/to/old/file" */
+	printf("ls :%"PRIu32" \"", rev);
+	pool_print_seq(REPO_MAX_PATH_DEPTH, path, '/', stdout);
+	printf("\"\n");
+	fflush(stdout);
+}
+
 static int ends_with(const char *s, size_t len, const char *suffix)
 {
 	const size_t suffixlen = strlen(suffix);
@@ -150,12 +159,7 @@ static off_t cat_from_rev(uint32_t rev, const uint32_t *path)
 	off_t length = length;
 	struct strbuf blob_name = STRBUF_INIT;
 
-	/* ls :5 "path/to/old/file" */
-	printf("ls :%"PRIu32" \"", rev);
-	pool_print_seq(REPO_MAX_PATH_DEPTH, path, '/', stdout);
-	printf("\"\n");
-	fflush(stdout);
-
+	ls_from_rev(rev, path);
 	response = get_response_line();
 	if (parse_ls_response_line(response, &blob_name))
 		die("invalid ls response: %s", response);
-- 
1.7.2.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 08/10] vcs-svn: prepare to eliminate repo_tree structure
  2010-12-10 10:20 [RFC/PATCH 00/10] vcs-svn: prepare for (implement?) incremental import Jonathan Nieder
                   ` (6 preceding siblings ...)
  2010-12-10 10:27 ` [PATCH 07/10] vcs-svn: split off function to make 'ls' requests Jonathan Nieder
@ 2010-12-10 10:28 ` Jonathan Nieder
  2011-03-06 12:52   ` [PATCH v2] " Jonathan Nieder
  2010-12-10 10:30 ` [PATCH 09/10] vcs-svn: simplifications for repo_modify_path et al Jonathan Nieder
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 37+ messages in thread
From: Jonathan Nieder @ 2010-12-10 10:28 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Currently svn-fe processes each commit in two stages: first decide
on the correct content for all paths and export the relevant blobs,
then export a commit with the result.

But we can keep less state and simplify svn-fe a great deal by
doing exporting the commit in one stage: use 'inline' blobs for
each path and remember nothing.  This way, the repo_tree structure
could be eliminated, and we would get support for incremental
imports 'for free'.

Reorganize handle_node() along these lines.  This is just a code
cleanup; the functional change to repo_tree will come later.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/svndump.c |   32 +++++++++++++++++++++++---------
 1 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
index da968fa..649a468 100644
--- a/vcs-svn/svndump.c
+++ b/vcs-svn/svndump.c
@@ -201,11 +201,12 @@ static void handle_node(void)
 	uint32_t mark = 0, old_mode, old_mark;
 	const uint32_t type = node_ctx.type;
 	const int have_props = node_ctx.propLength != LENGTH_UNKNOWN;
+	const int have_text = node_ctx.textLength != LENGTH_UNKNOWN;
 
-	if (node_ctx.textLength != LENGTH_UNKNOWN)
+	if (have_text)
 		mark = next_blob_mark();
 	if (node_ctx.action == NODEACT_DELETE) {
-		if (mark || have_props || node_ctx.srcRev)
+		if (have_text || have_props || node_ctx.srcRev)
 			die("invalid dump: deletion node has "
 				"copyfrom info, text, or properties");
 		return repo_delete(node_ctx.dst);
@@ -219,8 +220,13 @@ static void handle_node(void)
 		if (node_ctx.action == NODEACT_ADD)
 			node_ctx.action = NODEACT_CHANGE;
 	}
-	if (mark && type == REPO_MODE_DIR)
+	if (have_text && type == REPO_MODE_DIR)
 		die("invalid dump: directories cannot have text attached");
+
+	/*
+	 * Find old content (old_mark) and decide on the new content (mark)
+	 * and mode (node_ctx.type).
+	 */
 	if (node_ctx.action == NODEACT_CHANGE && !~*node_ctx.dst) {
 		if (type != REPO_MODE_DIR)
 			die("invalid dump: root of tree is not a regular file");
@@ -228,7 +234,9 @@ static void handle_node(void)
 	} else if (node_ctx.action == NODEACT_CHANGE) {
 		uint32_t mode;
 		old_mark = repo_read_path(node_ctx.dst);
-		mode = repo_modify_path(node_ctx.dst, 0, mark);
+		if (!have_text)
+			mark = old_mark;
+		mode = repo_modify_path(node_ctx.dst, 0, 0);
 		if (!mode)
 			die("invalid dump: path to be modified is missing");
 		if (mode == REPO_MODE_DIR && type != REPO_MODE_DIR)
@@ -237,23 +245,29 @@ static void handle_node(void)
 			die("invalid dump: cannot modify a file into a directory");
 		node_ctx.type = mode;
 	} else if (node_ctx.action == NODEACT_ADD) {
-		if (!mark && type != REPO_MODE_DIR)
+		if (!have_text && type != REPO_MODE_DIR)
 			die("invalid dump: adds node without text");
-		repo_add(node_ctx.dst, type, mark);
 		old_mark = 0;
 	} else {
 		die("invalid dump: Node-path block lacks Node-action");
 	}
+
+	/*
+	 * Adjust mode to reflect properties.
+	 */
 	old_mode = node_ctx.type;
 	if (have_props) {
 		if (!node_ctx.prop_delta)
 			node_ctx.type = type;
 		if (node_ctx.propLength)
 			read_props();
-		if (node_ctx.type != old_mode)
-			repo_modify_path(node_ctx.dst, node_ctx.type, mark);
 	}
-	if (!mark)
+
+	/*
+	 * Save the result.
+	 */
+	repo_add(node_ctx.dst, node_ctx.type, mark);
+	if (!have_text)
 		return;
 	if (!node_ctx.text_delta) {
 		fast_export_blob(node_ctx.type, mark, node_ctx.textLength, &input);
-- 
1.7.2.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 09/10] vcs-svn: simplifications for repo_modify_path et al
  2010-12-10 10:20 [RFC/PATCH 00/10] vcs-svn: prepare for (implement?) incremental import Jonathan Nieder
                   ` (7 preceding siblings ...)
  2010-12-10 10:28 ` [PATCH 08/10] vcs-svn: prepare to eliminate repo_tree structure Jonathan Nieder
@ 2010-12-10 10:30 ` Jonathan Nieder
  2010-12-10 10:33 ` [PATCH 10/10] vcs-svn: eliminate repo_tree structure Jonathan Nieder
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2010-12-10 10:30 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Restrict repo_modify_path API to functions that are actually
needed.  That is:

 - decouple reading the mode and content of dirents from
   other operations
 - remove repo_modify_path.  It was only used to read the
   mode from dirents.
 - remove the ability to use repo_read_mode on a missing
   path.  The existing code only errored out in that case,
   anyway.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/repo_tree.c |   25 +++++++++----------------
 vcs-svn/repo_tree.h |    4 ++--
 vcs-svn/svndump.c   |    4 +---
 3 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/vcs-svn/repo_tree.c b/vcs-svn/repo_tree.c
index a4d8340..4d98185 100644
--- a/vcs-svn/repo_tree.c
+++ b/vcs-svn/repo_tree.c
@@ -166,7 +166,15 @@ uint32_t repo_read_path(uint32_t *path)
 	return content_offset;
 }
 
-uint32_t repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst)
+uint32_t repo_read_mode(const uint32_t *path)
+{
+	struct repo_dirent *dent = repo_read_dirent(active_commit, path);
+	if (dent == NULL)
+		die("invalid dump: path to be modified is missing");
+	return dent->mode;
+}
+
+void repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst)
 {
 	uint32_t mode = 0, content_offset = 0;
 	struct repo_dirent *src_dent;
@@ -176,7 +184,6 @@ uint32_t repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst)
 		content_offset = src_dent->content_offset;
 		repo_write_dirent(dst, mode, content_offset, 0);
 	}
-	return mode;
 }
 
 void repo_add(uint32_t *path, uint32_t mode, uint32_t blob_mark)
@@ -184,20 +191,6 @@ void repo_add(uint32_t *path, uint32_t mode, uint32_t blob_mark)
 	repo_write_dirent(path, mode, blob_mark, 0);
 }
 
-uint32_t repo_modify_path(uint32_t *path, uint32_t mode, uint32_t blob_mark)
-{
-	struct repo_dirent *src_dent;
-	src_dent = repo_read_dirent(active_commit, path);
-	if (!src_dent)
-		return 0;
-	if (!blob_mark)
-		blob_mark = src_dent->content_offset;
-	if (!mode)
-		mode = src_dent->mode;
-	repo_write_dirent(path, mode, blob_mark, 0);
-	return mode;
-}
-
 void repo_delete(uint32_t *path)
 {
 	repo_write_dirent(path, 0, 0, 1);
diff --git a/vcs-svn/repo_tree.h b/vcs-svn/repo_tree.h
index 7070839..0499a19 100644
--- a/vcs-svn/repo_tree.h
+++ b/vcs-svn/repo_tree.h
@@ -12,10 +12,10 @@
 #define REPO_MAX_PATH_DEPTH 1000
 
 uint32_t next_blob_mark(void);
-uint32_t repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst);
+void repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst);
 void repo_add(uint32_t *path, uint32_t mode, uint32_t blob_mark);
-uint32_t repo_modify_path(uint32_t *path, uint32_t mode, uint32_t blob_mark);
 uint32_t repo_read_path(uint32_t *path);
+uint32_t repo_read_mode(const uint32_t *path);
 void repo_delete(uint32_t *path);
 void repo_commit(uint32_t revision, uint32_t author, char *log, uint32_t uuid,
 		 uint32_t url, long unsigned timestamp);
diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
index 649a468..31c6056 100644
--- a/vcs-svn/svndump.c
+++ b/vcs-svn/svndump.c
@@ -236,9 +236,7 @@ static void handle_node(void)
 		old_mark = repo_read_path(node_ctx.dst);
 		if (!have_text)
 			mark = old_mark;
-		mode = repo_modify_path(node_ctx.dst, 0, 0);
-		if (!mode)
-			die("invalid dump: path to be modified is missing");
+		mode = repo_read_mode(node_ctx.dst);
 		if (mode == REPO_MODE_DIR && type != REPO_MODE_DIR)
 			die("invalid dump: cannot modify a directory into a file");
 		if (mode != REPO_MODE_DIR && type == REPO_MODE_DIR)
-- 
1.7.2.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 10/10] vcs-svn: eliminate repo_tree structure
  2010-12-10 10:20 [RFC/PATCH 00/10] vcs-svn: prepare for (implement?) incremental import Jonathan Nieder
                   ` (8 preceding siblings ...)
  2010-12-10 10:30 ` [PATCH 09/10] vcs-svn: simplifications for repo_modify_path et al Jonathan Nieder
@ 2010-12-10 10:33 ` Jonathan Nieder
       [not found] ` <C59168D0-B409-4A83-B96C-8CCD42D0B62F@cordelta.com>
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2010-12-10 10:33 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Rely on fast-import for information about previous revs.

This requires always setting up backward flow of information,
even for v2 dumps.  On the plus side, it simplifies the code
by quite a bit and opens the door to further simplifications.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
That's the end of the series.  Sorry for its rough state; just thought
it would be useful to get this out early so other tools can be built
on it.  No doubt it is terribly buggy and ugly.  Reports of both sorts
of problem would be greatly appreciated.

 cache.h               |    2 +
 fast-import.c         |   42 +++++--
 t/t9010-svn-fe.sh     |   55 +++++----
 vcs-svn/fast_export.c |  173 +++++++++++++++------------
 vcs-svn/fast_export.h |   25 +++--
 vcs-svn/repo_tree.c   |  310 +++----------------------------------------------
 vcs-svn/repo_tree.h   |    2 +-
 vcs-svn/svndump.c     |   73 +++++++-----
 8 files changed, 239 insertions(+), 443 deletions(-)

diff --git a/cache.h b/cache.h
index 33decd9..33e69eb 100644
--- a/cache.h
+++ b/cache.h
@@ -678,6 +678,8 @@ static inline void hashclr(unsigned char *hash)
 #define EMPTY_TREE_SHA1_BIN \
 	 "\x4b\x82\x5d\xc6\x42\xcb\x6e\xb9\xa0\x60" \
 	 "\xe5\x4b\xf8\xd6\x92\x88\xfb\xee\x49\x04"
+#define EMPTY_BLOB_SHA1_HEX \
+	"e69de29bb2d1d6434b8b29ae775ad8c2e48c5391"
 
 int git_mkstemp(char *path, size_t n, const char *template);
 
diff --git a/fast-import.c b/fast-import.c
index 670f4f5..e62f34d 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2862,12 +2862,20 @@ static struct object_entry *parse_treeish_dataref(const char **r)
 
 static void print_ls(int mode, unsigned char *sha1, char *path)
 {
-	enum object_type type;
 	struct strbuf line = STRBUF_INIT;
-	type = sha1_object_info(sha1, NULL);
+	const char *type;
+
+	/* See show_tree(). */
+	if (S_ISGITLINK(mode))
+		type = commit_type;
+	else if (S_ISDIR(mode))
+		type = tree_type;
+	else
+		type = blob_type;
+
 	/* mode SP type SP object_name TAB path LF */
-	strbuf_addf(&line, "%o %s %s\t%s\n",
-			mode, typename(type), sha1_to_hex(sha1), path);
+	strbuf_addf(&line, "%06o %s %s\t%s\n",
+			mode, type, sha1_to_hex(sha1), path);
 	cat_blob_write(line.buf, line.len);
 	strbuf_release(&line);
 }
@@ -2898,17 +2906,25 @@ static void parse_ls(struct branch *b)
 	if (*p)
 		die("Garbage after path: %s", command_buf.buf);
 	tree_content_get(root, uq.buf, &leaf);
-	if (!leaf.versions[1].mode)
-		die("Path %s not in branch", uq.buf);
+	if (!leaf.versions[1].mode) {
+		/*
+		 * NEEDSWORK.  Missing path?  Must be an empty directory!
+		 *
+		 * Should find a nicer way to report this --- e.g.
+		 *	die("Path %s not in branch", uq.buf);
+		 */
+		print_ls(S_IFDIR, (unsigned char *) EMPTY_TREE_SHA1_BIN, uq.buf);
+	} else {
+		/*
+		 * A directory in preparation would have a sha1 of zero
+		 * until it is saved.  Save, for simplicity.
+		 */
+		if (S_ISDIR(leaf.versions[1].mode))
+			store_tree(&leaf);
 
-	/*
-	 * A directory in preparation would have a sha1 of zero
-	 * until it is saved.  Save, for simplicity.
-	 */
-	if (S_ISDIR(leaf.versions[1].mode))
-		store_tree(&leaf);
+		print_ls(leaf.versions[1].mode, leaf.versions[1].sha1, uq.buf);
+	}
 
-	print_ls(leaf.versions[1].mode, leaf.versions[1].sha1, uq.buf);
 	strbuf_release(&uq);
 	if (!b || root != &b->branch_tree)
 		release_tree_entry(root);
diff --git a/t/t9010-svn-fe.sh b/t/t9010-svn-fe.sh
index 022e3e3..34bb9f6 100755
--- a/t/t9010-svn-fe.sh
+++ b/t/t9010-svn-fe.sh
@@ -53,8 +53,7 @@ test_expect_success 'empty dump' '
 test_expect_success 'v4 dumps not supported' '
 	reinit_git &&
 	echo "SVN-fs-dump-format-version: 4" >v4.dump &&
-	test_must_fail test-svn-fe v4.dump >stream &&
-	test_cmp empty stream
+	test_must_fail test-svn-fe v4.dump
 '
 
 test_expect_failure 'empty revision' '
@@ -301,8 +300,9 @@ test_expect_success 'action: add node without text' '
 	test_must_fail test-svn-fe textless.dump
 '
 
-test_expect_failure 'change file mode but keep old content' '
+test_expect_failure PIPE 'change file mode but keep old content' '
 	reinit_git &&
+	rm -f backflow &&
 	cat >expect <<-\EOF &&
 	OBJID
 	:120000 100644 OBJID OBJID T	greeting
@@ -364,8 +364,9 @@ test_expect_failure 'change file mode but keep old content' '
 
 	PROPS-END
 	EOF
-	test-svn-fe filemode.dump >stream &&
-	git fast-import <stream &&
+	mkfifo backflow &&
+	test-svn-fe filemode.dump 3<backflow |
+	git fast-import --cat-blob-fd=3 3>backflow &&
 	{
 		git rev-list HEAD |
 		git diff-tree --root --stdin |
@@ -378,8 +379,9 @@ test_expect_failure 'change file mode but keep old content' '
 	test_cmp hello actual.target
 '
 
-test_expect_success 'change file mode and reiterate content' '
+test_expect_success PIPE 'change file mode and reiterate content' '
 	reinit_git &&
+	rm -f backflow &&
 	cat >expect <<-\EOF &&
 	OBJID
 	:120000 100644 OBJID OBJID T	greeting
@@ -390,7 +392,7 @@ test_expect_success 'change file mode and reiterate content' '
 	EOF
 	echo "link hello" >expect.blob &&
 	echo hello >hello &&
-	cat >filemode.dump <<-\EOF &&
+	cat >filemode2.dump <<-\EOF &&
 	SVN-fs-dump-format-version: 3
 
 	Revision-number: 1
@@ -445,8 +447,9 @@ test_expect_success 'change file mode and reiterate content' '
 	PROPS-END
 	link hello
 	EOF
-	test-svn-fe filemode.dump >stream &&
-	git fast-import <stream &&
+	mkfifo backflow &&
+	test-svn-fe filemode2.dump 3<backflow |
+	git fast-import --cat-blob-fd=3 3>backflow &&
 	{
 		git rev-list HEAD |
 		git diff-tree --root --stdin |
@@ -522,12 +525,13 @@ test_expect_success PIPE 'deltas supported' '
 		cat delta
 	} >delta.dump &&
 	mkfifo backflow &&
-	test_must_fail test-svn-fe delta.dump 3<backflow |
+	test-svn-fe delta.dump 3<backflow |
 	git fast-import --cat-blob-fd=3 3>backflow
 '
 
-test_expect_success 'property deltas supported' '
+test_expect_success PIPE 'property deltas supported' '
 	reinit_git &&
+	rm -f backflow &&
 	cat >expect <<-\EOF &&
 	OBJID
 	:100755 100644 OBJID OBJID M	script.sh
@@ -582,8 +586,9 @@ test_expect_success 'property deltas supported' '
 		PROPS-END
 		EOF
 	} >propdelta.dump &&
-	test-svn-fe propdelta.dump >stream &&
-	git fast-import <stream &&
+	mkfifo backflow &&
+	test-svn-fe propdelta.dump 3<backflow |
+	git fast-import --cat-blob-fd=3 3>backflow &&
 	{
 		git rev-list HEAD |
 		git diff-tree --stdin |
@@ -592,8 +597,9 @@ test_expect_success 'property deltas supported' '
 	test_cmp expect actual
 '
 
-test_expect_success 'properties on /' '
+test_expect_success PIPE 'properties on /' '
 	reinit_git &&
+	rm -f backflow &&
 	cat <<-\EOF >expect &&
 	OBJID
 	OBJID
@@ -637,8 +643,9 @@ test_expect_success 'properties on /' '
 
 	PROPS-END
 	EOF
-	test-svn-fe changeroot.dump >stream &&
-	git fast-import <stream &&
+	mkfifo backflow &&
+	test-svn-fe changeroot.dump 3<backflow |
+	git fast-import --cat-blob-fd=3 3>backflow &&
 	{
 		git rev-list HEAD |
 		git diff-tree --root --always --stdin |
@@ -647,8 +654,9 @@ test_expect_success 'properties on /' '
 	test_cmp expect actual
 '
 
-test_expect_success 'deltas for typechange' '
+test_expect_success PIPE 'deltas for typechange' '
 	reinit_git &&
+	rm -f backflow &&
 	cat >expect <<-\EOF &&
 	OBJID
 	:120000 100644 OBJID OBJID T	test-file
@@ -723,8 +731,9 @@ test_expect_success 'deltas for typechange' '
 	PROPS-END
 	link testing 321
 	EOF
-	test-svn-fe deleteprop.dump >stream &&
-	git fast-import <stream &&
+	mkfifo backflow &&
+	test-svn-fe deleteprop.dump 3<backflow |
+	git fast-import --cat-blob-fd=3 3>backflow &&
 	{
 		git rev-list HEAD |
 		git diff-tree --root --stdin |
@@ -841,15 +850,17 @@ test_expect_success PIPE 'deltas need not consume the whole preimage' '
 	test_cmp expect.3 actual.3
 '
 
-test_expect_success 't9135/svn.dump' '
+test_expect_success PIPE 't9135/svn.dump' '
+	rm -f backflow &&
 	svnadmin create simple-svn &&
 	svnadmin load simple-svn <"$TEST_DIRECTORY/t9135/svn.dump" &&
 	svn_cmd export "file://$PWD/simple-svn" simple-svnco &&
 	git init simple-git &&
-	test-svn-fe "$TEST_DIRECTORY/t9135/svn.dump" >simple.fe &&
+	mkfifo backflow &&
+	test-svn-fe "$TEST_DIRECTORY/t9135/svn.dump" 3<backflow |
 	(
 		cd simple-git &&
-		git fast-import <../simple.fe
+		git fast-import --cat-blob-fd=3 3>../backflow
 	) &&
 	(
 		cd simple-svnco &&
diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index 6a4a689..96f6023 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -37,17 +37,17 @@ void fast_export_delete(uint32_t depth, uint32_t *path)
 	putchar('\n');
 }
 
-void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
-			uint32_t mark)
+void fast_export_modify(uint32_t depth, const uint32_t *path, uint32_t mode,
+			const char *dataref)
 {
 	/* Mode must be 100644, 100755, 120000, or 160000. */
-	printf("M %06"PRIo32" :%"PRIu32" ", mode, mark);
+	printf("M %06"PRIo32" %s ", mode, dataref);
 	pool_print_seq(depth, path, '/', stdout);
 	putchar('\n');
 }
 
 static char gitsvnline[MAX_GITSVN_LINE_LEN];
-void fast_export_commit(uint32_t revision, uint32_t author, char *log,
+void fast_export_begin_commit(uint32_t revision, uint32_t author, char *log,
 			uint32_t uuid, uint32_t url,
 			unsigned long timestamp)
 {
@@ -74,10 +74,20 @@ void fast_export_commit(uint32_t revision, uint32_t author, char *log,
 			printf("from refs/heads/master^0\n");
 		first_commit_done = 1;
 	}
-	repo_diff(revision - 1, revision);
-	fputc('\n', stdout);
+}
 
-	printf("progress Imported commit %"PRIu32".\n\n", revision);
+void fast_export_end_commit(uint32_t revision)
+{
+	printf("\nprogress Imported commit %"PRIu32".\n\n", revision);
+}
+
+static struct strbuf response_line = STRBUF_INIT;
+static const char *get_response_line(void)
+{
+	strbuf_reset(&response_line);
+	if (fd_read_line(&response_line, REPORT_FILENO))
+		return NULL;
+	return response_line.buf;
 }
 
 static void ls_from_rev(uint32_t rev, const uint32_t *path)
@@ -89,6 +99,56 @@ static void ls_from_rev(uint32_t rev, const uint32_t *path)
 	fflush(stdout);
 }
 
+static void parse_ls_response(const char *response, uint32_t *mode,
+					struct strbuf *dataref)
+{
+	const char *tab;
+	const char *response_end;
+
+	if (!response)
+		die("cannot read ls response");
+	response_end = response + strlen(response);
+
+	/* Mode. */
+	if (response_end - response < strlen("100644") ||
+	    response[strlen("100644")] != ' ')
+		die("invalid ls response: missing mode: %s", response);
+	*mode = 0;
+	for (; *response != ' '; response++) {
+		*mode *= 8;
+		*mode += (*response - '0');
+	}
+
+	/* ' blob ' or ' tree ' */
+	if (response_end - response < strlen(" blob "))
+		die("invalid ls response: missing type: %s", response);
+	response += strlen(" blob ");
+
+	/* Dataref. */
+	tab = memchr(response, '\t', response_end - response);
+	if (!tab)
+		die("invalid ls response: missing tab: %s", response);
+	strbuf_add(dataref, response, tab - response);
+}
+
+void fast_export_ls_rev(uint32_t rev, const uint32_t *path,
+				uint32_t *mode, struct strbuf *dataref)
+{
+	ls_from_rev(rev, path);
+	parse_ls_response(get_response_line(), mode, dataref);
+}
+
+/* Read directory entry from the active commit. */
+void fast_export_ls(const uint32_t *path,
+				uint32_t *mode, struct strbuf *dataref)
+{
+	printf("ls \"");
+	pool_print_seq(REPO_MAX_PATH_DEPTH, path, '/', stdout);
+	printf("\"\n");
+	fflush(stdout);
+	parse_ls_response(get_response_line(), mode, dataref);
+}
+
 static int ends_with(const char *s, size_t len, const char *suffix)
 {
 	const size_t suffixlen = strlen(suffix);
@@ -97,27 +157,15 @@ static int ends_with(const char *s, size_t len, const char *suffix)
 	return !memcmp(s + len - suffixlen, suffix, suffixlen);
 }
 
-static int parse_ls_response_line(const char *line, struct strbuf *objnam)
-{
-	const char *end = line + strlen(line);
-	const char *name, *tab;
-
-	if (end - line < strlen("100644 blob "))
-		return error("ls response too short: %s", line);
-	name = line + strlen("100644 blob ");
-	tab = memchr(name, '\t', end - name);
-	if (!tab)
-		return error("ls response does not contain tab: %s", line);
-	strbuf_add(objnam, name, tab - name);
-	return 0;
-}
-
 static int parse_cat_response_line(const char *header, off_t *len)
 {
-	size_t headerlen = strlen(header);
+	size_t headerlen;
 	const char *type;
 	const char *end;
 
+	if (!header)
+		return error("missing cat-blob response");
+	headerlen = strlen(header);
 	if (ends_with(header, headerlen, " missing"))
 		return error("cat-blob reports missing blob: %s", header);
 	type = memmem(header, headerlen, " blob ", strlen(" blob "));
@@ -131,49 +179,22 @@ static int parse_cat_response_line(const char *header, off_t *len)
 	return 0;
 }
 
-static struct strbuf response_line = STRBUF_INIT;
-static const char *get_response_line(void)
+static off_t cat_dataref(const char *dataref)
 {
-	strbuf_reset(&response_line);
-	if (fd_read_line(&response_line, REPORT_FILENO))
-		return NULL;
-	return response_line.buf;
-}
-
-static off_t cat_mark(uint32_t mark)
-{
-	const char *response;
 	off_t length = length;
-
-	printf("cat-blob :%"PRIu32"\n", mark);
-	fflush(stdout);
-	response = get_response_line();
-	if (parse_cat_response_line(response, &length))
-		die("invalid cat-blob response: %s", response);
-	return length;
-}
-
-static off_t cat_from_rev(uint32_t rev, const uint32_t *path)
-{
 	const char *response;
-	off_t length = length;
-	struct strbuf blob_name = STRBUF_INIT;
-
-	ls_from_rev(rev, path);
-	response = get_response_line();
-	if (parse_ls_response_line(response, &blob_name))
-		die("invalid ls response: %s", response);
 
-	printf("cat-blob %s\n", blob_name.buf);
+	if (!dataref)
+		die("BUG: null data reference");
+	printf("cat-blob %s\n", dataref);
 	fflush(stdout);
 	response = get_response_line();
 	if (parse_cat_response_line(response, &length))
-		die("invalid cat-blob response: %s", response);
-	strbuf_release(&blob_name);
+		die("invalid cat-blob response");
 	return length;
 }
 
-static long delta_apply(uint32_t mark, off_t len, struct line_buffer *input,
+static long delta_apply(off_t len, struct line_buffer *input,
 			off_t preimage_len, uint32_t old_mode)
 {
 	long ret;
@@ -203,52 +224,50 @@ static long delta_apply(uint32_t mark, off_t len, struct line_buffer *input,
 	return ret;
 }
 
-static void record_postimage(uint32_t mark, uint32_t mode,
-				long postimage_len)
+static void record_postimage(uint32_t mode, long postimage_len)
 {
 	if (mode == REPO_MODE_LNK) {
 		buffer_skip_bytes(&postimage, strlen("link "));
 		postimage_len -= strlen("link ");
 	}
-	printf("blob\nmark :%"PRIu32"\ndata %ld\n", mark, postimage_len);
+	printf("data %ld\n", postimage_len);
 	buffer_copy_bytes(&postimage, postimage_len);
 	fputc('\n', stdout);
 }
 
-void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len, struct line_buffer *input)
+void fast_export_data(uint32_t mode, uint32_t len, struct line_buffer *input)
 {
 	if (mode == REPO_MODE_LNK) {
 		/* svn symlink blobs start with "link " */
 		buffer_skip_bytes(input, 5);
 		len -= 5;
 	}
-	printf("blob\nmark :%"PRIu32"\ndata %"PRIu32"\n", mark, len);
+	printf("data %"PRIu32"\n", len);
 	buffer_copy_bytes(input, len);
 	fputc('\n', stdout);
 }
 
-void fast_export_blob_delta(uint32_t mode, uint32_t mark,
-				uint32_t old_mode, uint32_t old_mark,
-				uint32_t len, struct line_buffer *input)
+void fast_export_empty_blob(void)
 {
-	long postimage_len;
-	if (len > maximum_signed_value_of_type(off_t))
-		die("enormous delta");
-	postimage_len = delta_apply(mark, (off_t) len, input,
-						old_mark ? cat_mark(old_mark) : -1,
-						old_mode);
-	record_postimage(mark, mode, postimage_len);
+	printf("blob\ndata 0\n\n");
 }
 
-void fast_export_blob_delta_rev(uint32_t mode, uint32_t mark, uint32_t old_mode,
-				uint32_t old_rev, const uint32_t *old_path,
+void fast_export_delta(const uint32_t *path, uint32_t mode, uint32_t old_mode,
+				const char *dataref,
 				uint32_t len, struct line_buffer *input)
 {
+	off_t preimage_len;
 	long postimage_len;
 	if (len > maximum_signed_value_of_type(off_t))
 		die("enormous delta");
-	postimage_len = delta_apply(mark, (off_t) len, input,
-					cat_from_rev(old_rev, old_path),
-					old_mode);
-	record_postimage(mark, mode, postimage_len);
+	preimage_len = cat_dataref(dataref);
+
+	/*
+	 * NEEDSWORK: Will deadlock with very long paths.
+	 */
+	fast_export_modify(REPO_MAX_PATH_DEPTH, path, mode, "inline");
+
+	postimage_len = delta_apply((off_t) len, input,
+						preimage_len, old_mode);
+	record_postimage(mode, postimage_len);
 }
diff --git a/vcs-svn/fast_export.h b/vcs-svn/fast_export.h
index 487d3d4..8415d2c 100644
--- a/vcs-svn/fast_export.h
+++ b/vcs-svn/fast_export.h
@@ -3,18 +3,23 @@
 
 #include "line_buffer.h"
 
+/* Output routines */
 void fast_export_delete(uint32_t depth, uint32_t *path);
-void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
-			uint32_t mark);
-void fast_export_commit(uint32_t revision, uint32_t author, char *log,
+void fast_export_modify(uint32_t depth, const uint32_t *path, uint32_t mode,
+			const char *dataref);
+void fast_export_begin_commit(uint32_t revision, uint32_t author, char *log,
 			uint32_t uuid, uint32_t url, unsigned long timestamp);
-void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len,
-		      struct line_buffer *input);
-void fast_export_blob_delta(uint32_t mode, uint32_t mark,
-			uint32_t old_mode, uint32_t old_mark,
-			uint32_t len, struct line_buffer *input);
-void fast_export_blob_delta_rev(uint32_t mode, uint32_t mark, uint32_t old_mode,
-			uint32_t old_rev, const uint32_t *old_path,
+void fast_export_end_commit(uint32_t revision);
+void fast_export_empty_blob(void);
+void fast_export_data(uint32_t mode, uint32_t len, struct line_buffer *input);
+void fast_export_delta(const uint32_t *path, uint32_t mode, uint32_t old_mode,
+			const char *dataref,
 			uint32_t len, struct line_buffer *input);
 
+/* Input routines */
+void fast_export_ls_rev(uint32_t rev, const uint32_t *path,
+			uint32_t *mode_out, struct strbuf *dataref_out);
+void fast_export_ls(const uint32_t *path,
+			uint32_t *mode_out, struct strbuf *dataref_out);
+
 #endif
diff --git a/vcs-svn/repo_tree.c b/vcs-svn/repo_tree.c
index 4d98185..1283089 100644
--- a/vcs-svn/repo_tree.c
+++ b/vcs-svn/repo_tree.c
@@ -4,321 +4,49 @@
  */
 
 #include "git-compat-util.h"
-
-#include "string_pool.h"
+#include "strbuf.h"
 #include "repo_tree.h"
-#include "obj_pool.h"
 #include "fast_export.h"
 
-#include "trp.h"
-
-struct repo_dirent {
-	uint32_t name_offset;
-	struct trp_node children;
-	uint32_t mode;
-	uint32_t content_offset;
-};
-
-struct repo_dir {
-	struct trp_root entries;
-};
-
-struct repo_commit {
-	uint32_t root_dir_offset;
-};
-
-/* Memory pools for commit, dir and dirent */
-obj_pool_gen(commit, struct repo_commit, 4096)
-obj_pool_gen(dir, struct repo_dir, 4096)
-obj_pool_gen(dent, struct repo_dirent, 4096)
-
-static uint32_t active_commit;
-static uint32_t mark;
-
-static int repo_dirent_name_cmp(const void *a, const void *b);
-
-/* Treap for directory entries */
-trp_gen(static, dent_, struct repo_dirent, children, dent, repo_dirent_name_cmp);
-
-uint32_t next_blob_mark(void)
-{
-	return mark++;
-}
-
-static struct repo_dir *repo_commit_root_dir(struct repo_commit *commit)
-{
-	return dir_pointer(commit->root_dir_offset);
-}
-
-static struct repo_dirent *repo_first_dirent(struct repo_dir *dir)
-{
-	return dent_first(&dir->entries);
-}
-
-static int repo_dirent_name_cmp(const void *a, const void *b)
-{
-	const struct repo_dirent *dent1 = a, *dent2 = b;
-	uint32_t a_offset = dent1->name_offset;
-	uint32_t b_offset = dent2->name_offset;
-	return (a_offset > b_offset) - (a_offset < b_offset);
-}
-
-static int repo_dirent_is_dir(struct repo_dirent *dent)
-{
-	return dent != NULL && dent->mode == REPO_MODE_DIR;
-}
-
-static struct repo_dir *repo_dir_from_dirent(struct repo_dirent *dent)
-{
-	if (!repo_dirent_is_dir(dent))
-		return NULL;
-	return dir_pointer(dent->content_offset);
-}
-
-static struct repo_dir *repo_clone_dir(struct repo_dir *orig_dir)
-{
-	uint32_t orig_o, new_o;
-	orig_o = dir_offset(orig_dir);
-	if (orig_o >= dir_pool.committed)
-		return orig_dir;
-	new_o = dir_alloc(1);
-	orig_dir = dir_pointer(orig_o);
-	*dir_pointer(new_o) = *orig_dir;
-	return dir_pointer(new_o);
-}
-
-static struct repo_dirent *repo_read_dirent(uint32_t revision, uint32_t *path)
+static struct strbuf dataref_buf = STRBUF_INIT;
+const char *repo_read_path(uint32_t *path)
 {
-	uint32_t name = 0;
-	struct repo_dirent *key = dent_pointer(dent_alloc(1));
-	struct repo_dir *dir = NULL;
-	struct repo_dirent *dent = NULL;
-	dir = repo_commit_root_dir(commit_pointer(revision));
-	while (~(name = *path++)) {
-		key->name_offset = name;
-		dent = dent_search(&dir->entries, key);
-		if (dent == NULL || !repo_dirent_is_dir(dent))
-			break;
-		dir = repo_dir_from_dirent(dent);
-	}
-	dent_free(1);
-	return dent;
-}
+	uint32_t unused;
 
-static void repo_write_dirent(uint32_t *path, uint32_t mode,
-			      uint32_t content_offset, uint32_t del)
-{
-	uint32_t name, revision, dir_o = ~0, parent_dir_o = ~0;
-	struct repo_dir *dir;
-	struct repo_dirent *key;
-	struct repo_dirent *dent = NULL;
-	revision = active_commit;
-	dir = repo_commit_root_dir(commit_pointer(revision));
-	dir = repo_clone_dir(dir);
-	commit_pointer(revision)->root_dir_offset = dir_offset(dir);
-	while (~(name = *path++)) {
-		parent_dir_o = dir_offset(dir);
-
-		key = dent_pointer(dent_alloc(1));
-		key->name_offset = name;
-
-		dent = dent_search(&dir->entries, key);
-		if (dent == NULL)
-			dent = key;
-		else
-			dent_free(1);
-
-		if (dent == key) {
-			dent->mode = REPO_MODE_DIR;
-			dent->content_offset = 0;
-			dent_insert(&dir->entries, dent);
-		}
-
-		if (dent_offset(dent) < dent_pool.committed) {
-			dir_o = repo_dirent_is_dir(dent) ?
-					dent->content_offset : ~0;
-			dent_remove(&dir->entries, dent);
-			dent = dent_pointer(dent_alloc(1));
-			dent->name_offset = name;
-			dent->mode = REPO_MODE_DIR;
-			dent->content_offset = dir_o;
-			dent_insert(&dir->entries, dent);
-		}
-
-		dir = repo_dir_from_dirent(dent);
-		dir = repo_clone_dir(dir);
-		dent->content_offset = dir_offset(dir);
-	}
-	if (dent == NULL)
-		return;
-	dent->mode = mode;
-	dent->content_offset = content_offset;
-	if (del && ~parent_dir_o)
-		dent_remove(&dir_pointer(parent_dir_o)->entries, dent);
-}
-
-uint32_t repo_read_path(uint32_t *path)
-{
-	uint32_t content_offset = 0;
-	struct repo_dirent *dent = repo_read_dirent(active_commit, path);
-	if (dent != NULL)
-		content_offset = dent->content_offset;
-	return content_offset;
+	strbuf_reset(&dataref_buf);
+	fast_export_ls(path, &unused, &dataref_buf);
+	return dataref_buf.buf;
 }
 
 uint32_t repo_read_mode(const uint32_t *path)
 {
-	struct repo_dirent *dent = repo_read_dirent(active_commit, path);
-	if (dent == NULL)
-		die("invalid dump: path to be modified is missing");
-	return dent->mode;
+	uint32_t result;
+	struct strbuf unused = STRBUF_INIT;
+
+	fast_export_ls(path, &result, &unused);
+	strbuf_release(&unused);
+	return result;
 }
 
 void repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst)
 {
-	uint32_t mode = 0, content_offset = 0;
-	struct repo_dirent *src_dent;
-	src_dent = repo_read_dirent(revision, src);
-	if (src_dent != NULL) {
-		mode = src_dent->mode;
-		content_offset = src_dent->content_offset;
-		repo_write_dirent(dst, mode, content_offset, 0);
-	}
-}
+	uint32_t mode;
+	struct strbuf data = STRBUF_INIT;
 
-void repo_add(uint32_t *path, uint32_t mode, uint32_t blob_mark)
-{
-	repo_write_dirent(path, mode, blob_mark, 0);
+	fast_export_ls_rev(revision, src, &mode, &data);
+	fast_export_modify(REPO_MAX_PATH_DEPTH, dst, mode, data.buf);
+	strbuf_release(&data);
 }
 
 void repo_delete(uint32_t *path)
 {
-	repo_write_dirent(path, 0, 0, 1);
-}
-
-static void repo_git_add_r(uint32_t depth, uint32_t *path, struct repo_dir *dir);
-
-static void repo_git_add(uint32_t depth, uint32_t *path, struct repo_dirent *dent)
-{
-	if (repo_dirent_is_dir(dent))
-		repo_git_add_r(depth, path, repo_dir_from_dirent(dent));
-	else
-		fast_export_modify(depth, path,
-				   dent->mode, dent->content_offset);
-}
-
-static void repo_git_add_r(uint32_t depth, uint32_t *path, struct repo_dir *dir)
-{
-	struct repo_dirent *de = repo_first_dirent(dir);
-	while (de) {
-		path[depth] = de->name_offset;
-		repo_git_add(depth + 1, path, de);
-		de = dent_next(&dir->entries, de);
-	}
-}
-
-static void repo_diff_r(uint32_t depth, uint32_t *path, struct repo_dir *dir1,
-			struct repo_dir *dir2)
-{
-	struct repo_dirent *de1, *de2;
-	de1 = repo_first_dirent(dir1);
-	de2 = repo_first_dirent(dir2);
-
-	while (de1 && de2) {
-		if (de1->name_offset < de2->name_offset) {
-			path[depth] = de1->name_offset;
-			fast_export_delete(depth + 1, path);
-			de1 = dent_next(&dir1->entries, de1);
-			continue;
-		}
-		if (de1->name_offset > de2->name_offset) {
-			path[depth] = de2->name_offset;
-			repo_git_add(depth + 1, path, de2);
-			de2 = dent_next(&dir2->entries, de2);
-			continue;
-		}
-		path[depth] = de1->name_offset;
-
-		if (de1->mode == de2->mode &&
-		    de1->content_offset == de2->content_offset) {
-			; /* No change. */
-		} else if (repo_dirent_is_dir(de1) && repo_dirent_is_dir(de2)) {
-			repo_diff_r(depth + 1, path,
-				    repo_dir_from_dirent(de1),
-				    repo_dir_from_dirent(de2));
-		} else if (!repo_dirent_is_dir(de1) && !repo_dirent_is_dir(de2)) {
-			repo_git_add(depth + 1, path, de2);
-		} else {
-			fast_export_delete(depth + 1, path);
-			repo_git_add(depth + 1, path, de2);
-		}
-		de1 = dent_next(&dir1->entries, de1);
-		de2 = dent_next(&dir2->entries, de2);
-	}
-	while (de1) {
-		path[depth] = de1->name_offset;
-		fast_export_delete(depth + 1, path);
-		de1 = dent_next(&dir1->entries, de1);
-	}
-	while (de2) {
-		path[depth] = de2->name_offset;
-		repo_git_add(depth + 1, path, de2);
-		de2 = dent_next(&dir2->entries, de2);
-	}
-}
-
-static uint32_t path_stack[REPO_MAX_PATH_DEPTH];
-
-void repo_diff(uint32_t r1, uint32_t r2)
-{
-	repo_diff_r(0,
-		    path_stack,
-		    repo_commit_root_dir(commit_pointer(r1)),
-		    repo_commit_root_dir(commit_pointer(r2)));
-}
-
-void repo_commit(uint32_t revision, uint32_t author, char *log, uint32_t uuid,
-		 uint32_t url, unsigned long timestamp)
-{
-	fast_export_commit(revision, author, log, uuid, url, timestamp);
-	dent_commit();
-	dir_commit();
-	active_commit = commit_alloc(1);
-	commit_pointer(active_commit)->root_dir_offset =
-		commit_pointer(active_commit - 1)->root_dir_offset;
-}
-
-static void mark_init(void)
-{
-	uint32_t i;
-	mark = 1024 * 1024 * 1024;
-	for (i = 0; i < dent_pool.size; i++)
-		if (!repo_dirent_is_dir(dent_pointer(i)) &&
-		    dent_pointer(i)->content_offset > mark)
-			mark = dent_pointer(i)->content_offset;
-	mark++;
+	fast_export_delete(REPO_MAX_PATH_DEPTH, path);
 }
 
 void repo_init(void)
 {
-	mark_init();
-	if (commit_pool.size == 0) {
-		/* Create empty tree for commit 0. */
-		commit_alloc(1);
-		commit_pointer(0)->root_dir_offset = dir_alloc(1);
-		dir_pointer(0)->entries.trp_root = ~0;
-		dir_commit();
-	}
-	/* Preallocate next commit, ready for changes. */
-	active_commit = commit_alloc(1);
-	commit_pointer(active_commit)->root_dir_offset =
-		commit_pointer(active_commit - 1)->root_dir_offset;
 }
 
 void repo_reset(void)
 {
-	pool_reset();
-	commit_reset();
-	dir_reset();
-	dent_reset();
 }
diff --git a/vcs-svn/repo_tree.h b/vcs-svn/repo_tree.h
index 0499a19..559f99f 100644
--- a/vcs-svn/repo_tree.h
+++ b/vcs-svn/repo_tree.h
@@ -14,7 +14,7 @@
 uint32_t next_blob_mark(void);
 void repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst);
 void repo_add(uint32_t *path, uint32_t mode, uint32_t blob_mark);
-uint32_t repo_read_path(uint32_t *path);
+const char *repo_read_path(uint32_t *path);
 uint32_t repo_read_mode(const uint32_t *path);
 void repo_delete(uint32_t *path);
 void repo_commit(uint32_t revision, uint32_t author, char *log, uint32_t uuid,
diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
index 31c6056..68a8435 100644
--- a/vcs-svn/svndump.c
+++ b/vcs-svn/svndump.c
@@ -20,9 +20,11 @@
 #define NODEACT_CHANGE 1
 #define NODEACT_UNKNOWN 0
 
-#define DUMP_CTX 0
-#define REV_CTX  1
-#define NODE_CTX 2
+/* States: */
+#define DUMP_CTX 0	/* dump metadata */
+#define REV_CTX  1	/* revision metadata */
+#define NODE_CTX 2	/* node metadata */
+#define INTERNODE_CTX 3	/* between nodes */
 
 #define LENGTH_UNKNOWN (~0)
 #define DATE_RFC2822_LEN 31
@@ -198,13 +200,12 @@ static void read_props(void)
 
 static void handle_node(void)
 {
-	uint32_t mark = 0, old_mode, old_mark;
+	uint32_t old_mode;
+	const char *old_data;
 	const uint32_t type = node_ctx.type;
 	const int have_props = node_ctx.propLength != LENGTH_UNKNOWN;
 	const int have_text = node_ctx.textLength != LENGTH_UNKNOWN;
 
-	if (have_text)
-		mark = next_blob_mark();
 	if (node_ctx.action == NODEACT_DELETE) {
 		if (have_text || have_props || node_ctx.srcRev)
 			die("invalid dump: deletion node has "
@@ -224,18 +225,15 @@ static void handle_node(void)
 		die("invalid dump: directories cannot have text attached");
 
 	/*
-	 * Find old content (old_mark) and decide on the new content (mark)
-	 * and mode (node_ctx.type).
+	 * Find old content (old_data) and decide on the new mode.
 	 */
 	if (node_ctx.action == NODEACT_CHANGE && !~*node_ctx.dst) {
 		if (type != REPO_MODE_DIR)
 			die("invalid dump: root of tree is not a regular file");
-		old_mark = 0;
+		old_data = NULL;
 	} else if (node_ctx.action == NODEACT_CHANGE) {
 		uint32_t mode;
-		old_mark = repo_read_path(node_ctx.dst);
-		if (!have_text)
-			mark = old_mark;
+		old_data = repo_read_path(node_ctx.dst);
 		mode = repo_read_mode(node_ctx.dst);
 		if (mode == REPO_MODE_DIR && type != REPO_MODE_DIR)
 			die("invalid dump: cannot modify a directory into a file");
@@ -243,9 +241,12 @@ static void handle_node(void)
 			die("invalid dump: cannot modify a file into a directory");
 		node_ctx.type = mode;
 	} else if (node_ctx.action == NODEACT_ADD) {
-		if (!have_text && type != REPO_MODE_DIR)
+		if (type == REPO_MODE_DIR)
+			old_data = NULL;
+		else if (have_text)
+			old_data = EMPTY_BLOB_SHA1_HEX;
+		else
 			die("invalid dump: adds node without text");
-		old_mark = 0;
 	} else {
 		die("invalid dump: Node-path block lacks Node-action");
 	}
@@ -264,28 +265,35 @@ static void handle_node(void)
 	/*
 	 * Save the result.
 	 */
-	repo_add(node_ctx.dst, node_ctx.type, mark);
-	if (!have_text)
+	if (type == REPO_MODE_DIR)	/* directories are not tracked. */
 		return;
+	if (!have_text) {
+		fast_export_modify(REPO_MAX_PATH_DEPTH, node_ctx.dst, node_ctx.type,
+					old_data);
+		return;
+	}
 	if (!node_ctx.text_delta) {
-		fast_export_blob(node_ctx.type, mark, node_ctx.textLength, &input);
+		fast_export_modify(REPO_MAX_PATH_DEPTH, node_ctx.dst, node_ctx.type,
+					"inline");
+		fast_export_data(node_ctx.type, node_ctx.textLength, &input);
 		return;
 	}
-	if (node_ctx.srcRev) {
-		fast_export_blob_delta_rev(node_ctx.type, mark, old_mode,
-					node_ctx.srcRev, node_ctx.src,
+	fast_export_delta(node_ctx.dst, node_ctx.type, old_mode, old_data,
 					node_ctx.textLength, &input);
+}
+
+static void begin_revision(void)
+{
+	if (!rev_ctx.revision)	/* revision 0 gets no git commit. */
 		return;
-	}
-	fast_export_blob_delta(node_ctx.type, mark, old_mode, old_mark,
-				node_ctx.textLength, &input);
+	fast_export_begin_commit(rev_ctx.revision, rev_ctx.author, rev_ctx.log,
+		dump_ctx.uuid, dump_ctx.url, rev_ctx.timestamp);
 }
 
-static void handle_revision(void)
+static void end_revision(void)
 {
 	if (rev_ctx.revision)
-		repo_commit(rev_ctx.revision, rev_ctx.author, rev_ctx.log,
-			dump_ctx.uuid, dump_ctx.url, rev_ctx.timestamp);
+		fast_export_end_commit(rev_ctx.revision);
 }
 
 void svndump_read(const char *url)
@@ -315,13 +323,17 @@ void svndump_read(const char *url)
 		} else if (key == keys.revision_number) {
 			if (active_ctx == NODE_CTX)
 				handle_node();
+			if (active_ctx == REV_CTX)
+				begin_revision();
 			if (active_ctx != DUMP_CTX)
-				handle_revision();
+				end_revision();
 			active_ctx = REV_CTX;
 			reset_rev_ctx(atoi(val));
 		} else if (key == keys.node_path) {
 			if (active_ctx == NODE_CTX)
 				handle_node();
+			if (active_ctx == REV_CTX)
+				begin_revision();
 			active_ctx = NODE_CTX;
 			reset_node_ctx(val);
 		} else if (key == keys.node_kind) {
@@ -363,7 +375,7 @@ void svndump_read(const char *url)
 				read_props();
 			} else if (active_ctx == NODE_CTX) {
 				handle_node();
-				active_ctx = REV_CTX;
+				active_ctx = INTERNODE_CTX;
 			} else {
 				fprintf(stderr, "Unexpected content length header: %"PRIu32"\n", len);
 				buffer_skip_bytes(&input, len);
@@ -372,8 +384,10 @@ void svndump_read(const char *url)
 	}
 	if (active_ctx == NODE_CTX)
 		handle_node();
+	if (active_ctx == REV_CTX)
+		begin_revision();
 	if (active_ctx != DUMP_CTX)
-		handle_revision();
+		end_revision();
 }
 
 int svndump_init(const char *filename)
@@ -385,6 +399,7 @@ int svndump_init(const char *filename)
 	reset_rev_ctx(0);
 	reset_node_ctx(NULL);
 	init_keys();
+	fast_export_empty_blob();
 	return 0;
 }
 
-- 
1.7.2.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [RFC/PATCH] fast-import: treat filemodify with empty tree as delete
       [not found]   ` <20101211184654.GA17464@burratino>
@ 2010-12-11 22:47     ` Jonathan Nieder
  2010-12-11 23:00     ` [PATCH db/vcs-svn-incremental] vcs-svn: avoid git-isms in fast-import stream Jonathan Nieder
  1 sibling, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2010-12-11 22:47 UTC (permalink / raw)
  To: David Michael Barr
  Cc: git, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Jonathan Nieder wrote:

> Maybe fast-import should make the check itself, to avoid
> writing trees with empty directories that would be hard to
> re-create with "git write-tree".

Like this, maybe?

-- 8< --
Subject: fast-import: treat filemodify with empty tree as delete

Traditionally, git trees do not contain entries for empty
subdirectories.  Generally speaking, subtrees are not created or
destroyed explicitly; instead, they automatically appear when needed
to hold regular files, symlinks, and submodules.

v1.7.3-rc0~75^2 (Teach fast-import to import subtrees named by tree
id, 2010-06-30) changed that, by allowing an empty subtree to be
included in a fast-import stream explicitly:

	M 040000 4b825dc642cb6eb9a060e54bf8d69288fbee4904 subdir

That was unintentional.  Better and more closely analogous to "git
read-tree --prefix" to treat such an input line as a request to delete
("to empty") subdir.

Noticed-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Tests use the "ls" command from vcs-svn-pu.  The actual change
would apply cleanly to master or maint, though.

 fast-import.c          |   10 ++++
 t/t9300-fast-import.sh |  107 ++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 109 insertions(+), 8 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index e62f34d..c774893 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2196,6 +2196,16 @@ static void file_change_m(struct branch *b)
 		p = uq.buf;
 	}
 
+	/*
+	 * Git does not track empty, non-toplevel directories.
+	 */
+	if (S_ISDIR(mode) &&
+	    !memcmp(sha1, (unsigned char *) EMPTY_TREE_SHA1_BIN, 20) &&
+	    *p) {
+		tree_content_remove(&b->branch_tree, p, NULL);
+		return;
+	}
+
 	if (S_ISGITLINK(mode)) {
 		if (inline_data)
 			die("Git links cannot be specified 'inline': %s",
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index b0e3bda..c17f704 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -25,6 +25,14 @@ echo "$@"'
 
 >empty
 
+test_expect_success 'setup: have pipes?' '
+	rm -f frob &&
+	if mkfifo frob
+	then
+		test_set_prereq PIPE
+	fi
+'
+
 ###
 ### series A
 ###
@@ -881,6 +889,97 @@ test_expect_success \
 	 git diff-tree -C --find-copies-harder -r N4^ N4 >actual &&
 	 compare_diff_raw expect actual'
 
+test_expect_success PIPE 'N: read and copy directory' '
+	cat >expect <<-\EOF
+	:100755 100755 f1fb5da718392694d0076d677d6d0e364c79b0bc f1fb5da718392694d0076d677d6d0e364c79b0bc C100	file2/newf	file3/newf
+	:100644 100644 7123f7f44e39be127c5eb701e5968176ee9d78b1 7123f7f44e39be127c5eb701e5968176ee9d78b1 C100	file2/oldf	file3/oldf
+	EOF
+	git update-ref -d refs/heads/N4 &&
+	rm -f backflow &&
+	mkfifo backflow &&
+	(
+		exec <backflow &&
+		cat <<-EOF &&
+		commit refs/heads/N4
+		committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+		data <<COMMIT
+		copy by tree hash, part 2
+		COMMIT
+
+		from refs/heads/branch^0
+		ls "file2"
+		EOF
+		read mode type tree filename &&
+		echo "M 040000 $tree file3"
+	) |
+	git fast-import --cat-blob-fd=3 3>backflow &&
+	git diff-tree -C --find-copies-harder -r N4^ N4 >actual &&
+	compare_diff_raw expect actual
+'
+
+test_expect_success PIPE 'N: read and copy "empty" directory' '
+	cat <<-\EOF >expect &&
+	OBJNAME
+	:000000 100644 OBJNAME OBJNAME A	greeting
+	OBJNAME
+	:100644 000000 OBJNAME OBJNAME D	unrelated
+	OBJNAME
+	:000000 100644 OBJNAME OBJNAME A	unrelated
+	EOF
+	git update-ref -d refs/heads/copy-empty &&
+	rm -f backflow &&
+	mkfifo backflow &&
+	(
+		exec <backflow &&
+		cat <<-EOF &&
+		commit refs/heads/copy-empty
+		committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+		data <<COMMIT
+		copy "empty" (missing) directory
+		COMMIT
+
+		M 100644 inline src/greeting
+		data <<BLOB
+		hello
+		BLOB
+		C src/greeting dst1/non-greeting
+		C src/greeting unrelated
+		# leave behind "empty" src directory
+		D src/greeting
+		ls "src"
+		EOF
+		read mode type tree filename &&
+		sed -e "s/X\$//" <<-EOF
+		M $mode $tree dst1
+		M $mode $tree dst2
+
+		commit refs/heads/copy-empty
+		committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+		data <<COMMIT
+		copy empty directory to root
+		COMMIT
+
+		M $mode $tree X
+
+		commit refs/heads/copy-empty
+		committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+		data <<COMMIT
+		add another file
+		COMMIT
+
+		M 100644 inline greeting
+		data <<BLOB
+		hello
+		BLOB
+		EOF
+	) |
+	git fast-import --cat-blob-fd=3 3>backflow &&
+	git rev-list copy-empty |
+	git diff-tree -r --root --stdin |
+	sed "s/$_x40/OBJNAME/g" >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success \
 	'N: copy root directory by tree hash' \
 	'cat >expect <<-\EOF &&
@@ -1773,14 +1872,6 @@ test_expect_success 'R: print two blobs to stdout' '
 	test_cmp expect actual
 '
 
-test_expect_success 'setup: have pipes?' '
-	rm -f frob &&
-	if mkfifo frob
-	then
-		test_set_prereq PIPE
-	fi
-'
-
 test_expect_success PIPE 'R: copy using cat-file' '
 	expect_id=$(git hash-object big) &&
 	expect_len=$(wc -c <big) &&
-- 
1.7.2.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH db/vcs-svn-incremental] vcs-svn: avoid git-isms in fast-import stream
       [not found]   ` <20101211184654.GA17464@burratino>
  2010-12-11 22:47     ` [RFC/PATCH] fast-import: treat filemodify with empty tree as delete Jonathan Nieder
@ 2010-12-11 23:00     ` Jonathan Nieder
  1 sibling, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2010-12-11 23:00 UTC (permalink / raw)
  To: David Michael Barr
  Cc: git, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Jonathan Nieder wrote:

> I am not totally happy with the result, since it adds another
> git-specific detail to svn-fe (the first was hardcoding the
> empty blob sha1 instead of using
> 
> 	blob
> 	mark :0
> 	data 0
> 
> ).

Maybe this would help?

-- 8< --
Subject: vcs-svn: avoid git-isms in fast-import stream

Current svn-fe is not likely to work without change with other
fast-import backends, but don't let that stop us from trying:

 - instead of suppressing copies of empty trees, let the backend
   decide what to do with them;

 - use a mark instead of hard-coding git's name for the empty blob.

However, we do not include commands in the stream for new empty
directories, since no syntax is documented for that yet.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/fast_export.c |    8 +-------
 vcs-svn/svndump.c     |    2 +-
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index 75d674e..85166a6 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -40,12 +40,6 @@ void fast_export_delete(uint32_t depth, const uint32_t *path)
 void fast_export_modify(uint32_t depth, const uint32_t *path, uint32_t mode,
 			const char *dataref)
 {
-	/* Git does not track empty directories. */
-	if (S_ISDIR(mode) && !strcmp(dataref, EMPTY_TREE_SHA1_HEX)) {
-		fast_export_delete(depth, path);
-		return;
-	}
-
 	/* Mode must be 100644, 100755, 120000, or 160000. */
 	printf("M %06"PRIo32" %s ", mode, dataref);
 	pool_print_seq(depth, path, '/', stdout);
@@ -255,7 +249,7 @@ void fast_export_data(uint32_t mode, uint32_t len, struct line_buffer *input)
 
 void fast_export_empty_blob(void)
 {
-	printf("blob\ndata 0\n\n");
+	printf("blob\nmark :0\ndata 0\n\n");
 }
 
 void fast_export_delta(const uint32_t *path, uint32_t mode, uint32_t old_mode,
diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
index 68a8435..e28e762 100644
--- a/vcs-svn/svndump.c
+++ b/vcs-svn/svndump.c
@@ -244,7 +244,7 @@ static void handle_node(void)
 		if (type == REPO_MODE_DIR)
 			old_data = NULL;
 		else if (have_text)
-			old_data = EMPTY_BLOB_SHA1_HEX;
+			old_data = ":0";
 		else
 			die("invalid dump: adds node without text");
 	} else {
-- 
1.7.2.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 12/10] vcs-svn: quote paths correctly for ls command
  2010-12-10 10:20 [RFC/PATCH 00/10] vcs-svn: prepare for (implement?) incremental import Jonathan Nieder
                   ` (10 preceding siblings ...)
       [not found] ` <C59168D0-B409-4A83-B96C-8CCD42D0B62F@cordelta.com>
@ 2010-12-11 23:04 ` David Michael Barr
  2010-12-11 23:11   ` [PATCH db/vcs-svn-incremental] vcs-svn: quote all paths passed to fast-import Jonathan Nieder
  2010-12-12  9:32 ` [PATCH 13/10] vcs-svn: use mark from previous import for parent commit David Michael Barr
  2011-03-06 22:54 ` [PATCH v2 00/12] vcs-svn: incremental import Jonathan Nieder
  13 siblings, 1 reply; 37+ messages in thread
From: David Michael Barr @ 2010-12-11 23:04 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

From: David Barr <david.barr@cordelta.com>
Date: Sun, 12 Dec 2010 03:59:31 +1100
Subject: [PATCH] vcs-svn: quote paths correctly for ls command

This bug was found while importing rev 601865 of ASF.

Signed-off-by: David Barr <david.barr@cordelta.com>
---
 vcs-svn/fast_export.c |    4 ++--
 vcs-svn/string_pool.c |   11 +++++++++++
 vcs-svn/string_pool.h |    1 +
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index c798f6d..d2397d8 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -100,7 +100,7 @@ static void ls_from_rev(uint32_t rev, const uint32_t *path)
 {
 	/* ls :5 "path/to/old/file" */
 	printf("ls :%"PRIu32" \"", rev);
-	pool_print_seq(REPO_MAX_PATH_DEPTH, path, '/', stdout);
+	pool_print_seq_q(REPO_MAX_PATH_DEPTH, path, '/', stdout);
 	printf("\"\n");
 	fflush(stdout);
 }
@@ -149,7 +149,7 @@ void fast_export_ls(const uint32_t *path,
 				uint32_t *mode, struct strbuf *dataref)
 {
 	printf("ls \"");
-	pool_print_seq(REPO_MAX_PATH_DEPTH, path, '/', stdout);
+	pool_print_seq_q(REPO_MAX_PATH_DEPTH, path, '/', stdout);
 	printf("\"\n");
 	fflush(stdout);
 	parse_ls_response(get_response_line(), mode, dataref);
diff --git a/vcs-svn/string_pool.c b/vcs-svn/string_pool.c
index c08abac..a03f5a4 100644
--- a/vcs-svn/string_pool.c
+++ b/vcs-svn/string_pool.c
@@ -4,6 +4,8 @@
  */
 
 #include "git-compat-util.h"
+#include "strbuf.h"
+#include "quote.h"
 #include "trp.h"
 #include "obj_pool.h"
 #include "string_pool.h"
@@ -75,6 +77,15 @@ void pool_print_seq(uint32_t len, const uint32_t *seq, char delim, FILE *stream)
 	}
 }
 
+void pool_print_seq_q(uint32_t len, const uint32_t *seq, char delim, FILE *stream)
+{
+	uint32_t i;
+	for (i = 0; i < len && ~seq[i]; i++) {
+		quote_c_style(pool_fetch(seq[i]), NULL, stream, 1);
+		if (i < len - 1 && ~seq[i + 1])
+			fputc(delim, stream);
+	}
+}
 uint32_t pool_tok_seq(uint32_t sz, uint32_t *seq, const char *delim, char *str)
 {
 	char *context = NULL;
diff --git a/vcs-svn/string_pool.h b/vcs-svn/string_pool.h
index 3720cf8..96e501d 100644
--- a/vcs-svn/string_pool.h
+++ b/vcs-svn/string_pool.h
@@ -5,6 +5,7 @@ uint32_t pool_intern(const char *key);
 const char *pool_fetch(uint32_t entry);
 uint32_t pool_tok_r(char *str, const char *delim, char **saveptr);
 void pool_print_seq(uint32_t len, const uint32_t *seq, char delim, FILE *stream);
+void pool_print_seq_q(uint32_t len, const uint32_t *seq, char delim, FILE *stream);
 uint32_t pool_tok_seq(uint32_t sz, uint32_t *seq, const char *delim, char *str);
 void pool_reset(void);
 
-- 
1.7.3.2.846.gf4b062

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH db/vcs-svn-incremental] vcs-svn: quote all paths passed to fast-import
  2010-12-11 23:04 ` [PATCH 12/10] vcs-svn: quote paths correctly for ls command David Michael Barr
@ 2010-12-11 23:11   ` Jonathan Nieder
  0 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2010-12-11 23:11 UTC (permalink / raw)
  To: David Michael Barr
  Cc: git, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Filenames with linefeeds or double quotes need to be quoted if
fast-import is not to misinterpret them.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/fast_export.c |   13 ++++++-------
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index 7856ff2..2d2a6b2 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -31,19 +31,18 @@ static int init_postimage(void)
 
 void fast_export_delete(uint32_t depth, const uint32_t *path)
 {
-	putchar('D');
-	putchar(' ');
-	pool_print_seq(depth, path, '/', stdout);
-	putchar('\n');
+	printf("D \"");
+	pool_print_seq_q(depth, path, '/', stdout);
+	printf("\"\n");
 }
 
 void fast_export_modify(uint32_t depth, const uint32_t *path, uint32_t mode,
 			const char *dataref)
 {
 	/* Mode must be 100644, 100755, 120000, or 160000. */
-	printf("M %06"PRIo32" %s ", mode, dataref);
-	pool_print_seq(depth, path, '/', stdout);
-	putchar('\n');
+	printf("M %06"PRIo32" %s \"", mode, dataref);
+	pool_print_seq_q(depth, path, '/', stdout);
+	printf("\"\n");
 }
 
 static char gitsvnline[MAX_GITSVN_LINE_LEN];
-- 
1.7.2.4

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 13/10] vcs-svn: use mark from previous import for parent commit
  2010-12-10 10:20 [RFC/PATCH 00/10] vcs-svn: prepare for (implement?) incremental import Jonathan Nieder
                   ` (11 preceding siblings ...)
  2010-12-11 23:04 ` [PATCH 12/10] vcs-svn: quote paths correctly for ls command David Michael Barr
@ 2010-12-12  9:32 ` David Michael Barr
  2010-12-12 17:06   ` Jonathan Nieder
  2011-03-06 22:54 ` [PATCH v2 00/12] vcs-svn: incremental import Jonathan Nieder
  13 siblings, 1 reply; 37+ messages in thread
From: David Michael Barr @ 2010-12-12  9:32 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

From: David Barr <david.barr@cordelta.com>
Date: Sun, 12 Dec 2010 13:41:38 +1100
Subject: [PATCH] vcs-svn: use mark from previous import for parent commit

Signed-off-by: David Barr <david.barr@cordelta.com>
---
 vcs-svn/fast_export.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index d2397d8..6abd108 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -77,7 +77,7 @@ void fast_export_begin_commit(uint32_t revision, uint32_t author, char *log,
 		   log, gitsvnline);
 	if (!first_commit_done) {
 		if (revision > 1)
-			printf("from refs/heads/master^0\n");
+			printf("from :%"PRIu32"\n", revision - 1);
 		first_commit_done = 1;
 	}
 }
-- 
1.7.3.2.846.gf4b062

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 13/10] vcs-svn: use mark from previous import for parent commit
  2010-12-12  9:32 ` [PATCH 13/10] vcs-svn: use mark from previous import for parent commit David Michael Barr
@ 2010-12-12 17:06   ` Jonathan Nieder
  0 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2010-12-12 17:06 UTC (permalink / raw)
  To: David Michael Barr
  Cc: git, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

David Michael Barr wrote:

> Subject: [PATCH] vcs-svn: use mark from previous import for parent commit
[...]
> +++ b/vcs-svn/fast_export.c
> @@ -77,7 +77,7 @@ void fast_export_begin_commit(uint32_t revision, uint32_t
>                    log, gitsvnline);
>         if (!first_commit_done) {
>                 if (revision > 1)
> -                       printf("from refs/heads/master^0\n");
> +                       printf("from :%"PRIu32"\n", revision - 1);

This deals more sanely with attempts to continue an import starting at
the wrong revision.

Example: if I try

	import () {
		rm -f backflow
		mkfifo backflow

		svn-fe 3<backflow |
		git fast-import --cat-blob-fd=3 \
			--relative-marks \
			${1+--import-marks=svnrevs} \
			--export-marks=svnrevs \
			3>backflow
	}

	svnrdump -r0:100 $url | import
	svnrdump -r100:200 $url | import continue

then svn-fe should correctly re-import r100 the second time, instead
of trying to apply the same deltas twice.  If I try

	svnrdump -r0:100 $url | import
	svnrdump -r102:200 $url | import continue

then the second command should error out.

Thanks, queued.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/10] vcs-svn: save marks for imported commits
  2010-12-10 10:22 ` [PATCH 02/10] vcs-svn: save marks for imported commits Jonathan Nieder
@ 2011-03-06 11:15   ` Jonathan Nieder
  0 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2011-03-06 11:15 UTC (permalink / raw)
  To: git; +Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier

(pruned cc because reviving an old thread)
Hi,

Jonathan Nieder wrote:

> [Subject: vcs-svn: save marks for imported commits]
>
> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

That's a lousy commit message.  A version committed later (that
eventually found its way to David's repository) explains:

	This way, a person can use

		svnadmin dump $path |
		svn-fe |
		git fast-import --relative-marks --export-marks=svn-revs

	to get a list of what commit corresponds to each svn revision (plus
	some irrelevant blob names) in .git/info/fast-import/svn-revs.

In other words, this is the first half of a two-way mapping between
svn and git commit names.

It should be possible to build a mapping the other way like so:

	awk '
	BEGIN {
		print "commit refs/notes/svn-id";
		printf "committer ";
		system("git var GIT_COMMITTER_IDENT");
		print "data <<EOT";
		print "Automatically generated commits-to-revs mapping.";
		print "EOT";
	}
	{
		num = 0 + substr($1, 2);
		commitname = $2;
		if (num < 1024 * 1024 * 1024) {
			print "N inline " commitname
			print "data <<EOT";
			print "r" num;
			print "EOT";
		}
	}
	' .git/info/fast-import/svn-revs |
	git fast-import

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2] vcs-svn: prepare to eliminate repo_tree structure
  2010-12-10 10:28 ` [PATCH 08/10] vcs-svn: prepare to eliminate repo_tree structure Jonathan Nieder
@ 2011-03-06 12:52   ` Jonathan Nieder
  2011-03-06 20:41     ` David Barr
  0 siblings, 1 reply; 37+ messages in thread
From: Jonathan Nieder @ 2011-03-06 12:52 UTC (permalink / raw)
  To: git; +Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier

Date: Fri, 10 Dec 2010 04:28:06 -0600

Currently svn-fe processes each commit in two stages: first decide on
the correct content for all paths and export the relevant blobs, then
export a commit with the result.

But we can keep less state and simplify svn-fe a great deal by doing
exporting the commit in one stage: use 'inline' blobs for each path
and remember nothing.  This way, the repo_tree structure could be
eliminated, and we would get support for incremental imports 'for
free'.

Reorganize handle_node() along these lines.  This is just a code
cleanup; the functional change to repo_tree will come later.

Backported by David Barr to apply without text delta support.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Hi,

Jonathan Nieder wrote:

[...]
> @@ -228,7 +234,9 @@ static void handle_node(void)
>  	} else if (node_ctx.action == NODEACT_CHANGE) {
>  		uint32_t mode;
>  		old_mark = repo_read_path(node_ctx.dst);
> -		mode = repo_modify_path(node_ctx.dst, 0, mark);
> +		if (!have_text)
> +			mark = old_mark;
> +		mode = repo_modify_path(node_ctx.dst, 0, 0);
[...]

Backported so we can merge it a little sooner (thanks!).  It's very
similar to the original patch except there is no old_mark --- the
previous text at a path is useless until we learn to patch it (which
is a topic for another day).

David, may I have your sign-off on this series (the
vcs-svn-incremental branch)?

 vcs-svn/svndump.c |   33 +++++++++++++++++++++++----------
 1 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
index ee7c0bb..f07376f 100644
--- a/vcs-svn/svndump.c
+++ b/vcs-svn/svndump.c
@@ -201,13 +201,14 @@ static void handle_node(void)
 	uint32_t mark = 0;
 	const uint32_t type = node_ctx.type;
 	const int have_props = node_ctx.propLength != LENGTH_UNKNOWN;
+	const int have_text = node_ctx.textLength != LENGTH_UNKNOWN;
 
 	if (node_ctx.text_delta)
 		die("text deltas not supported");
-	if (node_ctx.textLength != LENGTH_UNKNOWN)
+	if (have_text)
 		mark = next_blob_mark();
 	if (node_ctx.action == NODEACT_DELETE) {
-		if (mark || have_props || node_ctx.srcRev)
+		if (have_text || have_props || node_ctx.srcRev)
 			die("invalid dump: deletion node has "
 				"copyfrom info, text, or properties");
 		return repo_delete(node_ctx.dst);
@@ -221,13 +222,20 @@ static void handle_node(void)
 		if (node_ctx.action == NODEACT_ADD)
 			node_ctx.action = NODEACT_CHANGE;
 	}
-	if (mark && type == REPO_MODE_DIR)
+	if (have_text && type == REPO_MODE_DIR)
 		die("invalid dump: directories cannot have text attached");
+
+	/*
+	 * Decide on the new content (mark) and mode (node_ctx.type).
+	 */
 	if (node_ctx.action == NODEACT_CHANGE && !~*node_ctx.dst) {
 		if (type != REPO_MODE_DIR)
 			die("invalid dump: root of tree is not a regular file");
 	} else if (node_ctx.action == NODEACT_CHANGE) {
-		uint32_t mode = repo_modify_path(node_ctx.dst, 0, mark);
+		uint32_t mode;
+		if (!have_text)
+			mark = repo_read_path(node_ctx.dst);
+		mode = repo_modify_path(node_ctx.dst, 0, 0);
 		if (!mode)
 			die("invalid dump: path to be modified is missing");
 		if (mode == REPO_MODE_DIR && type != REPO_MODE_DIR)
@@ -236,22 +244,27 @@ static void handle_node(void)
 			die("invalid dump: cannot modify a file into a directory");
 		node_ctx.type = mode;
 	} else if (node_ctx.action == NODEACT_ADD) {
-		if (!mark && type != REPO_MODE_DIR)
+		if (!have_text && type != REPO_MODE_DIR)
 			die("invalid dump: adds node without text");
-		repo_add(node_ctx.dst, type, mark);
 	} else {
 		die("invalid dump: Node-path block lacks Node-action");
 	}
+
+	/*
+	 * Adjust mode to reflect properties.
+	 */
 	if (have_props) {
-		const uint32_t old_mode = node_ctx.type;
 		if (!node_ctx.prop_delta)
 			node_ctx.type = type;
 		if (node_ctx.propLength)
 			read_props();
-		if (node_ctx.type != old_mode)
-			repo_modify_path(node_ctx.dst, node_ctx.type, mark);
 	}
-	if (mark)
+
+	/*
+	 * Save the result.
+	 */
+	repo_add(node_ctx.dst, node_ctx.type, mark);
+	if (have_text)
 		fast_export_blob(node_ctx.type, mark,
 				 node_ctx.textLength, &input);
 }
-- 
1.7.4.1.91.g15e19.dirty

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2] vcs-svn: prepare to eliminate repo_tree structure
  2011-03-06 12:52   ` [PATCH v2] " Jonathan Nieder
@ 2011-03-06 20:41     ` David Barr
  0 siblings, 0 replies; 37+ messages in thread
From: David Barr @ 2011-03-06 20:41 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Ramkumar Ramachandra, Sverre Rabbelier

Hi,

> David, may I have your sign-off on this series (the
> vcs-svn-incremental branch)?

Absolutely, I've updated my branch with the correct sign-offs.

Please pull

 git://github.com/barrbrain/git.git vcs-svn-incremental

for the complete series of back-ported patches to support incremental
imports for vcs-svn.

--
David Barr.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 00/12] vcs-svn: incremental import
  2010-12-10 10:20 [RFC/PATCH 00/10] vcs-svn: prepare for (implement?) incremental import Jonathan Nieder
                   ` (12 preceding siblings ...)
  2010-12-12  9:32 ` [PATCH 13/10] vcs-svn: use mark from previous import for parent commit David Michael Barr
@ 2011-03-06 22:54 ` Jonathan Nieder
  2011-03-06 23:03   ` [PATCH 01/12] vcs-svn: use higher mark numbers for blobs Jonathan Nieder
                     ` (12 more replies)
  13 siblings, 13 replies; 37+ messages in thread
From: Jonathan Nieder @ 2011-03-06 22:54 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Hi again,

Jonathan Nieder wrote:

> Using David's "ls" command we can eliminate the in-memory repo_tree
> and rely on the target repository for information about old revs.

Here's a reroll.  Aside from the aspects already mentioned (which
avoid a dependency on the mostly orthogonal topic of support for text
deltas), the original patch #10 has been split into smaller, more
easily digestible pieces.

Most of the credit for this incarnation of the series belongs to
David, who heroically streamlined it and untangled it from other
topics.

Patch 1 changes the mark numbers for blobs to be ridiculously high,
to make room for memorable commit marks (:1 for r1, :2 for r2, etc).
Patch 2 brings those commit marks into existence, as mentioned before.

Patches 3-5 simplify the repo-tree API somewhat.  They are somewhat
minimal; patches on top of this series offering further simplification
would be very welcome.

Patch 6 is a bit sneaky.  We want svn-fe's output to change from

	<import blob>
	<import blob>
	...
	<import blob>
	<import commit, using blobs>

to

	<commit header>
	M 100644 inline one/path
	<import blob>
	M 100644 inline another/path
	...
	<commit footer (progress update)>

since the latter allows svn-fe to maintain much less state.  But
that's a big change, so patch 6 introduces a stepping stone on the way
there:

	<comment that will become commit header>
	<import blob>
	...
	<import commit; this will become the commit footer>

That paves the way for patches 7-11, which teach svn-fe to rely
on the fast-import backend for information about previously
imported blobs, at long last.

The visible effects should be:

 - svn-fe _requires_ a backchannel from the fast-import
   backend now.  You can't do

	svn-fe <dump >stream &&
	fast-import <stream

   in two steps any more.

 - Given one dump that picks up where another left off, svn-fe
   can continue the import.  Use

	git fast-import --relative-marks \
		--import-marks-if-exists=svn-revs \
		--export-marks=svn-revs \
		--cat-blob-fd=3 3>backchannel

   for both imports.

I'm not happy about the loss of usability but I'm happy about the gain
in functionality.  A good next step might be to build a simple remote
helper to make this comfortable to use.

Thoughts?  Improvements?  Complaints?  Despite the deficiencies just
mentioned I'm tempted to push this out soon.  Feedback in either
direction would be welcome.

David Barr (3):
  vcs-svn: set up channel to read fast-import cat-blob response
  vcs-svn: quote paths correctly for ls command
  vcs-svn: use mark from previous import for parent commit

Jonathan Nieder (9):
  vcs-svn: use higher mark numbers for blobs
  vcs-svn: save marks for imported commits
  vcs-svn: introduce repo_read_path to check the content at a path
  vcs-svn: handle_node: use repo_read_path
  vcs-svn: simplify repo_modify_path and repo_copy
  vcs-svn: add a comment before each commit
  vcs-svn: allow input errors to be detected promptly
  vcs-svn: eliminate repo_tree structure
  vcs-svn: handle filenames with dq correctly

 contrib/svn-fe/svn-fe.txt |    6 +-
 t/t9010-svn-fe.sh         |  217 +++++++++++++++++++------
 vcs-svn/fast_export.c     |  145 +++++++++++++++--
 vcs-svn/fast_export.h     |   39 +++--
 vcs-svn/line_buffer.c     |    5 +
 vcs-svn/line_buffer.h     |    1 +
 vcs-svn/repo_tree.c       |  386 ++++++++-------------------------------------
 vcs-svn/repo_tree.h       |    5 +-
 vcs-svn/string_pool.c     |   13 ++-
 vcs-svn/string_pool.h     |    3 +-
 vcs-svn/svndump.c         |  106 +++++++++----
 11 files changed, 490 insertions(+), 436 deletions(-)
 rewrite vcs-svn/fast_export.h (75%)
 rewrite vcs-svn/repo_tree.c (96%)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 01/12] vcs-svn: use higher mark numbers for blobs
  2011-03-06 22:54 ` [PATCH v2 00/12] vcs-svn: incremental import Jonathan Nieder
@ 2011-03-06 23:03   ` Jonathan Nieder
  2011-03-08 19:08     ` Junio C Hamano
  2011-03-06 23:04   ` [PATCH 02/12] vcs-svn: save marks for imported commits Jonathan Nieder
                     ` (11 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Jonathan Nieder @ 2011-03-06 23:03 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Date: Fri, 10 Dec 2010 04:21:35 -0600

Prepare to use mark :5 for the commit corresponding to r5 (and so on).

1 billion seems sufficiently high for blob marks to avoid conflicting
with rev marks, while still leaving room for 3 billion blobs.  Such
high mark numbers cause trouble with ancient fast-import versions, but
this topic cannot support git fast-import versions before 1.7.4 (which
introduces the cat-blob command) anyway.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Unchanged except for a tiny tweak in the working of the change
description.

 vcs-svn/repo_tree.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/vcs-svn/repo_tree.c b/vcs-svn/repo_tree.c
index 491f013..093c5ff 100644
--- a/vcs-svn/repo_tree.c
+++ b/vcs-svn/repo_tree.c
@@ -289,7 +289,7 @@ void repo_commit(uint32_t revision, uint32_t author, char *log, uint32_t uuid,
 static void mark_init(void)
 {
 	uint32_t i;
-	mark = 0;
+	mark = 1024 * 1024 * 1024;
 	for (i = 0; i < dent_pool.size; i++)
 		if (!repo_dirent_is_dir(dent_pointer(i)) &&
 		    dent_pointer(i)->content_offset > mark)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 02/12] vcs-svn: save marks for imported commits
  2011-03-06 22:54 ` [PATCH v2 00/12] vcs-svn: incremental import Jonathan Nieder
  2011-03-06 23:03   ` [PATCH 01/12] vcs-svn: use higher mark numbers for blobs Jonathan Nieder
@ 2011-03-06 23:04   ` Jonathan Nieder
  2011-03-06 23:07   ` [PATCH 03/12] vcs-svn: introduce repo_read_path to check the content at a path Jonathan Nieder
                     ` (10 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2011-03-06 23:04 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Date: Thu, 9 Dec 2010 18:57:13 -0600

This way, a person can use

	svnadmin dump $path |
	svn-fe |
	git fast-import --relative-marks --export-marks=svn-revs

to get a list of what commit corresponds to each svn revision (plus
some irrelevant blob names) in .git/info/fast-import/svn-revs.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
From David's tree.

 vcs-svn/fast_export.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index 260cf50..932824a 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -45,6 +45,7 @@ void fast_export_commit(uint32_t revision, uint32_t author, char *log,
 		*gitsvnline = '\0';
 	}
 	printf("commit refs/heads/master\n");
+	printf("mark :%"PRIu32"\n", revision);
 	printf("committer %s <%s@%s> %ld +0000\n",
 		   ~author ? pool_fetch(author) : "nobody",
 		   ~author ? pool_fetch(author) : "nobody",
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 03/12] vcs-svn: introduce repo_read_path to check the content at a path
  2011-03-06 22:54 ` [PATCH v2 00/12] vcs-svn: incremental import Jonathan Nieder
  2011-03-06 23:03   ` [PATCH 01/12] vcs-svn: use higher mark numbers for blobs Jonathan Nieder
  2011-03-06 23:04   ` [PATCH 02/12] vcs-svn: save marks for imported commits Jonathan Nieder
@ 2011-03-06 23:07   ` Jonathan Nieder
  2011-03-06 23:08   ` [PATCH 04/12] vcs-svn: handle_node: use repo_read_path Jonathan Nieder
                     ` (9 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2011-03-06 23:07 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Date: Sat, 20 Nov 2010 13:25:28 -0600

The repo_tree structure remembers, for each path in each revision, a
mode (regular file, executable, symlink, or directory) and content
(blob mark or directory structure).  Maintaining a second copy of all
this information when it's already in the target repository is
wasteful, it does not persist between svn-fe invocations, and most
importantly, there is no convenient way to transfer it from one
machine to another.  So it would be nice to get rid of it.

As a first step, let's change the repo_tree API to match fast-import's
read commands more closely.  Currently to read the mode for a path,
one uses

	repo_modify_path(path, new_mode, new_content);

which changes the mode and content as a side effect.  There is no
function to read the content at a path; add one.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/repo_tree.c |   12 +++++++++++-
 vcs-svn/repo_tree.h |    1 +
 2 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/vcs-svn/repo_tree.c b/vcs-svn/repo_tree.c
index 093c5ff..23a9371 100644
--- a/vcs-svn/repo_tree.c
+++ b/vcs-svn/repo_tree.c
@@ -87,7 +87,8 @@ static struct repo_dir *repo_clone_dir(struct repo_dir *orig_dir)
 	return dir_pointer(new_o);
 }
 
-static struct repo_dirent *repo_read_dirent(uint32_t revision, uint32_t *path)
+static struct repo_dirent *repo_read_dirent(uint32_t revision,
+					    const uint32_t *path)
 {
 	uint32_t name = 0;
 	struct repo_dirent *key = dent_pointer(dent_alloc(1));
@@ -157,6 +158,15 @@ static void repo_write_dirent(uint32_t *path, uint32_t mode,
 		dent_remove(&dir_pointer(parent_dir_o)->entries, dent);
 }
 
+uint32_t repo_read_path(const uint32_t *path)
+{
+	uint32_t content_offset = 0;
+	struct repo_dirent *dent = repo_read_dirent(active_commit, path);
+	if (dent != NULL)
+		content_offset = dent->content_offset;
+	return content_offset;
+}
+
 uint32_t repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst)
 {
 	uint32_t mode = 0, content_offset = 0;
diff --git a/vcs-svn/repo_tree.h b/vcs-svn/repo_tree.h
index 68baeb5..3202bbe 100644
--- a/vcs-svn/repo_tree.h
+++ b/vcs-svn/repo_tree.h
@@ -15,6 +15,7 @@ uint32_t next_blob_mark(void);
 uint32_t repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst);
 void repo_add(uint32_t *path, uint32_t mode, uint32_t blob_mark);
 uint32_t repo_modify_path(uint32_t *path, uint32_t mode, uint32_t blob_mark);
+uint32_t repo_read_path(const uint32_t *path);
 void repo_delete(uint32_t *path);
 void repo_commit(uint32_t revision, uint32_t author, char *log, uint32_t uuid,
 		 uint32_t url, long unsigned timestamp);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 04/12] vcs-svn: handle_node: use repo_read_path
  2011-03-06 22:54 ` [PATCH v2 00/12] vcs-svn: incremental import Jonathan Nieder
                     ` (2 preceding siblings ...)
  2011-03-06 23:07   ` [PATCH 03/12] vcs-svn: introduce repo_read_path to check the content at a path Jonathan Nieder
@ 2011-03-06 23:08   ` Jonathan Nieder
  2011-03-06 23:09   ` [PATCH 05/12] vcs-svn: simplify repo_modify_path and repo_copy Jonathan Nieder
                     ` (8 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2011-03-06 23:08 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Date: Fri, 10 Dec 2010 04:28:06 -0600

svn-fe processes each commit in two stages: first decide on the
correct content for all paths and export the relevant blobs, then
export a commit with the result.

We can keep less state and simplify svn-fe a great deal by doing
exporting the commit in one stage: use 'inline' blobs for each path
and remember nothing.  This way, the repo_tree structure could be
eliminated, and we would get support for incremental imports 'for
free'.

Reorganize handle_node() along these lines.  This is just a code
cleanup; the functional changes to repo_tree and handle_revision
will come later.

[db: backported to apply without text delta support]

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/svndump.c |   33 +++++++++++++++++++++++----------
 1 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
index ee7c0bb..f07376f 100644
--- a/vcs-svn/svndump.c
+++ b/vcs-svn/svndump.c
@@ -201,13 +201,14 @@ static void handle_node(void)
 	uint32_t mark = 0;
 	const uint32_t type = node_ctx.type;
 	const int have_props = node_ctx.propLength != LENGTH_UNKNOWN;
+	const int have_text = node_ctx.textLength != LENGTH_UNKNOWN;
 
 	if (node_ctx.text_delta)
 		die("text deltas not supported");
-	if (node_ctx.textLength != LENGTH_UNKNOWN)
+	if (have_text)
 		mark = next_blob_mark();
 	if (node_ctx.action == NODEACT_DELETE) {
-		if (mark || have_props || node_ctx.srcRev)
+		if (have_text || have_props || node_ctx.srcRev)
 			die("invalid dump: deletion node has "
 				"copyfrom info, text, or properties");
 		return repo_delete(node_ctx.dst);
@@ -221,13 +222,20 @@ static void handle_node(void)
 		if (node_ctx.action == NODEACT_ADD)
 			node_ctx.action = NODEACT_CHANGE;
 	}
-	if (mark && type == REPO_MODE_DIR)
+	if (have_text && type == REPO_MODE_DIR)
 		die("invalid dump: directories cannot have text attached");
+
+	/*
+	 * Decide on the new content (mark) and mode (node_ctx.type).
+	 */
 	if (node_ctx.action == NODEACT_CHANGE && !~*node_ctx.dst) {
 		if (type != REPO_MODE_DIR)
 			die("invalid dump: root of tree is not a regular file");
 	} else if (node_ctx.action == NODEACT_CHANGE) {
-		uint32_t mode = repo_modify_path(node_ctx.dst, 0, mark);
+		uint32_t mode;
+		if (!have_text)
+			mark = repo_read_path(node_ctx.dst);
+		mode = repo_modify_path(node_ctx.dst, 0, 0);
 		if (!mode)
 			die("invalid dump: path to be modified is missing");
 		if (mode == REPO_MODE_DIR && type != REPO_MODE_DIR)
@@ -236,22 +244,27 @@ static void handle_node(void)
 			die("invalid dump: cannot modify a file into a directory");
 		node_ctx.type = mode;
 	} else if (node_ctx.action == NODEACT_ADD) {
-		if (!mark && type != REPO_MODE_DIR)
+		if (!have_text && type != REPO_MODE_DIR)
 			die("invalid dump: adds node without text");
-		repo_add(node_ctx.dst, type, mark);
 	} else {
 		die("invalid dump: Node-path block lacks Node-action");
 	}
+
+	/*
+	 * Adjust mode to reflect properties.
+	 */
 	if (have_props) {
-		const uint32_t old_mode = node_ctx.type;
 		if (!node_ctx.prop_delta)
 			node_ctx.type = type;
 		if (node_ctx.propLength)
 			read_props();
-		if (node_ctx.type != old_mode)
-			repo_modify_path(node_ctx.dst, node_ctx.type, mark);
 	}
-	if (mark)
+
+	/*
+	 * Save the result.
+	 */
+	repo_add(node_ctx.dst, node_ctx.type, mark);
+	if (have_text)
 		fast_export_blob(node_ctx.type, mark,
 				 node_ctx.textLength, &input);
 }
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 05/12] vcs-svn: simplify repo_modify_path and repo_copy
  2011-03-06 22:54 ` [PATCH v2 00/12] vcs-svn: incremental import Jonathan Nieder
                     ` (3 preceding siblings ...)
  2011-03-06 23:08   ` [PATCH 04/12] vcs-svn: handle_node: use repo_read_path Jonathan Nieder
@ 2011-03-06 23:09   ` Jonathan Nieder
  2011-03-06 23:09   ` [PATCH 06/12] vcs-svn: add a comment before each commit Jonathan Nieder
                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2011-03-06 23:09 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Date: Fri, 10 Dec 2010 00:53:54 -0600

Restrict the repo_tree API to functions that are actually needed.

 - decouple reading the mode and content of dirents from other
   operations.
 - remove repo_modify_path.  It is only used to read the mode from
   dirents.
 - remove the ability to use repo_read_mode on a missing path.  The
   existing code only errors out in that case, anyway.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/repo_tree.c |   27 ++++++++++-----------------
 vcs-svn/repo_tree.h |    4 ++--
 vcs-svn/svndump.c   |    4 +---
 3 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/vcs-svn/repo_tree.c b/vcs-svn/repo_tree.c
index 23a9371..036a686 100644
--- a/vcs-svn/repo_tree.c
+++ b/vcs-svn/repo_tree.c
@@ -106,7 +106,7 @@ static struct repo_dirent *repo_read_dirent(uint32_t revision,
 	return dent;
 }
 
-static void repo_write_dirent(uint32_t *path, uint32_t mode,
+static void repo_write_dirent(const uint32_t *path, uint32_t mode,
 			      uint32_t content_offset, uint32_t del)
 {
 	uint32_t name, revision, dir_o = ~0, parent_dir_o = ~0;
@@ -167,7 +167,15 @@ uint32_t repo_read_path(const uint32_t *path)
 	return content_offset;
 }
 
-uint32_t repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst)
+uint32_t repo_read_mode(const uint32_t *path)
+{
+	struct repo_dirent *dent = repo_read_dirent(active_commit, path);
+	if (dent == NULL)
+		die("invalid dump: path to be modified is missing");
+	return dent->mode;
+}
+
+void repo_copy(uint32_t revision, const uint32_t *src, const uint32_t *dst)
 {
 	uint32_t mode = 0, content_offset = 0;
 	struct repo_dirent *src_dent;
@@ -177,7 +185,6 @@ uint32_t repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst)
 		content_offset = src_dent->content_offset;
 		repo_write_dirent(dst, mode, content_offset, 0);
 	}
-	return mode;
 }
 
 void repo_add(uint32_t *path, uint32_t mode, uint32_t blob_mark)
@@ -185,20 +192,6 @@ void repo_add(uint32_t *path, uint32_t mode, uint32_t blob_mark)
 	repo_write_dirent(path, mode, blob_mark, 0);
 }
 
-uint32_t repo_modify_path(uint32_t *path, uint32_t mode, uint32_t blob_mark)
-{
-	struct repo_dirent *src_dent;
-	src_dent = repo_read_dirent(active_commit, path);
-	if (!src_dent)
-		return 0;
-	if (!blob_mark)
-		blob_mark = src_dent->content_offset;
-	if (!mode)
-		mode = src_dent->mode;
-	repo_write_dirent(path, mode, blob_mark, 0);
-	return mode;
-}
-
 void repo_delete(uint32_t *path)
 {
 	repo_write_dirent(path, 0, 0, 1);
diff --git a/vcs-svn/repo_tree.h b/vcs-svn/repo_tree.h
index 3202bbe..11d48c2 100644
--- a/vcs-svn/repo_tree.h
+++ b/vcs-svn/repo_tree.h
@@ -12,10 +12,10 @@
 #define REPO_MAX_PATH_DEPTH 1000
 
 uint32_t next_blob_mark(void);
-uint32_t repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst);
+void repo_copy(uint32_t revision, const uint32_t *src, const uint32_t *dst);
 void repo_add(uint32_t *path, uint32_t mode, uint32_t blob_mark);
-uint32_t repo_modify_path(uint32_t *path, uint32_t mode, uint32_t blob_mark);
 uint32_t repo_read_path(const uint32_t *path);
+uint32_t repo_read_mode(const uint32_t *path);
 void repo_delete(uint32_t *path);
 void repo_commit(uint32_t revision, uint32_t author, char *log, uint32_t uuid,
 		 uint32_t url, long unsigned timestamp);
diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
index f07376f..e6d84ba 100644
--- a/vcs-svn/svndump.c
+++ b/vcs-svn/svndump.c
@@ -235,9 +235,7 @@ static void handle_node(void)
 		uint32_t mode;
 		if (!have_text)
 			mark = repo_read_path(node_ctx.dst);
-		mode = repo_modify_path(node_ctx.dst, 0, 0);
-		if (!mode)
-			die("invalid dump: path to be modified is missing");
+		mode = repo_read_mode(node_ctx.dst);
 		if (mode == REPO_MODE_DIR && type != REPO_MODE_DIR)
 			die("invalid dump: cannot modify a directory into a file");
 		if (mode != REPO_MODE_DIR && type == REPO_MODE_DIR)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 06/12] vcs-svn: add a comment before each commit
  2011-03-06 22:54 ` [PATCH v2 00/12] vcs-svn: incremental import Jonathan Nieder
                     ` (4 preceding siblings ...)
  2011-03-06 23:09   ` [PATCH 05/12] vcs-svn: simplify repo_modify_path and repo_copy Jonathan Nieder
@ 2011-03-06 23:09   ` Jonathan Nieder
  2011-03-06 23:10   ` [PATCH 07/12] vcs-svn: allow input errors to be detected promptly Jonathan Nieder
                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2011-03-06 23:09 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Date: Tue, 4 Jan 2011 21:53:33 -0600

Current svn-fe produces output like this:

	blob
	mark :7382321
	data 5
	hello

	blob
	mark :7382322
	data 5
	Hello

	commit
	mark :3
[...]
	M 100644 :7382321 hello.c
	M 100644 :7382322 hello2.c

This means svn-fe has to keep track of the paths modified in each
commit and the corresponding marks, instead of dealing with each file
as it arrives in input and then forgetting about it.  A better
strategy would be to use inline blobs:

	commit
	mark :3
[...]
	M 100644 inline hello.c
	data 5
	hello
[...]

As a first step towards that, teach svn-fe to notice when the
collection of blobs for each commit starts and write a comment
("# commit 3.") there.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/fast_export.c |    5 +++++
 vcs-svn/fast_export.h |    1 +
 vcs-svn/svndump.c     |   29 ++++++++++++++++++++++-------
 3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index 932824a..5a105ad 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -30,6 +30,11 @@ void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
 	putchar('\n');
 }
 
+void fast_export_begin_commit(uint32_t revision)
+{
+	printf("# commit %"PRIu32".\n", revision);
+}
+
 static char gitsvnline[MAX_GITSVN_LINE_LEN];
 void fast_export_commit(uint32_t revision, uint32_t author, char *log,
 			uint32_t uuid, uint32_t url,
diff --git a/vcs-svn/fast_export.h b/vcs-svn/fast_export.h
index 054e7d5..aff8005 100644
--- a/vcs-svn/fast_export.h
+++ b/vcs-svn/fast_export.h
@@ -6,6 +6,7 @@
 void fast_export_delete(uint32_t depth, uint32_t *path);
 void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
 			uint32_t mark);
+void fast_export_begin_commit(uint32_t revision);
 void fast_export_commit(uint32_t revision, uint32_t author, char *log,
 			uint32_t uuid, uint32_t url, unsigned long timestamp);
 void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len,
diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
index e6d84ba..a384996 100644
--- a/vcs-svn/svndump.c
+++ b/vcs-svn/svndump.c
@@ -20,9 +20,11 @@
 #define NODEACT_CHANGE 1
 #define NODEACT_UNKNOWN 0
 
-#define DUMP_CTX 0
-#define REV_CTX  1
-#define NODE_CTX 2
+/* States: */
+#define DUMP_CTX 0	/* dump metadata */
+#define REV_CTX  1	/* revision metadata */
+#define NODE_CTX 2	/* node metadata */
+#define INTERNODE_CTX 3	/* between nodes */
 
 #define LENGTH_UNKNOWN (~0)
 #define DATE_RFC2822_LEN 31
@@ -267,7 +269,14 @@ static void handle_node(void)
 				 node_ctx.textLength, &input);
 }
 
-static void handle_revision(void)
+static void begin_revision(void)
+{
+	if (!rev_ctx.revision)	/* revision 0 gets no git commit. */
+		return;
+	fast_export_begin_commit(rev_ctx.revision);
+}
+
+static void end_revision(void)
 {
 	if (rev_ctx.revision)
 		repo_commit(rev_ctx.revision, rev_ctx.author, rev_ctx.log,
@@ -301,13 +310,17 @@ void svndump_read(const char *url)
 		} else if (key == keys.revision_number) {
 			if (active_ctx == NODE_CTX)
 				handle_node();
+			if (active_ctx == REV_CTX)
+				begin_revision();
 			if (active_ctx != DUMP_CTX)
-				handle_revision();
+				end_revision();
 			active_ctx = REV_CTX;
 			reset_rev_ctx(atoi(val));
 		} else if (key == keys.node_path) {
 			if (active_ctx == NODE_CTX)
 				handle_node();
+			if (active_ctx == REV_CTX)
+				begin_revision();
 			active_ctx = NODE_CTX;
 			reset_node_ctx(val);
 		} else if (key == keys.node_kind) {
@@ -349,7 +362,7 @@ void svndump_read(const char *url)
 				read_props();
 			} else if (active_ctx == NODE_CTX) {
 				handle_node();
-				active_ctx = REV_CTX;
+				active_ctx = INTERNODE_CTX;
 			} else {
 				fprintf(stderr, "Unexpected content length header: %"PRIu32"\n", len);
 				buffer_skip_bytes(&input, len);
@@ -358,8 +371,10 @@ void svndump_read(const char *url)
 	}
 	if (active_ctx == NODE_CTX)
 		handle_node();
+	if (active_ctx == REV_CTX)
+		begin_revision();
 	if (active_ctx != DUMP_CTX)
-		handle_revision();
+		end_revision();
 }
 
 int svndump_init(const char *filename)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 07/12] vcs-svn: allow input errors to be detected promptly
  2011-03-06 22:54 ` [PATCH v2 00/12] vcs-svn: incremental import Jonathan Nieder
                     ` (5 preceding siblings ...)
  2011-03-06 23:09   ` [PATCH 06/12] vcs-svn: add a comment before each commit Jonathan Nieder
@ 2011-03-06 23:10   ` Jonathan Nieder
  2011-03-06 23:11   ` [PATCH 08/12] vcs-svn: set up channel to read fast-import cat-blob response Jonathan Nieder
                     ` (5 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2011-03-06 23:10 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Date: Sun, 10 Oct 2010 21:51:21 -0500

The line_buffer library silently flags input errors until
buffer_deinit time; unfortunately, by that point usually errno is
invalid.  Expose the error flag so callers can check for and
report errors early for easy debugging.

	some_error_prone_operation(...);
	if (buffer_ferror(buf))
		return error("input error: %s", strerror(errno));

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/line_buffer.c |    5 +++++
 vcs-svn/line_buffer.h |    1 +
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index aedf105..eb8a6a7 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -59,6 +59,11 @@ long buffer_tmpfile_prepare_to_read(struct line_buffer *buf)
 	return pos;
 }
 
+int buffer_ferror(struct line_buffer *buf)
+{
+	return ferror(buf->infile);
+}
+
 int buffer_read_char(struct line_buffer *buf)
 {
 	return fgetc(buf->infile);
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
index 96ce966..3c9629e 100644
--- a/vcs-svn/line_buffer.h
+++ b/vcs-svn/line_buffer.h
@@ -21,6 +21,7 @@ int buffer_tmpfile_init(struct line_buffer *buf);
 FILE *buffer_tmpfile_rewind(struct line_buffer *buf);	/* prepare to write. */
 long buffer_tmpfile_prepare_to_read(struct line_buffer *buf);
 
+int buffer_ferror(struct line_buffer *buf);
 char *buffer_read_line(struct line_buffer *buf);
 char *buffer_read_string(struct line_buffer *buf, uint32_t len);
 int buffer_read_char(struct line_buffer *buf);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 08/12] vcs-svn: set up channel to read fast-import cat-blob response
  2011-03-06 22:54 ` [PATCH v2 00/12] vcs-svn: incremental import Jonathan Nieder
                     ` (6 preceding siblings ...)
  2011-03-06 23:10   ` [PATCH 07/12] vcs-svn: allow input errors to be detected promptly Jonathan Nieder
@ 2011-03-06 23:11   ` Jonathan Nieder
  2011-03-06 23:12   ` [PATCH 09/12] vcs-svn: eliminate repo_tree structure Jonathan Nieder
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2011-03-06 23:11 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

From: David Barr <david.barr@cordelta.com>
Date: Sat, 5 Mar 2011 13:30:23 +1100

Set up some plumbing: teach the svndump lib to pass a file descriptor
number to the fast_export lib, representing where cat-blob/ls
responses can be read from, and add a get_response_line helper
function to the fast_export lib to read a line from that file.

Unfortunately this means that svn-fe needs file descriptor 3 to be
redirected from somewhere (preferrably the cat-blob stream of a
fast-import backend); otherwise it will fail:

	$ svndump <path> | svn-fe
	fatal: cannot read from file descriptor 3: Bad file descriptor

For the moment, "svn-fe 3</dev/null" works as a workaround but it
will not work for very long.  A fast-import backend that can retrieve
old commits is needed in order to be able to fulfill svn
"Node-copyfrom-rev" requests that refer to revs from a previous run.

[jn: with new change description]

Based-on-patch-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 contrib/svn-fe/svn-fe.txt |    6 ++-
 t/t9010-svn-fe.sh         |  118 +++++++++++++++++++++++++-------------------
 vcs-svn/fast_export.c     |   28 +++++++++++
 vcs-svn/fast_export.h     |    4 ++
 vcs-svn/svndump.c         |    5 ++
 5 files changed, 109 insertions(+), 52 deletions(-)

diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
index cd075b9..85f7b83 100644
--- a/contrib/svn-fe/svn-fe.txt
+++ b/contrib/svn-fe/svn-fe.txt
@@ -7,7 +7,11 @@ svn-fe - convert an SVN "dumpfile" to a fast-import stream
 
 SYNOPSIS
 --------
-svnadmin dump --incremental REPO | svn-fe [url] | git fast-import
+[verse]
+mkfifo backchannel &&
+svnadmin dump --incremental REPO |
+	svn-fe [url] 3<backchannel |
+	git fast-import --cat-blob-fd=3 3>backchannel
 
 DESCRIPTION
 -----------
diff --git a/t/t9010-svn-fe.sh b/t/t9010-svn-fe.sh
index 5a6a4b9..2ae5374 100755
--- a/t/t9010-svn-fe.sh
+++ b/t/t9010-svn-fe.sh
@@ -5,8 +5,26 @@ test_description='check svn dumpfile importer'
 . ./test-lib.sh
 
 reinit_git () {
+	if ! test_declared_prereq PIPE
+	then
+		echo >&4 "reinit_git: need to declare PIPE prerequisite"
+		return 127
+	fi
 	rm -fr .git &&
-	git init
+	rm -f stream backflow &&
+	git init &&
+	mkfifo stream backflow
+}
+
+try_dump () {
+	input=$1 &&
+	maybe_fail=${2:+test_$2} &&
+
+	{
+		$maybe_fail test-svn-fe "$input" >stream 3<backflow &
+	} &&
+	git fast-import --cat-blob-fd=3 <stream 3>backflow &&
+	wait $!
 }
 
 properties () {
@@ -35,21 +53,27 @@ text_no_props () {
 
 >empty
 
-test_expect_success 'empty dump' '
+test_expect_success 'setup: have pipes?' '
+	rm -f frob &&
+	if mkfifo frob
+	then
+		test_set_prereq PIPE
+	fi
+'
+
+test_expect_success PIPE 'empty dump' '
 	reinit_git &&
 	echo "SVN-fs-dump-format-version: 2" >input &&
-	test-svn-fe input >stream &&
-	git fast-import <stream
+	try_dump input
 '
 
-test_expect_success 'v4 dumps not supported' '
+test_expect_success PIPE 'v4 dumps not supported' '
 	reinit_git &&
 	echo "SVN-fs-dump-format-version: 4" >v4.dump &&
-	test_must_fail test-svn-fe v4.dump >stream &&
-	test_cmp empty stream
+	try_dump v4.dump must_fail
 '
 
-test_expect_failure 'empty revision' '
+test_expect_failure PIPE 'empty revision' '
 	reinit_git &&
 	printf "rev <nobody, nobody@local>: %s\n" "" "" >expect &&
 	cat >emptyrev.dump <<-\EOF &&
@@ -64,13 +88,12 @@ test_expect_failure 'empty revision' '
 	Content-length: 0
 
 	EOF
-	test-svn-fe emptyrev.dump >stream &&
-	git fast-import <stream &&
+	try_dump emptyrev.dump &&
 	git log -p --format="rev <%an, %ae>: %s" HEAD >actual &&
 	test_cmp expect actual
 '
 
-test_expect_success 'empty properties' '
+test_expect_success PIPE 'empty properties' '
 	reinit_git &&
 	printf "rev <nobody, nobody@local>: %s\n" "" "" >expect &&
 	cat >emptyprop.dump <<-\EOF &&
@@ -88,13 +111,12 @@ test_expect_success 'empty properties' '
 
 	PROPS-END
 	EOF
-	test-svn-fe emptyprop.dump >stream &&
-	git fast-import <stream &&
+	try_dump emptyprop.dump &&
 	git log -p --format="rev <%an, %ae>: %s" HEAD >actual &&
 	test_cmp expect actual
 '
 
-test_expect_success 'author name and commit message' '
+test_expect_success PIPE 'author name and commit message' '
 	reinit_git &&
 	echo "<author@example.com, author@example.com@local>" >expect.author &&
 	cat >message <<-\EOF &&
@@ -121,15 +143,14 @@ test_expect_success 'author name and commit message' '
 		echo &&
 		cat props
 	} >log.dump &&
-	test-svn-fe log.dump >stream &&
-	git fast-import <stream &&
+	try_dump log.dump &&
 	git log -p --format="%B" HEAD >actual.log &&
 	git log --format="<%an, %ae>" >actual.author &&
 	test_cmp message actual.log &&
 	test_cmp expect.author actual.author
 '
 
-test_expect_success 'unsupported properties are ignored' '
+test_expect_success PIPE 'unsupported properties are ignored' '
 	reinit_git &&
 	echo author >expect &&
 	cat >extraprop.dump <<-\EOF &&
@@ -149,13 +170,12 @@ test_expect_success 'unsupported properties are ignored' '
 	author
 	PROPS-END
 	EOF
-	test-svn-fe extraprop.dump >stream &&
-	git fast-import <stream &&
+	try_dump extraprop.dump &&
 	git log -p --format=%an HEAD >actual &&
 	test_cmp expect actual
 '
 
-test_expect_failure 'timestamp and empty file' '
+test_expect_failure PIPE 'timestamp and empty file' '
 	echo author@example.com >expect.author &&
 	echo 1999-01-01 >expect.date &&
 	echo file >expect.files &&
@@ -186,8 +206,7 @@ test_expect_failure 'timestamp and empty file' '
 
 		EOF
 	} >emptyfile.dump &&
-	test-svn-fe emptyfile.dump >stream &&
-	git fast-import <stream &&
+	try_dump emptyfile.dump &&
 	git log --format=%an HEAD >actual.author &&
 	git log --date=short --format=%ad HEAD >actual.date &&
 	git ls-tree -r --name-only HEAD >actual.files &&
@@ -198,7 +217,7 @@ test_expect_failure 'timestamp and empty file' '
 	test_cmp empty file
 '
 
-test_expect_success 'directory with files' '
+test_expect_success PIPE 'directory with files' '
 	reinit_git &&
 	printf "%s\n" directory/file1 directory/file2 >expect.files &&
 	echo hi >hi &&
@@ -242,8 +261,7 @@ test_expect_success 'directory with files' '
 		EOF
 		text_no_props hi
 	} >directory.dump &&
-	test-svn-fe directory.dump >stream &&
-	git fast-import <stream &&
+	try_dump directory.dump &&
 
 	git ls-tree -r --name-only HEAD >actual.files &&
 	git checkout HEAD directory &&
@@ -252,7 +270,8 @@ test_expect_success 'directory with files' '
 	test_cmp hi directory/file2
 '
 
-test_expect_success 'node without action' '
+test_expect_success PIPE 'node without action' '
+	reinit_git &&
 	cat >inaction.dump <<-\EOF &&
 	SVN-fs-dump-format-version: 3
 
@@ -269,10 +288,11 @@ test_expect_success 'node without action' '
 
 	PROPS-END
 	EOF
-	test_must_fail test-svn-fe inaction.dump
+	try_dump inaction.dump must_fail
 '
 
-test_expect_success 'action: add node without text' '
+test_expect_success PIPE 'action: add node without text' '
+	reinit_git &&
 	cat >textless.dump <<-\EOF &&
 	SVN-fs-dump-format-version: 3
 
@@ -290,10 +310,10 @@ test_expect_success 'action: add node without text' '
 
 	PROPS-END
 	EOF
-	test_must_fail test-svn-fe textless.dump
+	try_dump textless.dump must_fail
 '
 
-test_expect_failure 'change file mode but keep old content' '
+test_expect_failure PIPE 'change file mode but keep old content' '
 	reinit_git &&
 	cat >expect <<-\EOF &&
 	OBJID
@@ -356,8 +376,7 @@ test_expect_failure 'change file mode but keep old content' '
 
 	PROPS-END
 	EOF
-	test-svn-fe filemode.dump >stream &&
-	git fast-import <stream &&
+	try_dump filemode.dump &&
 	{
 		git rev-list HEAD |
 		git diff-tree --root --stdin |
@@ -370,7 +389,7 @@ test_expect_failure 'change file mode but keep old content' '
 	test_cmp hello actual.target
 '
 
-test_expect_success 'change file mode and reiterate content' '
+test_expect_success PIPE 'change file mode and reiterate content' '
 	reinit_git &&
 	cat >expect <<-\EOF &&
 	OBJID
@@ -382,7 +401,7 @@ test_expect_success 'change file mode and reiterate content' '
 	EOF
 	echo "link hello" >expect.blob &&
 	echo hello >hello &&
-	cat >filemode.dump <<-\EOF &&
+	cat >filemode2.dump <<-\EOF &&
 	SVN-fs-dump-format-version: 3
 
 	Revision-number: 1
@@ -437,8 +456,7 @@ test_expect_success 'change file mode and reiterate content' '
 	PROPS-END
 	link hello
 	EOF
-	test-svn-fe filemode.dump >stream &&
-	git fast-import <stream &&
+	try_dump filemode2.dump &&
 	{
 		git rev-list HEAD |
 		git diff-tree --root --stdin |
@@ -451,7 +469,8 @@ test_expect_success 'change file mode and reiterate content' '
 	test_cmp hello actual.target
 '
 
-test_expect_success 'deltas not supported' '
+test_expect_success PIPE 'deltas not supported' '
+	reinit_git &&
 	{
 		# (old) h + (inline) ello + (old) \n
 		printf "SVNQ%b%b%s" "Q\003\006\005\004" "\001Q\0204\001\002" "ello" |
@@ -511,10 +530,10 @@ test_expect_success 'deltas not supported' '
 		echo PROPS-END &&
 		cat delta
 	} >delta.dump &&
-	test_must_fail test-svn-fe delta.dump
+	test_must_fail try_dump delta.dump
 '
 
-test_expect_success 'property deltas supported' '
+test_expect_success PIPE 'property deltas supported' '
 	reinit_git &&
 	cat >expect <<-\EOF &&
 	OBJID
@@ -570,8 +589,7 @@ test_expect_success 'property deltas supported' '
 		PROPS-END
 		EOF
 	} >propdelta.dump &&
-	test-svn-fe propdelta.dump >stream &&
-	git fast-import <stream &&
+	try_dump propdelta.dump &&
 	{
 		git rev-list HEAD |
 		git diff-tree --stdin |
@@ -580,7 +598,7 @@ test_expect_success 'property deltas supported' '
 	test_cmp expect actual
 '
 
-test_expect_success 'properties on /' '
+test_expect_success PIPE 'properties on /' '
 	reinit_git &&
 	cat <<-\EOF >expect &&
 	OBJID
@@ -625,8 +643,7 @@ test_expect_success 'properties on /' '
 
 	PROPS-END
 	EOF
-	test-svn-fe changeroot.dump >stream &&
-	git fast-import <stream &&
+	try_dump changeroot.dump &&
 	{
 		git rev-list HEAD |
 		git diff-tree --root --always --stdin |
@@ -635,7 +652,7 @@ test_expect_success 'properties on /' '
 	test_cmp expect actual
 '
 
-test_expect_success 'deltas for typechange' '
+test_expect_success PIPE 'deltas for typechange' '
 	reinit_git &&
 	cat >expect <<-\EOF &&
 	OBJID
@@ -711,8 +728,7 @@ test_expect_success 'deltas for typechange' '
 	PROPS-END
 	link testing 321
 	EOF
-	test-svn-fe deleteprop.dump >stream &&
-	git fast-import <stream &&
+	try_dump deleteprop.dump &&
 	{
 		git rev-list HEAD |
 		git diff-tree --root --stdin |
@@ -736,12 +752,12 @@ test_expect_success 'set up svn repo' '
 	fi
 '
 
-test_expect_success SVNREPO 't9135/svn.dump' '
-	git init simple-git &&
-	test-svn-fe "$TEST_DIRECTORY/t9135/svn.dump" >simple.fe &&
+test_expect_success SVNREPO,PIPE 't9135/svn.dump' '
+	mkdir -p simple-git &&
 	(
 		cd simple-git &&
-		git fast-import <../simple.fe
+		reinit_git &&
+		try_dump "$TEST_DIRECTORY/t9135/svn.dump"
 	) &&
 	(
 		cd simple-svnco &&
diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index 5a105ad..8786ed2 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -12,6 +12,24 @@
 #define MAX_GITSVN_LINE_LEN 4096
 
 static uint32_t first_commit_done;
+static struct line_buffer report_buffer = LINE_BUFFER_INIT;
+
+void fast_export_init(int fd)
+{
+	if (buffer_fdinit(&report_buffer, fd))
+		die_errno("cannot read from file descriptor %d", fd);
+}
+
+void fast_export_deinit(void)
+{
+	if (buffer_deinit(&report_buffer))
+		die_errno("error closing fast-import feedback stream");
+}
+
+void fast_export_reset(void)
+{
+	buffer_reset(&report_buffer);
+}
 
 void fast_export_delete(uint32_t depth, uint32_t *path)
 {
@@ -69,6 +87,16 @@ void fast_export_commit(uint32_t revision, uint32_t author, char *log,
 	printf("progress Imported commit %"PRIu32".\n\n", revision);
 }
 
+static const char *get_response_line(void)
+{
+	const char *line = buffer_read_line(&report_buffer);
+	if (line)
+		return line;
+	if (buffer_ferror(&report_buffer))
+		die_errno("error reading from fast-import");
+	die("unexpected end of fast-import feedback");
+}
+
 void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len, struct line_buffer *input)
 {
 	if (mode == REPO_MODE_LNK) {
diff --git a/vcs-svn/fast_export.h b/vcs-svn/fast_export.h
index aff8005..09b2033 100644
--- a/vcs-svn/fast_export.h
+++ b/vcs-svn/fast_export.h
@@ -3,6 +3,10 @@
 
 #include "line_buffer.h"
 
+void fast_export_init(int fd);
+void fast_export_deinit(void);
+void fast_export_reset(void);
+
 void fast_export_delete(uint32_t depth, uint32_t *path);
 void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
 			uint32_t mark);
diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
index a384996..3cc4135 100644
--- a/vcs-svn/svndump.c
+++ b/vcs-svn/svndump.c
@@ -14,6 +14,8 @@
 #include "obj_pool.h"
 #include "string_pool.h"
 
+#define REPORT_FILENO 3
+
 #define NODEACT_REPLACE 4
 #define NODEACT_DELETE 3
 #define NODEACT_ADD 2
@@ -382,6 +384,7 @@ int svndump_init(const char *filename)
 	if (buffer_init(&input, filename))
 		return error("cannot open %s: %s", filename, strerror(errno));
 	repo_init();
+	fast_export_init(REPORT_FILENO);
 	reset_dump_ctx(~0);
 	reset_rev_ctx(0);
 	reset_node_ctx(NULL);
@@ -392,6 +395,7 @@ int svndump_init(const char *filename)
 void svndump_deinit(void)
 {
 	log_reset();
+	fast_export_deinit();
 	repo_reset();
 	reset_dump_ctx(~0);
 	reset_rev_ctx(0);
@@ -405,6 +409,7 @@ void svndump_deinit(void)
 void svndump_reset(void)
 {
 	log_reset();
+	fast_export_reset();
 	buffer_reset(&input);
 	repo_reset();
 	reset_dump_ctx(~0);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 09/12] vcs-svn: eliminate repo_tree structure
  2011-03-06 22:54 ` [PATCH v2 00/12] vcs-svn: incremental import Jonathan Nieder
                     ` (7 preceding siblings ...)
  2011-03-06 23:11   ` [PATCH 08/12] vcs-svn: set up channel to read fast-import cat-blob response Jonathan Nieder
@ 2011-03-06 23:12   ` Jonathan Nieder
  2011-03-06 23:12   ` [PATCH 10/12] vcs-svn: quote paths correctly for ls command Jonathan Nieder
                     ` (3 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2011-03-06 23:12 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Date: Fri, 10 Dec 2010 04:00:55 -0600

Rely on fast-import for information about previous revs.

This requires always setting up backward flow of information, even for
v2 dumps.  On the plus side, it simplifies the code by quite a bit and
opens the door to further simplifications.

[db: adjusted to support final version of the cat-blob patch]
[jn: avoiding hard-coding git's name for the empty tree for
 portability to other backends]

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/fast_export.c |  108 ++++++++++++--
 vcs-svn/fast_export.h |   44 +++---
 vcs-svn/repo_tree.c   |  389 ++++++++-----------------------------------------
 vcs-svn/repo_tree.h   |    2 +-
 vcs-svn/string_pool.c |    2 +-
 vcs-svn/string_pool.h |    2 +-
 vcs-svn/svndump.c     |   53 +++++--
 7 files changed, 222 insertions(+), 378 deletions(-)
 rewrite vcs-svn/fast_export.h (65%)
 rewrite vcs-svn/repo_tree.c (95%)

diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index 8786ed2..a8ce5c6 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -8,6 +8,7 @@
 #include "line_buffer.h"
 #include "repo_tree.h"
 #include "string_pool.h"
+#include "strbuf.h"
 
 #define MAX_GITSVN_LINE_LEN 4096
 
@@ -31,7 +32,7 @@ void fast_export_reset(void)
 	buffer_reset(&report_buffer);
 }
 
-void fast_export_delete(uint32_t depth, uint32_t *path)
+void fast_export_delete(uint32_t depth, const uint32_t *path)
 {
 	putchar('D');
 	putchar(' ');
@@ -39,22 +40,27 @@ void fast_export_delete(uint32_t depth, uint32_t *path)
 	putchar('\n');
 }
 
-void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
-			uint32_t mark)
+static void fast_export_truncate(uint32_t depth, const uint32_t *path, uint32_t mode)
+{
+	fast_export_modify(depth, path, mode, "inline");
+	printf("data 0\n\n");
+}
+
+void fast_export_modify(uint32_t depth, const uint32_t *path, uint32_t mode,
+			const char *dataref)
 {
 	/* Mode must be 100644, 100755, 120000, or 160000. */
-	printf("M %06"PRIo32" :%"PRIu32" ", mode, mark);
+	if (!dataref) {
+		fast_export_truncate(depth, path, mode);
+		return;
+	}
+	printf("M %06"PRIo32" %s ", mode, dataref);
 	pool_print_seq(depth, path, '/', stdout);
 	putchar('\n');
 }
 
-void fast_export_begin_commit(uint32_t revision)
-{
-	printf("# commit %"PRIu32".\n", revision);
-}
-
 static char gitsvnline[MAX_GITSVN_LINE_LEN];
-void fast_export_commit(uint32_t revision, uint32_t author, char *log,
+void fast_export_begin_commit(uint32_t revision, uint32_t author, char *log,
 			uint32_t uuid, uint32_t url,
 			unsigned long timestamp)
 {
@@ -81,12 +87,31 @@ void fast_export_commit(uint32_t revision, uint32_t author, char *log,
 			printf("from refs/heads/master^0\n");
 		first_commit_done = 1;
 	}
-	repo_diff(revision - 1, revision);
-	fputc('\n', stdout);
+}
 
+void fast_export_end_commit(uint32_t revision)
+{
 	printf("progress Imported commit %"PRIu32".\n\n", revision);
 }
 
+static void ls_from_rev(uint32_t rev, uint32_t depth, const uint32_t *path)
+{
+	/* ls :5 path/to/old/file */
+	printf("ls :%"PRIu32" ", rev);
+	pool_print_seq(depth, path, '/', stdout);
+	putchar('\n');
+	fflush(stdout);
+}
+
+static void ls_from_active_commit(uint32_t depth, const uint32_t *path)
+{
+	/* ls "path/to/file" */
+	printf("ls \"");
+	pool_print_seq(depth, path, '/', stdout);
+	printf("\"\n");
+	fflush(stdout);
+}
+
 static const char *get_response_line(void)
 {
 	const char *line = buffer_read_line(&report_buffer);
@@ -97,14 +122,69 @@ static const char *get_response_line(void)
 	die("unexpected end of fast-import feedback");
 }
 
-void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len, struct line_buffer *input)
+void fast_export_data(uint32_t mode, uint32_t len, struct line_buffer *input)
 {
 	if (mode == REPO_MODE_LNK) {
 		/* svn symlink blobs start with "link " */
 		buffer_skip_bytes(input, 5);
 		len -= 5;
 	}
-	printf("blob\nmark :%"PRIu32"\ndata %"PRIu32"\n", mark, len);
+	printf("data %"PRIu32"\n", len);
 	buffer_copy_bytes(input, len);
 	fputc('\n', stdout);
 }
+
+static int parse_ls_response(const char *response, uint32_t *mode,
+					struct strbuf *dataref)
+{
+	const char *tab;
+	const char *response_end;
+
+	assert(response);
+	response_end = response + strlen(response);
+
+	if (*response == 'm') {	/* Missing. */
+		errno = ENOENT;
+		return -1;
+	}
+
+	/* Mode. */
+	if (response_end - response < strlen("100644") ||
+	    response[strlen("100644")] != ' ')
+		die("invalid ls response: missing mode: %s", response);
+	*mode = 0;
+	for (; *response != ' '; response++) {
+		char ch = *response;
+		if (ch < '0' || ch > '7')
+			die("invalid ls response: mode is not octal: %s", response);
+		*mode *= 8;
+		*mode += ch - '0';
+	}
+
+	/* ' blob ' or ' tree ' */
+	if (response_end - response < strlen(" blob ") ||
+	    (response[1] != 'b' && response[1] != 't'))
+		die("unexpected ls response: not a tree or blob: %s", response);
+	response += strlen(" blob ");
+
+	/* Dataref. */
+	tab = memchr(response, '\t', response_end - response);
+	if (!tab)
+		die("invalid ls response: missing tab: %s", response);
+	strbuf_add(dataref, response, tab - response);
+	return 0;
+}
+
+int fast_export_ls_rev(uint32_t rev, uint32_t depth, const uint32_t *path,
+				uint32_t *mode, struct strbuf *dataref)
+{
+	ls_from_rev(rev, depth, path);
+	return parse_ls_response(get_response_line(), mode, dataref);
+}
+
+int fast_export_ls(uint32_t depth, const uint32_t *path,
+				uint32_t *mode, struct strbuf *dataref)
+{
+	ls_from_active_commit(depth, path);
+	return parse_ls_response(get_response_line(), mode, dataref);
+}
diff --git a/vcs-svn/fast_export.h b/vcs-svn/fast_export.h
dissimilarity index 65%
index 09b2033..633d219 100644
--- a/vcs-svn/fast_export.h
+++ b/vcs-svn/fast_export.h
@@ -1,19 +1,25 @@
-#ifndef FAST_EXPORT_H_
-#define FAST_EXPORT_H_
-
-#include "line_buffer.h"
-
-void fast_export_init(int fd);
-void fast_export_deinit(void);
-void fast_export_reset(void);
-
-void fast_export_delete(uint32_t depth, uint32_t *path);
-void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
-			uint32_t mark);
-void fast_export_begin_commit(uint32_t revision);
-void fast_export_commit(uint32_t revision, uint32_t author, char *log,
-			uint32_t uuid, uint32_t url, unsigned long timestamp);
-void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len,
-		      struct line_buffer *input);
-
-#endif
+#ifndef FAST_EXPORT_H_
+#define FAST_EXPORT_H_
+
+struct strbuf;
+struct line_buffer;
+
+void fast_export_init(int fd);
+void fast_export_deinit(void);
+void fast_export_reset(void);
+
+void fast_export_delete(uint32_t depth, const uint32_t *path);
+void fast_export_modify(uint32_t depth, const uint32_t *path,
+			uint32_t mode, const char *dataref);
+void fast_export_begin_commit(uint32_t revision, uint32_t author, char *log,
+			uint32_t uuid, uint32_t url, unsigned long timestamp);
+void fast_export_end_commit(uint32_t revision);
+void fast_export_data(uint32_t mode, uint32_t len, struct line_buffer *input);
+
+/* If there is no such file at that rev, returns -1, errno == ENOENT. */
+int fast_export_ls_rev(uint32_t rev, uint32_t depth, const uint32_t *path,
+			uint32_t *mode_out, struct strbuf *dataref_out);
+int fast_export_ls(uint32_t depth, const uint32_t *path,
+			uint32_t *mode_out, struct strbuf *dataref_out);
+
+#endif
diff --git a/vcs-svn/repo_tree.c b/vcs-svn/repo_tree.c
dissimilarity index 95%
index 036a686..e75f580 100644
--- a/vcs-svn/repo_tree.c
+++ b/vcs-svn/repo_tree.c
@@ -1,325 +1,64 @@
-/*
- * Licensed under a two-clause BSD-style license.
- * See LICENSE for details.
- */
-
-#include "git-compat-util.h"
-
-#include "string_pool.h"
-#include "repo_tree.h"
-#include "obj_pool.h"
-#include "fast_export.h"
-
-#include "trp.h"
-
-struct repo_dirent {
-	uint32_t name_offset;
-	struct trp_node children;
-	uint32_t mode;
-	uint32_t content_offset;
-};
-
-struct repo_dir {
-	struct trp_root entries;
-};
-
-struct repo_commit {
-	uint32_t root_dir_offset;
-};
-
-/* Memory pools for commit, dir and dirent */
-obj_pool_gen(commit, struct repo_commit, 4096)
-obj_pool_gen(dir, struct repo_dir, 4096)
-obj_pool_gen(dent, struct repo_dirent, 4096)
-
-static uint32_t active_commit;
-static uint32_t mark;
-
-static int repo_dirent_name_cmp(const void *a, const void *b);
-
-/* Treap for directory entries */
-trp_gen(static, dent_, struct repo_dirent, children, dent, repo_dirent_name_cmp);
-
-uint32_t next_blob_mark(void)
-{
-	return mark++;
-}
-
-static struct repo_dir *repo_commit_root_dir(struct repo_commit *commit)
-{
-	return dir_pointer(commit->root_dir_offset);
-}
-
-static struct repo_dirent *repo_first_dirent(struct repo_dir *dir)
-{
-	return dent_first(&dir->entries);
-}
-
-static int repo_dirent_name_cmp(const void *a, const void *b)
-{
-	const struct repo_dirent *dent1 = a, *dent2 = b;
-	uint32_t a_offset = dent1->name_offset;
-	uint32_t b_offset = dent2->name_offset;
-	return (a_offset > b_offset) - (a_offset < b_offset);
-}
-
-static int repo_dirent_is_dir(struct repo_dirent *dent)
-{
-	return dent != NULL && dent->mode == REPO_MODE_DIR;
-}
-
-static struct repo_dir *repo_dir_from_dirent(struct repo_dirent *dent)
-{
-	if (!repo_dirent_is_dir(dent))
-		return NULL;
-	return dir_pointer(dent->content_offset);
-}
-
-static struct repo_dir *repo_clone_dir(struct repo_dir *orig_dir)
-{
-	uint32_t orig_o, new_o;
-	orig_o = dir_offset(orig_dir);
-	if (orig_o >= dir_pool.committed)
-		return orig_dir;
-	new_o = dir_alloc(1);
-	orig_dir = dir_pointer(orig_o);
-	*dir_pointer(new_o) = *orig_dir;
-	return dir_pointer(new_o);
-}
-
-static struct repo_dirent *repo_read_dirent(uint32_t revision,
-					    const uint32_t *path)
-{
-	uint32_t name = 0;
-	struct repo_dirent *key = dent_pointer(dent_alloc(1));
-	struct repo_dir *dir = NULL;
-	struct repo_dirent *dent = NULL;
-	dir = repo_commit_root_dir(commit_pointer(revision));
-	while (~(name = *path++)) {
-		key->name_offset = name;
-		dent = dent_search(&dir->entries, key);
-		if (dent == NULL || !repo_dirent_is_dir(dent))
-			break;
-		dir = repo_dir_from_dirent(dent);
-	}
-	dent_free(1);
-	return dent;
-}
-
-static void repo_write_dirent(const uint32_t *path, uint32_t mode,
-			      uint32_t content_offset, uint32_t del)
-{
-	uint32_t name, revision, dir_o = ~0, parent_dir_o = ~0;
-	struct repo_dir *dir;
-	struct repo_dirent *key;
-	struct repo_dirent *dent = NULL;
-	revision = active_commit;
-	dir = repo_commit_root_dir(commit_pointer(revision));
-	dir = repo_clone_dir(dir);
-	commit_pointer(revision)->root_dir_offset = dir_offset(dir);
-	while (~(name = *path++)) {
-		parent_dir_o = dir_offset(dir);
-
-		key = dent_pointer(dent_alloc(1));
-		key->name_offset = name;
-
-		dent = dent_search(&dir->entries, key);
-		if (dent == NULL)
-			dent = key;
-		else
-			dent_free(1);
-
-		if (dent == key) {
-			dent->mode = REPO_MODE_DIR;
-			dent->content_offset = 0;
-			dent = dent_insert(&dir->entries, dent);
-		}
-
-		if (dent_offset(dent) < dent_pool.committed) {
-			dir_o = repo_dirent_is_dir(dent) ?
-					dent->content_offset : ~0;
-			dent_remove(&dir->entries, dent);
-			dent = dent_pointer(dent_alloc(1));
-			dent->name_offset = name;
-			dent->mode = REPO_MODE_DIR;
-			dent->content_offset = dir_o;
-			dent = dent_insert(&dir->entries, dent);
-		}
-
-		dir = repo_dir_from_dirent(dent);
-		dir = repo_clone_dir(dir);
-		dent->content_offset = dir_offset(dir);
-	}
-	if (dent == NULL)
-		return;
-	dent->mode = mode;
-	dent->content_offset = content_offset;
-	if (del && ~parent_dir_o)
-		dent_remove(&dir_pointer(parent_dir_o)->entries, dent);
-}
-
-uint32_t repo_read_path(const uint32_t *path)
-{
-	uint32_t content_offset = 0;
-	struct repo_dirent *dent = repo_read_dirent(active_commit, path);
-	if (dent != NULL)
-		content_offset = dent->content_offset;
-	return content_offset;
-}
-
-uint32_t repo_read_mode(const uint32_t *path)
-{
-	struct repo_dirent *dent = repo_read_dirent(active_commit, path);
-	if (dent == NULL)
-		die("invalid dump: path to be modified is missing");
-	return dent->mode;
-}
-
-void repo_copy(uint32_t revision, const uint32_t *src, const uint32_t *dst)
-{
-	uint32_t mode = 0, content_offset = 0;
-	struct repo_dirent *src_dent;
-	src_dent = repo_read_dirent(revision, src);
-	if (src_dent != NULL) {
-		mode = src_dent->mode;
-		content_offset = src_dent->content_offset;
-		repo_write_dirent(dst, mode, content_offset, 0);
-	}
-}
-
-void repo_add(uint32_t *path, uint32_t mode, uint32_t blob_mark)
-{
-	repo_write_dirent(path, mode, blob_mark, 0);
-}
-
-void repo_delete(uint32_t *path)
-{
-	repo_write_dirent(path, 0, 0, 1);
-}
-
-static void repo_git_add_r(uint32_t depth, uint32_t *path, struct repo_dir *dir);
-
-static void repo_git_add(uint32_t depth, uint32_t *path, struct repo_dirent *dent)
-{
-	if (repo_dirent_is_dir(dent))
-		repo_git_add_r(depth, path, repo_dir_from_dirent(dent));
-	else
-		fast_export_modify(depth, path,
-				   dent->mode, dent->content_offset);
-}
-
-static void repo_git_add_r(uint32_t depth, uint32_t *path, struct repo_dir *dir)
-{
-	struct repo_dirent *de = repo_first_dirent(dir);
-	while (de) {
-		path[depth] = de->name_offset;
-		repo_git_add(depth + 1, path, de);
-		de = dent_next(&dir->entries, de);
-	}
-}
-
-static void repo_diff_r(uint32_t depth, uint32_t *path, struct repo_dir *dir1,
-			struct repo_dir *dir2)
-{
-	struct repo_dirent *de1, *de2;
-	de1 = repo_first_dirent(dir1);
-	de2 = repo_first_dirent(dir2);
-
-	while (de1 && de2) {
-		if (de1->name_offset < de2->name_offset) {
-			path[depth] = de1->name_offset;
-			fast_export_delete(depth + 1, path);
-			de1 = dent_next(&dir1->entries, de1);
-			continue;
-		}
-		if (de1->name_offset > de2->name_offset) {
-			path[depth] = de2->name_offset;
-			repo_git_add(depth + 1, path, de2);
-			de2 = dent_next(&dir2->entries, de2);
-			continue;
-		}
-		path[depth] = de1->name_offset;
-
-		if (de1->mode == de2->mode &&
-		    de1->content_offset == de2->content_offset) {
-			; /* No change. */
-		} else if (repo_dirent_is_dir(de1) && repo_dirent_is_dir(de2)) {
-			repo_diff_r(depth + 1, path,
-				    repo_dir_from_dirent(de1),
-				    repo_dir_from_dirent(de2));
-		} else if (!repo_dirent_is_dir(de1) && !repo_dirent_is_dir(de2)) {
-			repo_git_add(depth + 1, path, de2);
-		} else {
-			fast_export_delete(depth + 1, path);
-			repo_git_add(depth + 1, path, de2);
-		}
-		de1 = dent_next(&dir1->entries, de1);
-		de2 = dent_next(&dir2->entries, de2);
-	}
-	while (de1) {
-		path[depth] = de1->name_offset;
-		fast_export_delete(depth + 1, path);
-		de1 = dent_next(&dir1->entries, de1);
-	}
-	while (de2) {
-		path[depth] = de2->name_offset;
-		repo_git_add(depth + 1, path, de2);
-		de2 = dent_next(&dir2->entries, de2);
-	}
-}
-
-static uint32_t path_stack[REPO_MAX_PATH_DEPTH];
-
-void repo_diff(uint32_t r1, uint32_t r2)
-{
-	repo_diff_r(0,
-		    path_stack,
-		    repo_commit_root_dir(commit_pointer(r1)),
-		    repo_commit_root_dir(commit_pointer(r2)));
-}
-
-void repo_commit(uint32_t revision, uint32_t author, char *log, uint32_t uuid,
-		 uint32_t url, unsigned long timestamp)
-{
-	fast_export_commit(revision, author, log, uuid, url, timestamp);
-	dent_commit();
-	dir_commit();
-	active_commit = commit_alloc(1);
-	commit_pointer(active_commit)->root_dir_offset =
-		commit_pointer(active_commit - 1)->root_dir_offset;
-}
-
-static void mark_init(void)
-{
-	uint32_t i;
-	mark = 1024 * 1024 * 1024;
-	for (i = 0; i < dent_pool.size; i++)
-		if (!repo_dirent_is_dir(dent_pointer(i)) &&
-		    dent_pointer(i)->content_offset > mark)
-			mark = dent_pointer(i)->content_offset;
-	mark++;
-}
-
-void repo_init(void)
-{
-	mark_init();
-	if (commit_pool.size == 0) {
-		/* Create empty tree for commit 0. */
-		commit_alloc(1);
-		commit_pointer(0)->root_dir_offset = dir_alloc(1);
-		dir_pointer(0)->entries.trp_root = ~0;
-		dir_commit();
-	}
-	/* Preallocate next commit, ready for changes. */
-	active_commit = commit_alloc(1);
-	commit_pointer(active_commit)->root_dir_offset =
-		commit_pointer(active_commit - 1)->root_dir_offset;
-}
-
-void repo_reset(void)
-{
-	pool_reset();
-	commit_reset();
-	dir_reset();
-	dent_reset();
-}
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "repo_tree.h"
+#include "fast_export.h"
+
+const char *repo_read_path(const uint32_t *path)
+{
+	int err;
+	uint32_t dummy;
+	static struct strbuf buf = STRBUF_INIT;
+
+	strbuf_reset(&buf);
+	err = fast_export_ls(REPO_MAX_PATH_DEPTH, path, &dummy, &buf);
+	if (err) {
+		if (errno != ENOENT)
+			die_errno("BUG: unexpected fast_export_ls error");
+		return NULL;
+	}
+	return buf.buf;
+}
+
+uint32_t repo_read_mode(const uint32_t *path)
+{
+	int err;
+	uint32_t result;
+	static struct strbuf dummy = STRBUF_INIT;
+
+	strbuf_reset(&dummy);
+	err = fast_export_ls(REPO_MAX_PATH_DEPTH, path, &result, &dummy);
+	if (err) {
+		if (errno != ENOENT)
+			die_errno("BUG: unexpected fast_export_ls error");
+		/* Treat missing paths as directories. */
+		return REPO_MODE_DIR;
+	}
+	return result;
+}
+
+void repo_copy(uint32_t revision, const uint32_t *src, const uint32_t *dst)
+{
+	int err;
+	uint32_t mode;
+	static struct strbuf data = STRBUF_INIT;
+
+	strbuf_reset(&data);
+	err = fast_export_ls_rev(revision, REPO_MAX_PATH_DEPTH, src, &mode, &data);
+	if (err) {
+		if (errno != ENOENT)
+			die_errno("BUG: unexpected fast_export_ls_rev error");
+		fast_export_delete(REPO_MAX_PATH_DEPTH, dst);
+		return;
+	}
+	fast_export_modify(REPO_MAX_PATH_DEPTH, dst, mode, data.buf);
+}
+
+void repo_delete(uint32_t *path)
+{
+	fast_export_delete(REPO_MAX_PATH_DEPTH, path);
+}
diff --git a/vcs-svn/repo_tree.h b/vcs-svn/repo_tree.h
index 11d48c2..d690784 100644
--- a/vcs-svn/repo_tree.h
+++ b/vcs-svn/repo_tree.h
@@ -14,7 +14,7 @@
 uint32_t next_blob_mark(void);
 void repo_copy(uint32_t revision, const uint32_t *src, const uint32_t *dst);
 void repo_add(uint32_t *path, uint32_t mode, uint32_t blob_mark);
-uint32_t repo_read_path(const uint32_t *path);
+const char *repo_read_path(const uint32_t *path);
 uint32_t repo_read_mode(const uint32_t *path);
 void repo_delete(uint32_t *path);
 void repo_commit(uint32_t revision, uint32_t author, char *log, uint32_t uuid,
diff --git a/vcs-svn/string_pool.c b/vcs-svn/string_pool.c
index f5b1da8..c08abac 100644
--- a/vcs-svn/string_pool.c
+++ b/vcs-svn/string_pool.c
@@ -65,7 +65,7 @@ uint32_t pool_tok_r(char *str, const char *delim, char **saveptr)
 	return token ? pool_intern(token) : ~0;
 }
 
-void pool_print_seq(uint32_t len, uint32_t *seq, char delim, FILE *stream)
+void pool_print_seq(uint32_t len, const uint32_t *seq, char delim, FILE *stream)
 {
 	uint32_t i;
 	for (i = 0; i < len && ~seq[i]; i++) {
diff --git a/vcs-svn/string_pool.h b/vcs-svn/string_pool.h
index 222fb66..3720cf8 100644
--- a/vcs-svn/string_pool.h
+++ b/vcs-svn/string_pool.h
@@ -4,7 +4,7 @@
 uint32_t pool_intern(const char *key);
 const char *pool_fetch(uint32_t entry);
 uint32_t pool_tok_r(char *str, const char *delim, char **saveptr);
-void pool_print_seq(uint32_t len, uint32_t *seq, char delim, FILE *stream);
+void pool_print_seq(uint32_t len, const uint32_t *seq, char delim, FILE *stream);
 uint32_t pool_tok_seq(uint32_t sz, uint32_t *seq, const char *delim, char *str);
 void pool_reset(void);
 
diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
index 3cc4135..7ecb227 100644
--- a/vcs-svn/svndump.c
+++ b/vcs-svn/svndump.c
@@ -36,6 +36,8 @@ obj_pool_gen(log, char, 4096)
 
 static struct line_buffer input = LINE_BUFFER_INIT;
 
+#define REPORT_FILENO 3
+
 static char *log_copy(uint32_t length, const char *log)
 {
 	char *buffer;
@@ -202,15 +204,21 @@ static void read_props(void)
 
 static void handle_node(void)
 {
-	uint32_t mark = 0;
 	const uint32_t type = node_ctx.type;
 	const int have_props = node_ctx.propLength != LENGTH_UNKNOWN;
 	const int have_text = node_ctx.textLength != LENGTH_UNKNOWN;
+	/*
+	 * Old text for this node:
+	 *  NULL	- directory or bug
+	 *  empty_blob	- empty
+	 *  "<dataref>"	- data retrievable from fast-import
+	 */
+	static const char *const empty_blob = "::empty::";
+	const char *old_data = NULL;
 
 	if (node_ctx.text_delta)
 		die("text deltas not supported");
-	if (have_text)
-		mark = next_blob_mark();
+
 	if (node_ctx.action == NODEACT_DELETE) {
 		if (have_text || have_props || node_ctx.srcRev)
 			die("invalid dump: deletion node has "
@@ -230,15 +238,15 @@ static void handle_node(void)
 		die("invalid dump: directories cannot have text attached");
 
 	/*
-	 * Decide on the new content (mark) and mode (node_ctx.type).
+	 * Find old content (old_data) and decide on the new mode.
 	 */
 	if (node_ctx.action == NODEACT_CHANGE && !~*node_ctx.dst) {
 		if (type != REPO_MODE_DIR)
 			die("invalid dump: root of tree is not a regular file");
+		old_data = NULL;
 	} else if (node_ctx.action == NODEACT_CHANGE) {
 		uint32_t mode;
-		if (!have_text)
-			mark = repo_read_path(node_ctx.dst);
+		old_data = repo_read_path(node_ctx.dst);
 		mode = repo_read_mode(node_ctx.dst);
 		if (mode == REPO_MODE_DIR && type != REPO_MODE_DIR)
 			die("invalid dump: cannot modify a directory into a file");
@@ -246,7 +254,11 @@ static void handle_node(void)
 			die("invalid dump: cannot modify a file into a directory");
 		node_ctx.type = mode;
 	} else if (node_ctx.action == NODEACT_ADD) {
-		if (!have_text && type != REPO_MODE_DIR)
+		if (type == REPO_MODE_DIR)
+			old_data = NULL;
+		else if (have_text)
+			old_data = empty_blob;
+		else
 			die("invalid dump: adds node without text");
 	} else {
 		die("invalid dump: Node-path block lacks Node-action");
@@ -265,24 +277,34 @@ static void handle_node(void)
 	/*
 	 * Save the result.
 	 */
-	repo_add(node_ctx.dst, node_ctx.type, mark);
-	if (have_text)
-		fast_export_blob(node_ctx.type, mark,
-				 node_ctx.textLength, &input);
+	if (type == REPO_MODE_DIR)	/* directories are not tracked. */
+		return;
+	assert(old_data);
+	if (old_data == empty_blob)
+		/* For the fast_export_* functions, NULL means empty. */
+		old_data = NULL;
+	if (!have_text) {
+		fast_export_modify(REPO_MAX_PATH_DEPTH, node_ctx.dst,
+					node_ctx.type, old_data);
+		return;
+	}
+	fast_export_modify(REPO_MAX_PATH_DEPTH, node_ctx.dst,
+				node_ctx.type, "inline");
+	fast_export_data(node_ctx.type, node_ctx.textLength, &input);
 }
 
 static void begin_revision(void)
 {
 	if (!rev_ctx.revision)	/* revision 0 gets no git commit. */
 		return;
-	fast_export_begin_commit(rev_ctx.revision);
+	fast_export_begin_commit(rev_ctx.revision, rev_ctx.author, rev_ctx.log,
+		dump_ctx.uuid, dump_ctx.url, rev_ctx.timestamp);
 }
 
 static void end_revision(void)
 {
 	if (rev_ctx.revision)
-		repo_commit(rev_ctx.revision, rev_ctx.author, rev_ctx.log,
-			dump_ctx.uuid, dump_ctx.url, rev_ctx.timestamp);
+		fast_export_end_commit(rev_ctx.revision);
 }
 
 void svndump_read(const char *url)
@@ -383,7 +405,6 @@ int svndump_init(const char *filename)
 {
 	if (buffer_init(&input, filename))
 		return error("cannot open %s: %s", filename, strerror(errno));
-	repo_init();
 	fast_export_init(REPORT_FILENO);
 	reset_dump_ctx(~0);
 	reset_rev_ctx(0);
@@ -396,7 +417,6 @@ void svndump_deinit(void)
 {
 	log_reset();
 	fast_export_deinit();
-	repo_reset();
 	reset_dump_ctx(~0);
 	reset_rev_ctx(0);
 	reset_node_ctx(NULL);
@@ -411,7 +431,6 @@ void svndump_reset(void)
 	log_reset();
 	fast_export_reset();
 	buffer_reset(&input);
-	repo_reset();
 	reset_dump_ctx(~0);
 	reset_rev_ctx(0);
 	reset_node_ctx(NULL);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 10/12] vcs-svn: quote paths correctly for ls command
  2011-03-06 22:54 ` [PATCH v2 00/12] vcs-svn: incremental import Jonathan Nieder
                     ` (8 preceding siblings ...)
  2011-03-06 23:12   ` [PATCH 09/12] vcs-svn: eliminate repo_tree structure Jonathan Nieder
@ 2011-03-06 23:12   ` Jonathan Nieder
  2011-03-06 23:13   ` [PATCH 11/12] vcs-svn: handle filenames with dq correctly Jonathan Nieder
                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2011-03-06 23:12 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

From: David Barr <david.barr@cordelta.com>
Date: Sun, 12 Dec 2010 03:59:31 +1100

This bug was found while importing rev 601865 of ASF.

[jn: with test]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 t/t9010-svn-fe.sh     |   99 +++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/fast_export.c |    2 +-
 vcs-svn/string_pool.c |   11 +++++
 vcs-svn/string_pool.h |    1 +
 4 files changed, 112 insertions(+), 1 deletions(-)

diff --git a/t/t9010-svn-fe.sh b/t/t9010-svn-fe.sh
index 2ae5374..720fd6b 100755
--- a/t/t9010-svn-fe.sh
+++ b/t/t9010-svn-fe.sh
@@ -270,6 +270,105 @@ test_expect_success PIPE 'directory with files' '
 	test_cmp hi directory/file2
 '
 
+test_expect_success PIPE 'branch name with backslash' '
+	reinit_git &&
+	sort <<-\EOF >expect.branch-files &&
+	trunk/file1
+	trunk/file2
+	"branches/UpdateFOPto094\\/file1"
+	"branches/UpdateFOPto094\\/file2"
+	EOF
+
+	echo hi >hi &&
+	echo hello >hello &&
+	{
+		properties \
+			svn:author author@example.com \
+			svn:date "1999-02-02T00:01:02.000000Z" \
+			svn:log "add directory with some files in it" &&
+		echo PROPS-END
+	} >props.setup &&
+	{
+		properties \
+			svn:author brancher@example.com \
+			svn:date "2007-12-06T21:38:34.000000Z" \
+			svn:log "Updating fop to .94 and adjust fo-stylesheets" &&
+		echo PROPS-END
+	} >props.branch &&
+	{
+		cat <<-EOF &&
+		SVN-fs-dump-format-version: 3
+
+		Revision-number: 1
+		EOF
+		echo Prop-content-length: $(wc -c <props.setup) &&
+		echo Content-length: $(wc -c <props.setup) &&
+		echo &&
+		cat props.setup &&
+		cat <<-\EOF &&
+
+		Node-path: trunk
+		Node-kind: dir
+		Node-action: add
+		Prop-content-length: 10
+		Content-length: 10
+
+		PROPS-END
+
+		Node-path: branches
+		Node-kind: dir
+		Node-action: add
+		Prop-content-length: 10
+		Content-length: 10
+
+		PROPS-END
+
+		Node-path: trunk/file1
+		Node-kind: file
+		Node-action: add
+		EOF
+		text_no_props hello &&
+		cat <<-\EOF &&
+		Node-path: trunk/file2
+		Node-kind: file
+		Node-action: add
+		EOF
+		text_no_props hi &&
+		cat <<-\EOF &&
+
+		Revision-number: 2
+		EOF
+		echo Prop-content-length: $(wc -c <props.branch) &&
+		echo Content-length: $(wc -c <props.branch) &&
+		echo &&
+		cat props.branch &&
+		cat <<-\EOF
+
+		Node-path: branches/UpdateFOPto094\
+		Node-kind: dir
+		Node-action: add
+		Node-copyfrom-rev: 1
+		Node-copyfrom-path: trunk
+
+		Node-kind: dir
+		Node-action: add
+		Prop-content-length: 34
+		Content-length: 34
+
+		K 13
+		svn:mergeinfo
+		V 0
+
+		PROPS-END
+		EOF
+	} >branch.dump &&
+	try_dump branch.dump &&
+
+	git ls-tree -r --name-only HEAD |
+	sort >actual.branch-files &&
+	test_cmp expect.branch-files actual.branch-files
+'
+
 test_expect_success PIPE 'node without action' '
 	reinit_git &&
 	cat >inaction.dump <<-\EOF &&
diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index a8ce5c6..4d57efa 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -107,7 +107,7 @@ static void ls_from_active_commit(uint32_t depth, const uint32_t *path)
 {
 	/* ls "path/to/file" */
 	printf("ls \"");
-	pool_print_seq(depth, path, '/', stdout);
+	pool_print_seq_q(depth, path, '/', stdout);
 	printf("\"\n");
 	fflush(stdout);
 }
diff --git a/vcs-svn/string_pool.c b/vcs-svn/string_pool.c
index c08abac..be43598 100644
--- a/vcs-svn/string_pool.c
+++ b/vcs-svn/string_pool.c
@@ -4,6 +4,7 @@
  */
 
 #include "git-compat-util.h"
+#include "quote.h"
 #include "trp.h"
 #include "obj_pool.h"
 #include "string_pool.h"
@@ -75,6 +76,16 @@ void pool_print_seq(uint32_t len, const uint32_t *seq, char delim, FILE *stream)
 	}
 }
 
+void pool_print_seq_q(uint32_t len, const uint32_t *seq, char delim, FILE *stream)
+{
+	uint32_t i;
+	for (i = 0; i < len && ~seq[i]; i++) {
+		quote_c_style(pool_fetch(seq[i]), NULL, stream, 1);
+		if (i < len - 1 && ~seq[i + 1])
+			fputc(delim, stream);
+	}
+}
+
 uint32_t pool_tok_seq(uint32_t sz, uint32_t *seq, const char *delim, char *str)
 {
 	char *context = NULL;
diff --git a/vcs-svn/string_pool.h b/vcs-svn/string_pool.h
index 3720cf8..96e501d 100644
--- a/vcs-svn/string_pool.h
+++ b/vcs-svn/string_pool.h
@@ -5,6 +5,7 @@ uint32_t pool_intern(const char *key);
 const char *pool_fetch(uint32_t entry);
 uint32_t pool_tok_r(char *str, const char *delim, char **saveptr);
 void pool_print_seq(uint32_t len, const uint32_t *seq, char delim, FILE *stream);
+void pool_print_seq_q(uint32_t len, const uint32_t *seq, char delim, FILE *stream);
 uint32_t pool_tok_seq(uint32_t sz, uint32_t *seq, const char *delim, char *str);
 void pool_reset(void);
 
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 11/12] vcs-svn: handle filenames with dq correctly
  2011-03-06 22:54 ` [PATCH v2 00/12] vcs-svn: incremental import Jonathan Nieder
                     ` (9 preceding siblings ...)
  2011-03-06 23:12   ` [PATCH 10/12] vcs-svn: quote paths correctly for ls command Jonathan Nieder
@ 2011-03-06 23:13   ` Jonathan Nieder
  2011-03-06 23:16   ` [PATCH 12/12] vcs-svn: use mark from previous import for parent commit Jonathan Nieder
  2011-03-07 12:24   ` [PATCH v2 00/12] vcs-svn: incremental import Sverre Rabbelier
  12 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2011-03-06 23:13 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

Date: Sat, 11 Dec 2010 17:08:51 -0600

Quote paths passed to fast-import so filenames with double quotes are
not misinterpreted.

One might imagine this could help with filenames with newlines, too,
but svn does not allow those.

Helped-by: David Barr <daivd.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 vcs-svn/fast_export.c |   19 +++++++++----------
 1 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index 4d57efa..9c03f3e 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -34,10 +34,9 @@ void fast_export_reset(void)
 
 void fast_export_delete(uint32_t depth, const uint32_t *path)
 {
-	putchar('D');
-	putchar(' ');
-	pool_print_seq(depth, path, '/', stdout);
-	putchar('\n');
+	printf("D \"");
+	pool_print_seq_q(depth, path, '/', stdout);
+	printf("\"\n");
 }
 
 static void fast_export_truncate(uint32_t depth, const uint32_t *path, uint32_t mode)
@@ -54,9 +53,9 @@ void fast_export_modify(uint32_t depth, const uint32_t *path, uint32_t mode,
 		fast_export_truncate(depth, path, mode);
 		return;
 	}
-	printf("M %06"PRIo32" %s ", mode, dataref);
-	pool_print_seq(depth, path, '/', stdout);
-	putchar('\n');
+	printf("M %06"PRIo32" %s \"", mode, dataref);
+	pool_print_seq_q(depth, path, '/', stdout);
+	printf("\"\n");
 }
 
 static char gitsvnline[MAX_GITSVN_LINE_LEN];
@@ -97,9 +96,9 @@ void fast_export_end_commit(uint32_t revision)
 static void ls_from_rev(uint32_t rev, uint32_t depth, const uint32_t *path)
 {
 	/* ls :5 path/to/old/file */
-	printf("ls :%"PRIu32" ", rev);
-	pool_print_seq(depth, path, '/', stdout);
-	putchar('\n');
+	printf("ls :%"PRIu32" \"", rev);
+	pool_print_seq_q(depth, path, '/', stdout);
+	printf("\"\n");
 	fflush(stdout);
 }
 
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 12/12] vcs-svn: use mark from previous import for parent commit
  2011-03-06 22:54 ` [PATCH v2 00/12] vcs-svn: incremental import Jonathan Nieder
                     ` (10 preceding siblings ...)
  2011-03-06 23:13   ` [PATCH 11/12] vcs-svn: handle filenames with dq correctly Jonathan Nieder
@ 2011-03-06 23:16   ` Jonathan Nieder
  2011-03-07 12:24   ` [PATCH v2 00/12] vcs-svn: incremental import Sverre Rabbelier
  12 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2011-03-06 23:16 UTC (permalink / raw)
  To: git
  Cc: David Barr, Ramkumar Ramachandra, Sverre Rabbelier, Sam Vilain,
	Stephen Bash, Tomas Carnecky

From: David Barr <david.barr@cordelta.com>
Date: Sun, 12 Dec 2010 13:41:38 +1100

With this patch, overlapping incremental imports work.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
That's the end of the series.

Bug reports, suggestions, improvements, welcome, of course.  The
documentation in contrib/svn-fe/svn-fe.txt is probably out of date
now.

Have fun :)
Jonathan

 vcs-svn/fast_export.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
index 9c03f3e..f19db9a 100644
--- a/vcs-svn/fast_export.c
+++ b/vcs-svn/fast_export.c
@@ -83,7 +83,7 @@ void fast_export_begin_commit(uint32_t revision, uint32_t author, char *log,
 		   log, gitsvnline);
 	if (!first_commit_done) {
 		if (revision > 1)
-			printf("from refs/heads/master^0\n");
+			printf("from :%"PRIu32"\n", revision - 1);
 		first_commit_done = 1;
 	}
 }
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 00/12] vcs-svn: incremental import
  2011-03-06 22:54 ` [PATCH v2 00/12] vcs-svn: incremental import Jonathan Nieder
                     ` (11 preceding siblings ...)
  2011-03-06 23:16   ` [PATCH 12/12] vcs-svn: use mark from previous import for parent commit Jonathan Nieder
@ 2011-03-07 12:24   ` Sverre Rabbelier
  2011-03-07 21:23     ` Jonathan Nieder
  12 siblings, 1 reply; 37+ messages in thread
From: Sverre Rabbelier @ 2011-03-07 12:24 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, David Barr, Ramkumar Ramachandra, Sam Vilain, Stephen Bash,
	Tomas Carnecky

Heya,

On Sun, Mar 6, 2011 at 23:54, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Patch 1 changes the mark numbers for blobs to be ridiculously high,
> to make room for memorable commit marks (:1 for r1, :2 for r2, etc).

How high is rediculously high? There are repositories with a lot of
commits, will they have enough room? Ditto for repositories with a lot
of blobs, will _those_ have enough room? What about the git
fast-import side, what about this statement:

"Marks are stored in a sparse array, using 1 pointer (4 bytes or 8
bytes, depending on pointer size) per mark. Although the array is
sparse, frontends are still strongly encouraged to use marks between 1
and n, where n is the total number of marks required for this import."

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 00/12] vcs-svn: incremental import
  2011-03-07 12:24   ` [PATCH v2 00/12] vcs-svn: incremental import Sverre Rabbelier
@ 2011-03-07 21:23     ` Jonathan Nieder
  0 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2011-03-07 21:23 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: git, David Barr, Ramkumar Ramachandra, Sam Vilain, Stephen Bash,
	Tomas Carnecky

Sverre Rabbelier wrote:
> On Sun, Mar 6, 2011 at 23:54, Jonathan Nieder <jrnieder@gmail.com> wrote:

>> Patch 1 changes the mark numbers for blobs to be ridiculously high,
>> to make room for memorable commit marks (:1 for r1, :2 for r2, etc).
>
> How high is rediculously high?

2^30.  Later in the series (vcs-svn: eliminate repo_tree structure)
the blob marks are eliminated altogether.

> "Marks are stored in a sparse array, using 1 pointer (4 bytes or 8
> bytes, depending on pointer size) per mark. Although the array is
> sparse, frontends are still strongly encouraged to use marks between 1
> and n, where n is the total number of marks required for this import."

I assume that is mostly meant to make it easier to write alternate
backends.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 01/12] vcs-svn: use higher mark numbers for blobs
  2011-03-06 23:03   ` [PATCH 01/12] vcs-svn: use higher mark numbers for blobs Jonathan Nieder
@ 2011-03-08 19:08     ` Junio C Hamano
  2011-03-09  6:55       ` Jonathan Nieder
  0 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2011-03-08 19:08 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, David Barr, Ramkumar Ramachandra, Sverre Rabbelier,
	Sam Vilain, Stephen Bash, Tomas Carnecky

Jonathan Nieder <jrnieder@gmail.com> writes:

> Date: Fri, 10 Dec 2010 04:21:35 -0600
>
> Prepare to use mark :5 for the commit corresponding to r5 (and so on).
>
> 1 billion seems sufficiently high for blob marks to avoid conflicting
> with rev marks, while still leaving room for 3 billion blobs.  Such
> high mark numbers cause trouble with ancient fast-import versions, but
> this topic cannot support git fast-import versions before 1.7.4 (which
> introduces the cat-blob command) anyway.

Hmm, 1G+3G split?  Will we have HIGHMEM option someday? ;-)

How confident are you that you will never need more than two classes later
and you will never need to split the larger space again?

If you are not, and if the topic is to introduce incompatible output,
would it be wiser to be even more forward looking and introduce different
classes of marks with a backward incompatible syntax, perhaps like using
":\d+" for anything, and using ":[a-zA-Z0-9]+:\d+" for some application
specific "class" of objects that is specifed by the [a-zA-Z0-9]+ part?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 01/12] vcs-svn: use higher mark numbers for blobs
  2011-03-08 19:08     ` Junio C Hamano
@ 2011-03-09  6:55       ` Jonathan Nieder
  0 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2011-03-09  6:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, David Barr, Ramkumar Ramachandra, Sverre Rabbelier,
	Sam Vilain, Stephen Bash, Tomas Carnecky

Hi,

Junio C Hamano wrote:

> Hmm, 1G+3G split?  Will we have HIGHMEM option someday? ;-)
>
> How confident are you that you will never need more than two classes later
> and you will never need to split the larger space again?
>
> If you are not, and if the topic is to introduce incompatible output,
> would it be wiser to be even more forward looking and introduce different
> classes of marks with a backward incompatible syntax, perhaps like using
> ":\d+" for anything, and using ":[a-zA-Z0-9]+:\d+" for some application
> specific "class" of objects that is specifed by the [a-zA-Z0-9]+ part?

That sounds very sensible (and I'd be happy to see something like
that).

In this particular case a later patch ("vcs-svn: eliminate repo_tree
structure") gets rid of the blob marks so the split is temporary.
Perhaps a paragraph added to the change description would clear it up.

	A later patch will eliminate the blob marks altogether.

For the "vcs-svn: eliminate repo_tree" patch:

	Rely on fast-import for information about previous revs.

	This requires always setting up backward flow of information,
	even for v2 dumps.  On the plus side:

	 - No more need to include blobs in the marks table.
	 - Given one dump that picks up where another left off, svn-fe
	   can continue the import.  Use

		git fast-import --relative-marks \
			--export-marks=svn-revs \
			--cat-blob-fd=3 3>backchannel

	   for the first import and

		git fast-import --relative-marks \
			--import-marks=svn-revs \
			--export-marks=svn-revs \
			--cat-blob-fd=3 3>backchannel

	   for later ones.
	 - It simplifies the code by quite a bit and opens the door
	   to further simplifications.

Thanks for some clarity.
Jonathan

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2011-03-09  6:55 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-10 10:20 [RFC/PATCH 00/10] vcs-svn: prepare for (implement?) incremental import Jonathan Nieder
2010-12-10 10:21 ` [PATCH 01/10] vcs-svn: use higher mark numbers for blobs Jonathan Nieder
2010-12-10 10:22 ` [PATCH 02/10] vcs-svn: save marks for imported commits Jonathan Nieder
2011-03-06 11:15   ` Jonathan Nieder
2010-12-10 10:23 ` [PATCH 03/10] vcs-svn: introduce cat_mark function to retrieve a marked blob Jonathan Nieder
2010-12-10 10:23 ` [PATCH 04/10] vcs-svn: make apply_delta caller retrieve preimage Jonathan Nieder
2010-12-10 10:25 ` [PATCH 05/10] vcs-svn: split off function to export result from delta application Jonathan Nieder
2010-12-10 10:26 ` [PATCH 06/10] vcs-svn: do not rely on marks for old blobs Jonathan Nieder
2010-12-10 10:27 ` [PATCH 07/10] vcs-svn: split off function to make 'ls' requests Jonathan Nieder
2010-12-10 10:28 ` [PATCH 08/10] vcs-svn: prepare to eliminate repo_tree structure Jonathan Nieder
2011-03-06 12:52   ` [PATCH v2] " Jonathan Nieder
2011-03-06 20:41     ` David Barr
2010-12-10 10:30 ` [PATCH 09/10] vcs-svn: simplifications for repo_modify_path et al Jonathan Nieder
2010-12-10 10:33 ` [PATCH 10/10] vcs-svn: eliminate repo_tree structure Jonathan Nieder
     [not found] ` <C59168D0-B409-4A83-B96C-8CCD42D0B62F@cordelta.com>
     [not found]   ` <20101211184654.GA17464@burratino>
2010-12-11 22:47     ` [RFC/PATCH] fast-import: treat filemodify with empty tree as delete Jonathan Nieder
2010-12-11 23:00     ` [PATCH db/vcs-svn-incremental] vcs-svn: avoid git-isms in fast-import stream Jonathan Nieder
2010-12-11 23:04 ` [PATCH 12/10] vcs-svn: quote paths correctly for ls command David Michael Barr
2010-12-11 23:11   ` [PATCH db/vcs-svn-incremental] vcs-svn: quote all paths passed to fast-import Jonathan Nieder
2010-12-12  9:32 ` [PATCH 13/10] vcs-svn: use mark from previous import for parent commit David Michael Barr
2010-12-12 17:06   ` Jonathan Nieder
2011-03-06 22:54 ` [PATCH v2 00/12] vcs-svn: incremental import Jonathan Nieder
2011-03-06 23:03   ` [PATCH 01/12] vcs-svn: use higher mark numbers for blobs Jonathan Nieder
2011-03-08 19:08     ` Junio C Hamano
2011-03-09  6:55       ` Jonathan Nieder
2011-03-06 23:04   ` [PATCH 02/12] vcs-svn: save marks for imported commits Jonathan Nieder
2011-03-06 23:07   ` [PATCH 03/12] vcs-svn: introduce repo_read_path to check the content at a path Jonathan Nieder
2011-03-06 23:08   ` [PATCH 04/12] vcs-svn: handle_node: use repo_read_path Jonathan Nieder
2011-03-06 23:09   ` [PATCH 05/12] vcs-svn: simplify repo_modify_path and repo_copy Jonathan Nieder
2011-03-06 23:09   ` [PATCH 06/12] vcs-svn: add a comment before each commit Jonathan Nieder
2011-03-06 23:10   ` [PATCH 07/12] vcs-svn: allow input errors to be detected promptly Jonathan Nieder
2011-03-06 23:11   ` [PATCH 08/12] vcs-svn: set up channel to read fast-import cat-blob response Jonathan Nieder
2011-03-06 23:12   ` [PATCH 09/12] vcs-svn: eliminate repo_tree structure Jonathan Nieder
2011-03-06 23:12   ` [PATCH 10/12] vcs-svn: quote paths correctly for ls command Jonathan Nieder
2011-03-06 23:13   ` [PATCH 11/12] vcs-svn: handle filenames with dq correctly Jonathan Nieder
2011-03-06 23:16   ` [PATCH 12/12] vcs-svn: use mark from previous import for parent commit Jonathan Nieder
2011-03-07 12:24   ` [PATCH v2 00/12] vcs-svn: incremental import Sverre Rabbelier
2011-03-07 21:23     ` Jonathan Nieder

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).