git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH v2 0/5] Towards a Git-to-SVN bridge
@ 2011-01-19  5:44 Ramkumar Ramachandra
  2011-01-19  5:44 ` [PATCH 1/5] date: Expose the time_to_tm function Ramkumar Ramachandra
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Ramkumar Ramachandra @ 2011-01-19  5:44 UTC (permalink / raw)
  To: Git List; +Cc: Jonathan Nieder, David Barr, Sverre Rabbelier

Hi,

There are two major changes this time around:
1. I've managed to inline the blobs in the fast-export stream. This
completely eliminates the need for persisting blobs.
2. I've also managed to produce relevant directory nodes, keeping
directory state in a new dir_cache interface. I followed Jonathan's
suggestion to keep it as a string_list for the moment. This is highly
inefficient and should improve soon- perhaps ressurect the treap we
had? Or perhaps find a simple trie implementation somewhere?
Suggestions are welcome.

Unfortunately, I've still not had the time to clean up- there's a
probably a lot of cruft, and everything "just works" for now. Next
time, I'll also try to write some tests, so we don't have to test it
by hand.

Thanks for reading.

Ramkumar Ramachandra (5):
  date: Expose the time_to_tm function
  vcs-svn: Start working on the dumpfile producer
  Build an svn-fi target in contrib/svn-fe
  fast-export: Introduce --inline-blobs
  vcs-svn: Add dir_cache for svnload

 Documentation/git-fast-export.txt |    5 +
 Makefile                          |    4 +-
 builtin/fast-export.c             |   23 +++-
 cache.h                           |    1 +
 contrib/svn-fe/Makefile           |   23 +++-
 contrib/svn-fe/svn-fi.c           |   16 ++
 contrib/svn-fe/svn-fi.txt         |   28 ++++
 date.c                            |    2 +-
 vcs-svn/dir_cache.c               |   40 +++++
 vcs-svn/dir_cache.h               |   12 ++
 vcs-svn/dump_export.c             |  141 ++++++++++++++++++
 vcs-svn/dump_export.h             |   33 +++++
 vcs-svn/svnload.c                 |  286 +++++++++++++++++++++++++++++++++++++
 vcs-svn/svnload.h                 |   10 ++
 14 files changed, 617 insertions(+), 7 deletions(-)
 create mode 100644 contrib/svn-fe/svn-fi.c
 create mode 100644 contrib/svn-fe/svn-fi.txt
 create mode 100644 vcs-svn/dir_cache.c
 create mode 100644 vcs-svn/dir_cache.h
 create mode 100644 vcs-svn/dump_export.c
 create mode 100644 vcs-svn/dump_export.h
 create mode 100644 vcs-svn/svnload.c
 create mode 100644 vcs-svn/svnload.h

-- 
1.7.4.rc1.7.g2cf08.dirty

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/5] date: Expose the time_to_tm function
  2011-01-19  5:44 [RFC PATCH v2 0/5] Towards a Git-to-SVN bridge Ramkumar Ramachandra
@ 2011-01-19  5:44 ` Ramkumar Ramachandra
  2011-01-19  5:44 ` [PATCH 2/5] vcs-svn: Start working on the dumpfile producer Ramkumar Ramachandra
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Ramkumar Ramachandra @ 2011-01-19  5:44 UTC (permalink / raw)
  To: Git List; +Cc: Jonathan Nieder, David Barr, Sverre Rabbelier

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 cache.h |    1 +
 date.c  |    2 +-
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/cache.h b/cache.h
index d83d68c..95fea31 100644
--- a/cache.h
+++ b/cache.h
@@ -816,6 +816,7 @@ enum date_mode {
 	DATE_RAW
 };
 
+struct tm *time_to_tm(unsigned long time, int tz);
 const char *show_date(unsigned long time, int timezone, enum date_mode mode);
 const char *show_date_relative(unsigned long time, int tz,
 			       const struct timeval *now,
diff --git a/date.c b/date.c
index 00f9eb5..e601a50 100644
--- a/date.c
+++ b/date.c
@@ -54,7 +54,7 @@ static time_t gm_time_t(unsigned long time, int tz)
  * thing, which means that tz -0100 is passed in as the integer -100,
  * even though it means "sixty minutes off"
  */
-static struct tm *time_to_tm(unsigned long time, int tz)
+struct tm *time_to_tm(unsigned long time, int tz)
 {
 	time_t t = gm_time_t(time, tz);
 	return gmtime(&t);
-- 
1.7.4.rc1.7.g2cf08.dirty

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/5] vcs-svn: Start working on the dumpfile producer
  2011-01-19  5:44 [RFC PATCH v2 0/5] Towards a Git-to-SVN bridge Ramkumar Ramachandra
  2011-01-19  5:44 ` [PATCH 1/5] date: Expose the time_to_tm function Ramkumar Ramachandra
@ 2011-01-19  5:44 ` Ramkumar Ramachandra
  2011-01-22  0:30   ` Junio C Hamano
  2011-01-19  5:44 ` [PATCH 3/5] Build an svn-fi target in contrib/svn-fe Ramkumar Ramachandra
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Ramkumar Ramachandra @ 2011-01-19  5:44 UTC (permalink / raw)
  To: Git List; +Cc: Jonathan Nieder, David Barr, Sverre Rabbelier

Start off with some broad design sketches. Compile succeeds, but
parser is incorrect. Include a Makefile rule to build it into
vcs-svn/lib.a.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 Makefile              |    2 +-
 vcs-svn/dump_export.c |   73 ++++++++++++
 vcs-svn/svnload.c     |  294 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 368 insertions(+), 1 deletions(-)
 create mode 100644 vcs-svn/dump_export.c
 create mode 100644 vcs-svn/svnload.c

diff --git a/Makefile b/Makefile
index 1345c38..40f6691 100644
--- a/Makefile
+++ b/Makefile
@@ -1834,7 +1834,7 @@ ifndef NO_CURL
 endif
 XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
-VCSSVN_OBJS = vcs-svn/line_buffer.o \
+VCSSVN_OBJS = vcs-svn/line_buffer.o vcs-svn/svnload.o vcs-svn/dump_export.o \
 	vcs-svn/repo_tree.o vcs-svn/fast_export.o vcs-svn/sliding_window.o \
 	vcs-svn/svndiff.o vcs-svn/svndump.o
 VCSSVN_TEST_OBJS = test-obj-pool.o \
diff --git a/vcs-svn/dump_export.c b/vcs-svn/dump_export.c
new file mode 100644
index 0000000..04ede06
--- /dev/null
+++ b/vcs-svn/dump_export.c
@@ -0,0 +1,73 @@
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "line_buffer.h"
+#include "dump_export.h"
+
+void dump_export_begin_rev(int revision, const char *revprops,
+			int prop_len) {
+	printf("Revision-number: %d\n", revision);
+	printf("Prop-content-length: %d\n", prop_len);
+	printf("Content-length: %d\n\n", prop_len);
+	printf("%s\n", revprops);
+}
+
+void dump_export_node(const char *path, enum node_kind kind,
+		enum node_action action, unsigned long text_len,
+		unsigned long copyfrom_rev, const char *copyfrom_path) {
+	printf("Node-path: %s\n", path);
+	printf("Node-kind: ");
+	switch (action) {
+	case NODE_KIND_NORMAL:
+		printf("file\n");
+		break;
+	case NODE_KIND_EXECUTABLE:
+		printf("file\n");
+		break;
+	case NODE_KIND_SYMLINK:
+		printf("file\n");
+		break;
+	case NODE_KIND_GITLINK:
+		printf("file\n");
+		break;
+	case NODE_KIND_SUBDIR:
+		die("Unsupported: subdirectory");
+	default:
+		break;
+	}
+	printf("Node-action: ");
+	switch (action) {
+	case NODE_ACTION_CHANGE:
+		printf("change\n");
+		break;
+	case NODE_ACTION_ADD:
+		printf("add\n");
+		break;
+	case NODE_ACTION_REPLACE:
+		printf("replace\n");
+		break;
+	case NODE_ACTION_DELETE:
+		printf("delete\n");
+		break;
+	default:
+		break;
+	}
+	if (copyfrom_rev != SVN_INVALID_REV) {
+		printf("Node-copyfrom-rev: %lu\n", copyfrom_rev);
+		printf("Node-copyfrom-path: %s\n", copyfrom_path);
+	}
+	printf("Prop-delta: false\n");
+	printf("Prop-content-length: 10\n"); /* Constant 10 for "PROPS-END" */
+	printf("Text-delta: false\n");
+	printf("Text-content-length: %lu\n", text_len);
+	printf("Content-length: %lu\n\n", text_len + 10);
+	printf("PROPS-END\n\n");
+}
+
+void dump_export_text(struct line_buffer *data, off_t len) {
+	buffer_copy_bytes(data, len);
+}
diff --git a/vcs-svn/svnload.c b/vcs-svn/svnload.c
new file mode 100644
index 0000000..7043ae7
--- /dev/null
+++ b/vcs-svn/svnload.c
@@ -0,0 +1,294 @@
+/*
+ * Produce a dumpfile v3 from a fast-import stream.
+ * Load the dump into the SVN repository with:
+ * svnrdump load <URL> <dumpfile
+ *
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "cache.h"
+#include "git-compat-util.h"
+#include "line_buffer.h"
+#include "dump_export.h"
+#include "strbuf.h"
+
+#define SVN_DATE_FORMAT "%Y-%m-%dT%H:%M:%S.000000Z"
+#define SVN_DATE_LEN 28
+#define LENGTH_UNKNOWN (~0)
+
+static struct line_buffer input = LINE_BUFFER_INIT;
+static struct strbuf blobs[100];
+	
+static struct {
+	unsigned long prop_len, text_len, copyfrom_rev, mark;
+	int text_delta, prop_delta; /* Boolean */
+	enum node_action action;
+	enum node_kind kind;
+	struct strbuf copyfrom_path, path;
+} node_ctx;
+
+static struct {
+	int rev, text_len;
+	struct strbuf props, log;
+	struct strbuf svn_author, author, committer;
+	struct strbuf author_date, committer_date;
+	struct strbuf author_email, committer_email;
+} rev_ctx;
+
+static enum {
+	UNKNOWN_CTX,
+	COMMIT_CTX,
+	BLOB_CTX
+} active_ctx;
+
+static void reset_rev_ctx(int revision)
+{
+	rev_ctx.rev = revision;
+	strbuf_reset(&rev_ctx.props);
+	strbuf_reset(&rev_ctx.log);
+	strbuf_reset(&rev_ctx.svn_author);
+	strbuf_reset(&rev_ctx.author);
+	strbuf_reset(&rev_ctx.committer);
+	strbuf_reset(&rev_ctx.author_date);
+	strbuf_reset(&rev_ctx.committer_date);
+	strbuf_reset(&rev_ctx.author_email);
+	strbuf_reset(&rev_ctx.committer_email);
+}
+
+static void reset_node_ctx(void)
+{
+	node_ctx.prop_len = LENGTH_UNKNOWN;
+	node_ctx.text_len = LENGTH_UNKNOWN;
+	node_ctx.mark = 0;
+	node_ctx.copyfrom_rev = 0;
+	node_ctx.text_delta = -1;
+	node_ctx.prop_delta = -1;
+	strbuf_reset(&node_ctx.copyfrom_path);
+	strbuf_reset(&node_ctx.path);
+}
+
+static void populate_props(struct strbuf *props, const char *author,
+			const char *log, const char *date) {
+	strbuf_reset(props);	
+	strbuf_addf(props, "K\nsvn:author\nV\n%s\n", author);
+	strbuf_addf(props, "K\nsvn:log\nV\n%s", log);
+	strbuf_addf(props, "K\nsvn:date\nV\n%s\n", date);
+	strbuf_add(props, "PROPS-END\n", 10);
+}
+
+static void parse_author_line(char *val, struct strbuf *name,
+			struct strbuf *email, struct strbuf *date) {
+	char *t, *tz_off;
+	char time_buf[SVN_DATE_LEN];
+	const struct tm *tm_time;
+
+	/* Simon Hausmann <shausman@trolltech.com> 1170199019 +0100 */
+	strbuf_reset(name);
+	strbuf_reset(email);
+	strbuf_reset(date);
+	tz_off = strrchr(val, ' ');
+	*tz_off++ = '\0';
+	t = strrchr(val, ' ');
+	*(t - 1) = '\0'; /* Ignore '>' from email */
+	t ++;
+	tm_time = time_to_tm(strtoul(t, NULL, 10), atoi(tz_off));
+	strftime(time_buf, SVN_DATE_LEN, SVN_DATE_FORMAT, tm_time);
+	strbuf_add(date, time_buf, SVN_DATE_LEN);
+	t = strchr(val, '<');
+	*(t - 1) = '\0'; /* Ignore ' <' from email */
+	t ++;
+	strbuf_add(email, t, strlen(t));
+	strbuf_add(name, val, strlen(val));
+}
+
+void svnload_read(void) {
+	char *t, *val;
+	int mode_incr;
+	struct strbuf *to_dump;
+
+	while ((t = buffer_read_line(&input))) {
+		val = strchr(t, ' ');
+		if (!val) {
+			if (!memcmp(t, "blob", 4))
+				active_ctx = BLOB_CTX;
+			else if (!memcmp(t, "deleteall", 9))
+				;
+			continue;
+		}
+		*val++ = '\0';
+
+		/* strlen(key) */
+		switch (val - t - 1) {
+		case 1:
+			if (!memcmp(t, "D", 1)) {
+				node_ctx.action = NODE_ACTION_DELETE;
+			}
+			else if (!memcmp(t, "C", 1)) {
+				node_ctx.action = NODE_ACTION_ADD;
+			}
+			else if (!memcmp(t, "R", 1)) {
+				node_ctx.action = NODE_ACTION_REPLACE;
+			}
+			else if (!memcmp(t, "M", 1)) {
+				node_ctx.action = NODE_ACTION_CHANGE;
+				mode_incr = 7;
+				if (!memcmp(val, "100644", 6))
+					node_ctx.kind = NODE_KIND_NORMAL;
+				else if (!memcmp(val, "100755", 6))
+					node_ctx.kind = NODE_KIND_EXECUTABLE;
+				else if (!memcmp(val, "120000", 6))
+					node_ctx.kind = NODE_KIND_SYMLINK;
+				else if (!memcmp(val, "160000", 6))
+					node_ctx.kind = NODE_KIND_GITLINK;
+				else if (!memcmp(val, "040000", 6))
+					node_ctx.kind = NODE_KIND_SUBDIR;
+				else {
+					if (!memcmp(val, "755", 3))
+						node_ctx.kind = NODE_KIND_EXECUTABLE;
+					else if(!memcmp(val, "644", 3))
+						node_ctx.kind = NODE_KIND_NORMAL;
+					else
+						die("Unrecognized mode: %s", val);
+					mode_incr = 4;
+				}
+				val += mode_incr;
+				t = strchr(val, ' ');
+				*t++ = '\0';
+				strbuf_reset(&node_ctx.path);
+				strbuf_add(&node_ctx.path, t, strlen(t));
+				if (!memcmp(val + 1, "inline", 6))
+					die("Unsupported dataref: inline");
+				else if (*val == ':')
+					to_dump = &blobs[strtoul(val + 1, NULL, 10)];
+				else
+					die("Unsupported dataref: sha1");
+				dump_export_node(node_ctx.path.buf, node_ctx.kind,
+						node_ctx.action, to_dump->len,
+						0, NULL);
+				printf("%s", to_dump->buf);
+			}
+			break;
+		case 3:
+			if (!memcmp(t, "tag", 3))
+				continue;
+			break;
+		case 4:
+			if (!memcmp(t, "mark", 4))
+				switch(active_ctx) {
+				case COMMIT_CTX:
+					/* What do we do with commit marks? */
+					continue;
+				case BLOB_CTX:
+					node_ctx.mark = strtoul(val + 1, NULL, 10);
+					break;
+				default:
+					break;
+				}
+			else if (!memcmp(t, "from", 4))
+				continue;
+			else if (!memcmp(t, "data", 4)) {
+				switch (active_ctx) {
+				case COMMIT_CTX:
+					strbuf_reset(&rev_ctx.log);
+					buffer_read_binary(&input,
+							&rev_ctx.log,
+							strtoul(val, NULL, 10));
+					populate_props(&rev_ctx.props,
+						rev_ctx.svn_author.buf,
+						rev_ctx.log.buf,
+						rev_ctx.author_date.buf);
+					dump_export_begin_rev(rev_ctx.rev,
+							rev_ctx.props.buf,
+							rev_ctx.props.len);
+					break;
+				case BLOB_CTX:
+					node_ctx.text_len = strtoul(val, NULL, 10);
+					buffer_read_binary(&input,
+							&blobs[node_ctx.mark],
+							node_ctx.text_len);
+					break;
+				default:
+					break;
+				}
+			}
+			break;
+		case 5:
+			if (!memcmp(t, "reset", 5))
+				continue;
+			if (!memcmp(t, "merge", 5))
+				continue;
+			break;
+		case 6:
+			if (!memcmp(t, "author", 6)) {
+				parse_author_line(val, &rev_ctx.author,
+						&rev_ctx.author_email,
+						&rev_ctx.author_date);
+				/* Build svn_author */
+				t = strchr(rev_ctx.author_email.buf, '@');
+				strbuf_reset(&rev_ctx.svn_author);
+				strbuf_add(&rev_ctx.svn_author,
+					rev_ctx.author_email.buf,
+					t - rev_ctx.author_email.buf);
+
+			}
+			else if (!memcmp(t, "commit", 6)) {
+				rev_ctx.rev ++;
+				active_ctx = COMMIT_CTX;
+			}
+			break;
+		case 9:
+			if (!memcmp(t, "committer", 9))
+				parse_author_line(val, &rev_ctx.committer,
+						&rev_ctx.committer_email,
+						&rev_ctx.committer_date);
+			break;
+		default:
+			break;
+		}
+	}
+}
+
+int svnload_init(const char *filename)
+{
+	int i;
+	if (buffer_init(&input, filename))
+		return error("cannot open %s: %s", filename, strerror(errno));
+	active_ctx = UNKNOWN_CTX;
+	strbuf_init(&rev_ctx.props, MAX_GITSVN_LINE_LEN);
+	strbuf_init(&rev_ctx.log, MAX_GITSVN_LINE_LEN);
+	strbuf_init(&rev_ctx.author, MAX_GITSVN_LINE_LEN);
+	strbuf_init(&rev_ctx.committer, MAX_GITSVN_LINE_LEN);
+	strbuf_init(&rev_ctx.author_date, MAX_GITSVN_LINE_LEN);
+	strbuf_init(&rev_ctx.committer_date, MAX_GITSVN_LINE_LEN);
+	strbuf_init(&rev_ctx.author_email, MAX_GITSVN_LINE_LEN);
+	strbuf_init(&rev_ctx.committer_email, MAX_GITSVN_LINE_LEN);
+	strbuf_init(&node_ctx.path, MAX_GITSVN_LINE_LEN);
+	strbuf_init(&node_ctx.copyfrom_path, MAX_GITSVN_LINE_LEN);
+	for (i = 0; i < 100; i ++)
+		strbuf_init(&blobs[i], 10000);
+	return 0;
+}
+
+void svnload_deinit(void)
+{
+	int i;
+	reset_rev_ctx(0);
+	reset_node_ctx();
+	strbuf_release(&rev_ctx.props);
+	strbuf_release(&rev_ctx.log);
+	strbuf_release(&rev_ctx.author);
+	strbuf_release(&rev_ctx.committer);
+	strbuf_release(&rev_ctx.author_date);
+	strbuf_release(&rev_ctx.committer_date);
+	strbuf_release(&rev_ctx.author_email);
+	strbuf_release(&rev_ctx.committer_email);
+	strbuf_release(&node_ctx.path);
+	strbuf_release(&node_ctx.copyfrom_path);
+	for (i = 0; i < 100; i ++)
+		strbuf_release(&blobs[i]);
+	if (buffer_deinit(&input))
+		fprintf(stderr, "Input error\n");
+	if (ferror(stdout))
+		fprintf(stderr, "Output error\n");
+}
-- 
1.7.4.rc1.7.g2cf08.dirty

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/5] Build an svn-fi target in contrib/svn-fe
  2011-01-19  5:44 [RFC PATCH v2 0/5] Towards a Git-to-SVN bridge Ramkumar Ramachandra
  2011-01-19  5:44 ` [PATCH 1/5] date: Expose the time_to_tm function Ramkumar Ramachandra
  2011-01-19  5:44 ` [PATCH 2/5] vcs-svn: Start working on the dumpfile producer Ramkumar Ramachandra
@ 2011-01-19  5:44 ` Ramkumar Ramachandra
  2011-01-19  5:44 ` [PATCH 4/5] fast-export: Introduce --inline-blobs Ramkumar Ramachandra
  2011-01-19  5:44 ` [PATCH 5/5] vcs-svn: Add dir_cache for svnload Ramkumar Ramachandra
  4 siblings, 0 replies; 18+ messages in thread
From: Ramkumar Ramachandra @ 2011-01-19  5:44 UTC (permalink / raw)
  To: Git List; +Cc: Jonathan Nieder, David Barr, Sverre Rabbelier

Build an svn-fi target for testing the dumpfile producer in vcs-svn/.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 contrib/svn-fe/Makefile   |   23 +++++++++++++++++++++--
 contrib/svn-fe/svn-fi.c   |   16 ++++++++++++++++
 contrib/svn-fe/svn-fi.txt |   28 ++++++++++++++++++++++++++++
 3 files changed, 65 insertions(+), 2 deletions(-)
 create mode 100644 contrib/svn-fe/svn-fi.c
 create mode 100644 contrib/svn-fe/svn-fi.txt

diff --git a/contrib/svn-fe/Makefile b/contrib/svn-fe/Makefile
index 360d8da..555a8ff 100644
--- a/contrib/svn-fe/Makefile
+++ b/contrib/svn-fe/Makefile
@@ -37,7 +37,7 @@ svn-fe$X: svn-fe.o $(VCSSVN_LIB) $(GIT_LIB)
 	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ svn-fe.o \
 		$(ALL_LDFLAGS) $(LIBS)
 
-svn-fe.o: svn-fe.c ../../vcs-svn/svndump.h
+svn-fe.o: svn-fe.c ../../vcs-svn/svnload.h
 	$(QUIET_CC)$(CC) -I../../vcs-svn -o $*.o -c $(ALL_CFLAGS) $<
 
 svn-fe.html: svn-fe.txt
@@ -51,6 +51,24 @@ svn-fe.1: svn-fe.txt
 		../contrib/svn-fe/$@
 	$(MV) ../../Documentation/svn-fe.1 .
 
+svn-fi$X: svn-fi.o $(VCSSVN_LIB) $(GIT_LIB)
+	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ svn-fi.o \
+		$(ALL_LDFLAGS) $(LIBS)
+
+svn-fi.o: svn-fi.c ../../vcs-svn/svnload.h
+	$(QUIET_CC)$(CC) -I../../vcs-svn -o $*.o -c $(ALL_CFLAGS) $<
+
+svn-fi.html: svn-fi.txt
+	$(QUIET_SUBDIR0)../../Documentation $(QUIET_SUBDIR1) \
+		MAN_TXT=../contrib/svn-fe/svn-fi.txt \
+		../contrib/svn-fe/$@
+
+svn-fi.1: svn-fi.txt
+	$(QUIET_SUBDIR0)../../Documentation $(QUIET_SUBDIR1) \
+		MAN_TXT=../contrib/svn-fe/svn-fi.txt \
+		../contrib/svn-fe/$@
+	$(MV) ../../Documentation/svn-fi.1 .
+
 ../../vcs-svn/lib.a: FORCE
 	$(QUIET_SUBDIR0)../.. $(QUIET_SUBDIR1) vcs-svn/lib.a
 
@@ -58,6 +76,7 @@ svn-fe.1: svn-fe.txt
 	$(QUIET_SUBDIR0)../.. $(QUIET_SUBDIR1) libgit.a
 
 clean:
-	$(RM) svn-fe$X svn-fe.o svn-fe.html svn-fe.xml svn-fe.1
+	$(RM) svn-fe$X svn-fe.o svn-fe.html svn-fe.xml svn-fe.1 \
+	svn-fi$X svn-fi.o svn-fi.html svn-fi.xml svn-fi.1
 
 .PHONY: all clean FORCE
diff --git a/contrib/svn-fe/svn-fi.c b/contrib/svn-fe/svn-fi.c
new file mode 100644
index 0000000..81347b0
--- /dev/null
+++ b/contrib/svn-fe/svn-fi.c
@@ -0,0 +1,16 @@
+/*
+ * This file is in the public domain.
+ * You may freely use, modify, distribute, and relicense it.
+ */
+
+#include <stdlib.h>
+#include "svnload.h"
+
+int main(int argc, char **argv)
+{
+	if (svnload_init(NULL))
+		return 1;
+	svnload_read();
+	svnload_deinit();
+	return 0;
+}
diff --git a/contrib/svn-fe/svn-fi.txt b/contrib/svn-fe/svn-fi.txt
new file mode 100644
index 0000000..996a175
--- /dev/null
+++ b/contrib/svn-fe/svn-fi.txt
@@ -0,0 +1,28 @@
+svn-fe(1)
+=========
+
+NAME
+----
+svn-fi - convert fast-import stream to an SVN "dumpfile"
+
+SYNOPSIS
+--------
+[verse]
+svn-fi
+
+DESCRIPTION
+-----------
+
+Converts a git-fast-import(1) stream into a Subversion dumpfile.
+
+INPUT FORMAT
+-------------
+The fast-import format is documented by the git-fast-import(1)
+manual page.
+
+OUTPUT FORMAT
+------------
+Subversion's repository dump format is documented in full in
+`notes/dump-load-format.txt` from the Subversion source tree.
+Files in this format can be generated using the 'svnadmin dump' or
+'svk admin dump' command.
-- 
1.7.4.rc1.7.g2cf08.dirty

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/5] fast-export: Introduce --inline-blobs
  2011-01-19  5:44 [RFC PATCH v2 0/5] Towards a Git-to-SVN bridge Ramkumar Ramachandra
                   ` (2 preceding siblings ...)
  2011-01-19  5:44 ` [PATCH 3/5] Build an svn-fi target in contrib/svn-fe Ramkumar Ramachandra
@ 2011-01-19  5:44 ` Ramkumar Ramachandra
  2011-01-19 19:50   ` Junio C Hamano
  2011-01-22  0:30   ` Junio C Hamano
  2011-01-19  5:44 ` [PATCH 5/5] vcs-svn: Add dir_cache for svnload Ramkumar Ramachandra
  4 siblings, 2 replies; 18+ messages in thread
From: Ramkumar Ramachandra @ 2011-01-19  5:44 UTC (permalink / raw)
  To: Git List; +Cc: Jonathan Nieder, David Barr, Sverre Rabbelier

Introduce a new command-line option --inline-blobs that always inlines
blobs instead of referring to them via marks or their original SHA-1
hash.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 Documentation/git-fast-export.txt |    5 +++++
 builtin/fast-export.c             |   23 +++++++++++++++++++++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
index e05b686..cd48b4b 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -90,6 +90,11 @@ marks the same across runs.
 	resulting stream can only be used by a repository which
 	already contains the necessary objects.
 
+--inline-blobs::
+	Inline all blobs, instead of referring to them via marks or
+	their original SHA-1 hash.  This is useful to parsers, as they
+	don't need to persist blobs.
+
 --full-tree::
 	This option will cause fast-export to issue a "deleteall"
 	directive for each commit followed by a full list of all files
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index c8fd46b..202a3b9 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -27,6 +27,7 @@ static enum { ABORT, VERBATIM, WARN, STRIP } signed_tag_mode = ABORT;
 static enum { ERROR, DROP, REWRITE } tag_of_filtered_mode = ABORT;
 static int fake_missing_tagger;
 static int no_data;
+static int inline_blobs;
 static int full_tree;
 
 static int parse_opt_signed_tag_mode(const struct option *opt,
@@ -118,7 +119,7 @@ static void handle_object(const unsigned char *sha1)
 	char *buf;
 	struct object *object;
 
-	if (no_data)
+	if (no_data || inline_blobs)
 		return;
 
 	if (is_null_sha1(sha1))
@@ -218,7 +219,23 @@ static void show_filemodify(struct diff_queue_struct *q,
 			if (no_data || S_ISGITLINK(spec->mode))
 				printf("M %06o %s %s\n", spec->mode,
 				       sha1_to_hex(spec->sha1), spec->path);
-			else {
+			else if (inline_blobs) {
+				unsigned long size;
+				enum object_type type;
+				char *buf;
+
+				buf = read_sha1_file(spec->sha1, &type, &size);
+				if (!buf)
+					die ("Could not read blob %s",
+						sha1_to_hex(spec->sha1));
+				printf("M %06o inline %s\n", spec->mode, spec->path);
+				printf("data %lu\n", size);
+				if (size && fwrite(buf, size, 1, stdout) != 1)
+					die_errno ("Could not write blob '%s'",
+						sha1_to_hex(spec->sha1));
+				printf("\n");
+
+			} else {
 				struct object *object = lookup_object(spec->sha1);
 				printf("M %06o :%d %s\n", spec->mode,
 				       get_object_mark(object), spec->path);
@@ -627,6 +644,8 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
 			     "Fake a tagger when tags lack one"),
 		OPT_BOOLEAN(0, "full-tree", &full_tree,
 			     "Output full tree for each commit"),
+		OPT_BOOLEAN(0, "inline-blobs", &inline_blobs,
+			     "Output all blobs inline"),
 		{ OPTION_NEGBIT, 0, "data", &no_data, NULL,
 			"Skip output of blob data",
 			PARSE_OPT_NOARG | PARSE_OPT_NEGHELP, NULL, 1 },
-- 
1.7.4.rc1.7.g2cf08.dirty

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/5] vcs-svn: Add dir_cache for svnload
  2011-01-19  5:44 [RFC PATCH v2 0/5] Towards a Git-to-SVN bridge Ramkumar Ramachandra
                   ` (3 preceding siblings ...)
  2011-01-19  5:44 ` [PATCH 4/5] fast-export: Introduce --inline-blobs Ramkumar Ramachandra
@ 2011-01-19  5:44 ` Ramkumar Ramachandra
  4 siblings, 0 replies; 18+ messages in thread
From: Ramkumar Ramachandra @ 2011-01-19  5:44 UTC (permalink / raw)
  To: Git List; +Cc: Jonathan Nieder, David Barr, Sverre Rabbelier

The logic for creating directories recursively is now
implemented. Unfortunately, dir_cache is currently implemented
inefficiently using string_list- should probably use a treap or trie
in future.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 Makefile              |    2 +-
 vcs-svn/dir_cache.c   |   40 +++++++++++++++
 vcs-svn/dir_cache.h   |   12 +++++
 vcs-svn/dump_export.c |  104 ++++++++++++++++++++++++++++++++-------
 vcs-svn/dump_export.h |   33 ++++++++++++
 vcs-svn/svnload.c     |  130 +++++++++++++++++++++++--------------------------
 vcs-svn/svnload.h     |   10 ++++
 7 files changed, 243 insertions(+), 88 deletions(-)
 create mode 100644 vcs-svn/dir_cache.c
 create mode 100644 vcs-svn/dir_cache.h
 create mode 100644 vcs-svn/dump_export.h
 create mode 100644 vcs-svn/svnload.h

diff --git a/Makefile b/Makefile
index 40f6691..d9c2442 100644
--- a/Makefile
+++ b/Makefile
@@ -1836,7 +1836,7 @@ XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
 	xdiff/xmerge.o xdiff/xpatience.o
 VCSSVN_OBJS = vcs-svn/line_buffer.o vcs-svn/svnload.o vcs-svn/dump_export.o \
 	vcs-svn/repo_tree.o vcs-svn/fast_export.o vcs-svn/sliding_window.o \
-	vcs-svn/svndiff.o vcs-svn/svndump.o
+	vcs-svn/svndiff.o vcs-svn/svndump.o vcs-svn/dir_cache.o
 VCSSVN_TEST_OBJS = test-obj-pool.o \
 	test-line-buffer.o test-treap.o test-svn-fe.o
 OBJECTS := $(GIT_OBJS) $(XDIFF_OBJS) $(VCSSVN_OBJS)
diff --git a/vcs-svn/dir_cache.c b/vcs-svn/dir_cache.c
new file mode 100644
index 0000000..9a608ce
--- /dev/null
+++ b/vcs-svn/dir_cache.c
@@ -0,0 +1,40 @@
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "git-compat-util.h"
+#include "string-list.h"
+#include "line_buffer.h"
+#include "dump_export.h"
+
+static struct string_list dirents = STRING_LIST_INIT_DUP;
+static struct string_list_item *dir = NULL;
+
+void dir_cache_add(const char *path, enum node_kind kind) {
+	dir = string_list_insert(&dirents, path);
+	dir->util = malloc(sizeof(enum node_kind));
+	*((enum node_kind *)(dir->util)) = kind;
+}
+
+void dir_cache_remove(const char *path) {
+	dir = string_list_lookup(&dirents, path);
+	if (dir)
+		*((enum node_kind *)(dir->util)) = NODE_KIND_UNKNOWN;
+}
+
+enum node_kind dir_cache_lookup(const char *path) {
+	dir = string_list_lookup(&dirents, path);
+	if (dir)
+		return *((enum node_kind *)(dir->util));
+	else
+		return NODE_KIND_UNKNOWN;
+}
+
+void dir_cache_init() {
+	return;
+}
+
+void dir_cache_deinit() {
+	string_list_clear(&dirents, 1);
+}
diff --git a/vcs-svn/dir_cache.h b/vcs-svn/dir_cache.h
new file mode 100644
index 0000000..43c3797
--- /dev/null
+++ b/vcs-svn/dir_cache.h
@@ -0,0 +1,12 @@
+#ifndef DIR_CACHE_H_
+#define DIR_CACHE_H_
+
+#include "dump_export.h"
+
+void dir_cache_add(const char *path, enum node_kind kind);
+void dir_cache_remove(const char *path);
+enum node_kind dir_cache_lookup(const char *path);
+void dir_cache_init();
+void dir_cache_deinit();
+
+#endif
diff --git a/vcs-svn/dump_export.c b/vcs-svn/dump_export.c
index 04ede06..6f2a9d7 100644
--- a/vcs-svn/dump_export.c
+++ b/vcs-svn/dump_export.c
@@ -7,6 +7,9 @@
 #include "strbuf.h"
 #include "line_buffer.h"
 #include "dump_export.h"
+#include "dir_cache.h"
+
+static struct strbuf props;
 
 void dump_export_begin_rev(int revision, const char *revprops,
 			int prop_len) {
@@ -19,39 +22,47 @@ void dump_export_begin_rev(int revision, const char *revprops,
 void dump_export_node(const char *path, enum node_kind kind,
 		enum node_action action, unsigned long text_len,
 		unsigned long copyfrom_rev, const char *copyfrom_path) {
+	int dump_props = 1; /* Boolean */
+	strbuf_reset(&props);
 	printf("Node-path: %s\n", path);
-	printf("Node-kind: ");
-	switch (action) {
+	switch (kind) {
 	case NODE_KIND_NORMAL:
-		printf("file\n");
+		printf("Node-kind: file\n");
 		break;
 	case NODE_KIND_EXECUTABLE:
-		printf("file\n");
+		printf("Node-kind: file\n");
+		strbuf_addf(&props, "K\nsvn:executable\nV\n1\n");
 		break;
 	case NODE_KIND_SYMLINK:
-		printf("file\n");
+		printf("Node-kind: file\n");
+		strbuf_addf(&props, "K\nsvn:special\nV\n1\n");
 		break;
 	case NODE_KIND_GITLINK:
-		printf("file\n");
+		printf("Node-kind: file\n");
+		break;
+	case NODE_KIND_DIR:
+		printf("Node-kind: dir\n");
 		break;
 	case NODE_KIND_SUBDIR:
 		die("Unsupported: subdirectory");
 	default:
-		break;
+		break; /* When a node is being removed, nothing is printed */
 	}
-	printf("Node-action: ");
+	strbuf_add(&props, "PROPS-END\n", 10);
+
 	switch (action) {
 	case NODE_ACTION_CHANGE:
-		printf("change\n");
+		printf("Node-action: change\n");
 		break;
 	case NODE_ACTION_ADD:
-		printf("add\n");
+		printf("Node-action: add\n");
 		break;
 	case NODE_ACTION_REPLACE:
-		printf("replace\n");
+		printf("Node-action: replace\n");
 		break;
 	case NODE_ACTION_DELETE:
-		printf("delete\n");
+		printf("Node-action: delete\n");
+		dump_props = 0;
 		break;
 	default:
 		break;
@@ -60,14 +71,71 @@ void dump_export_node(const char *path, enum node_kind kind,
 		printf("Node-copyfrom-rev: %lu\n", copyfrom_rev);
 		printf("Node-copyfrom-path: %s\n", copyfrom_path);
 	}
-	printf("Prop-delta: false\n");
-	printf("Prop-content-length: 10\n"); /* Constant 10 for "PROPS-END" */
-	printf("Text-delta: false\n");
-	printf("Text-content-length: %lu\n", text_len);
-	printf("Content-length: %lu\n\n", text_len + 10);
-	printf("PROPS-END\n\n");
+	if (dump_props) {
+		printf("Prop-delta: false\n");
+		printf("Prop-content-length: %lu\n", props.len);
+	}
+	if (text_len) {
+		printf("Text-delta: false\n");		
+		printf("Text-content-length: %lu\n", text_len);
+	}
+	if (text_len || dump_props) {
+		printf("Content-length: %lu\n\n", text_len + props.len);
+		printf("%s", props.buf);
+	}
+	if (!text_len)
+		printf("\n");
+}
+
+void dump_export_node_r(const char *path, enum node_kind kind,
+			enum node_action action, unsigned long text_len,
+			unsigned long copyfrom_rev, const char *copyfrom_path) {
+	char *start, *t;
+	start = (char *) path;
+
+	while((t = strchr(start, '/'))) {
+			*t = '\0';
+			if (dir_cache_lookup(path) == NODE_KIND_UNKNOWN) {
+				dir_cache_add(path, NODE_KIND_NORMAL);
+				dump_export_node(path, NODE_KIND_DIR,
+						NODE_ACTION_ADD, 0,
+						SVN_INVALID_REV, NULL);
+			}
+			*t = '/';   /* Change it back */
+			start = t + 1;
+	}
+	switch (dir_cache_lookup(path)) {
+	case NODE_KIND_UNKNOWN:
+		action = NODE_ACTION_ADD;
+		break;
+	case NODE_KIND_SYMLINK:
+		dir_cache_remove(path);
+		dump_export_node(path, NODE_KIND_UNKNOWN,
+				NODE_ACTION_DELETE,
+				0, SVN_INVALID_REV, NULL);
+		break;
+	case NODE_KIND_DIR:
+		die("File was previously a directory?");
+		break;
+	case NODE_KIND_SUBDIR:
+		die("Subdirectories unsupported");
+		break;
+	default:
+		action = NODE_ACTION_CHANGE;
+		break;
+	}
+	dir_cache_add(path, kind);
+	dump_export_node(path, kind, action, text_len, copyfrom_rev, copyfrom_path);
 }
 
 void dump_export_text(struct line_buffer *data, off_t len) {
 	buffer_copy_bytes(data, len);
 }
+
+void dump_export_init() {
+	strbuf_init(&props, MAX_GITSVN_LINE_LEN);
+}
+
+void dump_export_deinit() {
+	strbuf_release(&props);
+}
diff --git a/vcs-svn/dump_export.h b/vcs-svn/dump_export.h
new file mode 100644
index 0000000..e9f51a3
--- /dev/null
+++ b/vcs-svn/dump_export.h
@@ -0,0 +1,33 @@
+#ifndef DUMP_EXPORT_H_
+#define DUMP_EXPORT_H_
+
+#define MAX_GITSVN_LINE_LEN 4096
+#define SVN_INVALID_REV 0
+
+enum node_action {
+	NODE_ACTION_UNKNOWN,
+	NODE_ACTION_CHANGE,
+	NODE_ACTION_ADD,
+	NODE_ACTION_DELETE,
+	NODE_ACTION_REPLACE
+};
+
+enum node_kind {
+	NODE_KIND_UNKNOWN,       /* Missing node */
+	NODE_KIND_NORMAL,
+	NODE_KIND_EXECUTABLE,
+	NODE_KIND_SYMLINK,
+	NODE_KIND_GITLINK,
+	NODE_KIND_DIR,           /* SVN-specific */
+	NODE_KIND_SUBDIR
+};
+
+void dump_export_begin_rev(int revision, const char *revprops, int prop_len);
+void dump_export_text(struct line_buffer *data, off_t len);
+void dump_export_node_r(const char *path, enum node_kind kind,
+			enum node_action action, unsigned long text_len,
+			unsigned long copyfrom_rev, const char *copyfrom_path);
+void dump_export_init();
+void dump_export_deinit();
+
+#endif
diff --git a/vcs-svn/svnload.c b/vcs-svn/svnload.c
index 7043ae7..19c4689 100644
--- a/vcs-svn/svnload.c
+++ b/vcs-svn/svnload.c
@@ -11,17 +11,16 @@
 #include "git-compat-util.h"
 #include "line_buffer.h"
 #include "dump_export.h"
-#include "strbuf.h"
+#include "dir_cache.h"
 
 #define SVN_DATE_FORMAT "%Y-%m-%dT%H:%M:%S.000000Z"
 #define SVN_DATE_LEN 28
 #define LENGTH_UNKNOWN (~0)
 
 static struct line_buffer input = LINE_BUFFER_INIT;
-static struct strbuf blobs[100];
-	
+
 static struct {
-	unsigned long prop_len, text_len, copyfrom_rev, mark;
+	unsigned long prop_len, text_len, copyfrom_rev;
 	int text_delta, prop_delta; /* Boolean */
 	enum node_action action;
 	enum node_kind kind;
@@ -60,15 +59,14 @@ static void reset_node_ctx(void)
 {
 	node_ctx.prop_len = LENGTH_UNKNOWN;
 	node_ctx.text_len = LENGTH_UNKNOWN;
-	node_ctx.mark = 0;
-	node_ctx.copyfrom_rev = 0;
+	node_ctx.copyfrom_rev = SVN_INVALID_REV;
 	node_ctx.text_delta = -1;
 	node_ctx.prop_delta = -1;
 	strbuf_reset(&node_ctx.copyfrom_path);
 	strbuf_reset(&node_ctx.path);
 }
 
-static void populate_props(struct strbuf *props, const char *author,
+static void populate_revprops(struct strbuf *props, const char *author,
 			const char *log, const char *date) {
 	strbuf_reset(props);	
 	strbuf_addf(props, "K\nsvn:author\nV\n%s\n", author);
@@ -83,7 +81,7 @@ static void parse_author_line(char *val, struct strbuf *name,
 	char time_buf[SVN_DATE_LEN];
 	const struct tm *tm_time;
 
-	/* Simon Hausmann <shausman@trolltech.com> 1170199019 +0100 */
+	/* Author Name <author@email.com> 1170199019 +0530 */
 	strbuf_reset(name);
 	strbuf_reset(email);
 	strbuf_reset(date);
@@ -104,46 +102,42 @@ static void parse_author_line(char *val, struct strbuf *name,
 
 void svnload_read(void) {
 	char *t, *val;
-	int mode_incr;
-	struct strbuf *to_dump;
+	int len, mode_incr;
 
 	while ((t = buffer_read_line(&input))) {
 		val = strchr(t, ' ');
-		if (!val) {
-			if (!memcmp(t, "blob", 4))
-				active_ctx = BLOB_CTX;
-			else if (!memcmp(t, "deleteall", 9))
-				;
-			continue;
+		if (!val)
+			len = strlen(t);
+		else {
+			*val++ = '\0';
+			len = val - t - 1;
 		}
-		*val++ = '\0';
 
-		/* strlen(key) */
-		switch (val - t - 1) {
+		switch (len) {
 		case 1:
 			if (!memcmp(t, "D", 1)) {
 				node_ctx.action = NODE_ACTION_DELETE;
-			}
-			else if (!memcmp(t, "C", 1)) {
+			} else if (!memcmp(t, "C", 1)) {
 				node_ctx.action = NODE_ACTION_ADD;
-			}
-			else if (!memcmp(t, "R", 1)) {
+			} else if (!memcmp(t, "R", 1)) {
 				node_ctx.action = NODE_ACTION_REPLACE;
-			}
-			else if (!memcmp(t, "M", 1)) {
+			} else if (!memcmp(t, "M", 1)) {
 				node_ctx.action = NODE_ACTION_CHANGE;
-				mode_incr = 7;
-				if (!memcmp(val, "100644", 6))
-					node_ctx.kind = NODE_KIND_NORMAL;
-				else if (!memcmp(val, "100755", 6))
-					node_ctx.kind = NODE_KIND_EXECUTABLE;
-				else if (!memcmp(val, "120000", 6))
-					node_ctx.kind = NODE_KIND_SYMLINK;
-				else if (!memcmp(val, "160000", 6))
-					node_ctx.kind = NODE_KIND_GITLINK;
-				else if (!memcmp(val, "040000", 6))
-					node_ctx.kind = NODE_KIND_SUBDIR;
-				else {
+				if (strlen(val) >= 6) {
+					if (!memcmp(val, "100644", 6))
+						node_ctx.kind = NODE_KIND_NORMAL;
+					else if (!memcmp(val, "100755", 6))
+						node_ctx.kind = NODE_KIND_EXECUTABLE;
+					else if (!memcmp(val, "120000", 6))
+						node_ctx.kind = NODE_KIND_SYMLINK;
+					else if (!memcmp(val, "160000", 6))
+						node_ctx.kind = NODE_KIND_GITLINK;
+					else if (!memcmp(val, "040000", 6))
+						node_ctx.kind = NODE_KIND_SUBDIR;
+					else
+						die("Unrecognized mode: %s", val);
+					mode_incr = 7;
+				} else if (strlen(val) >= 3) {
 					if (!memcmp(val, "755", 3))
 						node_ctx.kind = NODE_KIND_EXECUTABLE;
 					else if(!memcmp(val, "644", 3))
@@ -151,40 +145,35 @@ void svnload_read(void) {
 					else
 						die("Unrecognized mode: %s", val);
 					mode_incr = 4;
-				}
+				} else
+					die ("Malformed filemodify: Missing mode");
 				val += mode_incr;
 				t = strchr(val, ' ');
+				if (!t)
+					die("Malformed filemodify: Missing dataref");
 				*t++ = '\0';
 				strbuf_reset(&node_ctx.path);
 				strbuf_add(&node_ctx.path, t, strlen(t));
-				if (!memcmp(val + 1, "inline", 6))
-					die("Unsupported dataref: inline");
+				if (!strncmp(val, "inline", 6))
+					active_ctx = BLOB_CTX;
 				else if (*val == ':')
-					to_dump = &blobs[strtoul(val + 1, NULL, 10)];
+					die("Unsupported dataref: marks");
 				else
 					die("Unsupported dataref: sha1");
-				dump_export_node(node_ctx.path.buf, node_ctx.kind,
-						node_ctx.action, to_dump->len,
-						0, NULL);
-				printf("%s", to_dump->buf);
 			}
 			break;
+		case 2:
+			if (!memcmp(t, "ls", 2))
+				die("ls not supported");
 		case 3:
 			if (!memcmp(t, "tag", 3))
 				continue;
 			break;
 		case 4:
-			if (!memcmp(t, "mark", 4))
-				switch(active_ctx) {
-				case COMMIT_CTX:
-					/* What do we do with commit marks? */
-					continue;
-				case BLOB_CTX:
-					node_ctx.mark = strtoul(val + 1, NULL, 10);
-					break;
-				default:
-					break;
-				}
+			if (!memcmp(t, "blob", 4))
+				die("Please inline blobs in the fast-import stream");
+			else if (!memcmp(t, "mark", 4))
+				continue;
 			else if (!memcmp(t, "from", 4))
 				continue;
 			else if (!memcmp(t, "data", 4)) {
@@ -194,7 +183,7 @@ void svnload_read(void) {
 					buffer_read_binary(&input,
 							&rev_ctx.log,
 							strtoul(val, NULL, 10));
-					populate_props(&rev_ctx.props,
+					populate_revprops(&rev_ctx.props,
 						rev_ctx.svn_author.buf,
 						rev_ctx.log.buf,
 						rev_ctx.author_date.buf);
@@ -204,9 +193,10 @@ void svnload_read(void) {
 					break;
 				case BLOB_CTX:
 					node_ctx.text_len = strtoul(val, NULL, 10);
-					buffer_read_binary(&input,
-							&blobs[node_ctx.mark],
-							node_ctx.text_len);
+					dump_export_node_r(node_ctx.path.buf, node_ctx.kind,
+							node_ctx.action, node_ctx.text_len,
+							SVN_INVALID_REV, NULL);
+					buffer_copy_bytes(&input, node_ctx.text_len);
 					break;
 				default:
 					break;
@@ -231,14 +221,18 @@ void svnload_read(void) {
 					rev_ctx.author_email.buf,
 					t - rev_ctx.author_email.buf);
 
-			}
-			else if (!memcmp(t, "commit", 6)) {
+			} else if (!memcmp(t, "commit", 6)) {
 				rev_ctx.rev ++;
 				active_ctx = COMMIT_CTX;
 			}
 			break;
+		case 8:
+			if (!memcmp(t, "cat-blob", 8))
+				die("cat-blob unsupported");
 		case 9:
-			if (!memcmp(t, "committer", 9))
+			if (!memcmp(t, "deleteall", 9))
+				continue;
+			else if (!memcmp(t, "committer", 9))
 				parse_author_line(val, &rev_ctx.committer,
 						&rev_ctx.committer_email,
 						&rev_ctx.committer_date);
@@ -251,7 +245,6 @@ void svnload_read(void) {
 
 int svnload_init(const char *filename)
 {
-	int i;
 	if (buffer_init(&input, filename))
 		return error("cannot open %s: %s", filename, strerror(errno));
 	active_ctx = UNKNOWN_CTX;
@@ -265,14 +258,13 @@ int svnload_init(const char *filename)
 	strbuf_init(&rev_ctx.committer_email, MAX_GITSVN_LINE_LEN);
 	strbuf_init(&node_ctx.path, MAX_GITSVN_LINE_LEN);
 	strbuf_init(&node_ctx.copyfrom_path, MAX_GITSVN_LINE_LEN);
-	for (i = 0; i < 100; i ++)
-		strbuf_init(&blobs[i], 10000);
+	dump_export_init();
+	dir_cache_init();
 	return 0;
 }
 
 void svnload_deinit(void)
 {
-	int i;
 	reset_rev_ctx(0);
 	reset_node_ctx();
 	strbuf_release(&rev_ctx.props);
@@ -285,8 +277,8 @@ void svnload_deinit(void)
 	strbuf_release(&rev_ctx.committer_email);
 	strbuf_release(&node_ctx.path);
 	strbuf_release(&node_ctx.copyfrom_path);
-	for (i = 0; i < 100; i ++)
-		strbuf_release(&blobs[i]);
+	dump_export_deinit();
+	dir_cache_deinit();
 	if (buffer_deinit(&input))
 		fprintf(stderr, "Input error\n");
 	if (ferror(stdout))
diff --git a/vcs-svn/svnload.h b/vcs-svn/svnload.h
new file mode 100644
index 0000000..0c8fe8b
--- /dev/null
+++ b/vcs-svn/svnload.h
@@ -0,0 +1,10 @@
+#ifndef SVNLOAD_H_
+#define SVNLOAD_H_
+
+#define SVN_INVALID_REV -1
+
+int svnload_init(const char *filename);
+void svnload_deinit(void);
+void svnload_read(void);
+
+#endif
-- 
1.7.4.rc1.7.g2cf08.dirty

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] fast-export: Introduce --inline-blobs
  2011-01-19  5:44 ` [PATCH 4/5] fast-export: Introduce --inline-blobs Ramkumar Ramachandra
@ 2011-01-19 19:50   ` Junio C Hamano
  2011-01-19 21:48     ` Jonathan Nieder
  2011-01-20  5:41     ` Jonathan Nieder
  2011-01-22  0:30   ` Junio C Hamano
  1 sibling, 2 replies; 18+ messages in thread
From: Junio C Hamano @ 2011-01-19 19:50 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git List, Jonathan Nieder, David Barr, Sverre Rabbelier

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Introduce a new command-line option --inline-blobs that always inlines
> blobs instead of referring to them via marks or their original SHA-1
> hash.
>
> Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
> ---
>  Documentation/git-fast-export.txt |    5 +++++
>  builtin/fast-export.c             |   23 +++++++++++++++++++++--
>  2 files changed, 26 insertions(+), 2 deletions(-)

Hmm, this smells somewhat fishy.

Wasn't G-F-I designed to be a common stream format for other SCMs to
generate streams, so that importers and exporters can be written once for
each SCM to interoperate?

This patch will allow you to write an importer that can only take a stream
with inlined blobs without any references to previous occurrences, but if
the exporter for an SCM that you are trying to interoperate with does not
support --inline-blobs, you are screwed.

What is the problem you are really trying to solve?  If it is "it is
cumbersome to keep track of blob references", wouldn't it be nicer to
instead make it easier for importers to support referenced blobs?

Just thinking aloud, but is it possible to write a filter that converts an
arbitrary G-F-I stream with referenced blobs into a G-F-I stream without
referenced blobs by inlining all the blobs?  Then other people do not have
to implement --inline-blobs to their own exporter.

If that is not possible, should/can we do something to at least allow
people to check if an existing stream is compatible with an importer that
cannot take referenced blobs without actually trying to run import (and
see it fail)?  Do we need a way to encourage people to add --inline-blobs
support to their exporters?  I suspect this series leads to make G-F-I
less useful by fragmenting the compatible subset of stream formats without
such effort, no?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] fast-export: Introduce --inline-blobs
  2011-01-19 19:50   ` Junio C Hamano
@ 2011-01-19 21:48     ` Jonathan Nieder
  2011-01-20  4:50       ` Ramkumar Ramachandra
  2011-01-20  5:41     ` Jonathan Nieder
  1 sibling, 1 reply; 18+ messages in thread
From: Jonathan Nieder @ 2011-01-19 21:48 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ramkumar Ramachandra, Git List, David Barr, Sverre Rabbelier

Junio C Hamano wrote:
> Ramkumar Ramachandra <artagnon@gmail.com> writes:

>> Introduce a new command-line option --inline-blobs that always inlines
>> blobs instead of referring to them via marks or their original SHA-1
>> hash.
[...]
> Hmm, this smells somewhat fishy.
>
> Wasn't G-F-I designed to be a common stream format for other SCMs to
> generate streams, so that importers and exporters can be written once for
> each SCM to interoperate?

Here is one way to sell it:

	With the inline blobs feature, fast-import backends have to
	maintain less state.  Using it should speed up exporting.

	This is made optional because ...

I haven't thought through whether it ought to be optional or measured
the effect on import performance.

A separate question is what an svn fast-import backend should do with
all those blobs that are not ready to be written to dump.  As a hack
while prototyping, one can rely on the "current" fast-export output,
even though that is not flexible or futureproof.  Longer term, the
folllowing sounds very interesting

> Just thinking aloud, but is it possible to write a filter that converts an
> arbitrary G-F-I stream with referenced blobs into a G-F-I stream without
> referenced blobs by inlining all the blobs?

to avoid complexity in the svn fast-import backend itself.
(Complicating detail: such a filter would presumably take responsibility
for --export-marks, so it might want a way to retrieve commit marks
from its downstream.)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] fast-export: Introduce --inline-blobs
  2011-01-19 21:48     ` Jonathan Nieder
@ 2011-01-20  4:50       ` Ramkumar Ramachandra
  2011-01-20  5:48         ` Jonathan Nieder
  2011-01-20 13:53         ` Drew Northup
  0 siblings, 2 replies; 18+ messages in thread
From: Ramkumar Ramachandra @ 2011-01-20  4:50 UTC (permalink / raw)
  To: Jonathan Nieder, Junio C Hamano; +Cc: Git List, David Barr, Sverre Rabbelier

Hi,

Jonathan Nieder writes:
> Junio C Hamano wrote:
> > Ramkumar Ramachandra <artagnon@gmail.com> writes:
> 
> >> Introduce a new command-line option --inline-blobs that always inlines
> >> blobs instead of referring to them via marks or their original SHA-1
> >> hash.
> [...]
> > Hmm, this smells somewhat fishy.
> >
> > Wasn't G-F-I designed to be a common stream format for other SCMs to
> > generate streams, so that importers and exporters can be written once for
> > each SCM to interoperate?
> 
> Here is one way to sell it:
> 
> 	With the inline blobs feature, fast-import backends have to
> 	maintain less state.  Using it should speed up exporting.
> 
> 	This is made optional because ...
> 
> I haven't thought through whether it ought to be optional or measured
> the effect on import performance.

It simplifies other fast-import backends greatly, because persisting
blobs can be complicated and expensive. I was thinking of making
svn-fe support both inlined blobs, and blobs referenced by marks. When
it's possible to be cheap by optionally having inlined blobs, why not
optionally have them? The filter we develop later can be used for
older fast-import streams that don't have inlined blobs.

On a related note, does it make sense to version our fast-import
stream format? It's certainly going to keep evolving with time, and we
need backward compatibility.

> A separate question is what an svn fast-import backend should do with
> all those blobs that are not ready to be written to dump.  As a hack
> while prototyping, one can rely on the "current" fast-export output,
> even though that is not flexible or futureproof.  Longer term, the
> folllowing sounds very interesting

Good point. The functionality to persist blobs that are refenced by
marks probably shouldn't be in svn-fe at all.

> > Just thinking aloud, but is it possible to write a filter that converts an
> > arbitrary G-F-I stream with referenced blobs into a G-F-I stream without
> > referenced blobs by inlining all the blobs?
> 
> to avoid complexity in the svn fast-import backend itself.
> (Complicating detail: such a filter would presumably take responsibility
> for --export-marks, so it might want a way to retrieve commit marks
> from its downstream.)

This filter will need to persist every blob for the entire lifetime of
the program. We can't possibly do it in-memory, so we have to find
some way to persist them on-disk and retrieve them very
quickly. Jonathan suggested using something like ToyoCabinet earlier-
I'll start working and see what I come up with.

-- Ram

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] fast-export: Introduce --inline-blobs
  2011-01-19 19:50   ` Junio C Hamano
  2011-01-19 21:48     ` Jonathan Nieder
@ 2011-01-20  5:41     ` Jonathan Nieder
  1 sibling, 0 replies; 18+ messages in thread
From: Jonathan Nieder @ 2011-01-20  5:41 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ramkumar Ramachandra, Git List, David Barr, Sverre Rabbelier

Junio C Hamano wrote:

> Just thinking aloud, but is it possible to write a filter that converts an
> arbitrary G-F-I stream with referenced blobs into a G-F-I stream without
> referenced blobs by inlining all the blobs?

A few details to watch out for:

- A mark, as in

	M 100644 :1 path/to/file

  can refer to a blob from a previous import.  A mark can even refer
  to a manually prepared marks file.

- The syntax

	M 100644 0409ac9fd3f1ea36680189e07116e58b2630ccad path/to/file

  refers to a blob that might not have been mentioned elsewhere in the
  stream.  This is the variant used by "git fast-export --no-data" to
  avoid transferring blob data.  (In general, non-git backends would
  presumably use something other than git blob IDs if they use this
  feature.  A filter of the kind we are describing would probably pass
  them through.)

  These datarefs can be acquired out of band (probably not a big deal)
  or by using the "ls" command to copy from a previous revision:

	> ls :3 "path/to/other/file"
	100644 blob 0409ac9fd3f1ea36680189e07116e58b2630ccad	git.c
	> M 100644 0409ac9fd3f1ea36680189e07116e58b2630ccad path/to/file

- The cat-blob command ("cat-blob :1") allows frontends to request
  the content of a previously imported blob (presumably in order to
  apply a delta to it).

So while something like the filter you describe seems possible, it
cannot be as simple as

	mkfifo replies &&
	fast-export-frontend 3<replies |
	inline-blobs |
	fast-import-backend --cat-blob-fd=3 <args> 3>replies

for general <frontend> and <backend>.  The frontend might try to cat
blobs by mark number or to pick off where it left off in a previous
run using a marks file.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] fast-export: Introduce --inline-blobs
  2011-01-20  4:50       ` Ramkumar Ramachandra
@ 2011-01-20  5:48         ` Jonathan Nieder
  2011-01-20  6:28           ` Ramkumar Ramachandra
  2011-01-20 13:53         ` Drew Northup
  1 sibling, 1 reply; 18+ messages in thread
From: Jonathan Nieder @ 2011-01-20  5:48 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Junio C Hamano, Git List, David Barr, Sverre Rabbelier

Ramkumar Ramachandra wrote:

> The functionality to persist blobs that are refenced by
> marks probably shouldn't be in svn-fe at all.

Do you mean svn-fi?

> This filter will need to persist every blob for the entire lifetime of
> the program.

Depending on the interface, couldn't it be possible to rely on svn for
the content of blobs that have already been exported?  If so, one
would only need a place to stash (1) a mapping from mark numbers to
(svn rev, path) pairs and (2) the full text of blobs that have not
been exported as part of a rev yet.

Cheers,
Jonathan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] fast-export: Introduce --inline-blobs
  2011-01-20  5:48         ` Jonathan Nieder
@ 2011-01-20  6:28           ` Ramkumar Ramachandra
  0 siblings, 0 replies; 18+ messages in thread
From: Ramkumar Ramachandra @ 2011-01-20  6:28 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Junio C Hamano, Git List, David Barr, Sverre Rabbelier

Hi Jonathan,

Jonathan Nieder writes:
> Ramkumar Ramachandra wrote:
> 
> > The functionality to persist blobs that are refenced by
> > marks probably shouldn't be in svn-fe at all.
> 
> Do you mean svn-fi?

Yeah. Sorry about the typo. Just to make it clear:

svn-fi should only ever support inlined blobs. For older streams,
there's an extra overhead- it'll have to be chained along with a
helper program that transforms the stream to inline all the blobs.

> > This filter will need to persist every blob for the entire lifetime of
> > the program.
> 
> Depending on the interface, couldn't it be possible to rely on svn for
> the content of blobs that have already been exported?  If so, one
> would only need a place to stash (1) a mapping from mark numbers to
> (svn rev, path) pairs and (2) the full text of blobs that have not
> been exported as part of a rev yet.

Oh yes. We discussed this on IRC :) I'm just afraid that it won't be
fast enough- my idea is to essentially use `svnrdump dump` to replay
the blobs in a certain (path, revision); let me know if you think
there's a quicker way.

-- Ram

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] fast-export: Introduce --inline-blobs
  2011-01-20  4:50       ` Ramkumar Ramachandra
  2011-01-20  5:48         ` Jonathan Nieder
@ 2011-01-20 13:53         ` Drew Northup
  2011-01-22  9:24           ` Ramkumar Ramachandra
  1 sibling, 1 reply; 18+ messages in thread
From: Drew Northup @ 2011-01-20 13:53 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Jonathan Nieder, Junio C Hamano, Git List, David Barr,
	Sverre Rabbelier


On Thu, 2011-01-20 at 10:20 +0530, Ramkumar Ramachandra wrote:
> Hi,
> 
> Jonathan Nieder writes:
> > Junio C Hamano wrote:
> > > Ramkumar Ramachandra <artagnon@gmail.com> writes:
> > > Just thinking aloud, but is it possible to write a filter that converts an
> > > arbitrary G-F-I stream with referenced blobs into a G-F-I stream without
> > > referenced blobs by inlining all the blobs?
> > 
> > to avoid complexity in the svn fast-import backend itself.
> > (Complicating detail: such a filter would presumably take responsibility
> > for --export-marks, so it might want a way to retrieve commit marks
> > from its downstream.)
> 
> This filter will need to persist every blob for the entire lifetime of
> the program. We can't possibly do it in-memory, so we have to find
> some way to persist them on-disk and retrieve them very
> quickly. Jonathan suggested using something like ToyoCabinet earlier-
> I'll start working and see what I come up with.

Is it worth including the extra dependency? Most systems that I'm in
frequent contact with already have some lightweight BDB implementation
already. I don't currently know of any with TokyoCabinet (or
KyotoCabinet for that matter) already in place. Besides, if all you're
doing is persisting blobs that you're likely to write out to disk
eventually anyway you might as well just do so once you have them and
keep an "index" (not to be confused with the Git Index, just lacking a
better word right now) of what you have in some standard in-memory
format (a heap?). From there you can build each commit into the Git
Index in the proper order once you have the required parts for
each--perhaps even re-using the blobs you've already dumped to disk
(mv'ing them or something).

Granted, there's a good likelihood that I'm missing something here, but
I don't see the point of adding external complexity (beyond what you're
currently stuck dealing with).

-- 
-Drew Northup
________________________________________________
"As opposed to vegetable or mineral error?"
-John Pescatore, SANS NewsBites Vol. 12 Num. 59

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] vcs-svn: Start working on the dumpfile producer
  2011-01-19  5:44 ` [PATCH 2/5] vcs-svn: Start working on the dumpfile producer Ramkumar Ramachandra
@ 2011-01-22  0:30   ` Junio C Hamano
  2011-01-22  9:45     ` Ramkumar Ramachandra
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2011-01-22  0:30 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git List, Jonathan Nieder, David Barr, Sverre Rabbelier

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Start off with some broad design sketches. Compile succeeds, but
> parser is incorrect. Include a Makefile rule to build it into
> vcs-svn/lib.a.
>
> Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>

I initially thought I should refrain from doing a usual picky review for
contrib/ material, but realized this does not go to contrib/, so...

> diff --git a/vcs-svn/dump_export.c b/vcs-svn/dump_export.c
> new file mode 100644
> index 0000000..04ede06
> --- /dev/null
> +++ b/vcs-svn/dump_export.c
> @@ -0,0 +1,73 @@
> +/*
> + * Licensed under a two-clause BSD-style license.
> + * See LICENSE for details.
> + */
> +
> +#include "git-compat-util.h"
> +#include "strbuf.h"
> +#include "line_buffer.h"
> +#include "dump_export.h"
> +
> +void dump_export_begin_rev(int revision, const char *revprops,
> +			int prop_len) {

Style. "{" at the beginning of the next line (same everywhere).

> +void dump_export_node(const char *path, enum node_kind kind,
> +		enum node_action action, unsigned long text_len,
> +		unsigned long copyfrom_rev, const char *copyfrom_path) {
> +	printf("Node-path: %s\n", path);
> +	printf("Node-kind: ");
> +	switch (action) {
> +	case NODE_KIND_NORMAL:
> +		printf("file\n");
> +		break;
> +	case NODE_KIND_EXECUTABLE:
> +		printf("file\n");
> +		break;
> +	case NODE_KIND_SYMLINK:
> +		printf("file\n");
> +		break;
> +	case NODE_KIND_GITLINK:
> +		printf("file\n");
> +		break;
> +	case NODE_KIND_SUBDIR:
> +		die("Unsupported: subdirectory");
> +	default:
> +		break;
> +	}

Hmph, is it expected that the repertoire of node-kind stay the same, or
will it be extended over time?  It might make sense to change the above
switch a more table-driven one.  The same for node-action that follows
this part of the code.

> diff --git a/vcs-svn/svnload.c b/vcs-svn/svnload.c
> new file mode 100644
> index 0000000..7043ae7
> --- /dev/null
> +++ b/vcs-svn/svnload.c
> @@ -0,0 +1,294 @@
> ...
> +#define SVN_DATE_FORMAT "%Y-%m-%dT%H:%M:%S.000000Z"
> +#define SVN_DATE_LEN 28
> +#define LENGTH_UNKNOWN (~0)
> +
> +static struct line_buffer input = LINE_BUFFER_INIT;
> +static struct strbuf blobs[100];
> +	

Is there a rationale for "100", or is it just a random number that can be
tweakable at the source level?  Would we want to have a symbolic constant
for it?

The "blank" line has a trailing whitespace.

> +static struct {
> +	unsigned long prop_len, text_len, copyfrom_rev, mark;
> +	int text_delta, prop_delta; /* Boolean */
> +	enum node_action action;
> +	enum node_kind kind;
> +	struct strbuf copyfrom_path, path;
> +} node_ctx;
> + ...
> +static void reset_node_ctx(void)
> +{
> +	node_ctx.prop_len = LENGTH_UNKNOWN;
> +	node_ctx.text_len = LENGTH_UNKNOWN;
> +	node_ctx.mark = 0;
> +	node_ctx.copyfrom_rev = 0;
> +	node_ctx.text_delta = -1;
> +	node_ctx.prop_delta = -1;

Does your "Boolean" type take values 0 or -1?  Or is it perhaps a tristate
false=0, true=1, unknown=-1?  If so, the comment on the type above needs
to be fixed.

> +	strbuf_reset(&node_ctx.copyfrom_path);
> +	strbuf_reset(&node_ctx.path);
> +}
> +
> +static void populate_props(struct strbuf *props, const char *author,
> +			const char *log, const char *date) {

Style on "{" (look everywhere).

> +	strbuf_reset(props);	

Trailing whitespace.

> +static void parse_author_line(char *val, struct strbuf *name,
> +			struct strbuf *email, struct strbuf *date) {
> +	char *t, *tz_off;
> +	char time_buf[SVN_DATE_LEN];
> +	const struct tm *tm_time;
> +
> +	/* Simon Hausmann <shausman@trolltech.com> 1170199019 +0100 */
> +	strbuf_reset(name);
> +	strbuf_reset(email);
> +	strbuf_reset(date);
> +	tz_off = strrchr(val, ' ');
> +	*tz_off++ = '\0';
> +	t = strrchr(val, ' ');
> +	*(t - 1) = '\0'; /* Ignore '>' from email */
> +	t ++;

Unwanted SP before "++" (look everywhere).

> +	tm_time = time_to_tm(strtoul(t, NULL, 10), atoi(tz_off));

Has your caller already verified tz_off is a reasonable integer?

> +void svnload_read(void) {
> ...
> +		switch (val - t - 1) {
> +		case 1:
> +			if (!memcmp(t, "D", 1)) {
> +				node_ctx.action = NODE_ACTION_DELETE;
> +			}
> +			else if (!memcmp(t, "C", 1)) {

Style: "} else if (...) {".

> +				node_ctx.action = NODE_ACTION_ADD;
> +			}
> +			else if (!memcmp(t, "R", 1)) {
> +				node_ctx.action = NODE_ACTION_REPLACE;
> +			}
> +			else if (!memcmp(t, "M", 1)) {
> +				node_ctx.action = NODE_ACTION_CHANGE;
> +				mode_incr = 7;
> +				if (!memcmp(val, "100644", 6))
> +					node_ctx.kind = NODE_KIND_NORMAL;
> +				else if (!memcmp(val, "100755", 6))
> +					node_ctx.kind = NODE_KIND_EXECUTABLE;
> +				else if (!memcmp(val, "120000", 6))
> +					node_ctx.kind = NODE_KIND_SYMLINK;
> +				else if (!memcmp(val, "160000", 6))
> +					node_ctx.kind = NODE_KIND_GITLINK;
> +				else if (!memcmp(val, "040000", 6))

Is <mode> guaranteed to be a 6-digit octal, padded with 0 to the left?

The manual page of git-fast-import seems to hint that is the case, but I
am double checking, as the leading zero is an error in the tree object.

> +					node_ctx.kind = NODE_KIND_SUBDIR;
> +				else {
> +					if (!memcmp(val, "755", 3))
> +						node_ctx.kind = NODE_KIND_EXECUTABLE;
> +					else if(!memcmp(val, "644", 3))

Style; missing SP after "if/switch" (look everywhere).

> +						node_ctx.kind = NODE_KIND_NORMAL;
> +					else
> +						die("Unrecognized mode: %s", val);
> +					mode_incr = 4;
> +				}
> +				val += mode_incr;

Hmm, this whole thing is ugly.

Perhaps a table-lookup, or at least a helper function that translates
"val" to node-kind (while advancing val as the side effect) may make it
easier to read?

> +				t = strchr(val, ' ');
> +				*t++ = '\0';
> +				strbuf_reset(&node_ctx.path);
> +				strbuf_add(&node_ctx.path, t, strlen(t));
> +				if (!memcmp(val + 1, "inline", 6))
> +					die("Unsupported dataref: inline");
> +				else if (*val == ':')
> +					to_dump = &blobs[strtoul(val + 1, NULL, 10)];

Has anybody so far verified the decimal number at (val+1) is a reasonable
value, or am I seeing a future CVE here?

Do we care what follows the len of digits on this line?  Does a line in G-F-I
stream allow trailing garbage (this question is not limited to this part
of the code)?
> +int svnload_init(const char *filename)
> +{
> +	int i;
> +	if (buffer_init(&input, filename))
> +		return error("cannot open %s: %s", filename, strerror(errno));
> +	active_ctx = UNKNOWN_CTX;
> +	strbuf_init(&rev_ctx.props, MAX_GITSVN_LINE_LEN);
> +	...
> +	strbuf_init(&node_ctx.copyfrom_path, MAX_GITSVN_LINE_LEN);
> +	for (i = 0; i < 100; i ++)
> +		strbuf_init(&blobs[i], 10000);
> +	return 0;
> +}
> +
> +void svnload_deinit(void)
> +{
> +	int i;
> +	reset_rev_ctx(0);
> +	reset_node_ctx();
> +	strbuf_release(&rev_ctx.props);
> +	...
> +	strbuf_release(&node_ctx.copyfrom_path);
> +	for (i = 0; i < 100; i ++)
> +		strbuf_release(&blobs[i]);
> +	if (buffer_deinit(&input))
> +		fprintf(stderr, "Input error\n");
> +	if (ferror(stdout))
> +		fprintf(stderr, "Output error\n");
> +}

Whoever needs to add an element to rev_ctx must consistently touch reset,
init and deinit.  Would it help him/her if you moved the code so that
these functions are close together from the beginning?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] fast-export: Introduce --inline-blobs
  2011-01-19  5:44 ` [PATCH 4/5] fast-export: Introduce --inline-blobs Ramkumar Ramachandra
  2011-01-19 19:50   ` Junio C Hamano
@ 2011-01-22  0:30   ` Junio C Hamano
  1 sibling, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2011-01-22  0:30 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git List, Jonathan Nieder, David Barr, Sverre Rabbelier

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> +				buf = read_sha1_file(spec->sha1, &type, &size);
> +				if (!buf)
> +					die ("Could not read blob %s",
> +						sha1_to_hex(spec->sha1));

Style; no SP after "die", as it is a function, not syntactic elements like
if/switch/while (look everywhere).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] fast-export: Introduce --inline-blobs
  2011-01-20 13:53         ` Drew Northup
@ 2011-01-22  9:24           ` Ramkumar Ramachandra
  2011-01-22 19:18             ` Jonathan Nieder
  0 siblings, 1 reply; 18+ messages in thread
From: Ramkumar Ramachandra @ 2011-01-22  9:24 UTC (permalink / raw)
  To: Drew Northup
  Cc: Jonathan Nieder, Junio C Hamano, Git List, David Barr,
	Sverre Rabbelier

Hi Drew,

Drew Northup writes:
> On Thu, 2011-01-20 at 10:20 +0530, Ramkumar Ramachandra wrote:
> > Hi,
> > Jonathan Nieder writes:
> > > Junio C Hamano wrote:
> > > > Ramkumar Ramachandra <artagnon@gmail.com> writes:
> > > > Just thinking aloud, but is it possible to write a filter that converts an
> > > > arbitrary G-F-I stream with referenced blobs into a G-F-I stream without
> > > > referenced blobs by inlining all the blobs?
> > > 
> > > to avoid complexity in the svn fast-import backend itself.
> > > (Complicating detail: such a filter would presumably take responsibility
> > > for --export-marks, so it might want a way to retrieve commit marks
> > > from its downstream.)
> > 
> > This filter will need to persist every blob for the entire lifetime of
> > the program. We can't possibly do it in-memory, so we have to find
> > some way to persist them on-disk and retrieve them very
> > quickly. Jonathan suggested using something like ToyoCabinet earlier-
> > I'll start working and see what I come up with.
> 
> Is it worth including the extra dependency? Most systems that I'm in
> frequent contact with already have some lightweight BDB implementation
> already. I don't currently know of any with TokyoCabinet (or
> KyotoCabinet for that matter) already in place. Besides, if all you're
> doing is persisting blobs that you're likely to write out to disk
> eventually anyway you might as well just do so once you have them and
> keep an "index" (not to be confused with the Git Index, just lacking a
> better word right now) of what you have in some standard in-memory
> format (a heap?). From there you can build each commit into the Git
> Index in the proper order once you have the required parts for
> each--perhaps even re-using the blobs you've already dumped to disk
> (mv'ing them or something).

Agreed. I wouldn't like to introduce an extra dependency either. I was
talking about using it for prototyping- if the final version includes
an extra dependency, it's unlikely to get merged into git.git :) The
final design will probably use an in-memory B+ tree, but I haven't
thought about that hard enough.

-- Ram

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] vcs-svn: Start working on the dumpfile producer
  2011-01-22  0:30   ` Junio C Hamano
@ 2011-01-22  9:45     ` Ramkumar Ramachandra
  0 siblings, 0 replies; 18+ messages in thread
From: Ramkumar Ramachandra @ 2011-01-22  9:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Jonathan Nieder, David Barr, Sverre Rabbelier

Hi Junio,

Junio C Hamano writes:
> Ramkumar Ramachandra <artagnon@gmail.com> writes:
> > Start off with some broad design sketches. Compile succeeds, but
> > parser is incorrect. Include a Makefile rule to build it into
> > vcs-svn/lib.a.
> >
> > Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
> 
> I initially thought I should refrain from doing a usual picky review for
> contrib/ material, but realized this does not go to contrib/, so...

Thanks for the review. A new iteration including tests is almost
ready- your review came in at the perfect time :)

> > diff --git a/vcs-svn/dump_export.c b/vcs-svn/dump_export.c
> > new file mode 100644
> > index 0000000..04ede06
> > --- /dev/null
> > +++ b/vcs-svn/dump_export.c
> > ...
> > +void dump_export_begin_rev(int revision, const char *revprops,
> > +			int prop_len) {
> 
> Style. "{" at the beginning of the next line (same everywhere).

Fixed.

> > +void dump_export_node(const char *path, enum node_kind kind,
> > +		enum node_action action, unsigned long text_len,
> > +		unsigned long copyfrom_rev, const char *copyfrom_path) {
> > +	printf("Node-path: %s\n", path);
> > +	printf("Node-kind: ");
> > +	switch (action) {
> > +	case NODE_KIND_NORMAL:
> > +		printf("file\n");
> > +		break;
> > +	case NODE_KIND_EXECUTABLE:
> > +		printf("file\n");
> > +		break;
> > +	case NODE_KIND_SYMLINK:
> > +		printf("file\n");
> > +		break;
> > +	case NODE_KIND_GITLINK:
> > +		printf("file\n");
> > +		break;
> > +	case NODE_KIND_SUBDIR:
> > +		die("Unsupported: subdirectory");
> > +	default:
> > +		break;
> > +	}
> 
> Hmph, is it expected that the repertoire of node-kind stay the same, or
> will it be extended over time?  It might make sense to change the above
> switch a more table-driven one.  The same for node-action that follows
> this part of the code.

I needed more than a lookup table, because I wanted the flexibility to
execute arbitrary statements in each case. It's extended in the latest
version (will come up on the list soon).

> > diff --git a/vcs-svn/svnload.c b/vcs-svn/svnload.c
> > new file mode 100644
> > index 0000000..7043ae7
> > --- /dev/null
> > +++ b/vcs-svn/svnload.c
> > @@ -0,0 +1,294 @@
> > ...
> > +static struct strbuf blobs[100];
> > +	
> 
> Is there a rationale for "100", or is it just a random number that can be
> tweakable at the source level?  Would we want to have a symbolic constant
> for it?
> 
> The "blank" line has a trailing whitespace.

Fixed. I used this to illustrate the problem with persisting blobs- in
this version, I persist 100 blobs; this is totally arbitrary and
inextensible. Since the latest version leverages the --inline-blobs
feature of fast export, this is totally unecessary.

> > +static struct {
> > +	unsigned long prop_len, text_len, copyfrom_rev, mark;
> > +	int text_delta, prop_delta; /* Boolean */
> > ...
> > +static void reset_node_ctx(void)
> > ...
> > +	node_ctx.text_delta = -1;
> > +	node_ctx.prop_delta = -1;
> 
> Does your "Boolean" type take values 0 or -1?  Or is it perhaps a tristate
> false=0, true=1, unknown=-1?  If so, the comment on the type above needs
> to be fixed.

Fixed. Thanks for pointing this out.

> > +	strbuf_reset(&node_ctx.copyfrom_path);
> > +	strbuf_reset(&node_ctx.path);
> > +}
> > +
> > +static void populate_props(struct strbuf *props, const char *author,
> > +			const char *log, const char *date) {
> 
> Style on "{" (look everywhere).

Fixed.

> > +	strbuf_reset(props);	
> 
> Trailing whitespace.

Fixed.

> > ...
> > +	t ++;
> 
> Unwanted SP before "++" (look everywhere).

Fixed.

> > +	tm_time = time_to_tm(strtoul(t, NULL, 10), atoi(tz_off));
> 
> Has your caller already verified tz_off is a reasonable integer?

Fixed.

> > ...
> > +			if (!memcmp(t, "D", 1)) {
> > +				node_ctx.action = NODE_ACTION_DELETE;
> > +			}
> > +			else if (!memcmp(t, "C", 1)) {
> 
> Style: "} else if (...) {".

Fixed.

> > ...
> > +					node_ctx.kind = NODE_KIND_GITLINK;
> > +				else if (!memcmp(val, "040000", 6))
> 
> Is <mode> guaranteed to be a 6-digit octal, padded with 0 to the left?
> 
> The manual page of git-fast-import seems to hint that is the case, but I
> am double checking, as the leading zero is an error in the tree object.

The documentation and commit messages seem to hint that this is the
case. I'm digging to see if there's anything wrong with this.

> > +					node_ctx.kind = NODE_KIND_SUBDIR;
> > +				else {
> > +					if (!memcmp(val, "755", 3))
> > +						node_ctx.kind = NODE_KIND_EXECUTABLE;
> > +					else if(!memcmp(val, "644", 3))
> 
> Style; missing SP after "if/switch" (look everywhere).

Fixed.

> > +						node_ctx.kind = NODE_KIND_NORMAL;
> > +					else
> > +						die("Unrecognized mode: %s", val);
> > +					mode_incr = 4;
> > +				}
> > +				val += mode_incr;
> 
> Hmm, this whole thing is ugly.
> 
> Perhaps a table-lookup, or at least a helper function that translates
> "val" to node-kind (while advancing val as the side effect) may make it
> easier to read?

Fixed. I used a helper function.

> > ...
> > +					to_dump = &blobs[strtoul(val + 1, NULL, 10)];
> 
> Has anybody so far verified the decimal number at (val+1) is a reasonable
> value, or am I seeing a future CVE here?

Fixed.

> Do we care what follows the len of digits on this line?  Does a line in G-F-I
> stream allow trailing garbage (this question is not limited to this part
> of the code)?

Hm, I haven't thought about this hard enough. Currently, both svn-fi
and svn-fe ignore any trailing garbage/ commands they don't
understand. Are there some issues that we haven't anticipated?

> > +int svnload_init(const char *filename)
> > ...
> > +void svnload_deinit(void)
>
> Whoever needs to add an element to rev_ctx must consistently touch reset,
> init and deinit.  Would it help him/her if you moved the code so that
> these functions are close together from the beginning?

Fixed.

-- Ram

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] fast-export: Introduce --inline-blobs
  2011-01-22  9:24           ` Ramkumar Ramachandra
@ 2011-01-22 19:18             ` Jonathan Nieder
  0 siblings, 0 replies; 18+ messages in thread
From: Jonathan Nieder @ 2011-01-22 19:18 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Drew Northup, Junio C Hamano, Git List, David Barr,
	Sverre Rabbelier

Ramkumar Ramachandra wrote:

> Agreed. I wouldn't like to introduce an extra dependency either. I was
> talking about using it for prototyping- if the final version includes
> an extra dependency, it's unlikely to get merged into git.git :) The
> final design will probably use an in-memory B+ tree, but I haven't
> thought about that hard enough.

Immediate reaction:

Please no.  There is a value to simplicity.

As you mention, the final form is a way off, so as long as people are
careful not to get locked into bad implementation decisions, I think
it is okay.  I have refrained from nitpicking the implementation so
far because the design and interface are not obvious yet.

In this particular case, I am dreaming that we will discover a hidden
"mkdir -p" node-action in the dumpfile format so the list of
directories will not be needed. ;-)

Re Junio's critique:

is it possible to use

1) a table with callbacks?  See the source code to unifdef for
   inspiration.

2) separate code paths for different input states?  fast-import.c
   does this.

3) separate "parsing" and "acting" code?  That can open the door to a
   little paralellism, though not necessarily enough to matter.

	parser                              actor

	read a chunk
	determine the first "thing to do"
	                                    pick up the first "thing to do"
	                                    do it
	determine the next "thing to do"
	                                    pick up the next "thing to do"
	                                    do it

   There is potential parallelism because the parser can keep
   chugging along if the actor is blocked, say, writing its
   output to the network.

   Syntactically: the actor function (write_dump) calls the parser
   function (next_command) to ask what to do next.  If wanted, a later
   refactoring could make that parser function just grab an action off
   of a queue, while the parser proper runs in the background.
  
   And of course if the "thing to do" data structure is simple enough,
   this can also make the code easier to read.

I mention these ideas because some of them (especially #2) could make
the prototyping a lot easier as well as resulting in code that is
easier to review.

Re error handling:

Writing robust code (e.g., checking for errors) is also a lot easier
when done from the start.  The svn-fe error handling is known to be a
problem (see BUGS in contrib/svn-fe/svn-fe.txt).  So yes, I also
consider avoiding segfaults and deadlocks and catching parse errors to
be worthwhile things.

Thanks.
Jonathan

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-01-22 19:18 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-19  5:44 [RFC PATCH v2 0/5] Towards a Git-to-SVN bridge Ramkumar Ramachandra
2011-01-19  5:44 ` [PATCH 1/5] date: Expose the time_to_tm function Ramkumar Ramachandra
2011-01-19  5:44 ` [PATCH 2/5] vcs-svn: Start working on the dumpfile producer Ramkumar Ramachandra
2011-01-22  0:30   ` Junio C Hamano
2011-01-22  9:45     ` Ramkumar Ramachandra
2011-01-19  5:44 ` [PATCH 3/5] Build an svn-fi target in contrib/svn-fe Ramkumar Ramachandra
2011-01-19  5:44 ` [PATCH 4/5] fast-export: Introduce --inline-blobs Ramkumar Ramachandra
2011-01-19 19:50   ` Junio C Hamano
2011-01-19 21:48     ` Jonathan Nieder
2011-01-20  4:50       ` Ramkumar Ramachandra
2011-01-20  5:48         ` Jonathan Nieder
2011-01-20  6:28           ` Ramkumar Ramachandra
2011-01-20 13:53         ` Drew Northup
2011-01-22  9:24           ` Ramkumar Ramachandra
2011-01-22 19:18             ` Jonathan Nieder
2011-01-20  5:41     ` Jonathan Nieder
2011-01-22  0:30   ` Junio C Hamano
2011-01-19  5:44 ` [PATCH 5/5] vcs-svn: Add dir_cache for svnload Ramkumar Ramachandra

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).