git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Ramkumar Ramachandra <artagnon@gmail.com>
Cc: Git List <git@vger.kernel.org>,
	Jonathan Nieder <jrnieder@gmail.com>,
	David Barr <david.barr@cordelta.com>,
	Sverre Rabbelier <srabbelier@gmail.com>
Subject: Re: [PATCH 2/5] vcs-svn: Start working on the dumpfile producer
Date: Fri, 21 Jan 2011 16:30:18 -0800	[thread overview]
Message-ID: <7v39olok4l.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: 1295415899-1192-3-git-send-email-artagnon@gmail.com

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Start off with some broad design sketches. Compile succeeds, but
> parser is incorrect. Include a Makefile rule to build it into
> vcs-svn/lib.a.
>
> Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>

I initially thought I should refrain from doing a usual picky review for
contrib/ material, but realized this does not go to contrib/, so...

> diff --git a/vcs-svn/dump_export.c b/vcs-svn/dump_export.c
> new file mode 100644
> index 0000000..04ede06
> --- /dev/null
> +++ b/vcs-svn/dump_export.c
> @@ -0,0 +1,73 @@
> +/*
> + * Licensed under a two-clause BSD-style license.
> + * See LICENSE for details.
> + */
> +
> +#include "git-compat-util.h"
> +#include "strbuf.h"
> +#include "line_buffer.h"
> +#include "dump_export.h"
> +
> +void dump_export_begin_rev(int revision, const char *revprops,
> +			int prop_len) {

Style. "{" at the beginning of the next line (same everywhere).

> +void dump_export_node(const char *path, enum node_kind kind,
> +		enum node_action action, unsigned long text_len,
> +		unsigned long copyfrom_rev, const char *copyfrom_path) {
> +	printf("Node-path: %s\n", path);
> +	printf("Node-kind: ");
> +	switch (action) {
> +	case NODE_KIND_NORMAL:
> +		printf("file\n");
> +		break;
> +	case NODE_KIND_EXECUTABLE:
> +		printf("file\n");
> +		break;
> +	case NODE_KIND_SYMLINK:
> +		printf("file\n");
> +		break;
> +	case NODE_KIND_GITLINK:
> +		printf("file\n");
> +		break;
> +	case NODE_KIND_SUBDIR:
> +		die("Unsupported: subdirectory");
> +	default:
> +		break;
> +	}

Hmph, is it expected that the repertoire of node-kind stay the same, or
will it be extended over time?  It might make sense to change the above
switch a more table-driven one.  The same for node-action that follows
this part of the code.

> diff --git a/vcs-svn/svnload.c b/vcs-svn/svnload.c
> new file mode 100644
> index 0000000..7043ae7
> --- /dev/null
> +++ b/vcs-svn/svnload.c
> @@ -0,0 +1,294 @@
> ...
> +#define SVN_DATE_FORMAT "%Y-%m-%dT%H:%M:%S.000000Z"
> +#define SVN_DATE_LEN 28
> +#define LENGTH_UNKNOWN (~0)
> +
> +static struct line_buffer input = LINE_BUFFER_INIT;
> +static struct strbuf blobs[100];
> +	

Is there a rationale for "100", or is it just a random number that can be
tweakable at the source level?  Would we want to have a symbolic constant
for it?

The "blank" line has a trailing whitespace.

> +static struct {
> +	unsigned long prop_len, text_len, copyfrom_rev, mark;
> +	int text_delta, prop_delta; /* Boolean */
> +	enum node_action action;
> +	enum node_kind kind;
> +	struct strbuf copyfrom_path, path;
> +} node_ctx;
> + ...
> +static void reset_node_ctx(void)
> +{
> +	node_ctx.prop_len = LENGTH_UNKNOWN;
> +	node_ctx.text_len = LENGTH_UNKNOWN;
> +	node_ctx.mark = 0;
> +	node_ctx.copyfrom_rev = 0;
> +	node_ctx.text_delta = -1;
> +	node_ctx.prop_delta = -1;

Does your "Boolean" type take values 0 or -1?  Or is it perhaps a tristate
false=0, true=1, unknown=-1?  If so, the comment on the type above needs
to be fixed.

> +	strbuf_reset(&node_ctx.copyfrom_path);
> +	strbuf_reset(&node_ctx.path);
> +}
> +
> +static void populate_props(struct strbuf *props, const char *author,
> +			const char *log, const char *date) {

Style on "{" (look everywhere).

> +	strbuf_reset(props);	

Trailing whitespace.

> +static void parse_author_line(char *val, struct strbuf *name,
> +			struct strbuf *email, struct strbuf *date) {
> +	char *t, *tz_off;
> +	char time_buf[SVN_DATE_LEN];
> +	const struct tm *tm_time;
> +
> +	/* Simon Hausmann <shausman@trolltech.com> 1170199019 +0100 */
> +	strbuf_reset(name);
> +	strbuf_reset(email);
> +	strbuf_reset(date);
> +	tz_off = strrchr(val, ' ');
> +	*tz_off++ = '\0';
> +	t = strrchr(val, ' ');
> +	*(t - 1) = '\0'; /* Ignore '>' from email */
> +	t ++;

Unwanted SP before "++" (look everywhere).

> +	tm_time = time_to_tm(strtoul(t, NULL, 10), atoi(tz_off));

Has your caller already verified tz_off is a reasonable integer?

> +void svnload_read(void) {
> ...
> +		switch (val - t - 1) {
> +		case 1:
> +			if (!memcmp(t, "D", 1)) {
> +				node_ctx.action = NODE_ACTION_DELETE;
> +			}
> +			else if (!memcmp(t, "C", 1)) {

Style: "} else if (...) {".

> +				node_ctx.action = NODE_ACTION_ADD;
> +			}
> +			else if (!memcmp(t, "R", 1)) {
> +				node_ctx.action = NODE_ACTION_REPLACE;
> +			}
> +			else if (!memcmp(t, "M", 1)) {
> +				node_ctx.action = NODE_ACTION_CHANGE;
> +				mode_incr = 7;
> +				if (!memcmp(val, "100644", 6))
> +					node_ctx.kind = NODE_KIND_NORMAL;
> +				else if (!memcmp(val, "100755", 6))
> +					node_ctx.kind = NODE_KIND_EXECUTABLE;
> +				else if (!memcmp(val, "120000", 6))
> +					node_ctx.kind = NODE_KIND_SYMLINK;
> +				else if (!memcmp(val, "160000", 6))
> +					node_ctx.kind = NODE_KIND_GITLINK;
> +				else if (!memcmp(val, "040000", 6))

Is <mode> guaranteed to be a 6-digit octal, padded with 0 to the left?

The manual page of git-fast-import seems to hint that is the case, but I
am double checking, as the leading zero is an error in the tree object.

> +					node_ctx.kind = NODE_KIND_SUBDIR;
> +				else {
> +					if (!memcmp(val, "755", 3))
> +						node_ctx.kind = NODE_KIND_EXECUTABLE;
> +					else if(!memcmp(val, "644", 3))

Style; missing SP after "if/switch" (look everywhere).

> +						node_ctx.kind = NODE_KIND_NORMAL;
> +					else
> +						die("Unrecognized mode: %s", val);
> +					mode_incr = 4;
> +				}
> +				val += mode_incr;

Hmm, this whole thing is ugly.

Perhaps a table-lookup, or at least a helper function that translates
"val" to node-kind (while advancing val as the side effect) may make it
easier to read?

> +				t = strchr(val, ' ');
> +				*t++ = '\0';
> +				strbuf_reset(&node_ctx.path);
> +				strbuf_add(&node_ctx.path, t, strlen(t));
> +				if (!memcmp(val + 1, "inline", 6))
> +					die("Unsupported dataref: inline");
> +				else if (*val == ':')
> +					to_dump = &blobs[strtoul(val + 1, NULL, 10)];

Has anybody so far verified the decimal number at (val+1) is a reasonable
value, or am I seeing a future CVE here?

Do we care what follows the len of digits on this line?  Does a line in G-F-I
stream allow trailing garbage (this question is not limited to this part
of the code)?
> +int svnload_init(const char *filename)
> +{
> +	int i;
> +	if (buffer_init(&input, filename))
> +		return error("cannot open %s: %s", filename, strerror(errno));
> +	active_ctx = UNKNOWN_CTX;
> +	strbuf_init(&rev_ctx.props, MAX_GITSVN_LINE_LEN);
> +	...
> +	strbuf_init(&node_ctx.copyfrom_path, MAX_GITSVN_LINE_LEN);
> +	for (i = 0; i < 100; i ++)
> +		strbuf_init(&blobs[i], 10000);
> +	return 0;
> +}
> +
> +void svnload_deinit(void)
> +{
> +	int i;
> +	reset_rev_ctx(0);
> +	reset_node_ctx();
> +	strbuf_release(&rev_ctx.props);
> +	...
> +	strbuf_release(&node_ctx.copyfrom_path);
> +	for (i = 0; i < 100; i ++)
> +		strbuf_release(&blobs[i]);
> +	if (buffer_deinit(&input))
> +		fprintf(stderr, "Input error\n");
> +	if (ferror(stdout))
> +		fprintf(stderr, "Output error\n");
> +}

Whoever needs to add an element to rev_ctx must consistently touch reset,
init and deinit.  Would it help him/her if you moved the code so that
these functions are close together from the beginning?

  reply	other threads:[~2011-01-22  0:30 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-19  5:44 [RFC PATCH v2 0/5] Towards a Git-to-SVN bridge Ramkumar Ramachandra
2011-01-19  5:44 ` [PATCH 1/5] date: Expose the time_to_tm function Ramkumar Ramachandra
2011-01-19  5:44 ` [PATCH 2/5] vcs-svn: Start working on the dumpfile producer Ramkumar Ramachandra
2011-01-22  0:30   ` Junio C Hamano [this message]
2011-01-22  9:45     ` Ramkumar Ramachandra
2011-01-19  5:44 ` [PATCH 3/5] Build an svn-fi target in contrib/svn-fe Ramkumar Ramachandra
2011-01-19  5:44 ` [PATCH 4/5] fast-export: Introduce --inline-blobs Ramkumar Ramachandra
2011-01-19 19:50   ` Junio C Hamano
2011-01-19 21:48     ` Jonathan Nieder
2011-01-20  4:50       ` Ramkumar Ramachandra
2011-01-20  5:48         ` Jonathan Nieder
2011-01-20  6:28           ` Ramkumar Ramachandra
2011-01-20 13:53         ` Drew Northup
2011-01-22  9:24           ` Ramkumar Ramachandra
2011-01-22 19:18             ` Jonathan Nieder
2011-01-20  5:41     ` Jonathan Nieder
2011-01-22  0:30   ` Junio C Hamano
2011-01-19  5:44 ` [PATCH 5/5] vcs-svn: Add dir_cache for svnload Ramkumar Ramachandra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7v39olok4l.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=artagnon@gmail.com \
    --cc=david.barr@cordelta.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=srabbelier@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).