git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC/PATCH 1/2] bundle: allowing to read from an unseekable fd
@ 2011-10-13 22:32 Junio C Hamano
  2011-10-13 22:35 ` [PATCH 2/2] bundle: add parse_bundle_header() helper function Junio C Hamano
  2011-10-14 12:31 ` [RFC/PATCH 1/2] bundle: allowing to read from an unseekable fd Phil Hord
  0 siblings, 2 replies; 5+ messages in thread
From: Junio C Hamano @ 2011-10-13 22:32 UTC (permalink / raw
  To: Shawn O. Pearce; +Cc: git

The current code opens a given file with fopen(), reads it until the end
of the header and runs ftell(), and reopens the same file with open() and
seeks to skip the header. This structure makes it hard to retarget the
code to read from input that is not seekable, such as a network socket.

This patch by itself does not reach that goal yet, but I think it is a
right step in that direction.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * It would be nice if we can avoid byte-by-byte reading from the file
   descriptor by over-reading into the strbuf and pass the remainder to
   the caller of read_bundle_header(), but I suspect that it would require
   us to carry the "here is the remainder from the previous read" buffer
   around throughout the transport layer. Parsing of the header wouldn't
   be performance critical compared to the computation cost of actually
   reading the rest of the bundle, hopefully, so...

 bundle.c |   99 ++++++++++++++++++++++++++++++++++++++++----------------------
 1 files changed, 64 insertions(+), 35 deletions(-)

diff --git a/bundle.c b/bundle.c
index f48fd7d..3aa715c 100644
--- a/bundle.c
+++ b/bundle.c
@@ -23,49 +23,78 @@ static void add_to_ref_list(const unsigned char *sha1, const char *name,
 	list->nr++;
 }
 
-/* returns an fd */
+/* Eventually this should go to strbuf.[ch] */
+static int strbuf_readline_fd(struct strbuf *sb, int fd)
+{
+	strbuf_reset(sb);
+
+	while (1) {
+		char ch;
+		ssize_t len = xread(fd, &ch, 1);
+		if (len < 0)
+			return -1;
+		strbuf_addch(sb, ch);
+		if (ch == '\n')
+			break;
+	}
+	return 0;
+}
+
 int read_bundle_header(const char *path, struct bundle_header *header)
 {
-	char buffer[1024];
-	int fd;
-	long fpos;
-	FILE *ffd = fopen(path, "rb");
+	struct strbuf buf = STRBUF_INIT;
+	int fd = open(path, O_RDONLY);
+	int status = 0;
 
-	if (!ffd)
+	if (fd < 0)
 		return error("could not open '%s'", path);
-	if (!fgets(buffer, sizeof(buffer), ffd) ||
-			strcmp(buffer, bundle_signature)) {
-		fclose(ffd);
-		return error("'%s' does not look like a v2 bundle file", path);
+
+	/* The bundle header begins with the signature */
+	if (strbuf_readline_fd(&buf, fd) ||
+	    strcmp(buf.buf, bundle_signature)) {
+		error("'%s' does not look like a v2 bundle file", path);
+		status = -1;
+		goto abort;
 	}
-	while (fgets(buffer, sizeof(buffer), ffd)
-			&& buffer[0] != '\n') {
-		int is_prereq = buffer[0] == '-';
-		int offset = is_prereq ? 1 : 0;
-		int len = strlen(buffer);
+
+	/* The bundle header ends with an empty line */
+	while (!strbuf_readline_fd(&buf, fd) &&
+	       buf.len && buf.buf[0] != '\n') {
 		unsigned char sha1[20];
-		struct ref_list *list = is_prereq ? &header->prerequisites
-			: &header->references;
-		char delim;
-
-		if (len && buffer[len - 1] == '\n')
-			buffer[len - 1] = '\0';
-		if (get_sha1_hex(buffer + offset, sha1)) {
-			warning("unrecognized header: %s", buffer);
-			continue;
+		int is_prereq = 0;
+
+		if (*buf.buf == '-') {
+			is_prereq = 1;
+			strbuf_remove(&buf, 0, 1);
+		}
+		strbuf_rtrim(&buf);
+
+		/*
+		 * Tip lines have object name, SP, and refname.
+		 * Prerequisites have object name that is optionally
+		 * followed by SP and subject line.
+		 */
+		if (get_sha1_hex(buf.buf, sha1) ||
+		    (40 <= buf.len && !isspace(buf.buf[40])) ||
+		    (!is_prereq && buf.len <= 40)) {
+			error("unrecognized header: %s%s (%d)",
+			      (is_prereq ? "-" : ""), buf.buf, (int)buf.len);
+			status = -1;
+			break;
+		} else {
+			if (is_prereq)
+				add_to_ref_list(sha1, "", &header->prerequisites);
+			else
+				add_to_ref_list(sha1, buf.buf + 41, &header->references);
 		}
-		delim = buffer[40 + offset];
-		if (!isspace(delim) && (delim != '\0' || !is_prereq))
-			die ("invalid header: %s", buffer);
-		add_to_ref_list(sha1, isspace(delim) ?
-				buffer + 41 + offset : "", list);
 	}
-	fpos = ftell(ffd);
-	fclose(ffd);
-	fd = open(path, O_RDONLY);
-	if (fd < 0)
-		return error("could not open '%s'", path);
-	lseek(fd, fpos, SEEK_SET);
+
+ abort:
+	if (status) {
+		close(fd);
+		fd = -1;
+	}
+	strbuf_release(&buf);
 	return fd;
 }
 
-- 
1.7.7.289.gd0d4bb

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] bundle: add parse_bundle_header() helper function
  2011-10-13 22:32 [RFC/PATCH 1/2] bundle: allowing to read from an unseekable fd Junio C Hamano
@ 2011-10-13 22:35 ` Junio C Hamano
  2011-10-14 12:59   ` Phil Hord
  2011-10-14 12:31 ` [RFC/PATCH 1/2] bundle: allowing to read from an unseekable fd Phil Hord
  1 sibling, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2011-10-13 22:35 UTC (permalink / raw
  To: git; +Cc: Shawn O. Pearce, Phil Hord

Move most of the code from read_bundle_header() to parse_bundle_header()
that takes a file descriptor that is already opened for reading, and make
the former responsible only for opening the file and noticing errors.

As a logical consequence of this, is_bundle() helper function can be
implemented as a non-complaining variant of read_bundle_header() that
does not return an open file descriptor, and can be used to tighten
the check used to decide the use of bundle transport in transport.c

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * It generally is a bad practice to base a non-RFC patch on an RFC one,
   but in any case here is how I would do the is_bundle() helper.

 bundle.c    |   39 +++++++++++++++++++++++++++++++--------
 bundle.h    |    1 +
 transport.c |    2 +-
 3 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/bundle.c b/bundle.c
index 3aa715c..105b005 100644
--- a/bundle.c
+++ b/bundle.c
@@ -40,19 +40,18 @@ static int strbuf_readline_fd(struct strbuf *sb, int fd)
 	return 0;
 }
 
-int read_bundle_header(const char *path, struct bundle_header *header)
+static int parse_bundle_header(int fd, struct bundle_header *header,
+			       const char *report_path)
 {
 	struct strbuf buf = STRBUF_INIT;
-	int fd = open(path, O_RDONLY);
 	int status = 0;
 
-	if (fd < 0)
-		return error("could not open '%s'", path);
-
 	/* The bundle header begins with the signature */
 	if (strbuf_readline_fd(&buf, fd) ||
 	    strcmp(buf.buf, bundle_signature)) {
-		error("'%s' does not look like a v2 bundle file", path);
+		if (report_path)
+			error("'%s' does not look like a v2 bundle file",
+			      report_path);
 		status = -1;
 		goto abort;
 	}
@@ -77,8 +76,9 @@ int read_bundle_header(const char *path, struct bundle_header *header)
 		if (get_sha1_hex(buf.buf, sha1) ||
 		    (40 <= buf.len && !isspace(buf.buf[40])) ||
 		    (!is_prereq && buf.len <= 40)) {
-			error("unrecognized header: %s%s (%d)",
-			      (is_prereq ? "-" : ""), buf.buf, (int)buf.len);
+			if (report_path)
+				error("unrecognized header: %s%s (%d)",
+				      (is_prereq ? "-" : ""), buf.buf, (int)buf.len);
 			status = -1;
 			break;
 		} else {
@@ -98,6 +98,29 @@ int read_bundle_header(const char *path, struct bundle_header *header)
 	return fd;
 }
 
+int read_bundle_header(const char *path, struct bundle_header *header)
+{
+	int fd = open(path, O_RDONLY);
+
+	if (fd < 0)
+		return error("could not open '%s'", path);
+	return parse_bundle_header(fd, header, path);
+}
+
+int is_bundle(const char *path, int quiet)
+{
+	struct bundle_header header;
+	int fd = open(path, O_RDONLY);
+
+	if (fd < 0)
+		return 0;
+	memset(&header, 0, sizeof(header));
+	fd = parse_bundle_header(fd, &header, quiet ? NULL : path);
+	if (fd >= 0)
+		close(fd);
+	return (fd >= 0);
+}
+
 static int list_refs(struct ref_list *r, int argc, const char **argv)
 {
 	int i;
diff --git a/bundle.h b/bundle.h
index e2aedd6..c6228e2 100644
--- a/bundle.h
+++ b/bundle.h
@@ -14,6 +14,7 @@ struct bundle_header {
 	struct ref_list references;
 };
 
+int is_bundle(const char *path, int quiet);
 int read_bundle_header(const char *path, struct bundle_header *header);
 int create_bundle(struct bundle_header *header, const char *path,
 		int argc, const char **argv);
diff --git a/transport.c b/transport.c
index c9c8056..84d6480 100644
--- a/transport.c
+++ b/transport.c
@@ -913,7 +913,7 @@ struct transport *transport_get(struct remote *remote, const char *url)
 		ret->fetch = fetch_objs_via_rsync;
 		ret->push = rsync_transport_push;
 		ret->smart_options = NULL;
-	} else if (is_local(url) && is_file(url)) {
+	} else if (is_local(url) && is_file(url) && is_bundle(url, 1)) {
 		struct bundle_transport_data *data = xcalloc(1, sizeof(*data));
 		ret->data = data;
 		ret->get_refs_list = get_refs_from_bundle;
-- 
1.7.7.289.gd0d4bb

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC/PATCH 1/2] bundle: allowing to read from an unseekable fd
  2011-10-13 22:32 [RFC/PATCH 1/2] bundle: allowing to read from an unseekable fd Junio C Hamano
  2011-10-13 22:35 ` [PATCH 2/2] bundle: add parse_bundle_header() helper function Junio C Hamano
@ 2011-10-14 12:31 ` Phil Hord
  1 sibling, 0 replies; 5+ messages in thread
From: Phil Hord @ 2011-10-14 12:31 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Shawn O. Pearce, git

On Thu, Oct 13, 2011 at 6:32 PM, Junio C Hamano <gitster@pobox.com> wrote:
> The current code opens a given file with fopen(), reads it until the end
> of the header and runs ftell(), and reopens the same file with open() and
> seeks to skip the header. This structure makes it hard to retarget the
> code to read from input that is not seekable, such as a network socket.
>
> This patch by itself does not reach that goal yet, but I think it is a
> right step in that direction.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>
>  * It would be nice if we can avoid byte-by-byte reading from the file
>   descriptor by over-reading into the strbuf and pass the remainder to
>   the caller of read_bundle_header(), but I suspect that it would require
>   us to carry the "here is the remainder from the previous read" buffer
>   around throughout the transport layer. Parsing of the header wouldn't
>   be performance critical compared to the computation cost of actually
>   reading the rest of the bundle, hopefully, so...
>
>  bundle.c |   99 ++++++++++++++++++++++++++++++++++++++++----------------------
>  1 files changed, 64 insertions(+), 35 deletions(-)
>
> diff --git a/bundle.c b/bundle.c
> index f48fd7d..3aa715c 100644
> --- a/bundle.c
> +++ b/bundle.c
> @@ -23,49 +23,78 @@ static void add_to_ref_list(const unsigned char *sha1, const char *name,
>        list->nr++;
>  }
>
> -/* returns an fd */
> +/* Eventually this should go to strbuf.[ch] */
> +static int strbuf_readline_fd(struct strbuf *sb, int fd)

A size limiter would be useful here.  Since it's readline, maybe the
limit can even be hardcoded.

Without a limit, calling this with something stupid like "/dev/zero"
will consume all memory and never return.


Phil

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] bundle: add parse_bundle_header() helper function
  2011-10-13 22:35 ` [PATCH 2/2] bundle: add parse_bundle_header() helper function Junio C Hamano
@ 2011-10-14 12:59   ` Phil Hord
  2011-10-14 18:14     ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Phil Hord @ 2011-10-14 12:59 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, Shawn O. Pearce, Phil Hord

On Thu, Oct 13, 2011 at 6:35 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Move most of the code from read_bundle_header() to parse_bundle_header()
> that takes a file descriptor that is already opened for reading, and make
> the former responsible only for opening the file and noticing errors.
...

>  * It generally is a bad practice to base a non-RFC patch on an RFC one,
>   but in any case here is how I would do the is_bundle() helper.

I didn't label it RFC, but I did pose a question in the message
section.  Is there a rule for what marks something as RFC (if not the
RFC label in the subject line)?

Your implementation looks fine to me.

Phil

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] bundle: add parse_bundle_header() helper function
  2011-10-14 12:59   ` Phil Hord
@ 2011-10-14 18:14     ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2011-10-14 18:14 UTC (permalink / raw
  To: Phil Hord; +Cc: Junio C Hamano, git, Shawn O. Pearce, Phil Hord

Phil Hord <phil.hord@gmail.com> writes:

> On Thu, Oct 13, 2011 at 6:35 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Move most of the code from read_bundle_header() to parse_bundle_header()
>> that takes a file descriptor that is already opened for reading, and make
>> the former responsible only for opening the file and noticing errors.
> ...
>
>>  * It generally is a bad practice to base a non-RFC patch on an RFC one,
>>   but in any case here is how I would do the is_bundle() helper.
>
> I didn't label it RFC,...

I was referring to the fact that _my_ 2/2 you are responding to was meant
to be applied on top of _my_ 1/2 that was marked as RFC.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-10-14 18:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-13 22:32 [RFC/PATCH 1/2] bundle: allowing to read from an unseekable fd Junio C Hamano
2011-10-13 22:35 ` [PATCH 2/2] bundle: add parse_bundle_header() helper function Junio C Hamano
2011-10-14 12:59   ` Phil Hord
2011-10-14 18:14     ` Junio C Hamano
2011-10-14 12:31 ` [RFC/PATCH 1/2] bundle: allowing to read from an unseekable fd Phil Hord

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).