git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: larsxschneider@gmail.com
Cc: git@vger.kernel.org, peff@peff.net, sbeller@google.com,
	Johannes.Schindelin@gmx.de, jnareb@gmail.com, mlbright@gmail.com
Subject: Re: [PATCH v6 06/13] pkt-line: add functions to read/write flush terminated packet streams
Date: Thu, 25 Aug 2016 15:27:16 -0700	[thread overview]
Message-ID: <xmqqzio0jxh7.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20160825110752.31581-7-larsxschneider@gmail.com> (larsxschneider@gmail.com's message of "Thu, 25 Aug 2016 13:07:45 +0200")

larsxschneider@gmail.com writes:

> From: Lars Schneider <larsxschneider@gmail.com>
>
> packet_write_stream_with_flush_from_fd() and
> packet_write_stream_with_flush_from_buf() write a stream of packets. All
> content packets use the maximal packet size except for the last one.
> After the last content packet a `flush` control packet is written.
> packet_read_till_flush() reads arbitrary sized packets until it detects
> a `flush` packet.

These are awkwardly named and I couldn't guess what the input is (I
can tell one is to read from fd and the other is <mem,len> buffer,
but it is unclear if that is in packetized form or just raw data
stream to be copied to the end from their names) without reading the
implementation.  I _think_ you read a raw stream of data through the
end (either EOF or length limit) and write it out packetized, and
use the flush packet to mark the end of the stream.  In my mind,
that is "writing a packetized stream".  The words "packetizing" and
"stream" imply that the stream could consist of more data than what
would fit in a single packet, which in turn implies that there needs
a way to mark the end of one data item, so with_flush does not
necessarily have to be their names.

The counter-part would be "reading a packetized stream".

> +int packet_write_stream_with_flush_from_fd(int fd_in, int fd_out)
> +{

Especially this one I am tempted to suggest "copy-to-packetized-stream",
as it reads a stream from one fd and then copies out while packetizing.

> +	int err = 0;
> +	ssize_t bytes_to_write;
> +
> +	while (!err) {
> +		bytes_to_write = xread(fd_in, packet_write_buffer, sizeof(packet_write_buffer) - 4);
> +		if (bytes_to_write < 0)
> +			return COPY_READ_ERROR;
> +		if (bytes_to_write == 0)
> +			break;
> +		if (bytes_to_write > sizeof(packet_write_buffer) - 4)
> +			return COPY_WRITE_ERROR;

... and you seem to agree with me by using COPY here.

> +		err = packet_write_gently(fd_out, packet_write_buffer, bytes_to_write);
> +	}
> +	if (!err)
> +		err = packet_flush_gently(fd_out);
> +	return err;
> +}
> +
> +int packet_write_stream_with_flush_from_buf(const char *src_in, size_t len, int fd_out)
> +{
> +	int err = 0;
> +	size_t bytes_written = 0;
> +	size_t bytes_to_write;
> +
> +	while (!err) {
> +		if ((len - bytes_written) > sizeof(packet_write_buffer) - 4)
> +			bytes_to_write = sizeof(packet_write_buffer) - 4;
> +		else
> +			bytes_to_write = len - bytes_written;
> +		if (bytes_to_write == 0)
> +			break;

The lack of COPY_WRITE_ERROR puzzled me briefly here.  If you are
assuming that your math at the beginning of this loop is correct and
bytes_to_write will never exceed the write-buffer size, I think you
should be able to (and it would be better to) assume that the math
you do to tell xread() up to how many bytes it is allowed to read at
once is also correct, losing the COPY_WRITE_ERROR check in the other
function.  You can choose to play safer and do a check in this
function, too.  Either way, we would want to be consistent.

> +		err = packet_write_gently(fd_out, src_in + bytes_written, bytes_to_write);
> +		bytes_written += bytes_to_write;
> +	}
> +	if (!err)
> +		err = packet_flush_gently(fd_out);
> +	return err;
> +}

> +ssize_t packet_read_till_flush(int fd_in, struct strbuf *sb_out)
> +{
> +	int len, ret;
> +	int options = PACKET_READ_GENTLE_ON_EOF;
> +	char linelen[4];
> +
> +	size_t oldlen = sb_out->len;
> +	size_t oldalloc = sb_out->alloc;
> +
> +	for (;;) {
> +		/* Read packet header */
> +		ret = get_packet_data(fd_in, NULL, NULL, linelen, 4, options);
> +		if (ret < 0)
> +			goto done;
> +		len = packet_length(linelen);
> +		if (len < 0)
> +			die("protocol error: bad line length character: %.4s", linelen);
> +		if (!len) {
> +			/* Found a flush packet - Done! */
> +			packet_trace("0000", 4, 0);
> +			break;
> +		}
> +		len -= 4;
> +
> +		/* Read packet content */
> +		strbuf_grow(sb_out, len);
> +		ret = get_packet_data(fd_in, NULL, NULL, sb_out->buf + sb_out->len, len, options);
> +		if (ret < 0)
> +			goto done;
> +		if (ret != len) {
> +			error("protocol error: incomplete read (expected %d, got %d)", len, ret);
> +			goto done;
> +		}
> +
> +		packet_trace(sb_out->buf + sb_out->len, len, 0);

All of the above seems to pretty much duplicate the logic in
packet_read(), except that this user does not need options handling
it has.  Is optimizing that out the reason why you open-coded it
here?

Or is it because you cannot tell if you got a truly empty packet or
you got a flush from outside packet_read(), and you wanted to make
sure that you won't be fooled by a normal packet with 0-length
payload?

If the latter is the reason, it may be a viable alternative to
update packet_read() to take PACKET_READ_IGNORE_EMPTY_PACKET, i.e. a
new bit in its options parameter, so that a normal packet with
0-length payload is simply ignored there (i.e. even without
returning, packet_read() would repeat from the beginning when it got
such a packet).  That way, the above would become 

	strbuf_grow(); /* enough to hold max-packet-len more bytes */
	len = packet_read();
        if (!len)
        	/* we cannot get 0 unless we see flush */
                break;

which may be a lot cleaner?

> +		sb_out->len += len;
> +	}
> +
> +done:
> +	if (ret < 0) {
> +		if (oldalloc == 0)
> +			strbuf_release(sb_out);
> +		else
> +			strbuf_setlen(sb_out, oldlen);
> +		return ret;  /* unexpected EOF */
> +	}
> +	return sb_out->len - oldlen;
> +}
> diff --git a/pkt-line.h b/pkt-line.h
> index 3fa0899..9616117 100644
> --- a/pkt-line.h
> +++ b/pkt-line.h
> @@ -25,6 +25,8 @@ void packet_buf_flush(struct strbuf *buf);
>  void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
>  int packet_flush_gently(int fd);
>  int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
> +int packet_write_stream_with_flush_from_fd(int fd_in, int fd_out);
> +int packet_write_stream_with_flush_from_buf(const char *src_in, size_t len, int fd_out);
>  
>  /*
>   * Read a packetized line into the buffer, which must be at least size bytes
> @@ -77,6 +79,11 @@ char *packet_read_line(int fd, int *size);
>   */
>  char *packet_read_line_buf(char **src_buf, size_t *src_len, int *size);
>  
> +/*
> + * Reads a stream of variable sized packets until a flush packet is detected.
> + */
> +ssize_t packet_read_till_flush(int fd_in, struct strbuf *sb_out);
> +
>  #define DEFAULT_PACKET_MAX 1000
>  #define LARGE_PACKET_MAX 65520
>  extern char packet_buffer[LARGE_PACKET_MAX];

  parent reply	other threads:[~2016-08-25 22:45 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-25 11:07 [PATCH v6 00/13] Git filter protocol larsxschneider
2016-08-25 11:07 ` [PATCH v6 01/13] pkt-line: rename packet_write() to packet_write_fmt() larsxschneider
2016-08-25 11:07 ` [PATCH v6 02/13] pkt-line: extract set_packet_header() larsxschneider
2016-08-25 11:07 ` [PATCH v6 03/13] pkt-line: add packet_write_fmt_gently() larsxschneider
2016-08-25 18:12   ` Stefan Beller
2016-08-25 18:47     ` Lars Schneider
2016-08-25 21:41   ` Junio C Hamano
2016-08-26  9:17     ` Lars Schneider
2016-08-26 17:10       ` Junio C Hamano
2016-08-26 17:23         ` Jeff King
2016-08-25 11:07 ` [PATCH v6 04/13] pkt-line: add packet_flush_gently() larsxschneider
2016-08-25 11:07 ` [PATCH v6 05/13] pkt-line: add packet_write_gently() larsxschneider
2016-08-25 21:50   ` Junio C Hamano
2016-08-26  9:40     ` Lars Schneider
2016-08-26 17:15       ` Junio C Hamano
2016-08-29  9:40         ` Lars Schneider
2016-08-25 11:07 ` [PATCH v6 06/13] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
2016-08-25 18:46   ` Stefan Beller
2016-08-25 19:33     ` Lars Schneider
2016-08-25 22:31     ` Junio C Hamano
2016-08-26  0:55       ` Jacob Keller
2016-08-26 17:02         ` Stefan Beller
2016-08-26 17:21           ` Jeff King
2016-08-26 17:17         ` Junio C Hamano
2016-08-25 22:27   ` Junio C Hamano [this message]
2016-08-26 10:13     ` Lars Schneider
2016-08-26 17:21       ` Junio C Hamano
2016-08-29  9:43         ` Lars Schneider
2016-08-25 11:07 ` [PATCH v6 07/13] pack-protocol: fix maximum pkt-line size larsxschneider
2016-08-25 18:59   ` Stefan Beller
2016-08-25 19:35     ` Lars Schneider
2016-08-26 19:44       ` Junio C Hamano
2016-08-25 11:07 ` [PATCH v6 08/13] convert: quote filter names in error messages larsxschneider
2016-08-26 19:45   ` Junio C Hamano
2016-08-25 11:07 ` [PATCH v6 09/13] convert: modernize tests larsxschneider
2016-08-26 20:03   ` Junio C Hamano
2016-08-29 10:09     ` Lars Schneider
2016-08-25 11:07 ` [PATCH v6 10/13] convert: generate large test files only once larsxschneider
2016-08-25 19:17   ` Stefan Beller
2016-08-25 19:54     ` Lars Schneider
2016-08-29 17:52       ` Junio C Hamano
2016-08-30 11:47         ` Lars Schneider
2016-08-30 16:55           ` Junio C Hamano
2016-08-29 17:46   ` Junio C Hamano
2016-08-30 11:41     ` Lars Schneider
2016-08-30 16:37       ` Jeff King
2016-08-25 11:07 ` [PATCH v6 11/13] convert: make apply_filter() adhere to standard Git error handling larsxschneider
2016-08-25 11:07 ` [PATCH v6 12/13] convert: add filter.<driver>.process option larsxschneider
2016-08-29 22:21   ` Junio C Hamano
2016-08-30 16:27     ` Lars Schneider
2016-08-30 18:59       ` Junio C Hamano
2016-08-30 20:38         ` Lars Schneider
2016-08-30 22:23           ` Junio C Hamano
2016-08-31  4:57             ` Torsten Bögershausen
2016-08-31 13:14               ` Jakub Narębski
2016-08-30 20:46         ` Jakub Narębski
2016-09-05 19:47           ` Lars Schneider
2016-08-25 11:07 ` [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes larsxschneider
2016-08-29 18:05   ` Junio C Hamano
2016-08-29 19:03     ` Lars Schneider
2016-08-29 19:45       ` Junio C Hamano
2016-08-30 12:32         ` Lars Schneider
2016-08-30 14:54           ` Torsten Bögershausen
2016-09-01 17:15             ` Junio C Hamano
2016-08-29 15:39 ` [PATCH v6 00/13] Git filter protocol Lars Schneider
2016-08-29 18:09   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqzio0jxh7.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=larsxschneider@gmail.com \
    --cc=mlbright@gmail.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).