From: Jeff King <peff@peff.net>
To: larsxschneider@gmail.com
Cc: git@vger.kernel.org, gitster@pobox.com, jnareb@gmail.com,
tboegi@web.de, mlbright@gmail.com,
remi.galan-alfonso@ensimag.grenoble-inp.fr, pclouds@gmail.com,
e@80x24.org, ramsay@ramsayjones.plus.com
Subject: Re: [PATCH v2 5/5] convert: add filter.<driver>.process option
Date: Tue, 26 Jul 2016 21:32:51 -0400 [thread overview]
Message-ID: <20160727013251.GA12159@sigill.intra.peff.net> (raw)
In-Reply-To: <20160727000605.49982-6-larsxschneider@gmail.com>
On Wed, Jul 27, 2016 at 02:06:05AM +0200, larsxschneider@gmail.com wrote:
> +static off_t multi_packet_read(struct strbuf *sb, const int fd, const size_t size)
> +{
> + off_t bytes_read;
> + off_t total_bytes_read = 0;
I haven't looked carefully at the whole patch yet, but there seems to be
some type issues here. off_t is a good type for storing the whole size
of a file (which may be larger than the amount of memory we can
allocate). But size_t is the right size for an in-memory object.
This function takes a size_t size, which makes sense if it is meant to
read everything into a strbuf.
So I think our total_bytes_read would probably want to be a size_t here,
too, because it cannot possibly grow larger than that (and that is
enforced by the loop below). Otherwise you get weirdness like "sb->buf +
total_bytes_ref" possibly overflowing memory.
> + strbuf_grow(sb, size + 1); // we need one extra byte for the packet flush
What happens if size is the maximum for size_t here (i.e., 4GB-1 on a
32-bit system)?
> + do {
> + bytes_read = packet_read(
> + fd, NULL, NULL,
> + sb->buf + total_bytes_read, sb->len - total_bytes_read - 1,
> + PACKET_READ_GENTLE_ON_EOF
> + );
packet_read() actually returns an int, and may return "-1" on EOF (and
int is fine because we know that we are constrained to 16-bit values
by the pkt-line definition). You read it into an "off_t". I _think_ that
is OK, because I believe POSIX says off_t must be signed. But probably
"int" is the more correct type here.
> + total_bytes_read += bytes_read;
If you do get "-1", I think you need to detect it here before adjusting
total_bytes_read.
> + while (
> + bytes_read > 0 && // the last packet was no flush
> + sb->len - total_bytes_read - 1 > 0 // we still have space left in the buffer
> + );
And I'm not sure if you need to distinguish between "0" and "-1" when
checking byte_read here.
> + strbuf_setlen(sb, total_bytes_read);
Passing an off_t to something expecting a size_t, which can involve
truncation (though I think in practice you really are limited to
size_t).
> +static int multi_packet_write(const char *src, size_t len, const int in, const int out)
> +{
> + int ret = 1;
> + char header[4];
> + char buffer[8192];
> + off_t bytes_to_write;
> + while (ret) {
> + if (in >= 0) {
> + bytes_to_write = xread(in, buffer, sizeof(buffer));
Likewise here, xread() is returning ssize_t. Again, OK if we can assume
off_t is signed, but it probably makes sense to use the correct type (we
also know it cannot be larger than 8K, of course).
Why 8K? The pkt-line format naturally restricts us to just under 64K, so
why not take advantage of that and minimize the framing overhead for
large data?
> + if (bytes_to_write < 0)
> + ret &= 0;
I think "&= 0" is unusual for our codebase? Would just writing "= 0" be
more clear?
We do sometimes do "ret |= something()" but that is in cases where
"ret" is zero for success, and non-zero (usually -1) otherwise. Perhaps
your function's error-reporting is inverted from our usual style?
> + set_packet_header(header, bytes_to_write + 4);
> + ret &= write_in_full(out, &header, sizeof(header)) == sizeof(header);
> + ret &= write_in_full(out, src, bytes_to_write) == bytes_to_write;
> + }
If you look at format_packet(), it pulls a slight trick: we have a
buffer 4 bytes larger than we need, format into "buf + 4", and then
write the final size at the beginning. That lets us write() it all in
one go.
At first I thought this function was simply reinventing packet_write(),
but I guess you are trying to avoid the extra copy of the data (once
into the buffer from xread, and then again via format_packet just to add
the extra bytes at the beginning).
I agree with what Junio said elsewhere, that there may be a way to make
the pkt-line code handle this zero-copy situation better. Perhaps
something like:
struct pktline {
/* first 4 bytes are reserved for length header */
char buf[LARGE_PACKET_MAX];
};
#define PKTLINE_DATA_START(pkt) ((pkt)->buf + 4)
#define PKTLINE_DATA_LEN (LARGE_PACKET_MAX - 4)
...
struct pktline pkt;
ssize_t len = xread(fd, PKTLINE_DATA_START(&pkt), PKTLINE_DATA_LEN);
packet_send(&pkt, len);
Then packet_send() knows that the first 4 bytes are reserved for it. I
suspect that the strbuf used by format_packet() could get away with
using such a "struct pktline" too, though in practice I doubt there's
any real efficiency to be gained (we generally reuse the same strbuf
over and over, so it will grow once to 64K and get reused).
> + ret &= write_in_full(out, "0000", 4) == 4;
packet_flush() ?
I know the packet functions are keen on write_or_die() versus
write_in_full(). That is perhaps something that should be fixed.
This was just supposed to be a short note about off_t before eating
dinner (oops), so I didn't read past here.
-Peff
next prev parent reply other threads:[~2016-07-27 1:33 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-22 15:48 [PATCH v1 0/3] Git filter protocol larsxschneider
2016-07-22 15:48 ` [PATCH v1 1/3] convert: quote filter names in error messages larsxschneider
2016-07-22 15:48 ` [PATCH v1 2/3] convert: modernize tests larsxschneider
2016-07-26 15:18 ` Remi Galan Alfonso
2016-07-26 20:40 ` Junio C Hamano
2016-07-22 15:49 ` [PATCH v1 3/3] convert: add filter.<driver>.useProtocol option larsxschneider
2016-07-22 22:32 ` Torsten Bögershausen
2016-07-24 12:09 ` Lars Schneider
2016-07-22 23:19 ` Ramsay Jones
2016-07-22 23:28 ` Ramsay Jones
2016-07-24 17:16 ` Lars Schneider
2016-07-24 22:36 ` Ramsay Jones
2016-07-24 23:22 ` Jakub Narębski
2016-07-25 20:32 ` Lars Schneider
2016-07-26 10:58 ` Jakub Narębski
2016-07-25 20:24 ` Lars Schneider
2016-07-23 0:11 ` Jakub Narębski
2016-07-23 7:27 ` Eric Wong
2016-07-26 20:00 ` Jeff King
2016-07-24 18:36 ` Lars Schneider
2016-07-24 20:14 ` Jakub Narębski
2016-07-24 21:30 ` Jakub Narębski
2016-07-25 20:16 ` Lars Schneider
2016-07-26 12:24 ` Jakub Narębski
2016-07-25 20:09 ` Lars Schneider
2016-07-26 14:18 ` Jakub Narębski
2016-07-23 8:14 ` Eric Wong
2016-07-24 19:11 ` Lars Schneider
2016-07-25 7:27 ` Eric Wong
2016-07-25 15:48 ` Duy Nguyen
2016-07-22 21:39 ` [PATCH v1 0/3] Git filter protocol Junio C Hamano
2016-07-24 11:24 ` Lars Schneider
2016-07-26 20:11 ` Jeff King
2016-07-27 0:06 ` [PATCH v2 0/5] " larsxschneider
2016-07-27 0:06 ` [PATCH v2 1/5] convert: quote filter names in error messages larsxschneider
2016-07-27 20:01 ` Jakub Narębski
2016-07-28 8:23 ` Lars Schneider
2016-07-27 0:06 ` [PATCH v2 2/5] convert: modernize tests larsxschneider
2016-07-27 0:06 ` [PATCH v2 3/5] pkt-line: extract and use `set_packet_header` function larsxschneider
2016-07-27 0:20 ` Junio C Hamano
2016-07-27 9:13 ` Lars Schneider
2016-07-27 16:31 ` Junio C Hamano
2016-07-27 0:06 ` [PATCH v2 4/5] convert: generate large test files only once larsxschneider
2016-07-27 2:35 ` Torsten Bögershausen
2016-07-27 13:32 ` Jeff King
2016-07-27 16:50 ` Lars Schneider
2016-07-27 0:06 ` [PATCH v2 5/5] convert: add filter.<driver>.process option larsxschneider
2016-07-27 1:32 ` Jeff King [this message]
2016-07-27 17:31 ` Lars Schneider
2016-07-27 18:11 ` Jeff King
2016-07-28 12:10 ` Lars Schneider
2016-07-28 13:35 ` Jeff King
2016-07-27 9:41 ` Eric Wong
2016-07-29 10:38 ` Lars Schneider
2016-07-29 11:24 ` Jakub Narębski
2016-07-29 11:31 ` Lars Schneider
2016-08-05 18:55 ` Eric Wong
2016-08-05 23:26 ` Lars Schneider
2016-08-05 23:38 ` Eric Wong
2016-07-27 23:31 ` Jakub Narębski
2016-07-29 8:04 ` Lars Schneider
2016-07-29 17:35 ` Junio C Hamano
2016-07-29 23:11 ` Jakub Narębski
2016-07-29 23:44 ` Lars Schneider
2016-07-30 9:32 ` Jakub Narębski
2016-07-28 10:32 ` Torsten Bögershausen
2016-07-27 19:08 ` [PATCH v2 0/5] Git filter protocol Jakub Narębski
2016-07-28 7:16 ` Lars Schneider
2016-07-28 10:42 ` Jakub Narębski
2016-07-28 13:29 ` Jeff King
2016-07-29 7:40 ` Jakub Narębski
2016-07-29 8:14 ` Lars Schneider
2016-07-29 15:57 ` Jeff King
2016-07-29 16:20 ` Lars Schneider
2016-07-29 16:50 ` Jeff King
2016-07-29 17:43 ` Lars Schneider
2016-07-29 18:27 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160727013251.GA12159@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=e@80x24.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jnareb@gmail.com \
--cc=larsxschneider@gmail.com \
--cc=mlbright@gmail.com \
--cc=pclouds@gmail.com \
--cc=ramsay@ramsayjones.plus.com \
--cc=remi.galan-alfonso@ensimag.grenoble-inp.fr \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).