From: Jeff King <peff@peff.net>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 5/5] remote-curl: use post_rpc() for protocol v2 also
Date: Thu, 21 Feb 2019 08:46:10 -0500 [thread overview]
Message-ID: <20190221134609.GA21406@sigill.intra.peff.net> (raw)
In-Reply-To: <25ea75eb435ed8fed759b30a4c362a68818a8905.1550170980.git.jonathantanmy@google.com>
On Thu, Feb 14, 2019 at 11:06:39AM -0800, Jonathan Tan wrote:
> diff --git a/remote-curl.c b/remote-curl.c
> index 32c133f636..13836e4c28 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -504,6 +504,18 @@ struct rpc_state {
> int any_written;
> unsigned gzip_request : 1;
> unsigned initial_buffer : 1;
> +
> + /*
> + * Whenever a pkt-line is read into buf, append the 4 characters
> + * denoting its length before appending the payload.
> + */
> + unsigned write_line_lengths : 1;
Hmm, so we read a packet, and then we "append its length" before
appending the contents. But that would always be the length we just
read, right? I wonder if it would be simpler to just call this option
something like "proxy_packets" or "full_packets", teach the packet code
to give us the full packets, and then just treat that whole buffer as a
unit. I dunno. There might be some gotchas in practice, and it's not
like it's that much simpler. Just a thought.
> + /*
> + * rpc_out uses this to keep track of whether it should continue
> + * reading to populate the current request. Initialize to 0.
> + */
> + unsigned stop_reading : 1;
OK, so we need this because the v2 proxying will require us to stop
reading but keep the channel open? Kind of awkward, but I don't see a
way around it.
> +static int rpc_read_from_out(struct rpc_state *rpc, int options,
> + size_t *appended,
> + enum packet_read_status *status) {
> + size_t left;
> + char *buf;
> + int pktlen_raw;
> +
> + if (rpc->write_line_lengths) {
> + left = rpc->alloc - rpc->len - 4;
> + buf = rpc->buf + rpc->len + 4;
> + } else {
> + left = rpc->alloc - rpc->len;
> + buf = rpc->buf + rpc->len;
> + }
OK, so we push the packets 4 bytes further into the buffer in that case,
leaving room for the header. Makes sense.
> if (left < LARGE_PACKET_MAX)
> return 0;
>
> - *appended = packet_read(rpc->out, NULL, NULL, buf, left, 0);
> - rpc->len += *appended;
> + *status = packet_read_with_status(rpc->out, NULL, NULL, buf,
> + left, &pktlen_raw, options);
> + if (*status != PACKET_READ_EOF) {
> + *appended = pktlen_raw + (rpc->write_line_lengths ? 4 : 0);
> + rpc->len += *appended;
> + }
> +
> + if (rpc->write_line_lengths) {
> + switch (*status) {
> + case PACKET_READ_EOF:
> + if (!(options & PACKET_READ_GENTLE_ON_EOF))
> + die("shouldn't have EOF when not gentle on EOF");
> + break;
> + case PACKET_READ_NORMAL:
> + set_packet_header(buf - 4, *appended);
> + break;
> + case PACKET_READ_DELIM:
> + memcpy(buf - 4, "0001", 4);
> + break;
> + case PACKET_READ_FLUSH:
> + memcpy(buf - 4, "0000", 4);
> + break;
> + }
> + }
And here we fill it in. Make sense. It's a little awkward that we have
to re-translate READ_DELIM, etc, back into their headers.
> @@ -531,15 +580,32 @@ static size_t rpc_out(void *ptr, size_t eltsize,
> size_t max = eltsize * nmemb;
> struct rpc_state *rpc = buffer_;
> size_t avail = rpc->len - rpc->pos;
> + enum packet_read_status status;
>
> if (!avail) {
> rpc->initial_buffer = 0;
> rpc->len = 0;
> - if (!rpc_read_from_out(rpc, &avail))
> - BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX");
> - if (!avail)
> - return 0;
> rpc->pos = 0;
> + if (!rpc->stop_reading) {
> + if (!rpc_read_from_out(rpc, 0, &avail, &status))
> + BUG("The entire rpc->buf should be larger than LARGE_PACKET_MAX");
Do we actually need it to be LARGE_PACKET_MAX+4 here? I guess not,
because LARGE_PACKET_DATA_MAX is the "-4" version. So I think this BUG()
was perhaps already wrong?
> + if (status == PACKET_READ_FLUSH)
> + /*
> + * We are done reading for this request, but we
> + * still need to send this line out (if
> + * rpc->write_line_lengths is true) so do not
> + * return yet.
> + */
> + rpc->stop_reading = 1;
> + }
> + }
> + if (!avail && rpc->stop_reading) {
> + /*
> + * "return 0" will notify Curl that this RPC request is done,
> + * so reset stop_reading back to 0 for the next request.
> + */
> + rpc->stop_reading = 0;
> + return 0;
OK, and here's where we handle the stop_reading thing. It is indeed
awkward, but I think your comments make it clear what's going on.
If we get stop_reading, do we care about "avail"? I.e., shouldn't we be
able to return non-zero to say "we got the whole input, this is not a
too-large request"?
> +test_expect_success 'clone big repository with http:// using protocol v2' '
> + test_when_finished "rm -f log" &&
> +
> + git init "$HTTPD_DOCUMENT_ROOT_PATH/big" &&
> + # Ensure that the list of wants is greater than http.postbuffer below
> + for i in $(seq 1 1500)
> + do
> + test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/big" "commit$i"
> + done &&
As Junio noted, this should be test_seq. But I think it would be nice to
avoid looping on test_commit here at all. It kicks off at least 3
processes; multiplying that by 1500 is going to be slow.
Making a big input is often much faster by generating a fast-import
stream (which can often be done entirely in-shell). There's some prior
art in t3302, t5551, t5608, and others.
-Peff
next prev parent reply other threads:[~2019-02-21 13:46 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-14 19:06 [PATCH 0/5] Protocol v2 fix: http and auth Jonathan Tan
2019-02-14 19:06 ` [PATCH 1/5] remote-curl: reduce scope of rpc_state.argv Jonathan Tan
2019-02-14 19:06 ` [PATCH 2/5] remote-curl: reduce scope of rpc_state.stdin_preamble Jonathan Tan
2019-02-14 19:06 ` [PATCH 3/5] remote-curl: reduce scope of rpc_state.result Jonathan Tan
2019-02-14 19:06 ` [PATCH 4/5] remote-curl: refactor reading into rpc_state's buf Jonathan Tan
2019-02-14 19:06 ` [PATCH 5/5] remote-curl: use post_rpc() for protocol v2 also Jonathan Tan
2019-02-14 22:47 ` Junio C Hamano
2019-02-21 13:46 ` Jeff King [this message]
2019-02-21 19:26 ` Jonathan Tan
2019-02-21 13:47 ` [PATCH 0/5] Protocol v2 fix: http and auth Jeff King
2019-02-21 20:24 ` [PATCH v2 " Jonathan Tan
2019-02-21 20:24 ` [PATCH v2 1/5] remote-curl: reduce scope of rpc_state.argv Jonathan Tan
2019-02-21 20:24 ` [PATCH v2 2/5] remote-curl: reduce scope of rpc_state.stdin_preamble Jonathan Tan
2019-02-21 20:24 ` [PATCH v2 3/5] remote-curl: reduce scope of rpc_state.result Jonathan Tan
2019-02-21 20:24 ` [PATCH v2 4/5] remote-curl: refactor reading into rpc_state's buf Jonathan Tan
2019-02-21 20:24 ` [PATCH v2 5/5] remote-curl: use post_rpc() for protocol v2 also Jonathan Tan
2019-02-22 13:18 ` Eric Sunshine
2019-02-22 19:15 ` Eric Sunshine
2019-02-25 22:08 ` Jeff King
2019-02-25 23:49 ` [FIXUP] Fixup on tip of jt/http-auth-proto-v2-fix Jonathan Tan
2019-02-26 7:04 ` Jonathan Nieder
2019-02-26 18:18 ` Jonathan Tan
2019-03-04 3:45 ` Junio C Hamano
2019-02-27 12:02 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190221134609.GA21406@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).