git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Lars Schneider <larsxschneider@gmail.com>
To: Stefan Beller <sbeller@google.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	"Jeff King" <peff@peff.net>, "Junio C Hamano" <gitster@pobox.com>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Jakub Narębski" <jnareb@gmail.com>,
	"Martin-Louis Bright" <mlbright@gmail.com>
Subject: Re: [PATCH v6 06/13] pkt-line: add functions to read/write flush terminated packet streams
Date: Thu, 25 Aug 2016 21:33:34 +0200	[thread overview]
Message-ID: <269094FB-81F3-4220-BE7A-F957EDF6F808@gmail.com> (raw)
In-Reply-To: <CAGZ79kaVg40H-LeDtFfDYqDFDDbr+um3ZYj8oAaqGu+q6k5e+A@mail.gmail.com>


> On 25 Aug 2016, at 20:46, Stefan Beller <sbeller@google.com> wrote:
> 
>> On Thu, Aug 25, 2016 at 4:07 AM,  <larsxschneider@gmail.com> wrote:
>> From: Lars Schneider <larsxschneider@gmail.com>
>> 
>> packet_write_stream_with_flush_from_fd() and
>> packet_write_stream_with_flush_from_buf() write a stream of packets. All
>> content packets use the maximal packet size except for the last one.
>> After the last content packet a `flush` control packet is written.
>> 
>> packet_read_till_flush() reads arbitrary sized packets until it detects
>> a `flush` packet.
> 
> So the API provided by these read/write functions is intended
> to move a huge chunks of data. And as it puts the data on the wire one
> packet after the other without the possibility to intervene and e.g. send
> a side channel progress bar update, I would question the design of this.
> If I understand correctly this will be specifically  used for large
> files locally,
> so e.g. a file of 5 GB (such as a virtual machine tracked in Git), would
> require about 80k packets.

Peff suggested this approach arguing that the overhead is neglectable:
http://public-inbox.org/git/20160720134916.GB19359@sigill.intra.peff.net/


> Instead of having many packets of max length and then a remainder,
> I would suggest to invent larger packets for this use case. Then we can
> just send one packet instead.
> 
> Currently a packet consists of 4 bytes indicating the length in hex
> and then the payload of length-4 bytes. As the length is in hex
> the characters in the first 4 bytes are [0-9a-f], we can easily add another
> meaning for the length, e.g.:
> 
>  A packet starts with the overall length and then the payload.
>  If the first character of the length is 'v' the length is encoded as a
>  variable length quantity[1]. The high bit of the char indicates if
>  the next char is still part of the length field. The length must not exceed
>  LLONG_MAX (which results in a payload of 9223 Petabyte, so
>  enough for the foreseeable future).

Eventually I would like to resurrect Joey's cleanFromFile/smudgeToFile idea:

http://public-inbox.org/git/1468277112-9909-3-git-send-email-joeyh@joeyh.name/

Then we would not need to transfer that much data over the pipes. However, I wonder if the large amount of packets would actually be a problem. Honestly, I would prefer to not change Git's packet format in this already large series ;-)


>  [1] A variable-length quantity (VLQ) is a universal code that uses
>  an arbitrary number of bytes to represent an arbitrarily large integer.
>  https://en.wikipedia.org/wiki/Variable-length_quantity
> 
> The neat thing about the packet system is we can dedicate packets
> to different channels (such as the side channels), but with the provided
> API here this makes it impossible to later add in these side channel
> as it is a pure streaming API now. So let's remove the complication
> of having to send multiple packets and just go with one large packet
> instead.

I tried to design the protocol as flexible as possible for the future with a version negotiation and a capabilities list. Therefore, I would think it should be possible to implement these ideas in the future if they are required.


> --
>    I understand that my proposal would require writing code again,
>    but it has also some long term advantages in the networking stack
>    of Git: There are some worries that a capabilities line in fetch/push
>    might overflow in the far future, when there are lots of capabilities.
> 
>    Also a few days ago there was a proposal to add all symbolic refs
>    to a capabilities line, which Peff shot down as "the packet may be
>    too small".
> 
>    There is an incredible hack that allows transporting refs > 64kB IIRC.
> 
>    All these things could go away with the variable length encoded
>    packets. But to make them go away in the future we would need
>    to start with these variable length packets today. ;)
> 
> Just food for thought.

Thanks for thinking it through that thoroughly! I understand your point of view and I am curious what others thing.

Cheers,
Lars


  reply	other threads:[~2016-08-25 23:31 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-25 11:07 [PATCH v6 00/13] Git filter protocol larsxschneider
2016-08-25 11:07 ` [PATCH v6 01/13] pkt-line: rename packet_write() to packet_write_fmt() larsxschneider
2016-08-25 11:07 ` [PATCH v6 02/13] pkt-line: extract set_packet_header() larsxschneider
2016-08-25 11:07 ` [PATCH v6 03/13] pkt-line: add packet_write_fmt_gently() larsxschneider
2016-08-25 18:12   ` Stefan Beller
2016-08-25 18:47     ` Lars Schneider
2016-08-25 21:41   ` Junio C Hamano
2016-08-26  9:17     ` Lars Schneider
2016-08-26 17:10       ` Junio C Hamano
2016-08-26 17:23         ` Jeff King
2016-08-25 11:07 ` [PATCH v6 04/13] pkt-line: add packet_flush_gently() larsxschneider
2016-08-25 11:07 ` [PATCH v6 05/13] pkt-line: add packet_write_gently() larsxschneider
2016-08-25 21:50   ` Junio C Hamano
2016-08-26  9:40     ` Lars Schneider
2016-08-26 17:15       ` Junio C Hamano
2016-08-29  9:40         ` Lars Schneider
2016-08-25 11:07 ` [PATCH v6 06/13] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
2016-08-25 18:46   ` Stefan Beller
2016-08-25 19:33     ` Lars Schneider [this message]
2016-08-25 22:31     ` Junio C Hamano
2016-08-26  0:55       ` Jacob Keller
2016-08-26 17:02         ` Stefan Beller
2016-08-26 17:21           ` Jeff King
2016-08-26 17:17         ` Junio C Hamano
2016-08-25 22:27   ` Junio C Hamano
2016-08-26 10:13     ` Lars Schneider
2016-08-26 17:21       ` Junio C Hamano
2016-08-29  9:43         ` Lars Schneider
2016-08-25 11:07 ` [PATCH v6 07/13] pack-protocol: fix maximum pkt-line size larsxschneider
2016-08-25 18:59   ` Stefan Beller
2016-08-25 19:35     ` Lars Schneider
2016-08-26 19:44       ` Junio C Hamano
2016-08-25 11:07 ` [PATCH v6 08/13] convert: quote filter names in error messages larsxschneider
2016-08-26 19:45   ` Junio C Hamano
2016-08-25 11:07 ` [PATCH v6 09/13] convert: modernize tests larsxschneider
2016-08-26 20:03   ` Junio C Hamano
2016-08-29 10:09     ` Lars Schneider
2016-08-25 11:07 ` [PATCH v6 10/13] convert: generate large test files only once larsxschneider
2016-08-25 19:17   ` Stefan Beller
2016-08-25 19:54     ` Lars Schneider
2016-08-29 17:52       ` Junio C Hamano
2016-08-30 11:47         ` Lars Schneider
2016-08-30 16:55           ` Junio C Hamano
2016-08-29 17:46   ` Junio C Hamano
2016-08-30 11:41     ` Lars Schneider
2016-08-30 16:37       ` Jeff King
2016-08-25 11:07 ` [PATCH v6 11/13] convert: make apply_filter() adhere to standard Git error handling larsxschneider
2016-08-25 11:07 ` [PATCH v6 12/13] convert: add filter.<driver>.process option larsxschneider
2016-08-29 22:21   ` Junio C Hamano
2016-08-30 16:27     ` Lars Schneider
2016-08-30 18:59       ` Junio C Hamano
2016-08-30 20:38         ` Lars Schneider
2016-08-30 22:23           ` Junio C Hamano
2016-08-31  4:57             ` Torsten Bögershausen
2016-08-31 13:14               ` Jakub Narębski
2016-08-30 20:46         ` Jakub Narębski
2016-09-05 19:47           ` Lars Schneider
2016-08-25 11:07 ` [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes larsxschneider
2016-08-29 18:05   ` Junio C Hamano
2016-08-29 19:03     ` Lars Schneider
2016-08-29 19:45       ` Junio C Hamano
2016-08-30 12:32         ` Lars Schneider
2016-08-30 14:54           ` Torsten Bögershausen
2016-09-01 17:15             ` Junio C Hamano
2016-08-29 15:39 ` [PATCH v6 00/13] Git filter protocol Lars Schneider
2016-08-29 18:09   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=269094FB-81F3-4220-BE7A-F957EDF6F808@gmail.com \
    --to=larsxschneider@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=mlbright@gmail.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).