From: Lars Schneider <larsxschneider@gmail.com>
To: "Jakub Narębski" <jnareb@gmail.com>
Cc: git <git@vger.kernel.org>, "Jeff King" <peff@peff.net>,
"Junio C Hamano" <gitster@pobox.com>,
"Stefan Beller" <sbeller@google.com>,
"Martin-Louis Bright" <mlbright@gmail.com>,
"Torsten Bögershausen" <tboegi@web.de>,
"Ramsay Jones" <ramsay@ramsayjones.plus.com>
Subject: Re: [PATCH v8 11/11] convert: add filter.<driver>.process option
Date: Fri, 30 Sep 2016 20:56:12 +0200 [thread overview]
Message-ID: <4DE57A65-1020-4F17-81F2-9F319834BB2D@gmail.com> (raw)
In-Reply-To: <91589466-439e-7200-7256-b9288beae685@gmail.com>
> On 27 Sep 2016, at 00:41, Jakub Narębski <jnareb@gmail.com> wrote:
>
> Part first of the review of 11/11.
>
> W dniu 20.09.2016 o 21:02, larsxschneider@gmail.com pisze:
>> From: Lars Schneider <larsxschneider@gmail.com>
>>
>> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
>> index 7aff940..946dcad 100644
>> --- a/Documentation/gitattributes.txt
>> +++ b/Documentation/gitattributes.txt
>> @@ -293,7 +293,13 @@ checkout, when the `smudge` command is specified, the command is
>> fed the blob object from its standard input, and its standard
>> output is used to update the worktree file. Similarly, the
>> `clean` command is used to convert the contents of worktree file
>> -upon checkin.
>> +upon checkin. By default these commands process only a single
>> +blob and terminate. If a long running `process` filter is used
> ^^^^
>
> Should we use this terminology here? I have not read the preceding
> part of documentation, so I don't know if it talks about "blobs" or
> if it uses "files" and/or "file contents".
I used that because it was used in the paragraph above already.
>> +Long Running Filter Process
>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> +
>> +If the filter command (a string value) is defined via
>> +`filter.<driver>.process` then Git can process all blobs with a
>> +single filter invocation for the entire life of a single Git
>> +command. This is achieved by using a packet format (pkt-line,
>> +see technical/protocol-common.txt) based protocol over standard
>> +input and standard output as follows. All packets are considered
>> +text and therefore are terminated by an LF. Exceptions are the
>> +"*CONTENT" packets and the flush packet.
>
> I guess that reasoning here is that all but CONTENT packets are
> metadata, and thus to aid debuggability of the protocol are "text",
> as considered by pkt-line.
>
> Perhaps a bit more readable would be the following (but current is
> just fine; I am nitpicking):
>
> All packets, except for the "{star}CONTENT" packets and the "0000"
> flush packer, are considered text and therefore are terminated by
> a LF.
OK, I use that!
> I think it might be a good idea to describe what flush packet is
> somewhere in this document; on the other hand referring (especially
> if hyperlinked) to pkt-line technical documentation might be good
> enough / better. I'm unsure, but I tend on the side that referring
> to technical documentation is better.
I have this line in the first paragraph of the Long Running Filter process:
"packet format (pkt-line, see technical/protocol-common.txt) based protocol"
>
>> +to read a welcome response message ("git-filter-server") and exactly
>> +one protocol version number from the previously sent list. All further
>
> I guess that is to provide forward-compatibility, isn't it? Also,
> "Git expects..." probably means filter process MUST send, in the
> RFC2119 (https://tools.ietf.org/html/rfc2119) meaning.
True. I feel "expects" reads better but I am happy to change it if
you feel strong about it.
>> +
>> +After the version negotiation Git sends a list of supported capabilities
>> +and a flush packet.
>
> Is it that Git SHOULD send list of ALL supported capabilities, or is
> it that Git SHOULD NOT send capabilities it does not support, and that
> it MAY send only those capabilities it needs (so for example if command
> uses only `smudge`, it may not send `clean`, so that filter driver doesn't
> need to initialize data it would not need).
"After the version negotiation Git sends a list of all capabilities that
it supports and a flush packet."
Better?
> I wonder why it is "<capability>=true", and not "capability=<capability>".
> Is there a case where we would want to send "<capability>=false". Or
> is it to allow configurable / value based capabilities? Isn't it going
> a bit too far: is there even a hind of an idea for parametrize-able
> capability? YAGNI is a thing...
Peff suggested that format and I think it is OK:
http://public-inbox.org/git/20160803224619.bwtbvmslhuicx2qi@sigill.intra.peff.net/
> A few new capabilities that we might want to support in the near future
> is "size", "stream", which are options describing how to communicate,
> and "cleanFromFile", "smudgeToFile", which are new types of operations...
> but neither needs any parameter.
>
> I guess that adding new capabilities doesn't require having to come up
> with the new version of the protocol, isn't it.
Correct.
>> +packet: git< git-filter-server
>> +packet: git< version=2
>> +packet: git> clean=true
>> +packet: git> smudge=true
>> +packet: git> not-yet-invented=true
>
> Hmmm... should we hint at the use of kebab-case versus snake_case
> or camelCase for new capabilities?
I personally prefer kebab-case but I think that is a discussion for
future contributions ;-)
>> +------------------------
>> +packet: git> command=smudge
>> +packet: git> pathname=path/testfile.dat
>> +packet: git> 0000
>> +packet: git> CONTENT
>> +packet: git> 0000
>> +------------------------
>
> I think it is important to mention that (at least with current
> `filter.<driver>.process` implementation, that is absent future
> "stream" capability / option) the filter process needs to read
> *whole contents* at once, *before* writing anything. Otherwise
> it can lead to deadlock.
>
> This is especially important in that it is different (!) from the
> current behavior of `clean` and `smudge` filters, which can
> stream their response because Git invokes them async.
I added this:
" Please note, that the filter
must not send any response before it received the content and the
final flush packet. "
>> +
>> +If the filter experiences an error during processing, then it can
>> +send the status "error" after the content was (partially or
>> +completely) sent. Depending on the `filter.<driver>.required` flag
>> +Git will interpret that as error but it will not stop or restart the
>> +filter process.
>> +------------------------
>> +packet: git< status=success
>> +packet: git< 0000
>> +packet: git< HALF_WRITTEN_ERRONEOUS_CONTENT
>> +packet: git< 0000
>> +packet: git< status=error
>> +packet: git< 0000
>> +------------------------
>
> Good. A question is if the filter process can send "status=abort"
> after partial contents, or does it need to wait for the next command?
I added:
"expected to respond with an "abort" status at any point in
the protocol."
>> +
>> +After the filter has processed a blob it is expected to wait for
>> +the next "key=value" list containing a command. Git will close
>> +the command pipe on exit. The filter is expected to detect EOF
>> +and exit gracefully on its own.
>
> Good to have it documented.
>
> Anyway, as it is Git command that spawns the filter driver process,
> assuming that the filter process doesn't daemonize itself, wouldn't
> the operating system reap it after its parent process, that is the
> git command it invoked, dies? So detecting EOF is good, but not
> strictly necessary for simple filter that do not need to free
> its resources, or can leave freeing resources to the operating
> system? But I may be wrong here.
The filter process runs independent of Git.
>> +
>> +
>> Interaction between checkin/checkout attributes
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> diff --git a/contrib/long-running-filter/example.pl b/contrib/long-running-filter/example.pl
>> new file mode 100755
>> index 0000000..c13a631
>> --- /dev/null
>> +++ b/contrib/long-running-filter/example.pl
>
> To repeat myself, I think it would serve better as a separate patch.
OK
>> + die "invalid packet size '$bytes_read' field";
>
> This would read "invalid packet size '000' field", for example.
> Perhaps the following would be (slightly) better:
>
> + die "invalid packet size field: '$bytes_read'";
OK
>> + }
>> + elsif ( $pkt_size > 4 ) {
>
> Isn't a packet of $pkt_size == 4 a valid packet, a keep-alive
> one? Or is it forbidden?
"Implementations SHOULD NOT send an empty pkt-line ("0004")."
Source: Documentation/technical/protocol-common.txt
>> + die "invalid packet ($content_size expected; $bytes_read read)";
>
> This error message would read "invalid packet (12 expected; 10 read)";
> I think it would be better to rephrase it as
>
> + die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";
OK
>> + die "invalid packet size";
>
> I'm not sure if it is worth it (especially for the demo script),
> but perhaps we could show what this invalid size was?
>
> + die "invalid packet size value '$pkt_size'";
OK
>> +sub packet_txt_read {
>> + my ( $res, $buf ) = packet_bin_read();
>> + unless ( $buf =~ /\n$/ ) {
>
> Wouldn't
>
> + unless ( $buf =~ s/\n$// ) {
>
> or (less so)
>
> + unless ( $buf =~ s/\n$\z// ) {
>
> be more idiomatic (and not require use of 'substr')? Remember,
> the s/// substitution quote-like operator returns number of
> substitutions in the scalar context.
OK.
>> + die "A non-binary line SHOULD BE terminated by an LF.";
>
> This is SHOULD be, not MUST be, so perhaps 'warn' would be enough.
> Not that Git should send us such line.
Actually it MUST per protocol definition. I'll change it to MUST.
>> + my ($packet) = @_;
>
> This is equivalent to
>
> + my $packet = shift;
>
> which, I think, is more common for single-parameter subroutines.
>
> Also, this is $data (or $buf), not $packet.
OK
> Perhaps some comment that main begins here?
>
>> +( packet_txt_read() eq ( 0, "git-filter-client" ) ) || die "bad initialize";
>> +( packet_txt_read() eq ( 0, "version=2" ) ) || die "bad version";
>> +( packet_bin_read() eq ( 1, "" ) ) || die "bad version end";
>
> Actually, it is overly strict. It should not fail if there
> are other "version=3", "version=4" etc. lines.
True, but I think for an example this is OK. I'll add a note
to the file header.
>> +
>> +while (1) {
>> + my ($command) = packet_txt_read() =~ /^command=([^=]+)$/;
>> + my ($pathname) = packet_txt_read() =~ /^pathname=([^=]+)$/;
>
> Do we require this order? If it is, is that explained in the
> documentation?
Git sends that order right now but the filter should not rely
on that order.
>> + packet_flush(); # empty list!
>
> This is less "empty list!", and more keeping "status=success" unchanged.
OK
OK means, I agree and I added your suggestion to v9.
Thanks a lot for your review and the comments!
Cheers,
Lars
next prev parent reply other threads:[~2016-09-30 18:56 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-20 19:02 [PATCH v8 00/11] Git filter protocol larsxschneider
2016-09-20 19:02 ` [PATCH v8 01/11] pkt-line: rename packet_write() to packet_write_fmt() larsxschneider
2016-09-24 21:14 ` Jakub Narębski
2016-09-26 18:49 ` Lars Schneider
2016-09-28 23:15 ` Jakub Narębski
2016-09-20 19:02 ` [PATCH v8 02/11] pkt-line: extract set_packet_header() larsxschneider
2016-09-24 21:22 ` Jakub Narębski
2016-09-26 18:53 ` Lars Schneider
2016-09-20 19:02 ` [PATCH v8 03/11] run-command: move check_pipe() from write_or_die to run_command larsxschneider
2016-09-24 22:12 ` Jakub Narębski
2016-09-26 16:13 ` Lars Schneider
2016-09-26 16:21 ` Jakub Narębski
2016-09-20 19:02 ` [PATCH v8 04/11] pkt-line: add packet_write_fmt_gently() larsxschneider
2016-09-24 22:27 ` Jakub Narębski
2016-09-20 19:02 ` [PATCH v8 05/11] pkt-line: add packet_flush_gently() larsxschneider
2016-09-24 22:56 ` Jakub Narębski
2016-09-20 19:02 ` [PATCH v8 06/11] pkt-line: add packet_write_gently() larsxschneider
2016-09-25 11:26 ` Jakub Narębski
2016-09-26 19:21 ` Lars Schneider
2016-09-27 8:39 ` Jeff King
2016-09-27 19:33 ` Jakub Narębski
2016-09-20 19:02 ` [PATCH v8 07/11] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
2016-09-25 13:46 ` Jakub Narębski
2016-09-26 20:23 ` Lars Schneider
2016-09-27 8:14 ` Lars Schneider
2016-09-27 9:00 ` Jeff King
2016-09-27 12:10 ` Lars Schneider
2016-09-27 12:13 ` Jeff King
2016-09-20 19:02 ` [PATCH v8 08/11] convert: quote filter names in error messages larsxschneider
2016-09-25 14:03 ` Jakub Narębski
2016-09-20 19:02 ` [PATCH v8 09/11] convert: modernize tests larsxschneider
2016-09-25 14:43 ` Jakub Narębski
2016-09-20 19:02 ` [PATCH v8 10/11] convert: make apply_filter() adhere to standard Git error handling larsxschneider
2016-09-25 14:47 ` Jakub Narębski
2016-09-20 19:02 ` [PATCH v8 11/11] convert: add filter.<driver>.process option larsxschneider
2016-09-26 22:41 ` Jakub Narębski
2016-09-30 18:56 ` Lars Schneider [this message]
2016-10-04 20:50 ` Jakub Narębski
2016-10-06 13:16 ` Lars Schneider
2016-09-27 15:37 ` Jakub Narębski
2016-09-30 19:38 ` Lars Schneider
2016-10-04 21:00 ` Jakub Narębski
2016-10-06 21:27 ` Lars Schneider
2016-09-28 23:14 ` Jakub Narębski
2016-10-01 15:34 ` Lars Schneider
2016-10-04 21:34 ` Jakub Narębski
2016-09-28 21:49 ` [PATCH v8 00/11] Git filter protocol Junio C Hamano
2016-09-29 10:28 ` Lars Schneider
2016-09-29 11:57 ` Torsten Bögershausen
2016-09-29 16:57 ` Junio C Hamano
2016-09-29 17:57 ` Lars Schneider
2016-09-29 18:18 ` Torsten Bögershausen
2016-09-29 18:38 ` Johannes Sixt
2016-09-29 21:27 ` Junio C Hamano
2016-10-01 18:59 ` Lars Schneider
2016-10-01 20:48 ` Jakub Narębski
2016-10-03 17:13 ` Lars Schneider
2016-10-04 19:04 ` Jakub Narębski
2016-10-06 13:13 ` Lars Schneider
2016-10-06 16:01 ` Jeff King
2016-10-06 17:17 ` Junio C Hamano
2016-10-03 17:02 ` Junio C Hamano
2016-10-03 17:35 ` Lars Schneider
2016-10-04 12:11 ` Jeff King
2016-10-04 16:47 ` Junio C Hamano
2016-09-29 18:02 ` Jeff King
2016-09-29 21:19 ` Junio C Hamano
2016-09-29 20:50 ` Lars Schneider
2016-09-29 21:12 ` Junio C Hamano
2016-09-29 20:59 ` Jakub Narębski
2016-09-29 21:17 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DE57A65-1020-4F17-81F2-9F319834BB2D@gmail.com \
--to=larsxschneider@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jnareb@gmail.com \
--cc=mlbright@gmail.com \
--cc=peff@peff.net \
--cc=ramsay@ramsayjones.plus.com \
--cc=sbeller@google.com \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).