git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Lars Schneider <larsxschneider@gmail.com>
To: "Jakub Narębski" <jnareb@gmail.com>
Cc: git <git@vger.kernel.org>, "Jeff King" <peff@peff.net>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Stefan Beller" <sbeller@google.com>,
	"Martin-Louis Bright" <mlbright@gmail.com>,
	"Torsten Bögershausen" <tboegi@web.de>,
	"Ramsay Jones" <ramsay@ramsayjones.plus.com>
Subject: Re: [PATCH v8 11/11] convert: add filter.<driver>.process option
Date: Fri, 30 Sep 2016 20:56:12 +0200	[thread overview]
Message-ID: <4DE57A65-1020-4F17-81F2-9F319834BB2D@gmail.com> (raw)
In-Reply-To: <91589466-439e-7200-7256-b9288beae685@gmail.com>


> On 27 Sep 2016, at 00:41, Jakub Narębski <jnareb@gmail.com> wrote:
> 
> Part first of the review of 11/11.
> 
> W dniu 20.09.2016 o 21:02, larsxschneider@gmail.com pisze:
>> From: Lars Schneider <larsxschneider@gmail.com>
>> 
>> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
>> index 7aff940..946dcad 100644
>> --- a/Documentation/gitattributes.txt
>> +++ b/Documentation/gitattributes.txt
>> @@ -293,7 +293,13 @@ checkout, when the `smudge` command is specified, the command is
>> fed the blob object from its standard input, and its standard
>> output is used to update the worktree file.  Similarly, the
>> `clean` command is used to convert the contents of worktree file
>> -upon checkin.
>> +upon checkin. By default these commands process only a single
>> +blob and terminate.  If a long running `process` filter is used
>   ^^^^
> 
> Should we use this terminology here?  I have not read the preceding
> part of documentation, so I don't know if it talks about "blobs" or
> if it uses "files" and/or "file contents".

I used that because it was used in the paragraph above already.


>> +Long Running Filter Process
>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> +
>> +If the filter command (a string value) is defined via
>> +`filter.<driver>.process` then Git can process all blobs with a
>> +single filter invocation for the entire life of a single Git
>> +command. This is achieved by using a packet format (pkt-line,
>> +see technical/protocol-common.txt) based protocol over standard
>> +input and standard output as follows. All packets are considered
>> +text and therefore are terminated by an LF. Exceptions are the
>> +"*CONTENT" packets and the flush packet.
> 
> I guess that reasoning here is that all but CONTENT packets are
> metadata, and thus to aid debuggability of the protocol are "text",
> as considered by pkt-line.
> 
> Perhaps a bit more readable would be the following (but current is
> just fine; I am nitpicking):
> 
>  All packets, except for the "{star}CONTENT" packets and the "0000"
>  flush packer, are considered text and therefore are terminated by
>  a LF.

OK, I use that!


> I think it might be a good idea to describe what flush packet is
> somewhere in this document; on the other hand referring (especially
> if hyperlinked) to pkt-line technical documentation might be good
> enough / better.  I'm unsure, but I tend on the side that referring
> to technical documentation is better.

I have this line in the first paragraph of the Long Running Filter process:
"packet format (pkt-line, see technical/protocol-common.txt) based protocol"

> 
>> +to read a welcome response message ("git-filter-server") and exactly
>> +one protocol version number from the previously sent list. All further
> 
> I guess that is to provide forward-compatibility, isn't it?  Also,
> "Git expects..." probably means filter process MUST send, in the
> RFC2119 (https://tools.ietf.org/html/rfc2119) meaning.

True. I feel "expects" reads better but I am happy to change it if
you feel strong about it.


>> +
>> +After the version negotiation Git sends a list of supported capabilities
>> +and a flush packet.
> 
> Is it that Git SHOULD send list of ALL supported capabilities, or is
> it that Git SHOULD NOT send capabilities it does not support, and that
> it MAY send only those capabilities it needs (so for example if command
> uses only `smudge`, it may not send `clean`, so that filter driver doesn't
> need to initialize data it would not need).

"After the version negotiation Git sends a list of all capabilities that
it supports and a flush packet."

Better?


> I wonder why it is "<capability>=true", and not "capability=<capability>".
> Is there a case where we would want to send "<capability>=false".  Or
> is it to allow configurable / value based capabilities?  Isn't it going
> a bit too far: is there even a hind of an idea for parametrize-able
> capability? YAGNI is a thing...

Peff suggested that format and I think it is OK:
http://public-inbox.org/git/20160803224619.bwtbvmslhuicx2qi@sigill.intra.peff.net/


> A few new capabilities that we might want to support in the near future
> is "size", "stream", which are options describing how to communicate,
> and "cleanFromFile", "smudgeToFile", which are new types of operations...
> but neither needs any parameter.
> 
> I guess that adding new capabilities doesn't require having to come up
> with the new version of the protocol, isn't it.

Correct.


>> +packet:          git< git-filter-server
>> +packet:          git< version=2
>> +packet:          git> clean=true
>> +packet:          git> smudge=true
>> +packet:          git> not-yet-invented=true
> 
> Hmmm... should we hint at the use of kebab-case versus snake_case
> or camelCase for new capabilities?

I personally prefer kebab-case but I think that is a discussion for
future contributions ;-)


>> +------------------------
>> +packet:          git> command=smudge
>> +packet:          git> pathname=path/testfile.dat
>> +packet:          git> 0000
>> +packet:          git> CONTENT
>> +packet:          git> 0000
>> +------------------------
> 
> I think it is important to mention that (at least with current
> `filter.<driver>.process` implementation, that is absent future
> "stream" capability / option) the filter process needs to read
> *whole contents* at once, *before* writing anything.  Otherwise
> it can lead to deadlock.
> 
> This is especially important in that it is different (!) from the
> current behavior of `clean` and `smudge` filters, which can
> stream their response because Git invokes them async.

I added this:
" Please note, that the filter
must not send any response before it received the content and the
final flush packet. "


>> +
>> +If the filter experiences an error during processing, then it can
>> +send the status "error" after the content was (partially or
>> +completely) sent. Depending on the `filter.<driver>.required` flag
>> +Git will interpret that as error but it will not stop or restart the
>> +filter process.
>> +------------------------
>> +packet:          git< status=success
>> +packet:          git< 0000
>> +packet:          git< HALF_WRITTEN_ERRONEOUS_CONTENT
>> +packet:          git< 0000
>> +packet:          git< status=error
>> +packet:          git< 0000
>> +------------------------
> 
> Good.  A question is if the filter process can send "status=abort"
> after partial contents, or does it need to wait for the next command?

I added:
"expected to respond with an "abort" status at any point in
the protocol."


>> +
>> +After the filter has processed a blob it is expected to wait for
>> +the next "key=value" list containing a command. Git will close
>> +the command pipe on exit. The filter is expected to detect EOF
>> +and exit gracefully on its own.
> 
> Good to have it documented.  
> 
> Anyway, as it is Git command that spawns the filter driver process,
> assuming that the filter process doesn't daemonize itself, wouldn't
> the operating system reap it after its parent process, that is the
> git command it invoked, dies? So detecting EOF is good, but not
> strictly necessary for simple filter that do not need to free
> its resources, or can leave freeing resources to the operating
> system? But I may be wrong here.

The filter process runs independent of Git.


>> +
>> +
>> Interaction between checkin/checkout attributes
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> 
>> diff --git a/contrib/long-running-filter/example.pl b/contrib/long-running-filter/example.pl
>> new file mode 100755
>> index 0000000..c13a631
>> --- /dev/null
>> +++ b/contrib/long-running-filter/example.pl
> 
> To repeat myself, I think it would serve better as a separate patch.

OK


>> +        die "invalid packet size '$bytes_read' field";
> 
> This would read "invalid packet size '000' field", for example.
> Perhaps the following would be (slightly) better:
> 
>  +        die "invalid packet size field: '$bytes_read'";

OK


>> +    }
>> +    elsif ( $pkt_size > 4 ) {
> 
> Isn't a packet of $pkt_size == 4 a valid packet, a keep-alive
> one?  Or is it forbidden?

"Implementations SHOULD NOT send an empty pkt-line ("0004")."
Source: Documentation/technical/protocol-common.txt


>> +            die "invalid packet ($content_size expected; $bytes_read read)";
> 
> This error message would read "invalid packet (12 expected; 10 read)";
> I think it would be better to rephrase it as
> 
>  +            die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";

OK


>> +        die "invalid packet size";
> 
> I'm not sure if it is worth it (especially for the demo script),
> but perhaps we could show what this invalid size was?
> 
>  +        die "invalid packet size value '$pkt_size'";

OK


>> +sub packet_txt_read {
>> +    my ( $res, $buf ) = packet_bin_read();
>> +    unless ( $buf =~ /\n$/ ) {
> 
> Wouldn't
> 
>  +    unless ( $buf =~ s/\n$// ) {
> 
> or (less so)
> 
>  +    unless ( $buf =~ s/\n$\z// ) {
> 
> be more idiomatic (and not require use of 'substr')?  Remember,
> the s/// substitution quote-like operator returns number of
> substitutions in the scalar context.

OK.


>> +        die "A non-binary line SHOULD BE terminated by an LF.";
> 
> This is SHOULD be, not MUST be, so perhaps 'warn' would be enough.
> Not that Git should send us such line.

Actually it MUST per protocol definition. I'll change it to MUST.


>> +    my ($packet) = @_;
> 
> This is equivalent to
> 
>  +    my $packet = shift;
> 
> which, I think, is more common for single-parameter subroutines.
> 
> Also, this is $data (or $buf), not $packet.

OK


> Perhaps some comment that main begins here?
> 
>> +( packet_txt_read() eq ( 0, "git-filter-client" ) ) || die "bad initialize";
>> +( packet_txt_read() eq ( 0, "version=2" ) )         || die "bad version";
>> +( packet_bin_read() eq ( 1, "" ) )                  || die "bad version end";
> 
> Actually, it is overly strict.  It should not fail if there
> are other "version=3", "version=4" etc. lines.

True, but I think for an example this is OK. I'll add a note
to the file header.


>> +
>> +while (1) {
>> +    my ($command)  = packet_txt_read() =~ /^command=([^=]+)$/;
>> +    my ($pathname) = packet_txt_read() =~ /^pathname=([^=]+)$/;
> 
> Do we require this order?  If it is, is that explained in the
> documentation?

Git sends that order right now but the filter should not rely
on that order.


>> +    packet_flush();    # empty list!
> 
> This is less "empty list!", and more keeping "status=success" unchanged.

OK


OK means, I agree and I added your suggestion to v9.
Thanks a lot for your review and the comments!

Cheers,
Lars

  reply	other threads:[~2016-09-30 18:56 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-20 19:02 [PATCH v8 00/11] Git filter protocol larsxschneider
2016-09-20 19:02 ` [PATCH v8 01/11] pkt-line: rename packet_write() to packet_write_fmt() larsxschneider
2016-09-24 21:14   ` Jakub Narębski
2016-09-26 18:49     ` Lars Schneider
2016-09-28 23:15       ` Jakub Narębski
2016-09-20 19:02 ` [PATCH v8 02/11] pkt-line: extract set_packet_header() larsxschneider
2016-09-24 21:22   ` Jakub Narębski
2016-09-26 18:53     ` Lars Schneider
2016-09-20 19:02 ` [PATCH v8 03/11] run-command: move check_pipe() from write_or_die to run_command larsxschneider
2016-09-24 22:12   ` Jakub Narębski
2016-09-26 16:13     ` Lars Schneider
2016-09-26 16:21       ` Jakub Narębski
2016-09-20 19:02 ` [PATCH v8 04/11] pkt-line: add packet_write_fmt_gently() larsxschneider
2016-09-24 22:27   ` Jakub Narębski
2016-09-20 19:02 ` [PATCH v8 05/11] pkt-line: add packet_flush_gently() larsxschneider
2016-09-24 22:56   ` Jakub Narębski
2016-09-20 19:02 ` [PATCH v8 06/11] pkt-line: add packet_write_gently() larsxschneider
2016-09-25 11:26   ` Jakub Narębski
2016-09-26 19:21     ` Lars Schneider
2016-09-27  8:39       ` Jeff King
2016-09-27 19:33         ` Jakub Narębski
2016-09-20 19:02 ` [PATCH v8 07/11] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
2016-09-25 13:46   ` Jakub Narębski
2016-09-26 20:23     ` Lars Schneider
2016-09-27  8:14       ` Lars Schneider
2016-09-27  9:00         ` Jeff King
2016-09-27 12:10           ` Lars Schneider
2016-09-27 12:13             ` Jeff King
2016-09-20 19:02 ` [PATCH v8 08/11] convert: quote filter names in error messages larsxschneider
2016-09-25 14:03   ` Jakub Narębski
2016-09-20 19:02 ` [PATCH v8 09/11] convert: modernize tests larsxschneider
2016-09-25 14:43   ` Jakub Narębski
2016-09-20 19:02 ` [PATCH v8 10/11] convert: make apply_filter() adhere to standard Git error handling larsxschneider
2016-09-25 14:47   ` Jakub Narębski
2016-09-20 19:02 ` [PATCH v8 11/11] convert: add filter.<driver>.process option larsxschneider
2016-09-26 22:41   ` Jakub Narębski
2016-09-30 18:56     ` Lars Schneider [this message]
2016-10-04 20:50       ` Jakub Narębski
2016-10-06 13:16         ` Lars Schneider
2016-09-27 15:37   ` Jakub Narębski
2016-09-30 19:38     ` Lars Schneider
2016-10-04 21:00       ` Jakub Narębski
2016-10-06 21:27         ` Lars Schneider
2016-09-28 23:14   ` Jakub Narębski
2016-10-01 15:34     ` Lars Schneider
2016-10-04 21:34       ` Jakub Narębski
2016-09-28 21:49 ` [PATCH v8 00/11] Git filter protocol Junio C Hamano
2016-09-29 10:28   ` Lars Schneider
2016-09-29 11:57     ` Torsten Bögershausen
2016-09-29 16:57       ` Junio C Hamano
2016-09-29 17:57         ` Lars Schneider
2016-09-29 18:18           ` Torsten Bögershausen
2016-09-29 18:38             ` Johannes Sixt
2016-09-29 21:27           ` Junio C Hamano
2016-10-01 18:59             ` Lars Schneider
2016-10-01 20:48               ` Jakub Narębski
2016-10-03 17:13                 ` Lars Schneider
2016-10-04 19:04                   ` Jakub Narębski
2016-10-06 13:13                     ` Lars Schneider
2016-10-06 16:01                       ` Jeff King
2016-10-06 17:17                         ` Junio C Hamano
2016-10-03 17:02               ` Junio C Hamano
2016-10-03 17:35                 ` Lars Schneider
2016-10-04 12:11                 ` Jeff King
2016-10-04 16:47                   ` Junio C Hamano
2016-09-29 18:02         ` Jeff King
2016-09-29 21:19           ` Junio C Hamano
2016-09-29 20:50         ` Lars Schneider
2016-09-29 21:12           ` Junio C Hamano
2016-09-29 20:59       ` Jakub Narębski
2016-09-29 21:17         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DE57A65-1020-4F17-81F2-9F319834BB2D@gmail.com \
    --to=larsxschneider@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=mlbright@gmail.com \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=sbeller@google.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).