git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Lars Schneider <larsxschneider@gmail.com>
To: "Jakub Narębski" <jnareb@gmail.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Torsten Bögershausen" <tboegi@web.de>,
	"Martin-Louis Bright" <mlbright@gmail.com>,
	"Eric Wong" <e@80x24.org>, "Jeff King" <peff@peff.net>
Subject: Re: [PATCH v3 10/10] convert: add filter.<driver>.process option
Date: Sun, 31 Jul 2016 21:49:04 +0200	[thread overview]
Message-ID: <6765D972-876A-4F94-A170-468002498296@gmail.com> (raw)
In-Reply-To: <69988611-06ec-048d-12e7-7b87882ddc6a@gmail.com>


> On 31 Jul 2016, at 11:42, Jakub Narębski <jnareb@gmail.com> wrote:
> 
> [Excuse me replying to myself, but there are a few things I forgot,
> or realized only later]

No worries :)

> 
> W dniu 31.07.2016 o 00:05, Jakub Narębski pisze:
>> W dniu 30.07.2016 o 01:38, larsxschneider@gmail.com pisze:
>>> From: Lars Schneider <larsxschneider@gmail.com>
>>> 
>>> Git's clean/smudge mechanism invokes an external filter process for every
>>> single blob that is affected by a filter. If Git filters a lot of blobs
>>> then the startup time of the external filter processes can become a
>>> significant part of the overall Git execution time.
>>> 
>>> This patch adds the filter.<driver>.process string option which, if used,
>>> keeps the external filter process running and processes all blobs with
>>> the following packet format (pkt-line) based protocol over standard input
>>> and standard output.
>> 
>> I think it would be nice to have here at least summary of the benchmarks
>> you did in https://github.com/github/git-lfs/pull/1382
> 
> Note that this feature is especially useful if startup time is long,
> that is if you are using an operating system with costly fork / new process
> startup time like MS Windows (which you have mentioned), or writing
> filter in a programming language with large startup time like Java
> or Python (the latter may have changed since).
> 
>  https://gnustavo.wordpress.com/2012/06/28/programming-languages-start-up-times/

OK, I will add this. Is it OK to add the link to the commit message?
(since I don't know how long the link will be available).


> [...]
>> I was thinking about having possible responses to receiving file
>> contents (or starting receiving in the streaming case) to be:
>> 
>>  packet:          git< ok size=7\n    (or "ok 7\n", if size is known)
>> 
>> or
>> 
>>  packet:          git< ok\n           (if filter does not know size upfront)
>> 
>> or
>> 
>>  packet:          git< fail <msg>\n   (or just "fail" + packet with msg)
>> 
>> The last would be when filter knows upfront that it cannot perform
>> the operation.  Though sending an empty file with non-"success" final
>> would work as well.
> 
> [...]
> 
>>> In case the filter cannot process the content, it is expected
>>> to respond with the result content size 0 (only if "stream" is
>>> not defined) and a "reject" packet.
>>> ------------------------
>>> packet:          git< size=0\n    (omitted with capability "stream")
>>> packet:          git< reject\n
>>> ------------------------
>> 
>> This is *wrong* idea!  Empty file, with size=0, can be a perfectly
>> legitimate response.  
> 
> Actually, I think I have misunderstood your intent.  If you want to have
> simpler protocol, with only one place to signal errors, that is after
> sending a response, then proper way of signaling the error condition
> would be to send empty file and then "reject" instead of "success":
> 
>   packet:          git< size=0\n    (omitted with capability "stream")
>   packet:          git< 0000        (we need this flush packet)
>   packet:          git< reject\n
> 
> Otherwise in the case without size upfront (capability "stream")
> file with contents "reject" would be mistaken for the "reject" packet.
> 
> See below for proposal with two places to signal errors: before sending
> first byte, and after.

Right now the protocol is implemented covering the following cases:

## CASE 1 - no stream success

packet:          git< size=57\n
packet:          git< SMUDGED_CONTENT
packet:          git< 0000
packet:          git< success\n


## CASE 2 - no stream success but 0 byte response

packet:          git< size=0\n
packet:          git< success\n


## CASE 3 - no stream filter; filter doesn't want to process the file

packet:          git< size=0\n
packet:          git< reject\n


## CASE 4 - no stream filter; filter error

packet:          git< size=57\n
packet:          git< SMUDGED_CONTENT
packet:          git< 0000
packet:          git< error\n

CASE 4 is not explicitly checked. If a final message is neither
"success" nor "reject" then it is interpreted as error. If that
happens then Git will shutdown and restart the filter process
if there is another file to filter. 

Alternatively a filter process can shutdown itself, too, to signal
an error.

The corresponding stream filter look like this:

## CASE 1 - stream success

packet:          git< SMUDGED_CONTENT
packet:          git< 0000
packet:          git< success\n


## CASE 2 - stream success but 0 byte response

packet:          git< 0000
packet:          git< success\n


## CASE 3 - stream filter; filter doesn't want to process the file

packet:          git< 0000
packet:          git< reject\n


## CASE 4 - stream filter; filter error

packet:          git< SMUDGED_CONTENT
packet:          git< 0000
packet:          git< error\n

--

I just realized that the size 0 case is a bit inconsistent
in the no stream case as it has no flush packet. Maybe I 
should indeed remove the flush packet in the no stream case
completely?!

Do the cases above make sense to you?

Regarding error handling. I would prefer it if the filter prints
all errors to STDERR by itself. I think that is the safest
option to communicate errors to the users because if the communication
got into a bad state then Git might not be able to read the errors
properly.

See Peff's response on the topic, too:
http://public-inbox.org/git/20160729165018.GA6553%40sigill.intra.peff.net/


> NOTE: there is a bit of mixed and possibly confusing notation, that
> is 0000 is flush packet, not packet with 0000 as content.  Perhaps
> write pkt-line in full?

I am not sure I understand what you mean (maybe it's too late for me...).
Can you try to rephrase or give an example?

Thank you,
Lars



> 
> 
> [...]
>>> ---
>>> Documentation/gitattributes.txt |  84 ++++++++-
>>> convert.c                       | 400 +++++++++++++++++++++++++++++++++++++--
>>> t/t0021-conversion.sh           | 405 ++++++++++++++++++++++++++++++++++++++++
>>> t/t0021/rot13-filter.pl         | 177 ++++++++++++++++++
>>> 4 files changed, 1053 insertions(+), 13 deletions(-)
>>> create mode 100755 t/t0021/rot13-filter.pl
> 
> Wouldn't it be better for easier review to split it into separate patches?
> Perhaps at least the new test...
> 
> [...]
>> I would assume that we have two error conditions.  
>> 
>> First situation is when the filter knows upfront (after receiving name
>> and size of file, and after receiving contents for not-streaming filters)
>> that it cannot process the file (like e.g. LFS filter with artifactory
>> replica/shard being a bit behind master, and not including contents of
>> the file being filtered).
>> 
>> My proposal is to reply with "fail" _in place of_ size of reply:
>> 
>>   packet:         git< fail\n       (any case: size known or not, stream or not)
>> 
>> It could be "reject", or "error" instead of "fail".
>> 
>> 
>> Another situation is if filter encounters error during output,
>> either with streaming filter (or non-stream, but not storing whole
>> input upfront) realizing in the middle of output that there is something
>> wrong with input (e.g. converting between encoding, and encountering
>> character that cannot be represented in output encoding), or e.g. filter
>> process being killed, or network connection dropping with LFS filter, etc.
>> The filter has send some packets with output already.  In this case
>> filter should flush, and send "reject" or "error" packet.
>> 
>>   <error condition>
>>   packet:         git< "0000"       (flush packet)
>>   packet:         git< reject\n
>> 
>> Should there be a place for an error message, or would standard error
>> (stderr) be used for this?
> 


  reply	other threads:[~2016-07-31 19:49 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20160727000605.49982-1-larsxschneider%40gmail.com/>
2016-07-29 23:37 ` [PATCH v3 00/10] Git filter protocol larsxschneider
2016-07-29 23:37   ` [PATCH v3 01/10] pkt-line: extract set_packet_header() larsxschneider
2016-07-30 10:30     ` Jakub Narębski
2016-08-01 11:33       ` Lars Schneider
2016-08-03 20:05         ` Jakub Narębski
2016-08-05 11:52           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 02/10] pkt-line: add direct_packet_write() and direct_packet_write_data() larsxschneider
2016-07-30 10:49     ` Jakub Narębski
2016-08-01 12:00       ` Lars Schneider
2016-08-03 20:12         ` Jakub Narębski
2016-08-05 12:02           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 03/10] pkt-line: add packet_flush_gentle() larsxschneider
2016-07-30 12:04     ` Jakub Narębski
2016-08-01 12:28       ` Lars Schneider
2016-07-31 20:36     ` Torstem Bögershausen
2016-07-31 21:45       ` Lars Schneider
2016-08-02 19:56         ` Torsten Bögershausen
2016-08-05  9:59           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 04/10] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
2016-07-30 12:29     ` Jakub Narębski
2016-08-01 12:18       ` Lars Schneider
2016-08-03 20:15         ` Jakub Narębski
2016-07-29 23:37   ` [PATCH v3 05/10] pack-protocol: fix maximum pkt-line size larsxschneider
2016-07-30 13:58     ` Jakub Narębski
2016-08-01 12:23       ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 06/10] run-command: add clean_on_exit_handler larsxschneider
2016-07-30  9:50     ` Johannes Sixt
2016-08-01 11:14       ` Lars Schneider
2016-08-02  5:53         ` Johannes Sixt
2016-08-02  7:41           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 07/10] convert: quote filter names in error messages larsxschneider
2016-07-29 23:37   ` [PATCH v3 08/10] convert: modernize tests larsxschneider
2016-07-29 23:38   ` [PATCH v3 09/10] convert: generate large test files only once larsxschneider
2016-07-29 23:38   ` [PATCH v3 10/10] convert: add filter.<driver>.process option larsxschneider
2016-07-30 22:05     ` Jakub Narębski
2016-07-31  9:42       ` Jakub Narębski
2016-07-31 19:49         ` Lars Schneider [this message]
2016-07-31 22:59           ` Jakub Narębski
2016-08-01 13:32       ` Lars Schneider
2016-08-03 18:30         ` Designing the filter process protocol (was: Re: [PATCH v3 10/10] convert: add filter.<driver>.process option) Jakub Narębski
2016-08-05 10:32           ` Lars Schneider
2016-08-06 18:24           ` Lars Schneider
2016-08-03 22:47         ` [PATCH v3 10/10] convert: add filter.<driver>.process option Jakub Narębski
2016-07-31 22:19     ` Jakub Narębski
2016-08-01 17:55       ` Lars Schneider
2016-08-04  0:42         ` Jakub Narębski
2016-08-03 13:10       ` Lars Schneider
2016-08-04 10:18         ` Jakub Narębski
2016-08-05 13:20           ` Lars Schneider
2016-08-03 16:42   ` [PATCH v4 00/12] Git filter protocol larsxschneider
2016-08-03 16:42     ` [PATCH v4 01/12] pkt-line: extract set_packet_header() larsxschneider
2016-08-03 20:18       ` Junio C Hamano
2016-08-03 21:12         ` Jeff King
2016-08-03 21:27           ` Jeff King
2016-08-04 16:14           ` Junio C Hamano
2016-08-05 14:55             ` Lars Schneider
2016-08-05 16:31               ` Junio C Hamano
2016-08-05 17:31             ` Lars Schneider
2016-08-05 17:41               ` Junio C Hamano
2016-08-03 21:56         ` Lars Schneider
2016-08-03 16:42     ` [PATCH v4 02/12] pkt-line: add direct_packet_write() and direct_packet_write_data() larsxschneider
2016-08-03 16:42     ` [PATCH v4 03/12] pkt-line: add packet_flush_gentle() larsxschneider
2016-08-03 21:39       ` Jeff King
2016-08-03 22:56         ` [PATCH 0/7] minor trace fixes and cosmetic improvements Jeff King
2016-08-03 22:56           ` [PATCH 1/7] trace: handle NULL argument in trace_disable() Jeff King
2016-08-03 22:58           ` [PATCH 2/7] trace: stop using write_or_whine_pipe() Jeff King
2016-08-03 22:58           ` [PATCH 3/7] trace: use warning() for printing trace errors Jeff King
2016-08-04 20:41             ` Junio C Hamano
2016-08-04 21:21               ` Jeff King
2016-08-04 21:28                 ` Junio C Hamano
2016-08-05  7:56                   ` Jeff King
2016-08-05  7:59                   ` Christian Couder
2016-08-05 18:41                     ` Junio C Hamano
2016-08-03 23:00           ` [PATCH 4/7] trace: cosmetic fixes for error messages Jeff King
2016-08-04 20:42             ` Junio C Hamano
2016-08-05  8:00               ` Jeff King
2016-08-03 23:00           ` [PATCH 5/7] trace: correct variable name in write() error message Jeff King
2016-08-03 23:01           ` [PATCH 6/7] trace: disable key after write error Jeff King
2016-08-04 20:45             ` Junio C Hamano
2016-08-04 21:22               ` Jeff King
2016-08-05  7:58                 ` Jeff King
2016-08-03 23:01           ` [PATCH 7/7] write_or_die: drop write_or_whine_pipe() Jeff King
2016-08-03 23:04           ` [PATCH 0/7] minor trace fixes and cosmetic improvements Jeff King
2016-08-04 16:16         ` [PATCH v4 03/12] pkt-line: add packet_flush_gentle() Junio C Hamano
2016-08-03 16:42     ` [PATCH v4 04/12] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
2016-08-03 16:42     ` [PATCH v4 05/12] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
2016-08-03 16:42     ` [PATCH v4 06/12] pack-protocol: fix maximum pkt-line size larsxschneider
2016-08-03 16:42     ` [PATCH v4 07/12] run-command: add clean_on_exit_handler larsxschneider
2016-08-03 21:24       ` Jeff King
2016-08-03 22:15         ` Lars Schneider
2016-08-03 22:53           ` Jeff King
2016-08-03 23:09             ` Lars Schneider
2016-08-03 23:15               ` Jeff King
2016-08-05 13:08                 ` Lars Schneider
2016-08-05 21:19                   ` Torsten Bögershausen
2016-08-05 21:50                     ` Lars Schneider
2016-08-03 16:42     ` [PATCH v4 08/12] convert: quote filter names in error messages larsxschneider
2016-08-03 16:42     ` [PATCH v4 09/12] convert: modernize tests larsxschneider
2016-08-03 16:42     ` [PATCH v4 10/12] convert: generate large test files only once larsxschneider
2016-08-03 16:42     ` [PATCH v4 11/12] convert: add filter.<driver>.process option larsxschneider
2016-08-03 17:45       ` Junio C Hamano
2016-08-03 21:48         ` Lars Schneider
2016-08-03 22:46           ` Jeff King
2016-08-05 12:53             ` Lars Schneider
2016-08-03 20:29       ` Junio C Hamano
2016-08-03 21:37         ` Lars Schneider
2016-08-03 21:43           ` Junio C Hamano
2016-08-03 22:01             ` Lars Schneider
2016-08-05 21:34       ` Torsten Bögershausen
2016-08-05 21:49         ` Lars Schneider
2016-08-05 22:06         ` Junio C Hamano
2016-08-05 22:27           ` Jeff King
2016-08-06 11:55             ` Lars Schneider
2016-08-06 12:14               ` Jeff King
2016-08-06 18:19                 ` Lars Schneider
2016-08-08 15:02                   ` Jeff King
2016-08-08 16:21                     ` Lars Schneider
2016-08-08 16:26                       ` Jeff King
2016-08-06 20:40           ` Torsten Bögershausen
2016-08-03 16:42     ` [PATCH v4 12/12] convert: add filter.<driver>.process shutdown command option larsxschneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6765D972-876A-4F94-A170-468002498296@gmail.com \
    --to=larsxschneider@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=mlbright@gmail.com \
    --cc=peff@peff.net \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).