From: "Jakub Narębski" <jnareb@gmail.com>
To: Lars Schneider <larsxschneider@gmail.com>, git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
"Torsten Bögershausen" <tboegi@web.de>,
"Martin-Louis Bright" <mlbright@gmail.com>,
"Eric Wong" <e@80x24.org>, "Jeff King" <peff@peff.net>
Subject: Re: [PATCH v3 10/10] convert: add filter.<driver>.process option
Date: Sun, 31 Jul 2016 11:42:11 +0200 [thread overview]
Message-ID: <69988611-06ec-048d-12e7-7b87882ddc6a@gmail.com> (raw)
In-Reply-To: <b4c9ac5d-bd6b-141b-5b85-ab4aa719ccb0@gmail.com>
[Excuse me replying to myself, but there are a few things I forgot,
or realized only later]
W dniu 31.07.2016 o 00:05, Jakub Narębski pisze:
> W dniu 30.07.2016 o 01:38, larsxschneider@gmail.com pisze:
>> From: Lars Schneider <larsxschneider@gmail.com>
>>
>> Git's clean/smudge mechanism invokes an external filter process for every
>> single blob that is affected by a filter. If Git filters a lot of blobs
>> then the startup time of the external filter processes can become a
>> significant part of the overall Git execution time.
>>
>> This patch adds the filter.<driver>.process string option which, if used,
>> keeps the external filter process running and processes all blobs with
>> the following packet format (pkt-line) based protocol over standard input
>> and standard output.
>
> I think it would be nice to have here at least summary of the benchmarks
> you did in https://github.com/github/git-lfs/pull/1382
Note that this feature is especially useful if startup time is long,
that is if you are using an operating system with costly fork / new process
startup time like MS Windows (which you have mentioned), or writing
filter in a programming language with large startup time like Java
or Python (the latter may have changed since).
https://gnustavo.wordpress.com/2012/06/28/programming-languages-start-up-times/
[...]
> I was thinking about having possible responses to receiving file
> contents (or starting receiving in the streaming case) to be:
>
> packet: git< ok size=7\n (or "ok 7\n", if size is known)
>
> or
>
> packet: git< ok\n (if filter does not know size upfront)
>
> or
>
> packet: git< fail <msg>\n (or just "fail" + packet with msg)
>
> The last would be when filter knows upfront that it cannot perform
> the operation. Though sending an empty file with non-"success" final
> would work as well.
[...]
>> In case the filter cannot process the content, it is expected
>> to respond with the result content size 0 (only if "stream" is
>> not defined) and a "reject" packet.
>> ------------------------
>> packet: git< size=0\n (omitted with capability "stream")
>> packet: git< reject\n
>> ------------------------
>
> This is *wrong* idea! Empty file, with size=0, can be a perfectly
> legitimate response.
Actually, I think I have misunderstood your intent. If you want to have
simpler protocol, with only one place to signal errors, that is after
sending a response, then proper way of signaling the error condition
would be to send empty file and then "reject" instead of "success":
packet: git< size=0\n (omitted with capability "stream")
packet: git< 0000 (we need this flush packet)
packet: git< reject\n
Otherwise in the case without size upfront (capability "stream")
file with contents "reject" would be mistaken for the "reject" packet.
See below for proposal with two places to signal errors: before sending
first byte, and after.
NOTE: there is a bit of mixed and possibly confusing notation, that
is 0000 is flush packet, not packet with 0000 as content. Perhaps
write pkt-line in full?
[...]
>> ---
>> Documentation/gitattributes.txt | 84 ++++++++-
>> convert.c | 400 +++++++++++++++++++++++++++++++++++++--
>> t/t0021-conversion.sh | 405 ++++++++++++++++++++++++++++++++++++++++
>> t/t0021/rot13-filter.pl | 177 ++++++++++++++++++
>> 4 files changed, 1053 insertions(+), 13 deletions(-)
>> create mode 100755 t/t0021/rot13-filter.pl
Wouldn't it be better for easier review to split it into separate patches?
Perhaps at least the new test...
[...]
> I would assume that we have two error conditions.
>
> First situation is when the filter knows upfront (after receiving name
> and size of file, and after receiving contents for not-streaming filters)
> that it cannot process the file (like e.g. LFS filter with artifactory
> replica/shard being a bit behind master, and not including contents of
> the file being filtered).
>
> My proposal is to reply with "fail" _in place of_ size of reply:
>
> packet: git< fail\n (any case: size known or not, stream or not)
>
> It could be "reject", or "error" instead of "fail".
>
>
> Another situation is if filter encounters error during output,
> either with streaming filter (or non-stream, but not storing whole
> input upfront) realizing in the middle of output that there is something
> wrong with input (e.g. converting between encoding, and encountering
> character that cannot be represented in output encoding), or e.g. filter
> process being killed, or network connection dropping with LFS filter, etc.
> The filter has send some packets with output already. In this case
> filter should flush, and send "reject" or "error" packet.
>
> <error condition>
> packet: git< "0000" (flush packet)
> packet: git< reject\n
>
> Should there be a place for an error message, or would standard error
> (stderr) be used for this?
next prev parent reply other threads:[~2016-07-31 9:42 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20160727000605.49982-1-larsxschneider%40gmail.com/>
2016-07-29 23:37 ` [PATCH v3 00/10] Git filter protocol larsxschneider
2016-07-29 23:37 ` [PATCH v3 01/10] pkt-line: extract set_packet_header() larsxschneider
2016-07-30 10:30 ` Jakub Narębski
2016-08-01 11:33 ` Lars Schneider
2016-08-03 20:05 ` Jakub Narębski
2016-08-05 11:52 ` Lars Schneider
2016-07-29 23:37 ` [PATCH v3 02/10] pkt-line: add direct_packet_write() and direct_packet_write_data() larsxschneider
2016-07-30 10:49 ` Jakub Narębski
2016-08-01 12:00 ` Lars Schneider
2016-08-03 20:12 ` Jakub Narębski
2016-08-05 12:02 ` Lars Schneider
2016-07-29 23:37 ` [PATCH v3 03/10] pkt-line: add packet_flush_gentle() larsxschneider
2016-07-30 12:04 ` Jakub Narębski
2016-08-01 12:28 ` Lars Schneider
2016-07-31 20:36 ` Torstem Bögershausen
2016-07-31 21:45 ` Lars Schneider
2016-08-02 19:56 ` Torsten Bögershausen
2016-08-05 9:59 ` Lars Schneider
2016-07-29 23:37 ` [PATCH v3 04/10] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
2016-07-30 12:29 ` Jakub Narębski
2016-08-01 12:18 ` Lars Schneider
2016-08-03 20:15 ` Jakub Narębski
2016-07-29 23:37 ` [PATCH v3 05/10] pack-protocol: fix maximum pkt-line size larsxschneider
2016-07-30 13:58 ` Jakub Narębski
2016-08-01 12:23 ` Lars Schneider
2016-07-29 23:37 ` [PATCH v3 06/10] run-command: add clean_on_exit_handler larsxschneider
2016-07-30 9:50 ` Johannes Sixt
2016-08-01 11:14 ` Lars Schneider
2016-08-02 5:53 ` Johannes Sixt
2016-08-02 7:41 ` Lars Schneider
2016-07-29 23:37 ` [PATCH v3 07/10] convert: quote filter names in error messages larsxschneider
2016-07-29 23:37 ` [PATCH v3 08/10] convert: modernize tests larsxschneider
2016-07-29 23:38 ` [PATCH v3 09/10] convert: generate large test files only once larsxschneider
2016-07-29 23:38 ` [PATCH v3 10/10] convert: add filter.<driver>.process option larsxschneider
2016-07-30 22:05 ` Jakub Narębski
2016-07-31 9:42 ` Jakub Narębski [this message]
2016-07-31 19:49 ` Lars Schneider
2016-07-31 22:59 ` Jakub Narębski
2016-08-01 13:32 ` Lars Schneider
2016-08-03 18:30 ` Designing the filter process protocol (was: Re: [PATCH v3 10/10] convert: add filter.<driver>.process option) Jakub Narębski
2016-08-05 10:32 ` Lars Schneider
2016-08-06 18:24 ` Lars Schneider
2016-08-03 22:47 ` [PATCH v3 10/10] convert: add filter.<driver>.process option Jakub Narębski
2016-07-31 22:19 ` Jakub Narębski
2016-08-01 17:55 ` Lars Schneider
2016-08-04 0:42 ` Jakub Narębski
2016-08-03 13:10 ` Lars Schneider
2016-08-04 10:18 ` Jakub Narębski
2016-08-05 13:20 ` Lars Schneider
2016-08-03 16:42 ` [PATCH v4 00/12] Git filter protocol larsxschneider
2016-08-03 16:42 ` [PATCH v4 01/12] pkt-line: extract set_packet_header() larsxschneider
2016-08-03 20:18 ` Junio C Hamano
2016-08-03 21:12 ` Jeff King
2016-08-03 21:27 ` Jeff King
2016-08-04 16:14 ` Junio C Hamano
2016-08-05 14:55 ` Lars Schneider
2016-08-05 16:31 ` Junio C Hamano
2016-08-05 17:31 ` Lars Schneider
2016-08-05 17:41 ` Junio C Hamano
2016-08-03 21:56 ` Lars Schneider
2016-08-03 16:42 ` [PATCH v4 02/12] pkt-line: add direct_packet_write() and direct_packet_write_data() larsxschneider
2016-08-03 16:42 ` [PATCH v4 03/12] pkt-line: add packet_flush_gentle() larsxschneider
2016-08-03 21:39 ` Jeff King
2016-08-03 22:56 ` [PATCH 0/7] minor trace fixes and cosmetic improvements Jeff King
2016-08-03 22:56 ` [PATCH 1/7] trace: handle NULL argument in trace_disable() Jeff King
2016-08-03 22:58 ` [PATCH 2/7] trace: stop using write_or_whine_pipe() Jeff King
2016-08-03 22:58 ` [PATCH 3/7] trace: use warning() for printing trace errors Jeff King
2016-08-04 20:41 ` Junio C Hamano
2016-08-04 21:21 ` Jeff King
2016-08-04 21:28 ` Junio C Hamano
2016-08-05 7:56 ` Jeff King
2016-08-05 7:59 ` Christian Couder
2016-08-05 18:41 ` Junio C Hamano
2016-08-03 23:00 ` [PATCH 4/7] trace: cosmetic fixes for error messages Jeff King
2016-08-04 20:42 ` Junio C Hamano
2016-08-05 8:00 ` Jeff King
2016-08-03 23:00 ` [PATCH 5/7] trace: correct variable name in write() error message Jeff King
2016-08-03 23:01 ` [PATCH 6/7] trace: disable key after write error Jeff King
2016-08-04 20:45 ` Junio C Hamano
2016-08-04 21:22 ` Jeff King
2016-08-05 7:58 ` Jeff King
2016-08-03 23:01 ` [PATCH 7/7] write_or_die: drop write_or_whine_pipe() Jeff King
2016-08-03 23:04 ` [PATCH 0/7] minor trace fixes and cosmetic improvements Jeff King
2016-08-04 16:16 ` [PATCH v4 03/12] pkt-line: add packet_flush_gentle() Junio C Hamano
2016-08-03 16:42 ` [PATCH v4 04/12] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
2016-08-03 16:42 ` [PATCH v4 05/12] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
2016-08-03 16:42 ` [PATCH v4 06/12] pack-protocol: fix maximum pkt-line size larsxschneider
2016-08-03 16:42 ` [PATCH v4 07/12] run-command: add clean_on_exit_handler larsxschneider
2016-08-03 21:24 ` Jeff King
2016-08-03 22:15 ` Lars Schneider
2016-08-03 22:53 ` Jeff King
2016-08-03 23:09 ` Lars Schneider
2016-08-03 23:15 ` Jeff King
2016-08-05 13:08 ` Lars Schneider
2016-08-05 21:19 ` Torsten Bögershausen
2016-08-05 21:50 ` Lars Schneider
2016-08-03 16:42 ` [PATCH v4 08/12] convert: quote filter names in error messages larsxschneider
2016-08-03 16:42 ` [PATCH v4 09/12] convert: modernize tests larsxschneider
2016-08-03 16:42 ` [PATCH v4 10/12] convert: generate large test files only once larsxschneider
2016-08-03 16:42 ` [PATCH v4 11/12] convert: add filter.<driver>.process option larsxschneider
2016-08-03 17:45 ` Junio C Hamano
2016-08-03 21:48 ` Lars Schneider
2016-08-03 22:46 ` Jeff King
2016-08-05 12:53 ` Lars Schneider
2016-08-03 20:29 ` Junio C Hamano
2016-08-03 21:37 ` Lars Schneider
2016-08-03 21:43 ` Junio C Hamano
2016-08-03 22:01 ` Lars Schneider
2016-08-05 21:34 ` Torsten Bögershausen
2016-08-05 21:49 ` Lars Schneider
2016-08-05 22:06 ` Junio C Hamano
2016-08-05 22:27 ` Jeff King
2016-08-06 11:55 ` Lars Schneider
2016-08-06 12:14 ` Jeff King
2016-08-06 18:19 ` Lars Schneider
2016-08-08 15:02 ` Jeff King
2016-08-08 16:21 ` Lars Schneider
2016-08-08 16:26 ` Jeff King
2016-08-06 20:40 ` Torsten Bögershausen
2016-08-03 16:42 ` [PATCH v4 12/12] convert: add filter.<driver>.process shutdown command option larsxschneider
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=69988611-06ec-048d-12e7-7b87882ddc6a@gmail.com \
--to=jnareb@gmail.com \
--cc=e@80x24.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=larsxschneider@gmail.com \
--cc=mlbright@gmail.com \
--cc=peff@peff.net \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).