git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Lars Schneider <larsxschneider@gmail.com>
To: "Jakub Narębski" <jnareb@gmail.com>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Torsten Bögershausen" <tboegi@web.de>,
	"Martin-Louis Bright" <mlbright@gmail.com>,
	"Eric Wong" <e@80x24.org>, "Jeff King" <peff@peff.net>
Subject: Re: Designing the filter process protocol (was: Re: [PATCH v3 10/10] convert: add filter.<driver>.process option)
Date: Fri, 5 Aug 2016 12:32:27 +0200	[thread overview]
Message-ID: <FE7B04D2-C017-42AF-BD30-6160D251028C@gmail.com> (raw)
In-Reply-To: <607c07fe-5b6f-fd67-13e1-705020c267ee@gmail.com>


> On 03 Aug 2016, at 20:30, Jakub Narębski <jnareb@gmail.com> wrote:
> 
> The ultimate goal is to be able to run filter drivers faster for both `clean`
> and `smudge` operations.  This is done by starting filter driver once per
> git command invocation, instead of once per file being processed.  Git needs
> to pass actual contents of files to filter driver, and get its output.
> 
> We want the protocol between Git and filter driver process to be extensible,
> so that new features can be added without modifying protocol.
> 
> 
> 1. CONFIGURATION
> 
> As I wrote, there are different ways of configuring new-type filter driver:
> 
> ...
> 
> # Using a single variable for new filter type, and decide on which phase
>   (which operation) is supported by filter driver during the handshake
>   *(current approach)*
> 
>   	[filter "protocol"]
>   		process = rot13-filtes.pl
> 
>   PROS: per-file and per-command filters possible with precedence rule;
>         extensible to other types of drivers: textconv, diff, etc.
>         only one invocation for commands which use both clean and smudge
>   CONS: need single driver to be responsible for both clean and smudge;
>         need to run driver to know that it does not support given
>           operation (workaround exists)
> 
> 
> 2. HANDSHAKE (INITIALIZATION)
> 
> Next, there is deciding on and designing the handshake between Git (between
> Git command) and the filter driver process.  With the `filter.<driver>.process`
> solution the driver needs to tell which operations among (for now) "clean"
> and "smudge" it does support.  Plus it provides a way to extend protocol,
> adding new features, like support for streaming, cleaning from file or
> smudging to file, providing size upfront, perhaps even progress report.
> 
> Current handshake consist of filter driver printing a signature, version
> number and capabilities, in that order.  Git checks that it is well formed
> and matches expectations, and notes which of "clean" and "smudge" operations
> are supported by the filter.
> 
> There is no interaction from the Git side in the handshake, for example to
> set options and expectations common to all files being filtered.  Take
> one possible extension of protocol: supporting streaming.  The filter
> driver needs to know whether it needs to read all the input, or whether
> it can start printing output while input is incoming (e.g. to reduce
> memory consumption)... though we may simply decide it to be next version
> of the protocol.
> 
> On the other hand if the handshake began with Git sending some initializer
> info to the filter driver, we probably could detect one-shot filter
> misconfigured as process-filter.

OK, I'll look into this.


> Note that we need some way of deciding where handshake ends, either by
> specifying number of entries (currently: three lines / pkt-line packets),
> or providing some terminator ("smart" transport protocol uses flush packet
> for this).
> 
> ...

Would you be OK with Peff's suggestion?

  version=2
  clean=true
  smudge=true
  0000

http://public-inbox.org/git/20160803224619.bwtbvmslhuicx2qi%40sigill.intra.peff.net/



> 3. SENDING CONTENTS (FILE TO BE FILTERED AND FILTER OUTPUT)
> 
> Next thing to design is decision how to send contents to be filtered
> to the filter driver process, and how to get filtered output from the
> filter driver process.
> 
> ...
> 
> # Send/receive data file by file, using some kind of chunking,
>   with a end-of-file marker.  The solution used by Git is
>   pkt-line, with flush packet used to signal end of file.
> 
>   This is protocol used by the current implementation.
> 
>   PROS:
>   - no need to know size upfront, so easier streaming support
>   - you can signal error that happened during output, after
>     some data were sent, as well as error known upfront
>   - tracing support for free (GIT_TRACE_PACKET)
>   CONS:
>   - filter driver program slightly more difficult to implement
>   - some negligible amount of overhead
> 
> If we want in the end to implement streaming, then the last solution
> is the way to go.
> 
> 
> 4. PER-FILE HANDSHAKE - SENDING FILE TO FILTER
> 
> Let's assume that for simplicity we want to implement (for now) only
> the synchronous (non-streaming) case, where we send whole contents
> of a file to filter driver process, and *then* read filter driver
> output.
> ...
> 
> If we are using pkt-line, then the convention is that text lines
> are terminated using LF ("\n") character.  This needs to be stated
> explicitly in the documentation for filter.<driver>.process writers.
> 
>    git> packet:  [operation] clean size=67\n
> 
> We could denote that it is operation name, but it is obvious from
> position in the stream, thus not really needed.

I would prefer not to mix command and size in one packet as it
makes parsing a little more difficult.


> ...
> 
> The Git would sent contents of the file to be filtered, using
> as many pack lines as needed (note: large file support needs
> to be tested, at least as expensive test).  Flush packet is
> used to signal the end of the file.
> 
>    git> packets:  <file contents>
>    git> flush packet

If expensive tests are enabled the test suite will process data
larger then max pkt size.


> 5. FILTER DRIVER PROCESS RESPONSE
> 
> First filter should, in my opinion, reply that it received the
> request (or the command, in the case of streaming supported).
> Also, in this response it can provide further information to
> Git process.
> 
>    git< packet: [received]  ok size=67\n

I think this would be different for real streaming and the current
non-streaming... therefore it would complicate the protocol?!
I wonder if it is truly necessary.


> This response could be used to refuse to filter specific file
> upfront (for example if the file is not present in the artifactory
> for git-LFS solutions).
> 
>   git< packet: [rejected]  reject\n

Reject is already supported in v4.


> We can even provide the reasoning to Git (maybe in the future
> extension)... or filter driver can print the explanation to the
> standard error (but then, no --quiet / --verbose support).
> 
>   git< packet: [rejected]  reject with-message\n
>   git< packet: [message]   File not found on server\n
>   git< flush packet

I think Git shouldn't care about these details. If the filter
needs to tell something then it should use stderr. 


> Another response, which I think should be standarized, or at
> least described in the documentation, is filter driver refusing
> to filter further (e.g. git-LFS and network is down), to be not
> restarted by Git.
> 
>   git< packet: [quit]      quit msg=Server error\n
> 
> or
> 
>   git< packet: [quit]      quit Server error\n
> 
> or
> 
>   git< packet: [quit]      quit with-message\n
>   git< packet: [message]   Server error\n
>   git< flush packet
> 
> Maybe this is over-engineering, but I don't think so.

Interesting idea! I will look into this for v5!


> Next comes the output from the filter driver (filtered contents),
> using possibly multiple pkt-lines, ending with a flush packet:
> 
>    git< packets:  <filtered contents>
>    git< flush packet
> 
> Note that empty file would consist of zero pack lines of contents,
> and one flush packet.
> 
> Finally, to allow handling of [resumable] errors that occurred
> during sending file contents, especially for the future streaming
> filters case, we want to confirm that we send whole file
> successfully.
> 
>    git< packet: [status]   success\n
> 
> If there was an error during process, making data receives so far
> invalid, filter driver should tell about it
> 
>    git< packet: [status]   fail\n
> 
> or
> 
>    git< packet: [status]   reject\n
> 
> This may happen for example for UCS-2 <-> UTF-8 filter when invalid
> byte sequence is encountered.  This may happen for git-LFS if the
> server fails during fetch, and spare / slave server doesn't have
> a file.

Correct!


> We may want to quit filtering at this point, and not to send another
> file.
> 
>   git< packet: [status]    quit\n

I don't get this one. Git would restart the filter as soon as it finds
another file that needs to be filtered, right?


Thanks a lot for this write up!

- Lars

  reply	other threads:[~2016-08-05 10:32 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20160727000605.49982-1-larsxschneider%40gmail.com/>
2016-07-29 23:37 ` [PATCH v3 00/10] Git filter protocol larsxschneider
2016-07-29 23:37   ` [PATCH v3 01/10] pkt-line: extract set_packet_header() larsxschneider
2016-07-30 10:30     ` Jakub Narębski
2016-08-01 11:33       ` Lars Schneider
2016-08-03 20:05         ` Jakub Narębski
2016-08-05 11:52           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 02/10] pkt-line: add direct_packet_write() and direct_packet_write_data() larsxschneider
2016-07-30 10:49     ` Jakub Narębski
2016-08-01 12:00       ` Lars Schneider
2016-08-03 20:12         ` Jakub Narębski
2016-08-05 12:02           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 03/10] pkt-line: add packet_flush_gentle() larsxschneider
2016-07-30 12:04     ` Jakub Narębski
2016-08-01 12:28       ` Lars Schneider
2016-07-31 20:36     ` Torstem Bögershausen
2016-07-31 21:45       ` Lars Schneider
2016-08-02 19:56         ` Torsten Bögershausen
2016-08-05  9:59           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 04/10] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
2016-07-30 12:29     ` Jakub Narębski
2016-08-01 12:18       ` Lars Schneider
2016-08-03 20:15         ` Jakub Narębski
2016-07-29 23:37   ` [PATCH v3 05/10] pack-protocol: fix maximum pkt-line size larsxschneider
2016-07-30 13:58     ` Jakub Narębski
2016-08-01 12:23       ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 06/10] run-command: add clean_on_exit_handler larsxschneider
2016-07-30  9:50     ` Johannes Sixt
2016-08-01 11:14       ` Lars Schneider
2016-08-02  5:53         ` Johannes Sixt
2016-08-02  7:41           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 07/10] convert: quote filter names in error messages larsxschneider
2016-07-29 23:37   ` [PATCH v3 08/10] convert: modernize tests larsxschneider
2016-07-29 23:38   ` [PATCH v3 09/10] convert: generate large test files only once larsxschneider
2016-07-29 23:38   ` [PATCH v3 10/10] convert: add filter.<driver>.process option larsxschneider
2016-07-30 22:05     ` Jakub Narębski
2016-07-31  9:42       ` Jakub Narębski
2016-07-31 19:49         ` Lars Schneider
2016-07-31 22:59           ` Jakub Narębski
2016-08-01 13:32       ` Lars Schneider
2016-08-03 18:30         ` Designing the filter process protocol (was: Re: [PATCH v3 10/10] convert: add filter.<driver>.process option) Jakub Narębski
2016-08-05 10:32           ` Lars Schneider [this message]
2016-08-06 18:24           ` Lars Schneider
2016-08-03 22:47         ` [PATCH v3 10/10] convert: add filter.<driver>.process option Jakub Narębski
2016-07-31 22:19     ` Jakub Narębski
2016-08-01 17:55       ` Lars Schneider
2016-08-04  0:42         ` Jakub Narębski
2016-08-03 13:10       ` Lars Schneider
2016-08-04 10:18         ` Jakub Narębski
2016-08-05 13:20           ` Lars Schneider
2016-08-03 16:42   ` [PATCH v4 00/12] Git filter protocol larsxschneider
2016-08-03 16:42     ` [PATCH v4 01/12] pkt-line: extract set_packet_header() larsxschneider
2016-08-03 20:18       ` Junio C Hamano
2016-08-03 21:12         ` Jeff King
2016-08-03 21:27           ` Jeff King
2016-08-04 16:14           ` Junio C Hamano
2016-08-05 14:55             ` Lars Schneider
2016-08-05 16:31               ` Junio C Hamano
2016-08-05 17:31             ` Lars Schneider
2016-08-05 17:41               ` Junio C Hamano
2016-08-03 21:56         ` Lars Schneider
2016-08-03 16:42     ` [PATCH v4 02/12] pkt-line: add direct_packet_write() and direct_packet_write_data() larsxschneider
2016-08-03 16:42     ` [PATCH v4 03/12] pkt-line: add packet_flush_gentle() larsxschneider
2016-08-03 21:39       ` Jeff King
2016-08-03 22:56         ` [PATCH 0/7] minor trace fixes and cosmetic improvements Jeff King
2016-08-03 22:56           ` [PATCH 1/7] trace: handle NULL argument in trace_disable() Jeff King
2016-08-03 22:58           ` [PATCH 2/7] trace: stop using write_or_whine_pipe() Jeff King
2016-08-03 22:58           ` [PATCH 3/7] trace: use warning() for printing trace errors Jeff King
2016-08-04 20:41             ` Junio C Hamano
2016-08-04 21:21               ` Jeff King
2016-08-04 21:28                 ` Junio C Hamano
2016-08-05  7:56                   ` Jeff King
2016-08-05  7:59                   ` Christian Couder
2016-08-05 18:41                     ` Junio C Hamano
2016-08-03 23:00           ` [PATCH 4/7] trace: cosmetic fixes for error messages Jeff King
2016-08-04 20:42             ` Junio C Hamano
2016-08-05  8:00               ` Jeff King
2016-08-03 23:00           ` [PATCH 5/7] trace: correct variable name in write() error message Jeff King
2016-08-03 23:01           ` [PATCH 6/7] trace: disable key after write error Jeff King
2016-08-04 20:45             ` Junio C Hamano
2016-08-04 21:22               ` Jeff King
2016-08-05  7:58                 ` Jeff King
2016-08-03 23:01           ` [PATCH 7/7] write_or_die: drop write_or_whine_pipe() Jeff King
2016-08-03 23:04           ` [PATCH 0/7] minor trace fixes and cosmetic improvements Jeff King
2016-08-04 16:16         ` [PATCH v4 03/12] pkt-line: add packet_flush_gentle() Junio C Hamano
2016-08-03 16:42     ` [PATCH v4 04/12] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
2016-08-03 16:42     ` [PATCH v4 05/12] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
2016-08-03 16:42     ` [PATCH v4 06/12] pack-protocol: fix maximum pkt-line size larsxschneider
2016-08-03 16:42     ` [PATCH v4 07/12] run-command: add clean_on_exit_handler larsxschneider
2016-08-03 21:24       ` Jeff King
2016-08-03 22:15         ` Lars Schneider
2016-08-03 22:53           ` Jeff King
2016-08-03 23:09             ` Lars Schneider
2016-08-03 23:15               ` Jeff King
2016-08-05 13:08                 ` Lars Schneider
2016-08-05 21:19                   ` Torsten Bögershausen
2016-08-05 21:50                     ` Lars Schneider
2016-08-03 16:42     ` [PATCH v4 08/12] convert: quote filter names in error messages larsxschneider
2016-08-03 16:42     ` [PATCH v4 09/12] convert: modernize tests larsxschneider
2016-08-03 16:42     ` [PATCH v4 10/12] convert: generate large test files only once larsxschneider
2016-08-03 16:42     ` [PATCH v4 11/12] convert: add filter.<driver>.process option larsxschneider
2016-08-03 17:45       ` Junio C Hamano
2016-08-03 21:48         ` Lars Schneider
2016-08-03 22:46           ` Jeff King
2016-08-05 12:53             ` Lars Schneider
2016-08-03 20:29       ` Junio C Hamano
2016-08-03 21:37         ` Lars Schneider
2016-08-03 21:43           ` Junio C Hamano
2016-08-03 22:01             ` Lars Schneider
2016-08-05 21:34       ` Torsten Bögershausen
2016-08-05 21:49         ` Lars Schneider
2016-08-05 22:06         ` Junio C Hamano
2016-08-05 22:27           ` Jeff King
2016-08-06 11:55             ` Lars Schneider
2016-08-06 12:14               ` Jeff King
2016-08-06 18:19                 ` Lars Schneider
2016-08-08 15:02                   ` Jeff King
2016-08-08 16:21                     ` Lars Schneider
2016-08-08 16:26                       ` Jeff King
2016-08-06 20:40           ` Torsten Bögershausen
2016-08-03 16:42     ` [PATCH v4 12/12] convert: add filter.<driver>.process shutdown command option larsxschneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FE7B04D2-C017-42AF-BD30-6160D251028C@gmail.com \
    --to=larsxschneider@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=mlbright@gmail.com \
    --cc=peff@peff.net \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).