git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Jakub Narębski" <jnareb@gmail.com>
To: Lars Schneider <larsxschneider@gmail.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Torsten Bögershausen" <tboegi@web.de>,
	mlbright@gmail.com, remi.galan-alfonso@ensimag.grenoble-inp.fr,
	pclouds@gmail.com, "Eric Wong" <e@80x24.org>,
	"Ramsay Jones" <ramsay@ramsayjones.plus.com>,
	"Jeff King" <peff@peff.net>
Subject: Re: [PATCH v2 0/5] Git filter protocol
Date: Thu, 28 Jul 2016 12:42:40 +0200	[thread overview]
Message-ID: <5799E1A0.2070902@gmail.com> (raw)
In-Reply-To: <BFED7ED8-40DB-46B5-A1B7-4F49624D5A62@gmail.com>

W dniu 2016-07-28 o 09:16, Lars Schneider pisze:
>> On 27 Jul 2016, at 21:08, Jakub Narębski <jnareb@gmail.com> wrote:
>> W dniu 2016-07-27 o 02:06, larsxschneider@gmail.com pisze:
>>>
>>> thanks a lot for the extensive reviews. I tried to address all mentioned
>>> concerns and summarized them below. The most prominent changes since v1 are
>>> the following:
>>> * Git offers a number of filter capabilities that a filter can request
>>>  (right now only "smudge" and "clean" - in the future maybe "cleanFromFile",
>>>  "smudgeToFile", and/or "stream")
>>> * pipe communication uses a packet format (pkt-line) based protocol
>>
>> I wonder if it would make sense to support both whole-file pipe communication,
>> and packet format (pkt-line) pipe communication.
>>
>> The problem with whole-file pipe communication (original proposal for
>> new filter protocol is that it needs file size upfront.  For some types
>> of filters it is not a problem:
>> - if a filtered file has the same size as original, like for rot13
>>   example in the test for the feature
>> - if you can calculate the resulting file size from original size,
>>   like for most if not all encryption formats (that includes GPG,
>>   uudecode, base64, quoted-printable, hex, etc.); same for decryption,
>>   and from converting between fixed-width encodings
>> - if resulting file size is saved somewhere that is easy to get, like
>>   for LFS solutions (I think).
>>
>> For other filters it is serious problem.  For example indent, keyword
>> expansion, rezipping with zero compression (well, maybe not this one,
>> but at least the reverse of it), converting between encodings where
>> at least one is variable width (like UTF-8),...
>>
>> IMHO writing whole-file persistent filters is easier than using pkt-line.
>> On the other hand using pkt-line allow for more detailed progress report.
> 
> I initially wanted to support only "while-file" pipe, too.
> 
> But Peff ($gmane/299902), Duy, and Eric, seemed to prefer the pkt-line
> solution (gmane is down - otherwise I would have given you the links).

As GMane is down (at least the web interface; NNTP seems to be running)
I cannot examine their arguments.  Could you summarize?

> 
> After I have looked at it I think the pkt-line solution is indeed nicer
> for the following reasons:
> 
> (1) A stream optimized version (read/write in separate threads) of the filter
>     protocol can be implemented in the future without changing the protocol

I think the more important thing is that with pkt-line the filter does
not need to know the size of the output upfront.  Separate threads are
independent of protocol used, I think; and anyway Git never writes to
filter and reads from filter in the same command, isn't it?  The lifetime
of filter driver command is one Git command for now.

Oh, you meant having separate threads for writing to filter, and
separate thread for receiving output, so you don't have to wait to
send whole file to filter before starting receiving?  Note that
I think original filter implementation does it; at least async_start()
used in it hints about that (I need to examine how it works to tell
more).

> (2) pkt-line is a simple and easy to implement format

But it is more complicated than whole-file based protocol.  You need
to loop over packets... well you need that tool with whole-file, but
it is covered by existing helper functions (read_in_full()).  It is
easy to redirect file descriptors (copy_fd()), while you need to
convert contents into packets on write side, and unpack and unsplit
on the receive side in Git.

You also need to take care documenting if trailing "\n\0", "\n", "\0"
is a part of packet.

> (3) Reuse of existing Git communication infrastructure
>     -> code and documentation are less surprising to people that know Git

Whole-file read is not that difficult...

>     -> you can use GIT_TRACE_PACKET to easily inspect the
>        communication between Git and the filter process

...but this is a nice advantage.

If deemed necessary, we could also reuse progress report meters from
fetch / push side (percent, bandwidth/throughput), I guess.

> (4) The overheads is neglect able (4 byte header vs 65516 byte content)

Right.

> 
> 
>>> * a long running filter application is defined with "filter.<driver>.process"
>>
>> I hope that won't confuse Git users into trying to use single-shot
>> filters with a new protocol...
> 
> Yes, that was my intention for this new config.

All right, but you need to document the precedence rules: that is 

(1) if, accordingly to the operation, `clean` or `smudge` per-file filter
    is present, it is used; 
(2) if `process` semi-persistent protocol based filter is present,
    and it offers, accordingly to the operation, `clean` or `smudge`
    capabilities, it is used;
(3) otherwise, no filtering is performed.  

`clean` or `smudge` set to empty string means identity filter; I don't know
about rule for the new `process` filter if it is set to empty string.

At least in the commit message you would need to describe why this particular
solution was chosen.  I guess it is to avoid starting `protocol` filter,
only to realize that it does not offer "smudge" capability, and that `smudge`
filter is to be used because it is set.

Best,
-- 
Jakub Narębski


  reply	other threads:[~2016-07-28 10:43 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-22 15:48 [PATCH v1 0/3] Git filter protocol larsxschneider
2016-07-22 15:48 ` [PATCH v1 1/3] convert: quote filter names in error messages larsxschneider
2016-07-22 15:48 ` [PATCH v1 2/3] convert: modernize tests larsxschneider
2016-07-26 15:18   ` Remi Galan Alfonso
2016-07-26 20:40     ` Junio C Hamano
2016-07-22 15:49 ` [PATCH v1 3/3] convert: add filter.<driver>.useProtocol option larsxschneider
2016-07-22 22:32   ` Torsten Bögershausen
2016-07-24 12:09     ` Lars Schneider
2016-07-22 23:19   ` Ramsay Jones
2016-07-22 23:28     ` Ramsay Jones
2016-07-24 17:16     ` Lars Schneider
2016-07-24 22:36       ` Ramsay Jones
2016-07-24 23:22         ` Jakub Narębski
2016-07-25 20:32           ` Lars Schneider
2016-07-26 10:58             ` Jakub Narębski
2016-07-25 20:24         ` Lars Schneider
2016-07-23  0:11   ` Jakub Narębski
2016-07-23  7:27     ` Eric Wong
2016-07-26 20:00       ` Jeff King
2016-07-24 18:36     ` Lars Schneider
2016-07-24 20:14       ` Jakub Narębski
2016-07-24 21:30         ` Jakub Narębski
2016-07-25 20:16           ` Lars Schneider
2016-07-26 12:24             ` Jakub Narębski
2016-07-25 20:09         ` Lars Schneider
2016-07-26 14:18           ` Jakub Narębski
2016-07-23  8:14   ` Eric Wong
2016-07-24 19:11     ` Lars Schneider
2016-07-25  7:27       ` Eric Wong
2016-07-25 15:48       ` Duy Nguyen
2016-07-22 21:39 ` [PATCH v1 0/3] Git filter protocol Junio C Hamano
2016-07-24 11:24   ` Lars Schneider
2016-07-26 20:11     ` Jeff King
2016-07-27  0:06 ` [PATCH v2 0/5] " larsxschneider
2016-07-27  0:06   ` [PATCH v2 1/5] convert: quote filter names in error messages larsxschneider
2016-07-27 20:01     ` Jakub Narębski
2016-07-28  8:23       ` Lars Schneider
2016-07-27  0:06   ` [PATCH v2 2/5] convert: modernize tests larsxschneider
2016-07-27  0:06   ` [PATCH v2 3/5] pkt-line: extract and use `set_packet_header` function larsxschneider
2016-07-27  0:20     ` Junio C Hamano
2016-07-27  9:13       ` Lars Schneider
2016-07-27 16:31         ` Junio C Hamano
2016-07-27  0:06   ` [PATCH v2 4/5] convert: generate large test files only once larsxschneider
2016-07-27  2:35     ` Torsten Bögershausen
2016-07-27 13:32       ` Jeff King
2016-07-27 16:50         ` Lars Schneider
2016-07-27  0:06   ` [PATCH v2 5/5] convert: add filter.<driver>.process option larsxschneider
2016-07-27  1:32     ` Jeff King
2016-07-27 17:31       ` Lars Schneider
2016-07-27 18:11         ` Jeff King
2016-07-28 12:10           ` Lars Schneider
2016-07-28 13:35             ` Jeff King
2016-07-27  9:41     ` Eric Wong
2016-07-29 10:38       ` Lars Schneider
2016-07-29 11:24         ` Jakub Narębski
2016-07-29 11:31           ` Lars Schneider
2016-08-05 18:55         ` Eric Wong
2016-08-05 23:26           ` Lars Schneider
2016-08-05 23:38             ` Eric Wong
2016-07-27 23:31     ` Jakub Narębski
2016-07-29  8:04       ` Lars Schneider
2016-07-29 17:35         ` Junio C Hamano
2016-07-29 23:11           ` Jakub Narębski
2016-07-29 23:44             ` Lars Schneider
2016-07-30  9:32               ` Jakub Narębski
2016-07-28 10:32     ` Torsten Bögershausen
2016-07-27 19:08   ` [PATCH v2 0/5] Git filter protocol Jakub Narębski
2016-07-28  7:16     ` Lars Schneider
2016-07-28 10:42       ` Jakub Narębski [this message]
2016-07-28 13:29       ` Jeff King
2016-07-29  7:40         ` Jakub Narębski
2016-07-29  8:14           ` Lars Schneider
2016-07-29 15:57             ` Jeff King
2016-07-29 16:20               ` Lars Schneider
2016-07-29 16:50                 ` Jeff King
2016-07-29 17:43                   ` Lars Schneider
2016-07-29 18:27                     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5799E1A0.2070902@gmail.com \
    --to=jnareb@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=larsxschneider@gmail.com \
    --cc=mlbright@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=remi.galan-alfonso@ensimag.grenoble-inp.fr \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).