git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Jakub Narębski" <jnareb@gmail.com>
To: Lars Schneider <larsxschneider@gmail.com>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Torsten Bögershausen" <tboegi@web.de>,
	"Martin-Louis Bright" <mlbright@gmail.com>,
	"Eric Wong" <e@80x24.org>, "Jeff King" <peff@peff.net>
Subject: Re: [PATCH v3 10/10] convert: add filter.<driver>.process option
Date: Thu, 4 Aug 2016 00:47:35 +0200	[thread overview]
Message-ID: <986d7918-d701-66ba-c04f-4cda56cf598e@gmail.com> (raw)
In-Reply-To: <ABE7D2DB-C45F-4F29-8CC2-8D873FD6C36A@gmail.com>

[Note that some of this might have been invalidated by v4]

W dniu 01.08.2016 o 15:32, Lars Schneider pisze:
>> On 31 Jul 2016, at 00:05, Jakub Narębski <jnareb@gmail.com> wrote:
>> W dniu 30.07.2016 o 01:38, larsxschneider@gmail.com pisze:

>>> Git starts the filter on first usage and expects a welcome
>>> message, protocol version number, and filter capabilities
>>> separated by spaces:
>>> ------------------------
>>> packet:          git< git-filter-protocol\n
>>> packet:          git< version 2\n
>>> packet:          git< capabilities clean smudge\n
>>
>> Sorry for going back and forth, but now I think that 'capabilities' are
>> not really needed here, though they are in line with "version" in
>> the second packet / line, namely "version 2".  If it does not make
>> parsing more difficult...
> 
> I don't understand what you mean with "they are not really needed"?
> The field is necessary to understand the protocol, no?
> 
> In the last roll I added the "key=value" format to the protocol upon
> yours and Peff's suggestion. Would it be OK to change the startup
> sequence accordingly?
> 
> packet:          git< version=2\n
> packet:          git< capabilities=clean smudge\n
 
With current implementation, Git checks second packet of the handshake
for version, and third packet for capabilities.  The "capabilities" or
"capabilities=" is entirely redundant; it is the position of packet
(it is the packet number) that matters.  At least for now.

The only thing that "version" in "version 2" and "capabilities"
in "capabilities: clean smudge" helps is self-describing of the protocol.

To really make use of them you would have to end handshake with flush
packet, and do a parsing: loop over every packet, and match known
patterns.  Well, perhaps with exception of known header: it doesn't
makes sense to have "version N" anywhere else than second packet,
and it doesn't makes sense to repeat it.

We also don't want to proliferate packets unnecessarily.  Each packet
is a bit (a tiny bit) of a performance hit.
 
>>> ------------------------
>>> Supported filter capabilities are "clean", "smudge", "stream",
>>> and "shutdown".
>>
>> I'd rather put "stream" and "shutdown" capabilities into separate
>> patches, for easier review.
> 
> I agree with "shutdown". I think I would like to remove the "stream"
> option and make it the default for the following reasons:
> 
> (1) As you mentioned elsewhere, "stream" is not really streaming at this
> point because we don't read/write in parallel.

We could, following the example of original per-file filter drivers.
It is as simple as starting writer using start_async(), as if we did
writing from Git in a child process.

Though that might be left for later (assuming that protocol is flexible
enough), as synchronous protocol (write, then read) is a bit simpler to
implement.

> (2) Junio and you pointed out that if we transmit size and flush packet
> then we have redundancy in the protocol.

Providing size upfront can be a hint for filter or Git.  For example
HTTP provides Content-Length: header, though it is not strictly necessary.

> (3) With the newly introduced "success"/"reject"/"failure" packet at the 
> end of a filter operation, a filter process has a way to signal Git that
> something went wrong. Initially I had the idea that a filter process just
> stops writing and Git would detect the mismatch between expected bytes
> and received bytes. But the final status packet is a much clearer solution.

The solution with stopping writing wouldn't work, I don't think.

> (4) Maintaining two slightly different protocols is a waste of resources 
> and only increases the size of this (already large) patch.

Right, better to design and implement basic protocol, taking care that
it is extensible, and only then add to it.

> My only argument for the size packet was that this allows efficient buffer
> allocation. However, in non of my benchmarks this was actually a problem.
> Therefore this is probably a epsilon optimization and should be removed.
> 
> OK with everyone?

All right.

>>> After the filter has processed a blob it is expected to wait for
>>> the next command. A demo implementation can be found in
>>> `t/t0021/rot13-filter.pl` located in the Git core repository.
>>
>> If filter does not support "shutdown" capability (or if said
>> capability is postponed for later patch), it should behave sanely
>> when Git command reaps it (SIGTERM + wait + SIGKILL?, SIGCHLD?).
> 
> How would you do this? Don't you think the current solution is
> good enough for processes that don't need a proper shutdown?

Actually... couldn't filter driver register atexit() / signal handler
to do a clean exit, if it is needed?
 
 
>> I wonder if it would be worth it to explain the reasoning behind
>> this solution and show alternate ones.
>>
>> * Using a separate variable to signal that filters are invoked
>>   per-command rather than per-file, and use pkt-line interface,
>>   like boolean-valued `useProtocol`, or `protocolVersion` set
>>   to '2' or 'v2', or `persistence` set to 'per-command', there
>>   is high risk of user's trying to use exiting one-shot per-file
>>   filters... and Git hanging.
>>
>> * Using new variables for each capability, e.g. `processSmudge`
>>   and `processClean` would lead to explosion of variable names;
>>   I think.
>>
>> * Current solution of using `process` in addition to `clean`
>>   and `smudge` clearly says that you need to use different
>>   command for per-file (`clean` and `smudge`), and per-command
>>   filter, while allowing to use them together.
>>
>>   The possible disadvantage is Git command starting `process`
>>   filter, only to see that it doesn't offer required capability,
>>   for example offering only "clean" but not "smudge".  There
>>   is simple workaround - set `smudge` variable (same as not
>>   present capability) to empty string.
> 
> If you think it is necessary to have this discussion in the
> commit message, then I will add it.

I think it would be good idea (not necessary, but helpful), though
possibly not in such exacting detail.  Just why this one, one sentence
or more.
 
 >>> +single filter invocation for the entire life of a single Git
>>> +command. This is achieved by using the following packet
>>> +format (pkt-line, see protocol-common.txt) based protocol over
>>
>> Can we linkgit-it (to technical documentation)?
> 
> I don't think that is possible because it was never done. See:
> git grep "linkgit:tech"

A pity.  Well, not your problem, anyway.

>>> +Git starts the filter on first usage and expects a welcome
>>
>> Is "usage" here correct?  Perhaps it would be more readable
>> to say that Git starts filter when encountering first file
>> that needs cleaning or smudgeing.
> 
> OK. How about this:
> 
> Git starts the filter when it encounters the first file
> that needs to be cleaned or smudged. After the filter started
> Git expects a welcome message, protocol version number, and 
> filter capabilities separated by spaces:

Better. 

>>> +
>>> +After the filter has processed a blob it is expected to wait for
>>> +the next command. A demo implementation can be found in
>>> +`t/t0021/rot13-filter.pl` located in the Git core repository.
>>
>> It is actually in Git sources.  Is it the best way to refer to
>> such files?
> 
> Well, I could add a github.com link but I don't think everyone
> would like that. What would you suggest?

Sorry, I wasn't clear.  What I meant is if "<file> located in the
Git core repository" is the best way to refer to such files, and
if we could do better.

But I think it is all right as it is.

Later we might want to provide some example filter.<driver>.process
filters e.g. in contrib/.  But that's for the future.
 
>>> +
>>> +Please note that you cannot use an existing filter.<driver>.clean
>>> +or filter.<driver>.smudge command as filter.<driver>.process
>>> +command. As soon as Git would detect a file that needs to be
>>> +processed by this filter, it would stop responding.
>>
>> This isn't.
> 
> Would that be better?
> 
> 
> Please note that you cannot use an existing `filter.<driver>.clean`
> or `filter.<driver>.smudge` command as `filter.<driver>.process`
> command because the former two use a different inter process 
> communication protocol than the latter one. As soon as Git would detect 
> a file that needs to be processed by such an invalid "process" filter, 
> it would wait for a proper protocol handshake and appear "hanging".

This is better.

-- 
Jakub Narębski


  parent reply	other threads:[~2016-08-03 22:48 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20160727000605.49982-1-larsxschneider%40gmail.com/>
2016-07-29 23:37 ` [PATCH v3 00/10] Git filter protocol larsxschneider
2016-07-29 23:37   ` [PATCH v3 01/10] pkt-line: extract set_packet_header() larsxschneider
2016-07-30 10:30     ` Jakub Narębski
2016-08-01 11:33       ` Lars Schneider
2016-08-03 20:05         ` Jakub Narębski
2016-08-05 11:52           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 02/10] pkt-line: add direct_packet_write() and direct_packet_write_data() larsxschneider
2016-07-30 10:49     ` Jakub Narębski
2016-08-01 12:00       ` Lars Schneider
2016-08-03 20:12         ` Jakub Narębski
2016-08-05 12:02           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 03/10] pkt-line: add packet_flush_gentle() larsxschneider
2016-07-30 12:04     ` Jakub Narębski
2016-08-01 12:28       ` Lars Schneider
2016-07-31 20:36     ` Torstem Bögershausen
2016-07-31 21:45       ` Lars Schneider
2016-08-02 19:56         ` Torsten Bögershausen
2016-08-05  9:59           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 04/10] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
2016-07-30 12:29     ` Jakub Narębski
2016-08-01 12:18       ` Lars Schneider
2016-08-03 20:15         ` Jakub Narębski
2016-07-29 23:37   ` [PATCH v3 05/10] pack-protocol: fix maximum pkt-line size larsxschneider
2016-07-30 13:58     ` Jakub Narębski
2016-08-01 12:23       ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 06/10] run-command: add clean_on_exit_handler larsxschneider
2016-07-30  9:50     ` Johannes Sixt
2016-08-01 11:14       ` Lars Schneider
2016-08-02  5:53         ` Johannes Sixt
2016-08-02  7:41           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 07/10] convert: quote filter names in error messages larsxschneider
2016-07-29 23:37   ` [PATCH v3 08/10] convert: modernize tests larsxschneider
2016-07-29 23:38   ` [PATCH v3 09/10] convert: generate large test files only once larsxschneider
2016-07-29 23:38   ` [PATCH v3 10/10] convert: add filter.<driver>.process option larsxschneider
2016-07-30 22:05     ` Jakub Narębski
2016-07-31  9:42       ` Jakub Narębski
2016-07-31 19:49         ` Lars Schneider
2016-07-31 22:59           ` Jakub Narębski
2016-08-01 13:32       ` Lars Schneider
2016-08-03 18:30         ` Designing the filter process protocol (was: Re: [PATCH v3 10/10] convert: add filter.<driver>.process option) Jakub Narębski
2016-08-05 10:32           ` Lars Schneider
2016-08-06 18:24           ` Lars Schneider
2016-08-03 22:47         ` Jakub Narębski [this message]
2016-07-31 22:19     ` [PATCH v3 10/10] convert: add filter.<driver>.process option Jakub Narębski
2016-08-01 17:55       ` Lars Schneider
2016-08-04  0:42         ` Jakub Narębski
2016-08-03 13:10       ` Lars Schneider
2016-08-04 10:18         ` Jakub Narębski
2016-08-05 13:20           ` Lars Schneider
2016-08-03 16:42   ` [PATCH v4 00/12] Git filter protocol larsxschneider
2016-08-03 16:42     ` [PATCH v4 01/12] pkt-line: extract set_packet_header() larsxschneider
2016-08-03 20:18       ` Junio C Hamano
2016-08-03 21:12         ` Jeff King
2016-08-03 21:27           ` Jeff King
2016-08-04 16:14           ` Junio C Hamano
2016-08-05 14:55             ` Lars Schneider
2016-08-05 16:31               ` Junio C Hamano
2016-08-05 17:31             ` Lars Schneider
2016-08-05 17:41               ` Junio C Hamano
2016-08-03 21:56         ` Lars Schneider
2016-08-03 16:42     ` [PATCH v4 02/12] pkt-line: add direct_packet_write() and direct_packet_write_data() larsxschneider
2016-08-03 16:42     ` [PATCH v4 03/12] pkt-line: add packet_flush_gentle() larsxschneider
2016-08-03 21:39       ` Jeff King
2016-08-03 22:56         ` [PATCH 0/7] minor trace fixes and cosmetic improvements Jeff King
2016-08-03 22:56           ` [PATCH 1/7] trace: handle NULL argument in trace_disable() Jeff King
2016-08-03 22:58           ` [PATCH 2/7] trace: stop using write_or_whine_pipe() Jeff King
2016-08-03 22:58           ` [PATCH 3/7] trace: use warning() for printing trace errors Jeff King
2016-08-04 20:41             ` Junio C Hamano
2016-08-04 21:21               ` Jeff King
2016-08-04 21:28                 ` Junio C Hamano
2016-08-05  7:56                   ` Jeff King
2016-08-05  7:59                   ` Christian Couder
2016-08-05 18:41                     ` Junio C Hamano
2016-08-03 23:00           ` [PATCH 4/7] trace: cosmetic fixes for error messages Jeff King
2016-08-04 20:42             ` Junio C Hamano
2016-08-05  8:00               ` Jeff King
2016-08-03 23:00           ` [PATCH 5/7] trace: correct variable name in write() error message Jeff King
2016-08-03 23:01           ` [PATCH 6/7] trace: disable key after write error Jeff King
2016-08-04 20:45             ` Junio C Hamano
2016-08-04 21:22               ` Jeff King
2016-08-05  7:58                 ` Jeff King
2016-08-03 23:01           ` [PATCH 7/7] write_or_die: drop write_or_whine_pipe() Jeff King
2016-08-03 23:04           ` [PATCH 0/7] minor trace fixes and cosmetic improvements Jeff King
2016-08-04 16:16         ` [PATCH v4 03/12] pkt-line: add packet_flush_gentle() Junio C Hamano
2016-08-03 16:42     ` [PATCH v4 04/12] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
2016-08-03 16:42     ` [PATCH v4 05/12] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
2016-08-03 16:42     ` [PATCH v4 06/12] pack-protocol: fix maximum pkt-line size larsxschneider
2016-08-03 16:42     ` [PATCH v4 07/12] run-command: add clean_on_exit_handler larsxschneider
2016-08-03 21:24       ` Jeff King
2016-08-03 22:15         ` Lars Schneider
2016-08-03 22:53           ` Jeff King
2016-08-03 23:09             ` Lars Schneider
2016-08-03 23:15               ` Jeff King
2016-08-05 13:08                 ` Lars Schneider
2016-08-05 21:19                   ` Torsten Bögershausen
2016-08-05 21:50                     ` Lars Schneider
2016-08-03 16:42     ` [PATCH v4 08/12] convert: quote filter names in error messages larsxschneider
2016-08-03 16:42     ` [PATCH v4 09/12] convert: modernize tests larsxschneider
2016-08-03 16:42     ` [PATCH v4 10/12] convert: generate large test files only once larsxschneider
2016-08-03 16:42     ` [PATCH v4 11/12] convert: add filter.<driver>.process option larsxschneider
2016-08-03 17:45       ` Junio C Hamano
2016-08-03 21:48         ` Lars Schneider
2016-08-03 22:46           ` Jeff King
2016-08-05 12:53             ` Lars Schneider
2016-08-03 20:29       ` Junio C Hamano
2016-08-03 21:37         ` Lars Schneider
2016-08-03 21:43           ` Junio C Hamano
2016-08-03 22:01             ` Lars Schneider
2016-08-05 21:34       ` Torsten Bögershausen
2016-08-05 21:49         ` Lars Schneider
2016-08-05 22:06         ` Junio C Hamano
2016-08-05 22:27           ` Jeff King
2016-08-06 11:55             ` Lars Schneider
2016-08-06 12:14               ` Jeff King
2016-08-06 18:19                 ` Lars Schneider
2016-08-08 15:02                   ` Jeff King
2016-08-08 16:21                     ` Lars Schneider
2016-08-08 16:26                       ` Jeff King
2016-08-06 20:40           ` Torsten Bögershausen
2016-08-03 16:42     ` [PATCH v4 12/12] convert: add filter.<driver>.process shutdown command option larsxschneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=986d7918-d701-66ba-c04f-4cda56cf598e@gmail.com \
    --to=jnareb@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=larsxschneider@gmail.com \
    --cc=mlbright@gmail.com \
    --cc=peff@peff.net \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).