From: "Jakub Narębski" <jnareb@gmail.com>
To: Lars Schneider <larsxschneider@gmail.com>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
"Torsten Bögershausen" <tboegi@web.de>,
"Martin-Louis Bright" <mlbright@gmail.com>,
"Eric Wong" <e@80x24.org>, "Jeff King" <peff@peff.net>
Subject: Re: [PATCH v3 10/10] convert: add filter.<driver>.process option
Date: Thu, 4 Aug 2016 00:47:35 +0200 [thread overview]
Message-ID: <986d7918-d701-66ba-c04f-4cda56cf598e@gmail.com> (raw)
In-Reply-To: <ABE7D2DB-C45F-4F29-8CC2-8D873FD6C36A@gmail.com>
[Note that some of this might have been invalidated by v4]
W dniu 01.08.2016 o 15:32, Lars Schneider pisze:
>> On 31 Jul 2016, at 00:05, Jakub Narębski <jnareb@gmail.com> wrote:
>> W dniu 30.07.2016 o 01:38, larsxschneider@gmail.com pisze:
>>> Git starts the filter on first usage and expects a welcome
>>> message, protocol version number, and filter capabilities
>>> separated by spaces:
>>> ------------------------
>>> packet: git< git-filter-protocol\n
>>> packet: git< version 2\n
>>> packet: git< capabilities clean smudge\n
>>
>> Sorry for going back and forth, but now I think that 'capabilities' are
>> not really needed here, though they are in line with "version" in
>> the second packet / line, namely "version 2". If it does not make
>> parsing more difficult...
>
> I don't understand what you mean with "they are not really needed"?
> The field is necessary to understand the protocol, no?
>
> In the last roll I added the "key=value" format to the protocol upon
> yours and Peff's suggestion. Would it be OK to change the startup
> sequence accordingly?
>
> packet: git< version=2\n
> packet: git< capabilities=clean smudge\n
With current implementation, Git checks second packet of the handshake
for version, and third packet for capabilities. The "capabilities" or
"capabilities=" is entirely redundant; it is the position of packet
(it is the packet number) that matters. At least for now.
The only thing that "version" in "version 2" and "capabilities"
in "capabilities: clean smudge" helps is self-describing of the protocol.
To really make use of them you would have to end handshake with flush
packet, and do a parsing: loop over every packet, and match known
patterns. Well, perhaps with exception of known header: it doesn't
makes sense to have "version N" anywhere else than second packet,
and it doesn't makes sense to repeat it.
We also don't want to proliferate packets unnecessarily. Each packet
is a bit (a tiny bit) of a performance hit.
>>> ------------------------
>>> Supported filter capabilities are "clean", "smudge", "stream",
>>> and "shutdown".
>>
>> I'd rather put "stream" and "shutdown" capabilities into separate
>> patches, for easier review.
>
> I agree with "shutdown". I think I would like to remove the "stream"
> option and make it the default for the following reasons:
>
> (1) As you mentioned elsewhere, "stream" is not really streaming at this
> point because we don't read/write in parallel.
We could, following the example of original per-file filter drivers.
It is as simple as starting writer using start_async(), as if we did
writing from Git in a child process.
Though that might be left for later (assuming that protocol is flexible
enough), as synchronous protocol (write, then read) is a bit simpler to
implement.
> (2) Junio and you pointed out that if we transmit size and flush packet
> then we have redundancy in the protocol.
Providing size upfront can be a hint for filter or Git. For example
HTTP provides Content-Length: header, though it is not strictly necessary.
> (3) With the newly introduced "success"/"reject"/"failure" packet at the
> end of a filter operation, a filter process has a way to signal Git that
> something went wrong. Initially I had the idea that a filter process just
> stops writing and Git would detect the mismatch between expected bytes
> and received bytes. But the final status packet is a much clearer solution.
The solution with stopping writing wouldn't work, I don't think.
> (4) Maintaining two slightly different protocols is a waste of resources
> and only increases the size of this (already large) patch.
Right, better to design and implement basic protocol, taking care that
it is extensible, and only then add to it.
> My only argument for the size packet was that this allows efficient buffer
> allocation. However, in non of my benchmarks this was actually a problem.
> Therefore this is probably a epsilon optimization and should be removed.
>
> OK with everyone?
All right.
>>> After the filter has processed a blob it is expected to wait for
>>> the next command. A demo implementation can be found in
>>> `t/t0021/rot13-filter.pl` located in the Git core repository.
>>
>> If filter does not support "shutdown" capability (or if said
>> capability is postponed for later patch), it should behave sanely
>> when Git command reaps it (SIGTERM + wait + SIGKILL?, SIGCHLD?).
>
> How would you do this? Don't you think the current solution is
> good enough for processes that don't need a proper shutdown?
Actually... couldn't filter driver register atexit() / signal handler
to do a clean exit, if it is needed?
>> I wonder if it would be worth it to explain the reasoning behind
>> this solution and show alternate ones.
>>
>> * Using a separate variable to signal that filters are invoked
>> per-command rather than per-file, and use pkt-line interface,
>> like boolean-valued `useProtocol`, or `protocolVersion` set
>> to '2' or 'v2', or `persistence` set to 'per-command', there
>> is high risk of user's trying to use exiting one-shot per-file
>> filters... and Git hanging.
>>
>> * Using new variables for each capability, e.g. `processSmudge`
>> and `processClean` would lead to explosion of variable names;
>> I think.
>>
>> * Current solution of using `process` in addition to `clean`
>> and `smudge` clearly says that you need to use different
>> command for per-file (`clean` and `smudge`), and per-command
>> filter, while allowing to use them together.
>>
>> The possible disadvantage is Git command starting `process`
>> filter, only to see that it doesn't offer required capability,
>> for example offering only "clean" but not "smudge". There
>> is simple workaround - set `smudge` variable (same as not
>> present capability) to empty string.
>
> If you think it is necessary to have this discussion in the
> commit message, then I will add it.
I think it would be good idea (not necessary, but helpful), though
possibly not in such exacting detail. Just why this one, one sentence
or more.
>>> +single filter invocation for the entire life of a single Git
>>> +command. This is achieved by using the following packet
>>> +format (pkt-line, see protocol-common.txt) based protocol over
>>
>> Can we linkgit-it (to technical documentation)?
>
> I don't think that is possible because it was never done. See:
> git grep "linkgit:tech"
A pity. Well, not your problem, anyway.
>>> +Git starts the filter on first usage and expects a welcome
>>
>> Is "usage" here correct? Perhaps it would be more readable
>> to say that Git starts filter when encountering first file
>> that needs cleaning or smudgeing.
>
> OK. How about this:
>
> Git starts the filter when it encounters the first file
> that needs to be cleaned or smudged. After the filter started
> Git expects a welcome message, protocol version number, and
> filter capabilities separated by spaces:
Better.
>>> +
>>> +After the filter has processed a blob it is expected to wait for
>>> +the next command. A demo implementation can be found in
>>> +`t/t0021/rot13-filter.pl` located in the Git core repository.
>>
>> It is actually in Git sources. Is it the best way to refer to
>> such files?
>
> Well, I could add a github.com link but I don't think everyone
> would like that. What would you suggest?
Sorry, I wasn't clear. What I meant is if "<file> located in the
Git core repository" is the best way to refer to such files, and
if we could do better.
But I think it is all right as it is.
Later we might want to provide some example filter.<driver>.process
filters e.g. in contrib/. But that's for the future.
>>> +
>>> +Please note that you cannot use an existing filter.<driver>.clean
>>> +or filter.<driver>.smudge command as filter.<driver>.process
>>> +command. As soon as Git would detect a file that needs to be
>>> +processed by this filter, it would stop responding.
>>
>> This isn't.
>
> Would that be better?
>
>
> Please note that you cannot use an existing `filter.<driver>.clean`
> or `filter.<driver>.smudge` command as `filter.<driver>.process`
> command because the former two use a different inter process
> communication protocol than the latter one. As soon as Git would detect
> a file that needs to be processed by such an invalid "process" filter,
> it would wait for a proper protocol handshake and appear "hanging".
This is better.
--
Jakub Narębski
next prev parent reply other threads:[~2016-08-03 22:48 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20160727000605.49982-1-larsxschneider%40gmail.com/>
2016-07-29 23:37 ` [PATCH v3 00/10] Git filter protocol larsxschneider
2016-07-29 23:37 ` [PATCH v3 01/10] pkt-line: extract set_packet_header() larsxschneider
2016-07-30 10:30 ` Jakub Narębski
2016-08-01 11:33 ` Lars Schneider
2016-08-03 20:05 ` Jakub Narębski
2016-08-05 11:52 ` Lars Schneider
2016-07-29 23:37 ` [PATCH v3 02/10] pkt-line: add direct_packet_write() and direct_packet_write_data() larsxschneider
2016-07-30 10:49 ` Jakub Narębski
2016-08-01 12:00 ` Lars Schneider
2016-08-03 20:12 ` Jakub Narębski
2016-08-05 12:02 ` Lars Schneider
2016-07-29 23:37 ` [PATCH v3 03/10] pkt-line: add packet_flush_gentle() larsxschneider
2016-07-30 12:04 ` Jakub Narębski
2016-08-01 12:28 ` Lars Schneider
2016-07-31 20:36 ` Torstem Bögershausen
2016-07-31 21:45 ` Lars Schneider
2016-08-02 19:56 ` Torsten Bögershausen
2016-08-05 9:59 ` Lars Schneider
2016-07-29 23:37 ` [PATCH v3 04/10] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
2016-07-30 12:29 ` Jakub Narębski
2016-08-01 12:18 ` Lars Schneider
2016-08-03 20:15 ` Jakub Narębski
2016-07-29 23:37 ` [PATCH v3 05/10] pack-protocol: fix maximum pkt-line size larsxschneider
2016-07-30 13:58 ` Jakub Narębski
2016-08-01 12:23 ` Lars Schneider
2016-07-29 23:37 ` [PATCH v3 06/10] run-command: add clean_on_exit_handler larsxschneider
2016-07-30 9:50 ` Johannes Sixt
2016-08-01 11:14 ` Lars Schneider
2016-08-02 5:53 ` Johannes Sixt
2016-08-02 7:41 ` Lars Schneider
2016-07-29 23:37 ` [PATCH v3 07/10] convert: quote filter names in error messages larsxschneider
2016-07-29 23:37 ` [PATCH v3 08/10] convert: modernize tests larsxschneider
2016-07-29 23:38 ` [PATCH v3 09/10] convert: generate large test files only once larsxschneider
2016-07-29 23:38 ` [PATCH v3 10/10] convert: add filter.<driver>.process option larsxschneider
2016-07-30 22:05 ` Jakub Narębski
2016-07-31 9:42 ` Jakub Narębski
2016-07-31 19:49 ` Lars Schneider
2016-07-31 22:59 ` Jakub Narębski
2016-08-01 13:32 ` Lars Schneider
2016-08-03 18:30 ` Designing the filter process protocol (was: Re: [PATCH v3 10/10] convert: add filter.<driver>.process option) Jakub Narębski
2016-08-05 10:32 ` Lars Schneider
2016-08-06 18:24 ` Lars Schneider
2016-08-03 22:47 ` Jakub Narębski [this message]
2016-07-31 22:19 ` [PATCH v3 10/10] convert: add filter.<driver>.process option Jakub Narębski
2016-08-01 17:55 ` Lars Schneider
2016-08-04 0:42 ` Jakub Narębski
2016-08-03 13:10 ` Lars Schneider
2016-08-04 10:18 ` Jakub Narębski
2016-08-05 13:20 ` Lars Schneider
2016-08-03 16:42 ` [PATCH v4 00/12] Git filter protocol larsxschneider
2016-08-03 16:42 ` [PATCH v4 01/12] pkt-line: extract set_packet_header() larsxschneider
2016-08-03 20:18 ` Junio C Hamano
2016-08-03 21:12 ` Jeff King
2016-08-03 21:27 ` Jeff King
2016-08-04 16:14 ` Junio C Hamano
2016-08-05 14:55 ` Lars Schneider
2016-08-05 16:31 ` Junio C Hamano
2016-08-05 17:31 ` Lars Schneider
2016-08-05 17:41 ` Junio C Hamano
2016-08-03 21:56 ` Lars Schneider
2016-08-03 16:42 ` [PATCH v4 02/12] pkt-line: add direct_packet_write() and direct_packet_write_data() larsxschneider
2016-08-03 16:42 ` [PATCH v4 03/12] pkt-line: add packet_flush_gentle() larsxschneider
2016-08-03 21:39 ` Jeff King
2016-08-03 22:56 ` [PATCH 0/7] minor trace fixes and cosmetic improvements Jeff King
2016-08-03 22:56 ` [PATCH 1/7] trace: handle NULL argument in trace_disable() Jeff King
2016-08-03 22:58 ` [PATCH 2/7] trace: stop using write_or_whine_pipe() Jeff King
2016-08-03 22:58 ` [PATCH 3/7] trace: use warning() for printing trace errors Jeff King
2016-08-04 20:41 ` Junio C Hamano
2016-08-04 21:21 ` Jeff King
2016-08-04 21:28 ` Junio C Hamano
2016-08-05 7:56 ` Jeff King
2016-08-05 7:59 ` Christian Couder
2016-08-05 18:41 ` Junio C Hamano
2016-08-03 23:00 ` [PATCH 4/7] trace: cosmetic fixes for error messages Jeff King
2016-08-04 20:42 ` Junio C Hamano
2016-08-05 8:00 ` Jeff King
2016-08-03 23:00 ` [PATCH 5/7] trace: correct variable name in write() error message Jeff King
2016-08-03 23:01 ` [PATCH 6/7] trace: disable key after write error Jeff King
2016-08-04 20:45 ` Junio C Hamano
2016-08-04 21:22 ` Jeff King
2016-08-05 7:58 ` Jeff King
2016-08-03 23:01 ` [PATCH 7/7] write_or_die: drop write_or_whine_pipe() Jeff King
2016-08-03 23:04 ` [PATCH 0/7] minor trace fixes and cosmetic improvements Jeff King
2016-08-04 16:16 ` [PATCH v4 03/12] pkt-line: add packet_flush_gentle() Junio C Hamano
2016-08-03 16:42 ` [PATCH v4 04/12] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
2016-08-03 16:42 ` [PATCH v4 05/12] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
2016-08-03 16:42 ` [PATCH v4 06/12] pack-protocol: fix maximum pkt-line size larsxschneider
2016-08-03 16:42 ` [PATCH v4 07/12] run-command: add clean_on_exit_handler larsxschneider
2016-08-03 21:24 ` Jeff King
2016-08-03 22:15 ` Lars Schneider
2016-08-03 22:53 ` Jeff King
2016-08-03 23:09 ` Lars Schneider
2016-08-03 23:15 ` Jeff King
2016-08-05 13:08 ` Lars Schneider
2016-08-05 21:19 ` Torsten Bögershausen
2016-08-05 21:50 ` Lars Schneider
2016-08-03 16:42 ` [PATCH v4 08/12] convert: quote filter names in error messages larsxschneider
2016-08-03 16:42 ` [PATCH v4 09/12] convert: modernize tests larsxschneider
2016-08-03 16:42 ` [PATCH v4 10/12] convert: generate large test files only once larsxschneider
2016-08-03 16:42 ` [PATCH v4 11/12] convert: add filter.<driver>.process option larsxschneider
2016-08-03 17:45 ` Junio C Hamano
2016-08-03 21:48 ` Lars Schneider
2016-08-03 22:46 ` Jeff King
2016-08-05 12:53 ` Lars Schneider
2016-08-03 20:29 ` Junio C Hamano
2016-08-03 21:37 ` Lars Schneider
2016-08-03 21:43 ` Junio C Hamano
2016-08-03 22:01 ` Lars Schneider
2016-08-05 21:34 ` Torsten Bögershausen
2016-08-05 21:49 ` Lars Schneider
2016-08-05 22:06 ` Junio C Hamano
2016-08-05 22:27 ` Jeff King
2016-08-06 11:55 ` Lars Schneider
2016-08-06 12:14 ` Jeff King
2016-08-06 18:19 ` Lars Schneider
2016-08-08 15:02 ` Jeff King
2016-08-08 16:21 ` Lars Schneider
2016-08-08 16:26 ` Jeff King
2016-08-06 20:40 ` Torsten Bögershausen
2016-08-03 16:42 ` [PATCH v4 12/12] convert: add filter.<driver>.process shutdown command option larsxschneider
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=986d7918-d701-66ba-c04f-4cda56cf598e@gmail.com \
--to=jnareb@gmail.com \
--cc=e@80x24.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=larsxschneider@gmail.com \
--cc=mlbright@gmail.com \
--cc=peff@peff.net \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).