git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Lars Schneider <larsxschneider@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, peff@peff.net, sbeller@google.com,
	Johannes.Schindelin@gmx.de, jnareb@gmail.com, mlbright@gmail.com
Subject: Re: [PATCH v6 12/13] convert: add filter.<driver>.process option
Date: Tue, 30 Aug 2016 18:27:17 +0200	[thread overview]
Message-ID: <7A378528-58DE-4C17-9F00-9708947716DD@gmail.com> (raw)
In-Reply-To: <xmqqzinv6wtg.fsf@gitster.mtv.corp.google.com>


> On 30 Aug 2016, at 00:21, Junio C Hamano <gitster@pobox.com> wrote:
> 
> larsxschneider@gmail.com writes:
> 
>> +In case the filter cannot or does not want to process the content,
>> +it is expected to respond with an "error" status. Depending on the
>> +`filter.<driver>.required` flag Git will interpret that as error
>> +but it will not stop or restart the filter process.
>> +------------------------
>> +packet:          git< status=error\n
>> +packet:          git< 0000
>> +------------------------
>> +
>> +In case the filter cannot or does not want to process the content
>> +as well as any future content for the lifetime of the Git process,
>> +it is expected to respond with an "error-all" status. Depending on
>> +the `filter.<driver>.required` flag Git will interpret that as error
>> +but it will not stop or restart the filter process.
>> +------------------------
>> +packet:          git< status=error-all\n
>> +packet:          git< 0000
>> +------------------------
> 
> This part of the document is well-written to help filter-writers.

Thanks!


> One thing that was unclear from the above to me, when read as a
> potential filter-writer, is when I am supposed to exit(2).  After I
> tell Git with error-all (I would have called it "abort", but that's
> OK) that I desire no further communication, am I free to go?  Or do
> I wait until Git somehow disconnects (perhaps by closing the packet
> stream I have been reading)?

The filter can exit right after the "error-all". If the filter does
not exit then Git will kill the filter. I'll add this to the docs.

"abort" could be ambiguous because it could be read as "abort only
this file". "abort-all" would work, though. Would you prefer to see
"error" replaced by "abort" and "error-all" by "abort-all"?


>> +If the filter dies during the communication or does not adhere to
>> +the protocol then Git will stop the filter process and restart it
>> +with the next file that needs to be processed.
> 
> Hmph, is there a reason not to retry a half-converted-and-failed
> blob with the fresh process?  Note that this is not "you must do it
> that way", and it is not even "I think doing so may be a better
> idea".  I merely want to know the reason behind this decision.

A filter that dies during communication or does not adhere to the protocol
is a faulty filter. Feeding the faulty filter after restart with the same 
blob would likely cause the same error. 

There are legitimate reasons for retries. E.g. if the filter communicates
with the network. In these cases I expect the filter to handle the retry
logic. Git just writes to and reads from pipes. I don't expect frequent
problems in that area. Plus the existing filter mechanism has no retry
either.

Later on we could easily add a "retry" capability if we deem it necessary,
though.


>> +After the filter has processed a blob it is expected to wait for
>> +the next "key=value" list containing a command. When the Git process
>> +terminates, it will send a kill signal to the filter in that stage.
> 
> The "kill" may not be very nice.  As Git side _knows_ that the
> filter is waiting for the next command, having an explicit
> "shutdown" command would give the filter a chance to implement a
> clean exit--it may have some housekeeping tasks it wants to perform
> once it is done.  The "explicit shutdown" could just be "the pipe
> gets closed", so from the implementation point of view there may not
> be anything you need to further add to this patch (after all, when
> we exit, the pipes to them would be closed), but the shutdown
> protocol and the expectation on the behaviour of filter processes
> would need to be documented.

I implemented a shutdown command in v4 [1][2] but dropped it in v5 after
a discussion with Peff [3].

[1] http://public-inbox.org/git/20160803164225.46355-8-larsxschneider@gmail.com/
[2] http://public-inbox.org/git/20160803164225.46355-13-larsxschneider@gmail.com/
[3] http://public-inbox.org/git/20160803225313.pk3tfe5ovz4y3i7l@sigill.intra.peff.net/

My main reasons to drop it:

A) There is no central place in Git that could execute code *after*
   all filter operations are complete and *before* Git exits. Therefore,
   I had to add a "clean_on_exit_handler()" to "run-command" [1]. This
   change made this series even larger and therefore harder to review.

B) If we communicate "shutdown" to the filter then we need to give the
   filter some time to perform the exit before the filter is killed on
   Git exit. I wasn't able to come up with a good answer how long Git 
   should wait for the exit.

Do you think I should resurrect the "shutdown" patch?


>> +If a `filter.<driver>.clean` or `filter.<driver>.smudge` command
>> +is configured then these commands always take precedence over
>> +a configured `filter.<driver>.process` command.
> 
> It may make more sense to give precedence to the .process (which is
> a late-comer) if defined, ignoring .clean and .smudge, than the
> other way around.

I agree.


>> +Please note that you cannot use an existing `filter.<driver>.clean`
>> +or `filter.<driver>.smudge` command with `filter.<driver>.process`
>> +because the former two use a different inter process communication
>> +protocol than the latter one.
> 
> Would it be a useful sample program we can ship in contrib/ if you
> created a "filter adapter" that reads these two configuration
> variables and act as a filter.<driver>.process?

You mean a v2 filter that would use v1 filters under the hood?
If we would drop v1, then this would be useful. Otherwise I don't
see any need for such a thing.


> During an imaginary session of "git add .", I think I found where
> you start THE filter process upon the first path that needs to be
> filtered with one for the configured <driver>, and I think the same
> place is where you reuse THE filter process, but I am not sure where
> you are cleaning up by killing the filter once all paths are added.

I don't clean them up. The filter gets killed when Git exits.


> Wouldn't you need some hooks at strategic places after such bulk
> operation to tell the multi-file-filter machinery to walk all the
> entries in cmd_process_map and tell the remaining filter processes
> that they have no more tasks, or something?

That would be nice. I tried to find these strategic places but it
turned out to be harder than I thought because Git can exit in a lot
of places.


>  Are you relying on
> these processes to exit upon a read failure after we exit and the
> pipe going to the filter is severed?

Not really. The filter process is killed by the run-command machinery
with the atexit child_cleanup_handler.

Thanks,
Lars

  reply	other threads:[~2016-08-30 16:27 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-25 11:07 [PATCH v6 00/13] Git filter protocol larsxschneider
2016-08-25 11:07 ` [PATCH v6 01/13] pkt-line: rename packet_write() to packet_write_fmt() larsxschneider
2016-08-25 11:07 ` [PATCH v6 02/13] pkt-line: extract set_packet_header() larsxschneider
2016-08-25 11:07 ` [PATCH v6 03/13] pkt-line: add packet_write_fmt_gently() larsxschneider
2016-08-25 18:12   ` Stefan Beller
2016-08-25 18:47     ` Lars Schneider
2016-08-25 21:41   ` Junio C Hamano
2016-08-26  9:17     ` Lars Schneider
2016-08-26 17:10       ` Junio C Hamano
2016-08-26 17:23         ` Jeff King
2016-08-25 11:07 ` [PATCH v6 04/13] pkt-line: add packet_flush_gently() larsxschneider
2016-08-25 11:07 ` [PATCH v6 05/13] pkt-line: add packet_write_gently() larsxschneider
2016-08-25 21:50   ` Junio C Hamano
2016-08-26  9:40     ` Lars Schneider
2016-08-26 17:15       ` Junio C Hamano
2016-08-29  9:40         ` Lars Schneider
2016-08-25 11:07 ` [PATCH v6 06/13] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
2016-08-25 18:46   ` Stefan Beller
2016-08-25 19:33     ` Lars Schneider
2016-08-25 22:31     ` Junio C Hamano
2016-08-26  0:55       ` Jacob Keller
2016-08-26 17:02         ` Stefan Beller
2016-08-26 17:21           ` Jeff King
2016-08-26 17:17         ` Junio C Hamano
2016-08-25 22:27   ` Junio C Hamano
2016-08-26 10:13     ` Lars Schneider
2016-08-26 17:21       ` Junio C Hamano
2016-08-29  9:43         ` Lars Schneider
2016-08-25 11:07 ` [PATCH v6 07/13] pack-protocol: fix maximum pkt-line size larsxschneider
2016-08-25 18:59   ` Stefan Beller
2016-08-25 19:35     ` Lars Schneider
2016-08-26 19:44       ` Junio C Hamano
2016-08-25 11:07 ` [PATCH v6 08/13] convert: quote filter names in error messages larsxschneider
2016-08-26 19:45   ` Junio C Hamano
2016-08-25 11:07 ` [PATCH v6 09/13] convert: modernize tests larsxschneider
2016-08-26 20:03   ` Junio C Hamano
2016-08-29 10:09     ` Lars Schneider
2016-08-25 11:07 ` [PATCH v6 10/13] convert: generate large test files only once larsxschneider
2016-08-25 19:17   ` Stefan Beller
2016-08-25 19:54     ` Lars Schneider
2016-08-29 17:52       ` Junio C Hamano
2016-08-30 11:47         ` Lars Schneider
2016-08-30 16:55           ` Junio C Hamano
2016-08-29 17:46   ` Junio C Hamano
2016-08-30 11:41     ` Lars Schneider
2016-08-30 16:37       ` Jeff King
2016-08-25 11:07 ` [PATCH v6 11/13] convert: make apply_filter() adhere to standard Git error handling larsxschneider
2016-08-25 11:07 ` [PATCH v6 12/13] convert: add filter.<driver>.process option larsxschneider
2016-08-29 22:21   ` Junio C Hamano
2016-08-30 16:27     ` Lars Schneider [this message]
2016-08-30 18:59       ` Junio C Hamano
2016-08-30 20:38         ` Lars Schneider
2016-08-30 22:23           ` Junio C Hamano
2016-08-31  4:57             ` Torsten Bögershausen
2016-08-31 13:14               ` Jakub Narębski
2016-08-30 20:46         ` Jakub Narębski
2016-09-05 19:47           ` Lars Schneider
2016-08-25 11:07 ` [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes larsxschneider
2016-08-29 18:05   ` Junio C Hamano
2016-08-29 19:03     ` Lars Schneider
2016-08-29 19:45       ` Junio C Hamano
2016-08-30 12:32         ` Lars Schneider
2016-08-30 14:54           ` Torsten Bögershausen
2016-09-01 17:15             ` Junio C Hamano
2016-08-29 15:39 ` [PATCH v6 00/13] Git filter protocol Lars Schneider
2016-08-29 18:09   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7A378528-58DE-4C17-9F00-9708947716DD@gmail.com \
    --to=larsxschneider@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=mlbright@gmail.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).