git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Lars Schneider <larsxschneider@gmail.com>
To: Taylor Blau <ttaylorr@github.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, e@80x24.org, jnareb@gmail.com
Subject: Re: [PATCH v1] convert: add "status=delayed" to filter process protocol
Date: Wed, 11 Jan 2017 11:13:00 +0100	[thread overview]
Message-ID: <F533857D-9B51-44C1-8889-AA0542AD8250@gmail.com> (raw)
In-Reply-To: <20170109233816.GA70151@Ida>


> On 10 Jan 2017, at 00:38, Taylor Blau <ttaylorr@github.com> wrote:
> 
> I've been considering some alternative approaches in order to make the
> communication between Git and any extension that implements this protocol more
> intuitive.
> 
> In particular, I'm considering alternatives to:
> 
>> 	for each delayed paths:
>> 		ensure filter process finished processing for path
>> 		fetch the thing to buf from the process
>> 		do the caller's thing to use buf
> 
> As I understand it, the above sequence of steps would force Git to either:
> 
> a) loop over all delayed paths and ask the filter if it's done processing,
>   creating a busy-loop between the filter and Git, or...
> b) loop over all delayed paths sequentially, checking out each path in sequence
> 
> I would like to avoid both of those situations, and instead opt for an
> asynchronous approach. In (a), the protocol is far too chatty. In (b), the
> protocol is much less chatty, but forces the checkout to be the very last step,
> which has negative performance implications on checkouts with many large files.
> 
> For instance, checking out several multi-gigabyte files one after the other
> means that a significant amount of time is lost while the filter has some of the
> items ready. Instead of checking them out as they become available, Git waits
> until the very end when they are all available.
> 
> I think it would be preferable for the protocol to specify a sort of "done"
> signal against each path such that Git could check out delayed paths as they
> become available. If implemented this way, Git could checkout files
> asynchronously, while the filter continues to do work on the other end.

In v1 I implemented a) with the busy-loop problem in mind. 

My thinking was this:

If the filter sees at least one filter request twice then the filter knows that
Git has already requested all files that require filtering. At that point the
filter could just block the "delayed" answer to the latest filter request until
at least one of the previously delayed requests can be fulfilled. Then the filter
answers "delay" to Git until Git requests the blob that can be fulfilled. This
process cycles until all requests can be fulfilled. Wouldn't that work?

I think a "done" message by the filter is not easy. Right now the protocol works 
in a mode were Git always asks and the filter always answers. I believe changing
the filter to be able to initiate a "done" message would complicated the protocol.


> Additionally, the protocol should specify a sentinel "no more entries" value
> that could be sent from Git to the filter to signal that there are no more files
> to checkout. Some filters may implement mechanisms for converting files that
> require a signal to know when all files have been sent. Specifically, Git LFS
> (https://git-lfs.github.com) batches files to be transferred together, and needs
> to know when all files have been announced to truncate and send the last batch,
> if it is not yet full. I'm sure other filter implementations use a similar
> mechanism and would benefit from this as well.

I agree. I think the filter already has this info implicitly as explained above
but an explicit message would be better!

Thanks,
Lars

  parent reply	other threads:[~2017-01-11 10:13 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-08 19:17 [PATCH v1] convert: add "status=delayed" to filter process protocol larsxschneider
2017-01-08 20:14 ` Torsten Bögershausen
2017-01-11  9:48   ` Lars Schneider
2017-01-08 20:45 ` Eric Wong
2017-01-11  9:51   ` Lars Schneider
2017-01-08 23:42 ` Junio C Hamano
2017-01-10 22:11   ` Jakub Narębski
2017-01-10 23:33     ` Taylor Blau
2017-01-11 10:20     ` Lars Schneider
2017-01-11 14:53       ` Jakub Narębski
2017-01-11 20:41         ` Junio C Hamano
2017-01-11  9:43   ` Lars Schneider
2017-01-11 20:45     ` Junio C Hamano
     [not found]   ` <20170109233816.GA70151@Ida>
2017-01-11 10:13     ` Lars Schneider [this message]
2017-01-11 17:59       ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F533857D-9B51-44C1-8889-AA0542AD8250@gmail.com \
    --to=larsxschneider@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=ttaylorr@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).