git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <ttaylorr@github.com>
To: "Jakub Narębski" <jnareb@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Lars Schneider <larsxschneider@gmail.com>,
	git@vger.kernel.org, Eric Wong <e@80x24.org>
Subject: Re: [PATCH v1] convert: add "status=delayed" to filter process protocol
Date: Tue, 10 Jan 2017 16:33:11 -0700	[thread overview]
Message-ID: <20170110233311.GA1572@Ida> (raw)
In-Reply-To: <ec8078ef-8ff2-d26f-ef73-5ef612737eee@gmail.com>

On Tue, Jan 10, 2017 at 11:11:01PM +0100, Jakub Narębski wrote:
> W dniu 09.01.2017 o 00:42, Junio C Hamano pisze:
> > larsxschneider@gmail.com writes:
> >> From: Lars Schneider <larsxschneider@gmail.com>
> >>
> >> Some `clean` / `smudge` filters might require a significant amount of
> >> time to process a single blob. During this process the Git checkout
> >> operation is blocked and Git needs to wait until the filter is done to
> >> continue with the checkout.
>
> Lars, what is expected use case for this feature; that is when do you
> think this problem may happen?  Is it something that happened IRL?
>
> >>
> >> Teach the filter process protocol (introduced in edcc858) to accept the
> >> status "delayed" as response to a filter request. Upon this response Git
> >> continues with the checkout operation and asks the filter to process the
> >> blob again after all other blobs have been processed.
> >
> > Hmm, I would have expected that the basic flow would become
> >
> > 	for each paths to be processed:
> > 		convert-to-worktree to buf
> > 		if not delayed:
> > 			do the caller's thing to use buf
> > 		else:
> > 			remember path
> >
> > 	for each delayed paths:
> > 		ensure filter process finished processing for path
> > 		fetch the thing to buf from the process
> > 		do the caller's thing to use buf
>
> I would expect here to have a kind of event loop, namely
>
>         while there are delayed paths:
>                 get path that is ready from filter
>                 fetch the thing to buf (supporting "delayed")
>                 if path done
>                         do the caller's thing to use buf
>                         (e.g. finish checkout path, eof convert, etc.)
>
> We can either trust filter process to tell us when it finished sending
> delayed paths, or keep list of paths that are being delayed in Git.

This makes a lot of sense to me. The "get path that is ready from filter" should
block until the filter has data that it is ready to send. This way Git isn't
wasting time in a busy-loop asking whether the filter has data ready to be sent.
It also means that if the filter has one large chunk that it's ready to write,
Git can work on that while the filter continues to process more data,
theoretically improving the performance of checkouts with many large delayed
objects.

>
> >
> > and that would make quite a lot of sense.  However, what is actually
> > implemented is a bit disappointing from that point of view.  While
> > its first part is the same as above, the latter part instead does:
> >
> > 	for each delayed paths:
> > 		checkout the path
> >
> > Presumably, checkout_entry() does the "ensure that the process is
> > done converting" (otherwise the result is simply buggy), but what
> > disappoints me is that this does not allow callers that call
> > "convert-to-working-tree", whose interface is obtain the bytestream
> > in-core in the working tree representation, given an object in the
> > object-db representation in an in-core buffer, to _use_ the result
> > of the conversion.  The caller does not have a chance to even see
> > the result as it is written straight to the filesystem, once it
> > calls checkout_delayed_entries().
> >
>

In addition to the above, I'd also like to investigate adding a "no more items"
message into the filter protocol. This would be useful for filters that
batch delayed items into groups. In particular, if the batch size is `N`, and Git
sends `2N-1` items, the second batch will be under-filled. The filter on the
other end needs some mechanism to send the second batch, even though it hasn't
hit max capacity.

Specifically, this is how Git LFS implements object transfers for data it does
not have locally, but I'm sure that this sort of functionality would be useful
for other filter implementations as well.

--
Thanks,
Taylor Blau

ttaylorr@github.com

  reply	other threads:[~2017-01-10 23:33 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-08 19:17 [PATCH v1] convert: add "status=delayed" to filter process protocol larsxschneider
2017-01-08 20:14 ` Torsten Bögershausen
2017-01-11  9:48   ` Lars Schneider
2017-01-08 20:45 ` Eric Wong
2017-01-11  9:51   ` Lars Schneider
2017-01-08 23:42 ` Junio C Hamano
2017-01-10 22:11   ` Jakub Narębski
2017-01-10 23:33     ` Taylor Blau [this message]
2017-01-11 10:20     ` Lars Schneider
2017-01-11 14:53       ` Jakub Narębski
2017-01-11 20:41         ` Junio C Hamano
2017-01-11  9:43   ` Lars Schneider
2017-01-11 20:45     ` Junio C Hamano
     [not found]   ` <20170109233816.GA70151@Ida>
2017-01-11 10:13     ` Lars Schneider
2017-01-11 17:59       ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170110233311.GA1572@Ida \
    --to=ttaylorr@github.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=larsxschneider@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).