git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Jakub Narębski" <jnareb@gmail.com>
To: Jeff King <peff@peff.net>, Lars Schneider <larsxschneider@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	gitster@pobox.com, tboegi@web.de, mlbright@gmail.com,
	remi.galan-alfonso@ensimag.grenoble-inp.fr, pclouds@gmail.com,
	e@80x24.org, ramsay@ramsayjones.plus.com
Subject: Re: [PATCH v2 0/5] Git filter protocol
Date: Fri, 29 Jul 2016 09:40:47 +0200	[thread overview]
Message-ID: <579B087F.7090108@gmail.com> (raw)
In-Reply-To: <20160728132906.GA21311@sigill.intra.peff.net>

W dniu 2016-07-28 o 15:29, Jeff King pisze:
> On Thu, Jul 28, 2016 at 09:16:18AM +0200, Lars Schneider wrote:
> 
>> But Peff ($gmane/299902), Duy, and Eric, seemed to prefer the pkt-line
>> solution (gmane is down - otherwise I would have given you the links).
> 
> FWIW, I think there are arguments for transmitting size + content
> (namely, that it is simpler); the downside is that it doesn't allow
> streaming.

And that it requires for the filter to know the size of its output
upfront (which, as I wrote, might be easy to do based on size of input
and data stored elsewhere, or might need generating whole output to
know).

I don't know how parallel Git is, but if it is parallel enough,
and other limits do not apply (limited amount of CPU cores, I/O limits),
without streaming new filter protocol might be slower, unless startup
time dominates (MS Windows?):

Current parallel:

   |   startup   | processing 1 |
    |  startup    | processing 2  |
   | startup |  processing 3 |
     |  startup  |  processing 4  |

Protocol v2:

   |  startup  | processing 1 | processing 2 | processing 3 | processing 4 |

> 
> So I think there are two viable alternatives:
> 
>   1. Total size of data in ASCII decimal, newline, then that many bytes
>      of content.
> 
>   2. No size header, then a series of pkt-lines followed by a flush
>      packet.

    3. Optional size header[2][3], then a series of pkt-lines followed
       by a flush packet[4].

[2] Git should always provide size, because it is easy to do, and
    I think quite cheap (stored with blob, stored in index, or stat()
    on file away).  Filter can provide size if it is easy to calculate,
    or approximation of size / size hint[5] - it helps to avoid
    reallocation.
[3] It is also a place where filter can pass error conditions that
    are known before starting processing a file.
[4] On one hand you need to catch cases where real size is larger than
    size sent upfront, or smaller than size sent upfront; on the
    other hand it might be a place where to send warnings and errors...
    unless we utilize stderr of a process (but then there is a problem
    of deadlocking, I think).
[5] I suggest

        <size as ascii decimal>
        "approx" SPC <size as ascii decimal>
        "unknown"
        "fail"

> And you should choose between the two based on whether it's more
> important to allow streaming, or more important to make the filter
> implementations simple[1].
> 
> Any solution that is in between those (like sending a size header and
> then using pktlines anyway) is sacrificing simplicity but not getting
> the streaming benefits.
> 
> -Peff
> 
> [1] I haven't thought hard enough about it to have a real opinion. My
>     gut says to go with the streaming, just because we've had to
>     retrofit streaming in other areas when dealing with blobs, so I
>     think we'll end up there eventually. So choosing a simpler protocol
>     like (1) would probably mean eventually implementing a next-version
>     protocol that does (2), and having to support both.
> 
> PS Jakub asked for links, but gmane is down. Here are the relevant threads:
> 
>    http://public-inbox.org/git/20160720134916.GB19359@sigill.intra.peff.net
> 
>    http://public-inbox.org/git/20160722154900.19477-1-larsxschneider%40gmail.com/t/#u
> 


  reply	other threads:[~2016-07-29  7:41 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-22 15:48 [PATCH v1 0/3] Git filter protocol larsxschneider
2016-07-22 15:48 ` [PATCH v1 1/3] convert: quote filter names in error messages larsxschneider
2016-07-22 15:48 ` [PATCH v1 2/3] convert: modernize tests larsxschneider
2016-07-26 15:18   ` Remi Galan Alfonso
2016-07-26 20:40     ` Junio C Hamano
2016-07-22 15:49 ` [PATCH v1 3/3] convert: add filter.<driver>.useProtocol option larsxschneider
2016-07-22 22:32   ` Torsten Bögershausen
2016-07-24 12:09     ` Lars Schneider
2016-07-22 23:19   ` Ramsay Jones
2016-07-22 23:28     ` Ramsay Jones
2016-07-24 17:16     ` Lars Schneider
2016-07-24 22:36       ` Ramsay Jones
2016-07-24 23:22         ` Jakub Narębski
2016-07-25 20:32           ` Lars Schneider
2016-07-26 10:58             ` Jakub Narębski
2016-07-25 20:24         ` Lars Schneider
2016-07-23  0:11   ` Jakub Narębski
2016-07-23  7:27     ` Eric Wong
2016-07-26 20:00       ` Jeff King
2016-07-24 18:36     ` Lars Schneider
2016-07-24 20:14       ` Jakub Narębski
2016-07-24 21:30         ` Jakub Narębski
2016-07-25 20:16           ` Lars Schneider
2016-07-26 12:24             ` Jakub Narębski
2016-07-25 20:09         ` Lars Schneider
2016-07-26 14:18           ` Jakub Narębski
2016-07-23  8:14   ` Eric Wong
2016-07-24 19:11     ` Lars Schneider
2016-07-25  7:27       ` Eric Wong
2016-07-25 15:48       ` Duy Nguyen
2016-07-22 21:39 ` [PATCH v1 0/3] Git filter protocol Junio C Hamano
2016-07-24 11:24   ` Lars Schneider
2016-07-26 20:11     ` Jeff King
2016-07-27  0:06 ` [PATCH v2 0/5] " larsxschneider
2016-07-27  0:06   ` [PATCH v2 1/5] convert: quote filter names in error messages larsxschneider
2016-07-27 20:01     ` Jakub Narębski
2016-07-28  8:23       ` Lars Schneider
2016-07-27  0:06   ` [PATCH v2 2/5] convert: modernize tests larsxschneider
2016-07-27  0:06   ` [PATCH v2 3/5] pkt-line: extract and use `set_packet_header` function larsxschneider
2016-07-27  0:20     ` Junio C Hamano
2016-07-27  9:13       ` Lars Schneider
2016-07-27 16:31         ` Junio C Hamano
2016-07-27  0:06   ` [PATCH v2 4/5] convert: generate large test files only once larsxschneider
2016-07-27  2:35     ` Torsten Bögershausen
2016-07-27 13:32       ` Jeff King
2016-07-27 16:50         ` Lars Schneider
2016-07-27  0:06   ` [PATCH v2 5/5] convert: add filter.<driver>.process option larsxschneider
2016-07-27  1:32     ` Jeff King
2016-07-27 17:31       ` Lars Schneider
2016-07-27 18:11         ` Jeff King
2016-07-28 12:10           ` Lars Schneider
2016-07-28 13:35             ` Jeff King
2016-07-27  9:41     ` Eric Wong
2016-07-29 10:38       ` Lars Schneider
2016-07-29 11:24         ` Jakub Narębski
2016-07-29 11:31           ` Lars Schneider
2016-08-05 18:55         ` Eric Wong
2016-08-05 23:26           ` Lars Schneider
2016-08-05 23:38             ` Eric Wong
2016-07-27 23:31     ` Jakub Narębski
2016-07-29  8:04       ` Lars Schneider
2016-07-29 17:35         ` Junio C Hamano
2016-07-29 23:11           ` Jakub Narębski
2016-07-29 23:44             ` Lars Schneider
2016-07-30  9:32               ` Jakub Narębski
2016-07-28 10:32     ` Torsten Bögershausen
2016-07-27 19:08   ` [PATCH v2 0/5] Git filter protocol Jakub Narębski
2016-07-28  7:16     ` Lars Schneider
2016-07-28 10:42       ` Jakub Narębski
2016-07-28 13:29       ` Jeff King
2016-07-29  7:40         ` Jakub Narębski [this message]
2016-07-29  8:14           ` Lars Schneider
2016-07-29 15:57             ` Jeff King
2016-07-29 16:20               ` Lars Schneider
2016-07-29 16:50                 ` Jeff King
2016-07-29 17:43                   ` Lars Schneider
2016-07-29 18:27                     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=579B087F.7090108@gmail.com \
    --to=jnareb@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=larsxschneider@gmail.com \
    --cc=mlbright@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=remi.galan-alfonso@ensimag.grenoble-inp.fr \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).