From: Lars Schneider <larsxschneider@gmail.com>
To: "Jakub Narębski" <jnareb@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
"Git Mailing List" <git@vger.kernel.org>,
"Torsten Bögershausen" <tboegi@web.de>,
mlbright@gmail.com,
"Remi Galan Alfonso" <remi.galan-alfonso@ensimag.grenoble-inp.fr>,
"Nguyen Thai Ngoc Duy" <pclouds@gmail.com>,
"Eric Wong" <e@80x24.org>,
"Ramsay Jones" <ramsay@ramsayjones.plus.com>,
"Jeff King" <peff@peff.net>,
"Johannes Schindelin" <johannes.schindelin@gmx.de>
Subject: Re: [PATCH v2 5/5] convert: add filter.<driver>.process option
Date: Sat, 30 Jul 2016 01:44:40 +0200 [thread overview]
Message-ID: <2435ACEE-19BE-4995-B929-BCEF658F278E@gmail.com> (raw)
In-Reply-To: <1a009e19-8830-7dea-2811-d475cf482ea3@gmail.com>
> On 30 Jul 2016, at 01:11, Jakub Narębski <jnareb@gmail.com> wrote:
>
> W dniu 2016-07-29 o 19:35, Junio C Hamano pisze:
>> Lars Schneider <larsxschneider@gmail.com> writes:
>>
>>> I think sending it upfront is nice for buffer allocations of big files
>>> and it doesn't cost us anything to do it.
>>
>> While I do NOT think "total size upfront" MUST BE avoided at all costs,
>> I do not think the above statement to justify it makes ANY sense.
>>
>> Big files are by definition something you cannot afford to hold its
>> entirety in core, so you do not want to be told that you'd be fed 40GB
>> and ask xmalloc to allocate that much.
>
> I don't know much how filter driver work internally, but in some cases
> Git reads or writes from file (file descriptor), in other cases it reads
> or writes from str+len pair (it probably predates strbuf) - I think in
> those cases file needs to fit in memory (in size_t). So in some cases
> Git reads file into memory. Whether it uses xmalloc or mmap, I don't
> know.
>
>>
>> It allows the reader to be lazy for buffer allocations as long as
>> you know the file fits in-core, at the cost of forcing the writer to
>> somehow come up with the total number of bytes even before sending a
>> single byte (in other words, if the writer cannot produce and hold
>> the data in-core, it may even have to spool the data in a temporary
>> file only to count, and then play it back after showing the total
>> size).
>
> For some types of filters you can know the size upfront:
> - for filters such as rot13, with 1-to-1 transformation, you know
> that the output size is the same as the input size
> - for block encodings, and for constant-width to constant-width
> encoding conversion, filter can calculate output size from the
> input size (e.g. <output size> = 2*<input size>)
> - filter may have get size from somewhere, for example LFS filter
> stub is constant size, and files are stored in artifactory with
> their length
>
>>
>> It is good that you allow both mode of operations and the size of
>> the data can either be given upfront (which allows a single fixed
>> allocation upfront without realloc, as long as the data fits in
>> core), or be left "(atend)".
>
> I think the protocol should be either: <size> + <contents>, or
> <size unknown> + <contents> + <flush>, that is do not use flush
> packet if size is known upfront -- it would be a second point
> of truth (SPOT principle).
As I mentioned elsewhere a <flush> packet is always send right now.
I have no strong opinion if this is good or bad. The implementation
was a little bit simpler and that's why I did it. I will implement
whatever option the majority prefers :-)
Cheers,
Lars
>
>> I just don't want to see it oversold as a "feature" that the size
>> has to come before data. That is a limitation, not a feature.
>>
>> Thanks.
>>
>
next prev parent reply other threads:[~2016-07-29 23:44 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-22 15:48 [PATCH v1 0/3] Git filter protocol larsxschneider
2016-07-22 15:48 ` [PATCH v1 1/3] convert: quote filter names in error messages larsxschneider
2016-07-22 15:48 ` [PATCH v1 2/3] convert: modernize tests larsxschneider
2016-07-26 15:18 ` Remi Galan Alfonso
2016-07-26 20:40 ` Junio C Hamano
2016-07-22 15:49 ` [PATCH v1 3/3] convert: add filter.<driver>.useProtocol option larsxschneider
2016-07-22 22:32 ` Torsten Bögershausen
2016-07-24 12:09 ` Lars Schneider
2016-07-22 23:19 ` Ramsay Jones
2016-07-22 23:28 ` Ramsay Jones
2016-07-24 17:16 ` Lars Schneider
2016-07-24 22:36 ` Ramsay Jones
2016-07-24 23:22 ` Jakub Narębski
2016-07-25 20:32 ` Lars Schneider
2016-07-26 10:58 ` Jakub Narębski
2016-07-25 20:24 ` Lars Schneider
2016-07-23 0:11 ` Jakub Narębski
2016-07-23 7:27 ` Eric Wong
2016-07-26 20:00 ` Jeff King
2016-07-24 18:36 ` Lars Schneider
2016-07-24 20:14 ` Jakub Narębski
2016-07-24 21:30 ` Jakub Narębski
2016-07-25 20:16 ` Lars Schneider
2016-07-26 12:24 ` Jakub Narębski
2016-07-25 20:09 ` Lars Schneider
2016-07-26 14:18 ` Jakub Narębski
2016-07-23 8:14 ` Eric Wong
2016-07-24 19:11 ` Lars Schneider
2016-07-25 7:27 ` Eric Wong
2016-07-25 15:48 ` Duy Nguyen
2016-07-22 21:39 ` [PATCH v1 0/3] Git filter protocol Junio C Hamano
2016-07-24 11:24 ` Lars Schneider
2016-07-26 20:11 ` Jeff King
2016-07-27 0:06 ` [PATCH v2 0/5] " larsxschneider
2016-07-27 0:06 ` [PATCH v2 1/5] convert: quote filter names in error messages larsxschneider
2016-07-27 20:01 ` Jakub Narębski
2016-07-28 8:23 ` Lars Schneider
2016-07-27 0:06 ` [PATCH v2 2/5] convert: modernize tests larsxschneider
2016-07-27 0:06 ` [PATCH v2 3/5] pkt-line: extract and use `set_packet_header` function larsxschneider
2016-07-27 0:20 ` Junio C Hamano
2016-07-27 9:13 ` Lars Schneider
2016-07-27 16:31 ` Junio C Hamano
2016-07-27 0:06 ` [PATCH v2 4/5] convert: generate large test files only once larsxschneider
2016-07-27 2:35 ` Torsten Bögershausen
2016-07-27 13:32 ` Jeff King
2016-07-27 16:50 ` Lars Schneider
2016-07-27 0:06 ` [PATCH v2 5/5] convert: add filter.<driver>.process option larsxschneider
2016-07-27 1:32 ` Jeff King
2016-07-27 17:31 ` Lars Schneider
2016-07-27 18:11 ` Jeff King
2016-07-28 12:10 ` Lars Schneider
2016-07-28 13:35 ` Jeff King
2016-07-27 9:41 ` Eric Wong
2016-07-29 10:38 ` Lars Schneider
2016-07-29 11:24 ` Jakub Narębski
2016-07-29 11:31 ` Lars Schneider
2016-08-05 18:55 ` Eric Wong
2016-08-05 23:26 ` Lars Schneider
2016-08-05 23:38 ` Eric Wong
2016-07-27 23:31 ` Jakub Narębski
2016-07-29 8:04 ` Lars Schneider
2016-07-29 17:35 ` Junio C Hamano
2016-07-29 23:11 ` Jakub Narębski
2016-07-29 23:44 ` Lars Schneider [this message]
2016-07-30 9:32 ` Jakub Narębski
2016-07-28 10:32 ` Torsten Bögershausen
2016-07-27 19:08 ` [PATCH v2 0/5] Git filter protocol Jakub Narębski
2016-07-28 7:16 ` Lars Schneider
2016-07-28 10:42 ` Jakub Narębski
2016-07-28 13:29 ` Jeff King
2016-07-29 7:40 ` Jakub Narębski
2016-07-29 8:14 ` Lars Schneider
2016-07-29 15:57 ` Jeff King
2016-07-29 16:20 ` Lars Schneider
2016-07-29 16:50 ` Jeff King
2016-07-29 17:43 ` Lars Schneider
2016-07-29 18:27 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2435ACEE-19BE-4995-B929-BCEF658F278E@gmail.com \
--to=larsxschneider@gmail.com \
--cc=e@80x24.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jnareb@gmail.com \
--cc=johannes.schindelin@gmx.de \
--cc=mlbright@gmail.com \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
--cc=ramsay@ramsayjones.plus.com \
--cc=remi.galan-alfonso@ensimag.grenoble-inp.fr \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).