From: Lars Schneider <larsxschneider@gmail.com>
To: Ramsay Jones <ramsay@ramsayjones.plus.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
"Jeff King" <peff@peff.net>,
jnareb@gmail.com, "Torsten Bögershausen" <tboegi@web.de>,
mlbright@gmail.com
Subject: Re: [PATCH v1 3/3] convert: add filter.<driver>.useProtocol option
Date: Mon, 25 Jul 2016 22:24:44 +0200 [thread overview]
Message-ID: <940904FE-93EB-45E9-B3F2-54C07BBF7E54@gmail.com> (raw)
In-Reply-To: <194ea810-76ff-f32c-0f8a-57e8e60b65f5@ramsayjones.plus.com>
On 25 Jul 2016, at 00:36, Ramsay Jones <ramsay@ramsayjones.plus.com> wrote:
> On 24/07/16 18:16, Lars Schneider wrote:
>>
>> On 23 Jul 2016, at 01:19, Ramsay Jones <ramsay@ramsayjones.plus.com> wrote:
>>
>>> On 22/07/16 16:49, larsxschneider@gmail.com wrote:
>>>> From: Lars Schneider <larsxschneider@gmail.com>
>>>>
>>>> Git's clean/smudge mechanism invokes an external filter process for every
>>>> single blob that is affected by a filter. If Git filters a lot of blobs
>>>> then the startup time of the external filter processes can become a
>>>> significant part of the overall Git execution time.
>>>>
>>>> This patch adds the filter.<driver>.useProtocol option which, if enabled,
>>>> keeps the external filter process running and processes all blobs with
>>>> the following protocol over stdin/stdout.
>>>>
>>>> 1. Git starts the filter on first usage and expects a welcome message
>>>> with protocol version number:
>>>> Git <-- Filter: "git-filter-protocol\n"
>>>> Git <-- Filter: "version 1"
>>>
>>> Hmm, I was a bit surprised to see a 'filter' talk first (but so long as the
>>> interaction is fully defined, I guess it doesn't matter).
>>>
>>> [If you wanted to check for a version, you could add a "version" command
>>> instead, just like "clean" and "smudge".]
>>
>> It was a conscious decision to have the `filter` talk first. My reasoning was:
>>
>> (1) I want a reliable way to distinguish the existing filter protocol ("single-shot
>> invocation") from the new one ("long running"). I don't think there would be a
>> situation where the existing protocol would talk first. Therefore the users would
>> not accidentally mix them with a possibly half working, undetermined, outcome.
>
> If an 'single-shot' filter were incorrectly configured, instead of a new one, then
> the interaction could last a little while - since it would result in deadlock! ;-)
>
> [If Git talks first instead, configuring a 'single-shot' filter _may_ still result
> in a deadlock - depending on pipe size, etc.]
Do you think this is an issue that needs to be addressed in the first version?
If yes, I would probably look into "select" to specify a timeout for the filter.
However, wouldn't the current "single-shot" clean/smudge filter block in the
same way if they don't write anything?
>> (2) In the future we could extend the pipe protocol (see $gmane/297994, it's very
>> interesting). A filter could check Git's version and then pick the most appropriate
>> filter protocol on startup.
>>
>>
>>> [...]
>>>> +static struct cmd2process *start_protocol_filter(const char *cmd)
>>>> +{
>>>> + int ret = 1;
>>>> + struct cmd2process *entry = NULL;
>>>> + struct child_process *process = NULL;
>>>> + struct strbuf nbuf = STRBUF_INIT;
>>>> + struct string_list split = STRING_LIST_INIT_NODUP;
>>>> + const char *argv[] = { NULL, NULL };
>>>> + const char *header = "git-filter-protocol\nversion";
>>>> +
>>>> + entry = xmalloc(sizeof(*entry));
>>>> + hashmap_entry_init(entry, strhash(cmd));
>>>> + entry->cmd = cmd;
>>>> + process = &entry->process;
>>>> +
>>>> + child_process_init(process);
>>>> + argv[0] = cmd;
>>>> + process->argv = argv;
>>>> + process->use_shell = 1;
>>>> + process->in = -1;
>>>> + process->out = -1;
>>>> +
>>>> + if (start_command(process)) {
>>>> + error("cannot fork to run external persistent filter '%s'", cmd);
>>>> + return NULL;
>>>> + }
>>>> + strbuf_reset(&nbuf);
>>>> +
>>>> + sigchain_push(SIGPIPE, SIG_IGN);
>>>> + ret &= strbuf_read_once(&nbuf, process->out, 0) > 0;
>>>
>>> Hmm, how much will be read into nbuf by this single call?
>>> Since strbuf_read_once() makes a single call to xread(), with
>>> a len argument that will probably be 8192, you can not really
>>> tell how much it will read, in general. (xread() does not
>>> guarantee how many bytes it will read.)
>>>
>>> In particular, it could be less than strlen(header).
>>
>> As mentioned to Torsten in $gmane/300156, I will add a newline
>> and then read until I find the second newline. That should solve
>> the problem, right?
>>
>> (You wrote in $gmane/300119 that I should ignore your email but
>> I think you have a valid point here ;-)
>
> Heh, as I said, it was late and I was trying to do several things
> at once. (I am updating 3 installations of Linux Mint 17.3 to Linux
> Mint 18 - I decided to do a complete re-install, since I needed to
> change partition sizes anyway. I have only just got email back up ...)
>
> I stopped commenting on the patch early but, after sending the first
> email, I decided to scan the rest of your patch before going to bed
> and noticed something which would invalidate my comments ...
>
>>
>>
>>>> [...]
>>>> + sigchain_push(SIGPIPE, SIG_IGN);
>>>> + switch (entry->protocol) {
>>>> + case 1:
>>>> + if (fd >= 0 && !src) {
>>>> + ret &= fstat(fd, &fileStat) != -1;
>>>> + len = fileStat.st_size;
>>>> + }
>>>> + strbuf_reset(&nbuf);
>>>> + strbuf_addf(&nbuf, "%s\n%s\n%zu\n", filter_type, path, len);
>>>> + ret &= write_str_in_full(process->in, nbuf.buf) > 1;
>>>
>>> why not write_in_full(process->in, nbuf.buf, nbuf.len) ?
>> OK, this would save a "strlen" call. Do you think such a function could be of general
>> use? If yes, then I would add:
>>
>> static inline ssize_t write_strbuf_in_full(int fd, struct strbuf *str)
>> {
>> return write_in_full(fd, str->buf, str->len);
>> }
>
> [I don't have strong feelings either way (but I suspect it's not worth it).]
OK
>>>> + if (len > 0) {
>>>> + if (src)
>>>> + ret &= write_in_full(process->in, src, len) == len;
>>>> + else if (fd >= 0)
>>>> + ret &= copy_fd(fd, process->in) == 0;
>>>> + else
>>>> + ret &= 0;
>>>> + }
>>>> +
>>>> + strbuf_reset(&nbuf);
>>>> + while (xread(process->out, &c, 1) == 1 && c != '\n')
>>>> + strbuf_addchars(&nbuf, c, 1);
>>>> + nbuf_len = (size_t)strtol(nbuf.buf, &strtol_end, 10);
>>>> + ret &= (strtol_end != nbuf.buf && errno != ERANGE);
>>>> + strbuf_reset(&nbuf);
>>>> + if (nbuf_len > 0)
>>>> + ret &= strbuf_read_once(&nbuf, process->out, nbuf_len) == nbuf_len;
>>>
>>> Again, how many bytes will be read?
>>> Note, that in the default configuration, a _maximum_ of
>>> MAX_IO_SIZE (8MB or SSIZE_MAX, whichever is smaller) bytes
>>> will be read.
>
> ... In particular, your 2GB test case should not have worked, so
> I assumed that I had missed a loop somewhere ...
Thanks a lot for this comment. The 2GB test case was bogus... v2
will have a much improved version :-)
>> Would something like this be more appropriate?
>>
>> strbuf_reset(&nbuf);
>> if (nbuf_len > 0) {
>> strbuf_grow(&nbuf, nbuf_len);
>> ret &= read_in_full(process->out, nbuf.buf, nbuf_len) == nbuf_len;
>> }
>
> ... and this looks better. [Note: this comment would apply equally to the
> version message.]
And it works better with large files, too :D
> [Hmm, now can I remember which packages I need to install ...]
:-)
Thanks,
Lars
next prev parent reply other threads:[~2016-07-25 20:24 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-22 15:48 [PATCH v1 0/3] Git filter protocol larsxschneider
2016-07-22 15:48 ` [PATCH v1 1/3] convert: quote filter names in error messages larsxschneider
2016-07-22 15:48 ` [PATCH v1 2/3] convert: modernize tests larsxschneider
2016-07-26 15:18 ` Remi Galan Alfonso
2016-07-26 20:40 ` Junio C Hamano
2016-07-22 15:49 ` [PATCH v1 3/3] convert: add filter.<driver>.useProtocol option larsxschneider
2016-07-22 22:32 ` Torsten Bögershausen
2016-07-24 12:09 ` Lars Schneider
2016-07-22 23:19 ` Ramsay Jones
2016-07-22 23:28 ` Ramsay Jones
2016-07-24 17:16 ` Lars Schneider
2016-07-24 22:36 ` Ramsay Jones
2016-07-24 23:22 ` Jakub Narębski
2016-07-25 20:32 ` Lars Schneider
2016-07-26 10:58 ` Jakub Narębski
2016-07-25 20:24 ` Lars Schneider [this message]
2016-07-23 0:11 ` Jakub Narębski
2016-07-23 7:27 ` Eric Wong
2016-07-26 20:00 ` Jeff King
2016-07-24 18:36 ` Lars Schneider
2016-07-24 20:14 ` Jakub Narębski
2016-07-24 21:30 ` Jakub Narębski
2016-07-25 20:16 ` Lars Schneider
2016-07-26 12:24 ` Jakub Narębski
2016-07-25 20:09 ` Lars Schneider
2016-07-26 14:18 ` Jakub Narębski
2016-07-23 8:14 ` Eric Wong
2016-07-24 19:11 ` Lars Schneider
2016-07-25 7:27 ` Eric Wong
2016-07-25 15:48 ` Duy Nguyen
2016-07-22 21:39 ` [PATCH v1 0/3] Git filter protocol Junio C Hamano
2016-07-24 11:24 ` Lars Schneider
2016-07-26 20:11 ` Jeff King
2016-07-27 0:06 ` [PATCH v2 0/5] " larsxschneider
2016-07-27 0:06 ` [PATCH v2 1/5] convert: quote filter names in error messages larsxschneider
2016-07-27 20:01 ` Jakub Narębski
2016-07-28 8:23 ` Lars Schneider
2016-07-27 0:06 ` [PATCH v2 2/5] convert: modernize tests larsxschneider
2016-07-27 0:06 ` [PATCH v2 3/5] pkt-line: extract and use `set_packet_header` function larsxschneider
2016-07-27 0:20 ` Junio C Hamano
2016-07-27 9:13 ` Lars Schneider
2016-07-27 16:31 ` Junio C Hamano
2016-07-27 0:06 ` [PATCH v2 4/5] convert: generate large test files only once larsxschneider
2016-07-27 2:35 ` Torsten Bögershausen
2016-07-27 13:32 ` Jeff King
2016-07-27 16:50 ` Lars Schneider
2016-07-27 0:06 ` [PATCH v2 5/5] convert: add filter.<driver>.process option larsxschneider
2016-07-27 1:32 ` Jeff King
2016-07-27 17:31 ` Lars Schneider
2016-07-27 18:11 ` Jeff King
2016-07-28 12:10 ` Lars Schneider
2016-07-28 13:35 ` Jeff King
2016-07-27 9:41 ` Eric Wong
2016-07-29 10:38 ` Lars Schneider
2016-07-29 11:24 ` Jakub Narębski
2016-07-29 11:31 ` Lars Schneider
2016-08-05 18:55 ` Eric Wong
2016-08-05 23:26 ` Lars Schneider
2016-08-05 23:38 ` Eric Wong
2016-07-27 23:31 ` Jakub Narębski
2016-07-29 8:04 ` Lars Schneider
2016-07-29 17:35 ` Junio C Hamano
2016-07-29 23:11 ` Jakub Narębski
2016-07-29 23:44 ` Lars Schneider
2016-07-30 9:32 ` Jakub Narębski
2016-07-28 10:32 ` Torsten Bögershausen
2016-07-27 19:08 ` [PATCH v2 0/5] Git filter protocol Jakub Narębski
2016-07-28 7:16 ` Lars Schneider
2016-07-28 10:42 ` Jakub Narębski
2016-07-28 13:29 ` Jeff King
2016-07-29 7:40 ` Jakub Narębski
2016-07-29 8:14 ` Lars Schneider
2016-07-29 15:57 ` Jeff King
2016-07-29 16:20 ` Lars Schneider
2016-07-29 16:50 ` Jeff King
2016-07-29 17:43 ` Lars Schneider
2016-07-29 18:27 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=940904FE-93EB-45E9-B3F2-54C07BBF7E54@gmail.com \
--to=larsxschneider@gmail.com \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
--cc=mlbright@gmail.com \
--cc=peff@peff.net \
--cc=ramsay@ramsayjones.plus.com \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).