Re: [PATCH v3 10/10] convert: add filter.<driver>.process option

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: "Jakub Narębski" <jnareb@gmail.com>
To: Lars Schneider <larsxschneider@gmail.com>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Torsten Bögershausen" <tboegi@web.de>,
	"Martin-Louis Bright" <mlbright@gmail.com>,
	"Eric Wong" <e@80x24.org>, "Jeff King" <peff@peff.net>
Subject: Re: [PATCH v3 10/10] convert: add filter.<driver>.process option
Date: Thu, 4 Aug 2016 02:42:09 +0200	[thread overview]
Message-ID: <744bcf80-5d7e-d149-59a3-e12dd40cbea1@gmail.com> (raw)
In-Reply-To: <5180D54D-92C4-4875-AEB3-801663D70A8B@gmail.com>

[Some of those answers might have been invalidated by v4]

W dniu 01.08.2016 o 19:55, Lars Schneider pisze:
>> On 01 Aug 2016, at 00:19, Jakub Narębski <jnareb@gmail.com> wrote:
>> W dniu 30.07.2016 o 01:38, larsxschneider@gmail.com pisze:
>> [...]

>>> +static int multi_packet_read(int fd_in, struct strbuf *sb, size_t expected_bytes, int is_stream)
>>
>> About name of this function: `multi_packet_read` is fine, though I wonder
>> if `packet_read_in_full` with nearly the same parameters as `packet_read`,
>> or `packet_read_till_flush`, or `read_in_full_packetized` would be better.
> 
> I like `multi_packet_read` and will rename!
 
Errr... what? multi_packet_read() is the current name...
 
>> Also, the problem is that while we know that what packet_read() stores
>> would fit in memory (in size_t), it is not true for reading whole file,
>> which might be very large - for example huge graphical assets like raw
>> images or raw videos, or virtual machine images.  Isn't that the goal
>> of git-LFS solutions, which need this feature?  Shouldn't we have then
>> both `multi_packet_read_to_fd` and `multi_packet_read_to_buf`,
>> or whatever?
> 
> Git LFS works well with the current clean/smudge mechanism that uses the
> same on in memory buffers. I understand your concern but I think this
> improvement is out of scope for this patch series.

True.  

BTW. this means that it cannot share code with fetch / push codebase,
where Git spools from pkt-line to packfile on disk.


>> Also, total_bytes_read could overflow size_t, but then we would have
>> problems storing the result in strbuf.
> 
> Would that check be ok?
> 
> 		if (total_bytes_read > SIZE_MAX - bytes_read)
> 			return 1;  // `total_bytes_read` would overflow and is not representable

Well, if current code doesn't have such check, then I think it would
be all right to not have it either.

Note that we do not use C++ comments.
 
 

>>> +
>>> +	if (is_stream)
>>> +		strbuf_grow(sb, LARGE_PACKET_MAX);           // allocate space for at least one packet
>>> +	else
>>> +		strbuf_grow(sb, st_add(expected_bytes, 1));  // add one extra byte for the packet flush
>>> +
>>> +	do {
>>> +		bytes_read = packet_read(
>>> +			fd_in, NULL, NULL,
>>> +			sb->buf + total_bytes_read, sb->len - total_bytes_read - 1,
>>> +			PACKET_READ_GENTLE_ON_EOF
>>> +		);
>>> +		if (bytes_read < 0)
>>> +			return 1;  // unexpected EOF
>>
>> Don't we usually return negative numbers on error?  Ah, I see that the
>> return is a bool, which allows to use boolean expression with 'return'.
>> But I am still unsure if it is good API, this return value.
> 
> According to Peff zero for success is the usual style:
> http://public-inbox.org/git/20160728133523.GB21311%40sigill.intra.peff.net/

The usual case is 0 for success, but -1 (and not 1) for error.
But I agree with Peff that keeping existing API is better. 

>>> +	);
>>> +	strbuf_setlen(sb, total_bytes_read);
>>> +	return (is_stream ? 0 : expected_bytes != total_bytes_read);
>>> +}
>>> +
>>> +static int multi_packet_write_from_fd(const int fd_in, const int fd_out)
>>
>> Is it equivalent of copy_fd() function, but where destination uses pkt-line
>> and we need to pack data into pkt-lines?
> 
> Correct!

Yes, and we cannot keep the naming convention.  Though maybe mentioning
the equivalence in the comment above function would be good idea...

>>> +	return did_fail;
>>
>> Return true on fail?  Shouldn't we follow example of copy_fd()
>> from copy.c, and return COPY_READ_ERROR, or COPY_WRITE_ERROR,
>> or PKTLINE_WRITE_ERROR?
> 
> OK. How about this?
> 
> static int multi_packet_write_from_fd(const int fd_in, const int fd_out)
> {
> 	int did_fail = 0;
> 	ssize_t bytes_to_write;
> 	while (!did_fail) {
> 		bytes_to_write = xread(fd_in, PKTLINE_DATA_START(packet_buffer), PKTLINE_DATA_MAXLEN);
> 		if (bytes_to_write < 0)
> 			return COPY_READ_ERROR;
> 		if (bytes_to_write == 0)
> 			break;
> 		did_fail |= direct_packet_write(fd_out, packet_buffer, PKTLINE_HEADER_LEN + bytes_to_write, 1);
> 	}
> 	if (!did_fail)
> 		did_fail = packet_flush_gently(fd_out);
> 	return (did_fail ? COPY_WRITE_ERROR : 0);
> }

That's better, I think. 
 
>>> +}
>>> +
>>> +static int multi_packet_write_from_buf(const char *src, size_t len, int fd_out)
>>
>> It is equivalent of write_in_full(), with different order of parameters,
>> but where destination file descriptor expects pkt-line and we need to pack
>> data into pkt-lines?
> 
> True. Do you suggest to reorder parameters? I also would like to rename `src` to `src_in`, OK?

Well, no need to reorder parameters.  Better keep it the same as for
other function.  'src' is input ('source'), 'src_in' is tautologic.

>> NOTE: function description comments?
> 
> What do you mean here?

Sorry for being so cryptic.  What I meant is to think about adding comments
describing new functions just above them.
 
>>  Namely:
>>
>> - for git -> filter:
>>    * read from fd,      write pkt-line to fd  (off_t)
>>    * read from str+len, write pkt-line to fd  (size_t, ssize_t)
>> - for filter -> git:
>>    * read pkt-line from fd, write to fd       (off_t)
> 
> This one does not exist.

Right, because filter output goes to Git via strbuf.
 
>>    * read pkt-line from fd, write to str+len  (size_t, ssize_t)
[...]

>>> +	struct child_process process;
>>> +};
>>> +
>>> +static int cmd_process_map_initialized = 0;
>>> +static struct hashmap cmd_process_map;
>>
>> Reading Documentation/technical/api-hashmap.txt I see that:
>>
>>  `tablesize` is the allocated size of the hash table. A non-0 value indicates
>>  that the hashmap is initialized.
>>
>> So cmd_process_map_initialized is not really needed, is it?
> 
> I copied that from config.c:
> https://github.com/git/git/blob/f8f7adce9fc50a11a764d57815602dcb818d1816/config.c#L1425-L1428
> 
> `git grep "tablesize"` reveals that the check for `tablesize` is only used
> in hashmap.c ... so what approach should we use?

Well, git code is not always the best example... 

>>> +static int apply_protocol2_filter(const char *path, const char *src, size_t len,
>>> +						int fd, struct strbuf *dst, const char *cmd,
>>> +						const int wanted_capability)
>>
[...]

>> This is equivalent to
>>
>>   static int apply_filter(const char *path, const char *src, size_t len, int fd,
>>                           struct strbuf *dst, const char *cmd)
>>
>> Could we have extended that one instead?
> 
> Initially I had one function but that got kind of long ... I prefer two for now.

All right, we could always refactor to avoid code duplication later. 
 

>>> +
>>> +	fflush(NULL);
>>
>> This is the same as in apply_filter(), but I wonder what it is for.
> 
> "If the stream argument is NULL, fflush() flushes all
>  open output streams."
> 
> http://man7.org/linux/man-pages/man3/fflush.3.html

What I wanted to ask was not "what it does?",
but "why we need to flush here?".
 
>> This is very similar to apply_filter(), but the latter uses start_async()
>> from "run-command.h", with filter_buffer_or_fd() as asynchronous process,
>> which gets passed command to run in struct filter_params.  In this
>> function start_protocol2_filter() runs start_command(), synchronous API.
>>
>> Why the difference?
> 
> The protocol V2 requires a sequential processing of the packets. See
> discussion with Junio here:
> http://public-inbox.org/git/xmqqbn1th5qn.fsf%40gitster.mtv.corp.google.com/

I don't know what you want to refer to.  The linked email explains
why we fork/start_async() Git process, and the answer was to support
streaming.

There isn't anything there about why protocol v2 requires sequential /
synchronous processing of file output, that is write file contents in
full, then read, instead of having child write, and Git read and ready
to read (so filter driver can start writing immediately, and do not need
to wait for the other ed to stop writing / finish file).

Best regards,
-- 
Jakub Narębski

next prev parent reply	other threads:[~2016-08-04  0:55 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20160727000605.49982-1-larsxschneider%40gmail.com/>
2016-07-29 23:37 ` [PATCH v3 00/10] Git filter protocol larsxschneider
2016-07-29 23:37   ` [PATCH v3 01/10] pkt-line: extract set_packet_header() larsxschneider
2016-07-30 10:30     ` Jakub Narębski
2016-08-01 11:33       ` Lars Schneider
2016-08-03 20:05         ` Jakub Narębski
2016-08-05 11:52           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 02/10] pkt-line: add direct_packet_write() and direct_packet_write_data() larsxschneider
2016-07-30 10:49     ` Jakub Narębski
2016-08-01 12:00       ` Lars Schneider
2016-08-03 20:12         ` Jakub Narębski
2016-08-05 12:02           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 03/10] pkt-line: add packet_flush_gentle() larsxschneider
2016-07-30 12:04     ` Jakub Narębski
2016-08-01 12:28       ` Lars Schneider
2016-07-31 20:36     ` Torstem Bögershausen
2016-07-31 21:45       ` Lars Schneider
2016-08-02 19:56         ` Torsten Bögershausen
2016-08-05  9:59           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 04/10] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
2016-07-30 12:29     ` Jakub Narębski
2016-08-01 12:18       ` Lars Schneider
2016-08-03 20:15         ` Jakub Narębski
2016-07-29 23:37   ` [PATCH v3 05/10] pack-protocol: fix maximum pkt-line size larsxschneider
2016-07-30 13:58     ` Jakub Narębski
2016-08-01 12:23       ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 06/10] run-command: add clean_on_exit_handler larsxschneider
2016-07-30  9:50     ` Johannes Sixt
2016-08-01 11:14       ` Lars Schneider
2016-08-02  5:53         ` Johannes Sixt
2016-08-02  7:41           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 07/10] convert: quote filter names in error messages larsxschneider
2016-07-29 23:37   ` [PATCH v3 08/10] convert: modernize tests larsxschneider
2016-07-29 23:38   ` [PATCH v3 09/10] convert: generate large test files only once larsxschneider
2016-07-29 23:38   ` [PATCH v3 10/10] convert: add filter.<driver>.process option larsxschneider
2016-07-30 22:05     ` Jakub Narębski
2016-07-31  9:42       ` Jakub Narębski
2016-07-31 19:49         ` Lars Schneider
2016-07-31 22:59           ` Jakub Narębski
2016-08-01 13:32       ` Lars Schneider
2016-08-03 18:30         ` Designing the filter process protocol (was: Re: [PATCH v3 10/10] convert: add filter.<driver>.process option) Jakub Narębski
2016-08-05 10:32           ` Lars Schneider
2016-08-06 18:24           ` Lars Schneider
2016-08-03 22:47         ` [PATCH v3 10/10] convert: add filter.<driver>.process option Jakub Narębski
2016-07-31 22:19     ` Jakub Narębski
2016-08-01 17:55       ` Lars Schneider
2016-08-04  0:42         ` Jakub Narębski [this message]
2016-08-03 13:10       ` Lars Schneider
2016-08-04 10:18         ` Jakub Narębski
2016-08-05 13:20           ` Lars Schneider
2016-08-03 16:42   ` [PATCH v4 00/12] Git filter protocol larsxschneider
2016-08-03 16:42     ` [PATCH v4 01/12] pkt-line: extract set_packet_header() larsxschneider
2016-08-03 20:18       ` Junio C Hamano
2016-08-03 21:12         ` Jeff King
2016-08-03 21:27           ` Jeff King
2016-08-04 16:14           ` Junio C Hamano
2016-08-05 14:55             ` Lars Schneider
2016-08-05 16:31               ` Junio C Hamano
2016-08-05 17:31             ` Lars Schneider
2016-08-05 17:41               ` Junio C Hamano
2016-08-03 21:56         ` Lars Schneider
2016-08-03 16:42     ` [PATCH v4 02/12] pkt-line: add direct_packet_write() and direct_packet_write_data() larsxschneider
2016-08-03 16:42     ` [PATCH v4 03/12] pkt-line: add packet_flush_gentle() larsxschneider
2016-08-03 21:39       ` Jeff King
2016-08-03 22:56         ` [PATCH 0/7] minor trace fixes and cosmetic improvements Jeff King
2016-08-03 22:56           ` [PATCH 1/7] trace: handle NULL argument in trace_disable() Jeff King
2016-08-03 22:58           ` [PATCH 2/7] trace: stop using write_or_whine_pipe() Jeff King
2016-08-03 22:58           ` [PATCH 3/7] trace: use warning() for printing trace errors Jeff King
2016-08-04 20:41             ` Junio C Hamano
2016-08-04 21:21               ` Jeff King
2016-08-04 21:28                 ` Junio C Hamano
2016-08-05  7:56                   ` Jeff King
2016-08-05  7:59                   ` Christian Couder
2016-08-05 18:41                     ` Junio C Hamano
2016-08-03 23:00           ` [PATCH 4/7] trace: cosmetic fixes for error messages Jeff King
2016-08-04 20:42             ` Junio C Hamano
2016-08-05  8:00               ` Jeff King
2016-08-03 23:00           ` [PATCH 5/7] trace: correct variable name in write() error message Jeff King
2016-08-03 23:01           ` [PATCH 6/7] trace: disable key after write error Jeff King
2016-08-04 20:45             ` Junio C Hamano
2016-08-04 21:22               ` Jeff King
2016-08-05  7:58                 ` Jeff King
2016-08-03 23:01           ` [PATCH 7/7] write_or_die: drop write_or_whine_pipe() Jeff King
2016-08-03 23:04           ` [PATCH 0/7] minor trace fixes and cosmetic improvements Jeff King
2016-08-04 16:16         ` [PATCH v4 03/12] pkt-line: add packet_flush_gentle() Junio C Hamano
2016-08-03 16:42     ` [PATCH v4 04/12] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
2016-08-03 16:42     ` [PATCH v4 05/12] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
2016-08-03 16:42     ` [PATCH v4 06/12] pack-protocol: fix maximum pkt-line size larsxschneider
2016-08-03 16:42     ` [PATCH v4 07/12] run-command: add clean_on_exit_handler larsxschneider
2016-08-03 21:24       ` Jeff King
2016-08-03 22:15         ` Lars Schneider
2016-08-03 22:53           ` Jeff King
2016-08-03 23:09             ` Lars Schneider
2016-08-03 23:15               ` Jeff King
2016-08-05 13:08                 ` Lars Schneider
2016-08-05 21:19                   ` Torsten Bögershausen
2016-08-05 21:50                     ` Lars Schneider
2016-08-03 16:42     ` [PATCH v4 08/12] convert: quote filter names in error messages larsxschneider
2016-08-03 16:42     ` [PATCH v4 09/12] convert: modernize tests larsxschneider
2016-08-03 16:42     ` [PATCH v4 10/12] convert: generate large test files only once larsxschneider
2016-08-03 16:42     ` [PATCH v4 11/12] convert: add filter.<driver>.process option larsxschneider
2016-08-03 17:45       ` Junio C Hamano
2016-08-03 21:48         ` Lars Schneider
2016-08-03 22:46           ` Jeff King
2016-08-05 12:53             ` Lars Schneider
2016-08-03 20:29       ` Junio C Hamano
2016-08-03 21:37         ` Lars Schneider
2016-08-03 21:43           ` Junio C Hamano
2016-08-03 22:01             ` Lars Schneider
2016-08-05 21:34       ` Torsten Bögershausen
2016-08-05 21:49         ` Lars Schneider
2016-08-05 22:06         ` Junio C Hamano
2016-08-05 22:27           ` Jeff King
2016-08-06 11:55             ` Lars Schneider
2016-08-06 12:14               ` Jeff King
2016-08-06 18:19                 ` Lars Schneider
2016-08-08 15:02                   ` Jeff King
2016-08-08 16:21                     ` Lars Schneider
2016-08-08 16:26                       ` Jeff King
2016-08-06 20:40           ` Torsten Bögershausen
2016-08-03 16:42     ` [PATCH v4 12/12] convert: add filter.<driver>.process shutdown command option larsxschneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=744bcf80-5d7e-d149-59a3-e12dd40cbea1@gmail.com \
    --to=jnareb@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=larsxschneider@gmail.com \
    --cc=mlbright@gmail.com \
    --cc=peff@peff.net \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).