Re: [PATCH v3 10/10] convert: add filter.<driver>.process option

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Lars Schneider <larsxschneider@gmail.com>
To: "Jakub Narębski" <jnareb@gmail.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Torsten Bögershausen" <tboegi@web.de>,
	"Martin-Louis Bright" <mlbright@gmail.com>,
	"Eric Wong" <e@80x24.org>, "Jeff King" <peff@peff.net>
Subject: Re: [PATCH v3 10/10] convert: add filter.<driver>.process option
Date: Wed, 3 Aug 2016 15:10:19 +0200	[thread overview]
Message-ID: <9DDA993E-2AFD-4C69-8E22-58601EEC8A40@gmail.com> (raw)
In-Reply-To: <2f4743d1-3c93-406d-8b44-da0eb075e65c@gmail.com>


> On 01 Aug 2016, at 00:19, Jakub Narębski <jnareb@gmail.com> wrote:
> 
> W dniu 30.07.2016 o 01:38, larsxschneider@gmail.com pisze:
> 

[LONG SNIP]

First part answered here:
http://public-inbox.org/git/5180D54D-92C4-4875-AEB3-801663D70A8B%40gmail.com/

> 
>> +	}
>> +	process = &entry->process;
>> +
>> +	if (!(wanted_capability & entry->supported_capabilities))
>> +		return 1;  // it is OK if the wanted capability is not supported
>> +
>> +	if FILTER_SUPPORTS_CLEAN(wanted_capability)
>> +		filter_type = "clean";
>> +	else if FILTER_SUPPORTS_SMUDGE(wanted_capability)
>> +		filter_type = "smudge";
>> +	else
>> +		die("unexpected filter type");
> 
> Style: it should be
> 
>  +	if (FILTER_SUPPORTS_CLEAN(wanted_capability))
>  +		filter_type = "clean";
>  +	else if (FILTER_SUPPORTS_SMUDGE(wanted_capability))
>  +		filter_type = "smudge";
>  +	else
>  +		die("unexpected filter type");
> 
> even though by accident the macro provides the parentheses to "if".

Agreed.


> Can we make an error/die message more detailed?  Maybe it is
> not possible...

Yeah, I don't see an easy way...

> 
>> +
>> +	if (fd >= 0 && !src) {
>> +		if (fstat(fd, &file_stat) == -1)
>> +			return 0;
>> +		len = file_stat.st_size;
>> +	}
> 
> All right, when fstat() can fail?  Could we then send contents without
> size upfront, or is it better to require size to make it more consistent
> for filter drivers scripts?

If fstat() fails then there is clearly something wrong and the filter
should fail.


> Could this whole "send single file" be put in a separate function?
> Or is it not worth it?

This function would have almost the same signature as apply_protocol2_filter
and therefore I would say it's not worth it since the function is not
crazy long.


> 
>> +
>> +	sigchain_push(SIGPIPE, SIG_IGN);
> 
> Hmmm... ignoring SIGPIPE was good for one-shot filters.  Is it still
> O.K. for per-command persistent ones?

Very good question. You are right... we don't want to ignore any errors
during the protocol... I will remove it.


> 
>> +
>> +	packet_buf_write(&nbuf, "%s\n", filter_type);
>> +	ret &= !direct_packet_write(process->in, nbuf.buf, nbuf.len, 1);
>> +
>> +	if (ret) {
>> +		strbuf_reset(&nbuf);
>> +		packet_buf_write(&nbuf, "filename=%s\n", path);
>> +		ret = !direct_packet_write(process->in, nbuf.buf, nbuf.len, 1);
>> +	}
> 
> Perhaps a better solution would be
> 
>        if (err)
>        	goto fin_error;
> 
> rather than this.

OK, I change it to goto error handling style.

> 
>> +
>> +	if (ret) {
>> +		strbuf_reset(&nbuf);
>> +		packet_buf_write(&nbuf, "size=%"PRIuMAX"\n", (uintmax_t)len);
>> +		ret = !direct_packet_write(process->in, nbuf.buf, nbuf.len, 1);
>> +	}
> 
> Or maybe extract writing the header for a file into a separate function?
> This one gets a bit long...

Maybe... but I think that would make it harder to understand the protocol. I
think I would prefer to have all the communication in one function layer.


>> +
>> +	if (ret) {
>> +		if (fd >= 0)
>> +			ret = !multi_packet_write_from_fd(fd, process->in);
>> +		else
>> +			ret = !multi_packet_write_from_buf(src, len, process->in);
>> +	}
> 
> This is not streaming.  The above sends whole file, or whole string to
> the filter process, without draining filter output.  If the filter were
> to read some, then write some, it might deadlock on full buffers, isn't it?
> Or am I mistaken?

Correct.


>> +
>> +	if (ret && !FILTER_SUPPORTS_STREAM(entry->supported_capabilities)) {
>> +		strbuf = packet_read_line(process->out, NULL);
>> +		if (strlen(strbuf) > 5 && !strncmp("size=", strbuf, 5)) {
>> +			expected_bytes = (off_t)strtol(strbuf + 5, &strtol_end, 10);
>> +			ret = (strtol_end != strbuf && errno != ERANGE);
>> +		} else {
>> +			ret = 0;
>> +		}
>> +	}
>> +
>> +	if (ret) {
>> +		strbuf_reset(&nbuf);
>> +		ret = !multi_packet_read(process->out, &nbuf, expected_bytes,
>> +			FILTER_SUPPORTS_STREAM(entry->supported_capabilities));
>> +	}
> 
> What happens if the output of filter does not fit in size_t?  I see that
> (I think) this problem is inherited from the original implementation.

Correct. And therefore I would prefer not to change this in this series.


>> +
>> +	if (ret) {
>> +		filter_result = packet_read_line(process->out, NULL);
>> +		ret = !strcmp(filter_result, "success");
>> +	}
>> +
>> +	sigchain_pop(SIGPIPE);
>> +
>> +	if (ret) {
>> +		strbuf_swap(dst, &nbuf);
>> +	} else {
>> +		if (!filter_result || strcmp(filter_result, "reject")) {
>> +			// Something went wrong with the protocol filter. Force shutdown!
>> +			error("external filter '%s' failed", cmd);
>> +			kill_protocol2_filter(&cmd_process_map, entry);
>> +		}
>> +	}
> 
> So if Git gets finish signal "success" from filter, it accepts the output.
> If Git gets finish signal "reject" from filter, it restarts filter (and
> reject the output - user can retry the command himself / herself).
> If Git gets any other finish signal, for example "error" (but this is not
> standarized), then it rejects the output, keeping the unfiltered result,
> but keeps filtering.
> 
> I think it is not described in this detail in the documentation of the
> new protocol.

Agreed, will add!

> 
>> +	strbuf_release(&nbuf);
>> +	return ret;
>> +}
> 
> I wonder if this point might be start of the new patch... but then you
> would have no way to test what you wrote.
> 
>> +
>> static struct convert_driver {
>> 	const char *name;
>> 	struct convert_driver *next;
>> 	const char *smudge;
>> 	const char *clean;
>> +	const char *process;
>> 	int required;
>> } *user_convert, **user_convert_tail;
> 
> All right.
> 
>> 
>> @@ -526,6 +871,10 @@ static int read_convert_config(const char *var, const char *value, void *cb)
>> 	if (!strcmp("clean", key))
>> 		return git_config_string(&drv->clean, var, value);
>> 
>> +	if (!strcmp("process", key)) {
>> +		return git_config_string(&drv->process, var, value);
>> +	}
>> +
> 
> All right.
> 
>> 	if (!strcmp("required", key)) {
>> 		drv->required = git_config_bool(var, value);
>> 		return 0;
>> @@ -823,7 +1172,12 @@ int would_convert_to_git_filter_fd(const char *path)
>> 	if (!ca.drv->required)
>> 		return 0;
>> 
>> -	return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean);
>> +	if (!ca.drv->clean && ca.drv->process)
>> +		return apply_protocol2_filter(
>> +			path, NULL, 0, -1, NULL, ca.drv->process, FILTER_CAPABILITIES_CLEAN
>> +		);
>> +	else
>> +		return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean);
> 
> Could we augment apply_filter() instead, so that the invocation is
> 
>        return apply_filter(path, NULL, 0, -1, NULL, ca.drv, FILTER_CLEAN);
> 
> Though I am not sure if moving this conditional to apply_filter would
> be a good idea; maybe wrapper around augmented apply_filter_do()?

Yes, a wrapper makes it way cleaner!


>> }
>> 
>> const char *get_convert_attr_ascii(const char *path)
>> @@ -856,17 +1210,24 @@ int convert_to_git(const char *path, const char *src, size_t len,
>>                    struct strbuf *dst, enum safe_crlf checksafe)
>> {
>> 	int ret = 0;
>> -	const char *filter = NULL;
>> +	const char *clean_filter = NULL;
>> +	const char *process_filter = NULL;
>> 	int required = 0;
>> 	struct conv_attrs ca;
>> 
>> 	convert_attrs(&ca, path);
>> 	if (ca.drv) {
>> -		filter = ca.drv->clean;
>> +		clean_filter = ca.drv->clean;
>> +		process_filter = ca.drv->process;
>> 		required = ca.drv->required;
>> 	}
> 
> All right (assuming un-augmented apply_filter()).
> 
>> 
>> -	ret |= apply_filter(path, src, len, -1, dst, filter);
>> +	if (!clean_filter && process_filter)
>> +		ret |= apply_protocol2_filter(
>> +			path, src, len, -1, dst, process_filter, FILTER_CAPABILITIES_CLEAN
>> +		);
>> +	else
>> +		ret |= apply_filter(path, src, len, -1, dst, clean_filter);
> 
> I wonder if it would be more readable to write it like this
> (and of course elsewhere too):
> 
>  +	if (!clean_filter && process_filter)
>  +		ret |= apply_protocol2_filter(
>  +			path, src, len, -1, dst, process_filter, FILTER_CAPABILITIES_CLEAN
>  +		);
>  +	else
>  +		ret |= apply_filter(
>  +			path, src, len, -1, dst, clean_filter);
>  +		);
> 
> 
> Though it would screw up "git blame -C -C -w"

Obsolete with the wrapper mentioned above.


>> 	if (!ret && required)
>> 		die("%s: clean filter '%s' failed", path, ca.drv->name);
>> 
>> @@ -885,13 +1246,21 @@ int convert_to_git(const char *path, const char *src, size_t len,
>> void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
>> 			      enum safe_crlf checksafe)
>> {
>> +	int ret = 0;
> 
> Right, 'ret' is needed because we now have two possibilities:
> `clean` filter and `process` filter.
> 
>> 	struct conv_attrs ca;
>> 	convert_attrs(&ca, path);
>> 
>> 	assert(ca.drv);
>> -	assert(ca.drv->clean);
>> +	assert(ca.drv->clean || ca.drv->process);
>> +
>> +	if (!ca.drv->clean && ca.drv->process)
>> +		ret = apply_protocol2_filter(
>> +			path, NULL, 0, fd, dst, ca.drv->process, FILTER_CAPABILITIES_CLEAN
>> +		);
>> +	else
>> +		ret = apply_filter(path, NULL, 0, fd, dst, ca.drv->clean);
>> 
>> -	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
>> +	if (!ret)
>> 		die("%s: clean filter '%s' failed", path, ca.drv->name);
>> 
>> 	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
>> @@ -902,14 +1271,16 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
>> 					    size_t len, struct strbuf *dst,
>> 					    int normalizing)
>> {
>> -	int ret = 0, ret_filter = 0;
>> -	const char *filter = NULL;
>> +	int ret = 0, ret_filter;
> 
> Why the change:
> 
>  -	int ret = 0, ret_filter = 0;
>  +	int ret = 0, ret_filter;

Reverted with the wrapper.


>> +	const char *smudge_filter = NULL;
>> +	const char *process_filter = NULL;
>> 	int required = 0;
>> 	struct conv_attrs ca;
>> 
>> 	convert_attrs(&ca, path);
>> 	if (ca.drv) {
>> -		filter = ca.drv->smudge;
>> +		process_filter = ca.drv->process;
>> +		smudge_filter = ca.drv->smudge;
>> 		required = ca.drv->required;
>> 	}
> 
> All right, the same.
> 
> [...]
>> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
>> index 34c8eb9..e8a7703 100755
>> --- a/t/t0021-conversion.sh
>> +++ b/t/t0021-conversion.sh
>> @@ -296,4 +296,409 @@ test_expect_success 'disable filter with empty override' '
>> 	test_must_be_empty err
>> '
>> 
>> +test_expect_success PERL 'required process filter should filter data' '
>> +	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge shutdown" &&
>> +	test_config_global filter.protocol.required true &&
>> +	rm -rf repo &&
>> +	mkdir repo &&
>> +	(
>> +		cd repo &&
>> +		git init &&
>> +
>> +		echo "*.r filter=protocol" >.gitattributes &&
>> +		git add . &&
>> +		git commit . -m "test commit" &&
> 
> This is more of "Initial commit", not that it matters
> 
>> +		git branch empty &&
>> +
>> +		cat ../test.o >test.r &&
> 
> Err, the above is just copying file, isn't it?
> Maybe it was copied from other tests, I have not checked.

It was created in the "setup" test.


>> +		echo "test22" >test2.r &&
>> +		mkdir testsubdir &&
>> +		echo "test333" >testsubdir/test3.r &&
> 
> All right, we test text file, we test binary file (I assume), we test
> file in a subdirectory.  What about testing empty file?  Or large file
> which would not fit in the stdin/stdout buffer (as EXPENSIVE test)?

No binary file. The main reason for this test is to check multiple files.
I'll add a empty file. A large file is tested in the next test.

> 
>> +
>> +		rm -f rot13-filter.log &&
>> +		git add . &&
> 
> So this runs "clean" filter, storing cleaned contents in the index.

Correct.


>> +		sort rot13-filter.log | uniq -c | sed "s/^[ ]*//" >uniq-rot13-filter.log &&
>> +		cat >expected_add.log <<-\EOF &&
>> +			1 IN: clean test.r 57 [OK] -- OUT: 57 [OK]
>> +			1 IN: clean test2.r 7 [OK] -- OUT: 7 [OK]
>> +			1 IN: clean testsubdir/test3.r 8 [OK] -- OUT: 8 [OK]
> 
> And we check the "know size upfront" case (mistakenly called non-"stream").

Correct - however, I removed non-stream


>> +			1 IN: shutdown -- [OK]
> 
> And test "shutdown" capability (not as separate test).

Fixed.


>> +			1 start
>> +			1 wrote filter header
>> +		EOF
> 
> And we are required to keep the expected_add.log file sorted by hand???

Well, the clean invocations (and therefore their order of appearance)
are not deterministic. See my discussion with Junio here:
http://public-inbox.org/git/xmqqshv18i8i.fsf%40gitster.mtv.corp.google.com/

> 
>> +		test_cmp expected_add.log uniq-rot13-filter.log &&
>> +
>> +		>rot13-filter.log &&
> 
> Truncate log. Still in the same test.
> 
>> +		git commit . -m "test commit" &&
> 
> This is test commit with files undergoing "clean" part of filter.
> 
>> +		sort rot13-filter.log | uniq -c | sed "s/^[ ]*//" |
>> +			sed "s/^\([0-9]\) IN: clean/x IN: clean/" >uniq-rot13-filter.log &&
> 
> There is known performance regression, in that filter is run more
> than once on given file.
> 
> Actually... why it does not use cleaned-up contents from the index?

See discussion here: http://public-inbox.org/git/20160722152753.GA6859%40sigill.intra.peff.net/


>> +		cat >expected_commit.log <<-\EOF &&
>> +			x IN: clean test.r 57 [OK] -- OUT: 57 [OK]
>> +			x IN: clean test2.r 7 [OK] -- OUT: 7 [OK]
>> +			x IN: clean testsubdir/test3.r 8 [OK] -- OUT: 8 [OK]
>> +			1 IN: shutdown -- [OK]
>> +			1 start
>> +			1 wrote filter header
> 
> Right, this is the goal of the patch series: for filter to be started
> only once per git command invocation.
> 
>> +		EOF
>> +		test_cmp expected_commit.log uniq-rot13-filter.log &&
>> +
> 
> Still in the same test, even though we would be testing "smudge"
> capability now.  
> 
> It's a pity that t/test-lib.sh does not support subtests from
> the TAP specification (Test Anything Protocol that Git testsuite
> uses).
> 
>> +		>rot13-filter.log &&
>> +		rm -f test?.r testsubdir/test3.r &&
>> +		git checkout . &&
> 
> All right, we removed some files so that "git checkout ." could
> restore them to life.
> 
>> +		cat rot13-filter.log | grep -v "IN: clean" >smudge-rot13-filter.log &&
> 
> Useless use of cat
> 
>  +		grep -v "IN: clean"  rot13-filter.log  >smudge-rot13-filter.log &&

Fixed, thanks!


> Also: why 'git checkout <path>' would run "clean" filter?
> Is it existing strange behaviour?

AFAIK, that's existing behavior.


>> +		cat >expected_checkout.log <<-\EOF &&
>> +			start
>> +			wrote filter header
>> +			IN: smudge test2.r 7 [OK] -- OUT: 7 [OK]
>> +			IN: smudge testsubdir/test3.r 8 [OK] -- OUT: 8 [OK]
>> +			IN: shutdown -- [OK]
>> +		EOF
> 
> This time without 'sort | uniq -c'.

Yes, because the smudge calls are deterministic!


>  Is it really needed for the
> "good" case, or is it there for two cases to look similar?

I am not sure what you mean?!


>> +		test_cmp expected_checkout.log smudge-rot13-filter.log &&
>> +
>> +		git checkout empty &&
> 
> Shouldn't we check that switching to branch 'empty' does not run
> filters, or is it covered by other tests?  Or perhaps this simply
> does not matter here, is it?

Easy enough to check. I will add this.

> 
>> +
>> +		>rot13-filter.log &&
>> +		git checkout master &&
> 
> Does it test different callpath than 'git checkout .'?  Well, the
> set of files is different...
> 
>> +		cat rot13-filter.log | grep -v "IN: clean" >smudge-rot13-filter.log &&
>> +		cat >expected_checkout_master.log <<-\EOF &&
>> +			start
>> +			wrote filter header
>> +			IN: smudge test.r 57 [OK] -- OUT: 57 [OK]
>> +			IN: smudge test2.r 7 [OK] -- OUT: 7 [OK]
>> +			IN: smudge testsubdir/test3.r 8 [OK] -- OUT: 8 [OK]
>> +			IN: shutdown -- [OK]
>> +		EOF
>> +		test_cmp expected_checkout_master.log smudge-rot13-filter.log &&
>> +
> 
> And here we start checking that the filter did filter,
> that is the content in the repository is "clean"ed-up.
> Still the same test.
> 
>> +		./../rot13.sh <test.r >expected &&
>> +		git cat-file blob :test.r >actual &&
>> +		test_cmp expected actual &&
>> +
>> +		./../rot13.sh <test2.r >expected &&
>> +		git cat-file blob :test2.r >actual &&
>> +		test_cmp expected actual &&
>> +
>> +		./../rot13.sh <testsubdir/test3.r >expected &&
>> +		git cat-file blob :testsubdir/test3.r >actual &&
>> +		test_cmp expected actual
>> +	)
>> +'
>> +
>> +test_expect_success PERL 'required process filter should filter data stream' '
>> +	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl stream clean smudge" &&
>> +	test_config_global filter.protocol.required true &&
> 
> Errr... I don't see how it is different from the previous test.
> [...]

stream/non-stream ... but this is obsolete in the next roll. fixed!

> 
>> +
>> +test_expect_success PERL 'required process filter should filter smudge data and one-shot filter should clean' '
> 
> All right, so this tests the precedence... well, it doesn't.
> 
> It tests that `process` filter with "smudge" capability only works well
> with one-shot `clean` filter.

True. Isn't that what the test description indicates?


>> +	test_config_global filter.protocol.clean ./../rot13.sh &&
>> +	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl smudge" &&
> 
> Why the difference in pathnames (the directory part) between those two?

rot13.sh is generated in the header of the file.
rot13-filter.pl is part of the test suite


>> +	test_config_global filter.protocol.required true &&
>> +	rm -rf repo &&
>> +	mkdir repo &&
>> +	(
>> +		cd repo &&
>> +		git init &&
>> +
>> +		echo "*.r filter=protocol" >.gitattributes &&
>> +		git add . &&
>> +		git commit . -m "test commit" &&
>> +		git branch empty &&
>> +
>> +		cat ../test.o >test.r &&
>> +		echo "test22" >test2.r &&
>> +		mkdir testsubdir &&
>> +		echo "test333" >testsubdir/test3.r &&
>> +
>> +		rm -f rot13-filter.log &&
>> +		git add . &&
>> +		test_must_be_empty rot13-filter.log &&
>> +
>> +		>rot13-filter.log &&
>> +		git commit . -m "test commit" &&
>> +		test_must_be_empty rot13-filter.log &&
> 
> All right, these tests that `process` filter is not ran.  But we don't
> know if it is because it lacks capability, or because it is overriden
> by one-shot filter (well, that comes later).

Only the clean one shot filter is configured. Therefore that shouldn't be a
problem, right?


>> +
>> +		>rot13-filter.log &&
>> +		rm -f test?.r testsubdir/test3.r &&
>> +		git checkout . &&
>> +		cat rot13-filter.log | grep -v "IN: clean" >smudge-rot13-filter.log &&
>> +		cat >expected_checkout.log <<-\EOF &&
>> +			start
>> +			wrote filter header
>> +			IN: smudge test2.r 7 [OK] -- OUT: 7 [OK]
>> +			IN: smudge testsubdir/test3.r 8 [OK] -- OUT: 8 [OK]
>> +		EOF
>> +		test_cmp expected_checkout.log smudge-rot13-filter.log &&
> 
> This part is repeated many, many times.  Maybe add some helper
> shell function for this?

Good idea! Will add!


> [...]
>> +		./../rot13.sh <test.r >expected &&
>> +		git cat-file blob :test.r >actual &&
>> +		test_cmp expected actual &&
>> +
>> +		./../rot13.sh <test2.r >expected &&
>> +		git cat-file blob :test2.r >actual &&
>> +		test_cmp expected actual &&
>> +
>> +		./../rot13.sh <testsubdir/test3.r >expected &&
>> +		git cat-file blob :testsubdir/test3.r >actual &&
>> +		test_cmp expected actual
> 
> Here we test that equivalent one-shot cleanup filter was run.
> Here also we have repeated contents; maybe some helper function
> would make it shorter?

Agreed!


>> +	)
>> +'
> 
> Here I am stopping examining tests in detail.
> 
>> +test_expect_success PERL 'required process filter should clean only' '
>> +test_expect_success PERL 'required process filter should process files larger LARGE_PACKET_MAX' '
> 
> Those two tests do not depend on being required or not; it is only
> that without required they would fail softly in case of latter test
> (which we can detect too).

True, but since they fail hard it is easier to check.


>> +test_expect_success PERL 'required process filter should with clean error should fail' '
>> +test_expect_success PERL 'process filter should restart after unexpected write failure' '
> 
> So these two are sort of complimentary.  When `process` is required,
> then it should fail if it cannot filter some file.  If it is not,
> it should keep processing other files.

True.


>> +test_expect_success PERL 'process filter should not restart after intentionally rejected file' '
> 
> Uh... all right, so "reject" means that filter cannot continue?
> Strange meaning for 'reject', though ;-)

No, with reject a filter can say "I don't want to process that file". This is a legitimate
response and I don't Git to restart the filter in that case.


>> test_done
>> diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
>> new file mode 100755
>> index 0000000..cb0925d
>> --- /dev/null
>> +++ b/t/t0021/rot13-filter.pl
>> @@ -0,0 +1,177 @@
>> +#!/usr/bin/perl
>> +#
>> +# Example implementation for the Git filter protocol version 2
>> +# See Documentation/gitattributes.txt, section "Filter Protocol"
>> +#
>> +# The script takes the list of supported protocol capabilities as
>> +# arguments ("stream", "clean", and "smudge" are supported).
> 
> What about "shutdown"?

Will fix.


>> +#
>> +# This implementation supports three special test cases:
>> +# (1) If data with the filename "clean-write-fail.r" is processed with
>> +#     a "clean" operation then the write operation will die.
>> +# (2) If data with the filename "smudge-write-fail.r" is processed with
>> +#     a "smudge" operation then the write operation will die.
> 
> All right, so it is hard failure with filter script dying.

Correct.

> 
>> +# (3) If data with the filename "failure.r" is processed with any
>> +#     operation then the filter signals that the operation was not
>> +#     successful.
> 
> All right, so it is failure detected by filter script and signalled to Git.
> 
>> +#
>> +
>> +use strict;
>> +use warnings;
> 
> So no more "use autodie", because of compatibility with old Perls.
> 
>> +
>> +my $MAX_PACKET_CONTENT_SIZE = 65516;
>> +my @capabilities            = @ARGV;
> 
> No autoflush this time?

Eric recommended to disable it:
http://public-inbox.org/git/20160723072721.GA20875%40starla/


>> +
>> +sub rot13 {
>> +    my ($str) = @_;
>> +    $str =~ y/A-Za-z/N-ZA-Mn-za-m/;
>> +    return $str;
>> +}
>> +
>> +sub packet_read {
>> +    my $buffer;
>> +    my $bytes_read = read STDIN, $buffer, 4;
>> +    if ( $bytes_read == 0 ) {
>> +        return;
>> +    }
>> +    elsif ( $bytes_read != 4 ) {
>> +        die "invalid packet size '$bytes_read' field";
>> +    }
>> +    my $pkt_size = hex($buffer);
>> +    if ( $pkt_size == 0 ) {
>> +        return ( 1, "" );
> 
> Unusual return convention.  Though it is a test script, so
> it doesn't matter much.
> 
>> +    }
>> +    elsif ( $pkt_size > 4 ) {
>> +        my $content_size = $pkt_size - 4;
>> +        $bytes_read = read STDIN, $buffer, $content_size;
>> +        if ( $bytes_read != $content_size ) {
>> +            die "invalid packet";
> 
> More detailed error message, maybe?

OK


>> +        }
>> +        return ( 0, $buffer );
>> +    }
>> +    else {
>> +        die "invalid packet size";
>> +    }
>> +}
>> +
>> +sub packet_write {
>> +    my ($packet) = @_;
>> +    print STDOUT sprintf( "%04x", length($packet) + 4 );
>> +    print STDOUT $packet;
>> +    STDOUT->flush();
>> +}
>> +
>> +sub packet_flush {
>> +    print STDOUT sprintf( "%04x", 0 );
>> +    STDOUT->flush();
>> +}
>> +
>> +open my $debug, ">>", "rot13-filter.log";
>> +print $debug "start\n";
>> +$debug->flush();
>> +
>> +packet_write("git-filter-protocol\n");
>> +packet_write("version 2\n");
>> +packet_write( "capabilities " . join( ' ', @capabilities ) . "\n" );
>> +print $debug "wrote filter header\n";
>> +$debug->flush();
>> +
>> +while (1) {
>> +    my $command = packet_read();
>> +    unless ( defined($command) ) {
>> +        exit();
>> +    }
>> +    chomp $command;
>> +    print $debug "IN: $command";
>> +    $debug->flush();
>> +
>> +    if ( $command eq "shutdown" ) {
>> +        print $debug " -- [OK]";
>> +        $debug->flush();
>> +        packet_write("done\n");
>> +        exit();
>> +    }
>> +
>> +    my ($filename) = packet_read() =~ /filename=([^=]+)\n/;
>> +    print $debug " $filename";
>> +    $debug->flush();
>> +    my ($filelen) = packet_read() =~ /size=([^=]+)\n/;
>> +    chomp $filelen;
> 
> I think this chomp is not needed, as "\n" is not included.
> Though the regexp should probably be anchored.

Agreed.


>> +    print $debug " $filelen";
>> +    $debug->flush();
>> +
>> +    $filelen =~ /\A\d+\z/ or die "bad filelen: $filelen";
>> +    my $output;
>> +
>> +    if ( $filelen > 0 ) {
> 
> So here is a special case for $filelen = 0.
> Negative $filelen is not allowed, via regexp.

Obsolete in v4.


>> +        my $input = "";
>> +        {
>> +            binmode(STDIN);
>> +            my $buffer;
>> +            my $done = 0;
>> +            while ( !$done ) {
>> +                ( $done, $buffer ) = packet_read();
>> +                $input .= $buffer;
>> +            }
>> +            print $debug " [OK] -- ";
>> +            $debug->flush();
>> +        }
>> +
>> +        if ( $command eq "clean" and grep( /^clean$/, @capabilities ) ) {
>> +            $output = rot13($input);
>> +        }
>> +        elsif ( $command eq "smudge" and grep( /^smudge$/, @capabilities ) ) {
>> +            $output = rot13($input);
>> +        }
> 
> These two conditionals could be shortened, but then they would be less
> readable.  Or not:
> 
>           if ( grep { $_ eq $command } @capabilities ) {
>           	$output = rot13($input);
>           }

I would like to keep it that way for readability since
the test script also serves as example implementation.


>> +        else {
>> +            die "bad command $command";
>> +        }
>> +    }
>> +
>> +    my $output_len = length($output);
>> +    if ( $filename eq "reject.r" ) {
>> +        $output_len = 0;
>> +    }
>> +
>> +    if ( grep( /^stream$/, @capabilities ) ) {
>> +        print $debug "OUT: STREAM ";
>> +    }
>> +    else {
>> +        packet_write("size=$output_len\n");
>> +        print $debug "OUT: $output_len ";
>> +    }
>> +    $debug->flush();
>> +
>> +    if ( $filename eq "reject.r" ) {
>> +        packet_write("reject\n");
>> +        print $debug "[REJECT]\n";    # Could also be an error
> 
> How if could be an error?

Removed.


> 
>> +        $debug->flush();
>> +    }
>> +
>> +    if ( $output_len > 0 ) {
>> +        if (( $command eq "clean" and $filename eq "clean-write-fail.r" )
>> +            or
>> +            ( $command eq "smudge" and $filename eq "smudge-write-fail.r" ))
> 
> Perhaps simply:
> 
>  +        if ( $filename eq "${command}-write-fail.r" ) {

Nice! Will fix!


>> +        {
>> +            print $debug "[WRITE FAIL]\n";
>> +            $debug->flush();
>> +            die "write error";
>> +        }
>> +        else {
>> +            while ( length($output) > 0 ) {
>> +                my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE );
>> +                packet_write($packet);
>> +                if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) {
>> +                    $output = substr( $output, $MAX_PACKET_CONTENT_SIZE );
>> +                }
>> +                else {
>> +                    $output = "";
>> +                }
>> +            }
>> +            packet_flush();
>> +            packet_write("success\n");
>> +            print $debug "[OK]\n";
>> +            $debug->flush();
>> +        }
>> +    }
>> +}
>> 
> 


Thank you very much (again!) for your extensive review,
Lars

next prev parent reply	other threads:[~2016-08-03 13:11 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20160727000605.49982-1-larsxschneider%40gmail.com/>
2016-07-29 23:37 ` [PATCH v3 00/10] Git filter protocol larsxschneider
2016-07-29 23:37   ` [PATCH v3 01/10] pkt-line: extract set_packet_header() larsxschneider
2016-07-30 10:30     ` Jakub Narębski
2016-08-01 11:33       ` Lars Schneider
2016-08-03 20:05         ` Jakub Narębski
2016-08-05 11:52           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 02/10] pkt-line: add direct_packet_write() and direct_packet_write_data() larsxschneider
2016-07-30 10:49     ` Jakub Narębski
2016-08-01 12:00       ` Lars Schneider
2016-08-03 20:12         ` Jakub Narębski
2016-08-05 12:02           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 03/10] pkt-line: add packet_flush_gentle() larsxschneider
2016-07-30 12:04     ` Jakub Narębski
2016-08-01 12:28       ` Lars Schneider
2016-07-31 20:36     ` Torstem Bögershausen
2016-07-31 21:45       ` Lars Schneider
2016-08-02 19:56         ` Torsten Bögershausen
2016-08-05  9:59           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 04/10] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
2016-07-30 12:29     ` Jakub Narębski
2016-08-01 12:18       ` Lars Schneider
2016-08-03 20:15         ` Jakub Narębski
2016-07-29 23:37   ` [PATCH v3 05/10] pack-protocol: fix maximum pkt-line size larsxschneider
2016-07-30 13:58     ` Jakub Narębski
2016-08-01 12:23       ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 06/10] run-command: add clean_on_exit_handler larsxschneider
2016-07-30  9:50     ` Johannes Sixt
2016-08-01 11:14       ` Lars Schneider
2016-08-02  5:53         ` Johannes Sixt
2016-08-02  7:41           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 07/10] convert: quote filter names in error messages larsxschneider
2016-07-29 23:37   ` [PATCH v3 08/10] convert: modernize tests larsxschneider
2016-07-29 23:38   ` [PATCH v3 09/10] convert: generate large test files only once larsxschneider
2016-07-29 23:38   ` [PATCH v3 10/10] convert: add filter.<driver>.process option larsxschneider
2016-07-30 22:05     ` Jakub Narębski
2016-07-31  9:42       ` Jakub Narębski
2016-07-31 19:49         ` Lars Schneider
2016-07-31 22:59           ` Jakub Narębski
2016-08-01 13:32       ` Lars Schneider
2016-08-03 18:30         ` Designing the filter process protocol (was: Re: [PATCH v3 10/10] convert: add filter.<driver>.process option) Jakub Narębski
2016-08-05 10:32           ` Lars Schneider
2016-08-06 18:24           ` Lars Schneider
2016-08-03 22:47         ` [PATCH v3 10/10] convert: add filter.<driver>.process option Jakub Narębski
2016-07-31 22:19     ` Jakub Narębski
2016-08-01 17:55       ` Lars Schneider
2016-08-04  0:42         ` Jakub Narębski
2016-08-03 13:10       ` Lars Schneider [this message]
2016-08-04 10:18         ` Jakub Narębski
2016-08-05 13:20           ` Lars Schneider
2016-08-03 16:42   ` [PATCH v4 00/12] Git filter protocol larsxschneider
2016-08-03 16:42     ` [PATCH v4 01/12] pkt-line: extract set_packet_header() larsxschneider
2016-08-03 20:18       ` Junio C Hamano
2016-08-03 21:12         ` Jeff King
2016-08-03 21:27           ` Jeff King
2016-08-04 16:14           ` Junio C Hamano
2016-08-05 14:55             ` Lars Schneider
2016-08-05 16:31               ` Junio C Hamano
2016-08-05 17:31             ` Lars Schneider
2016-08-05 17:41               ` Junio C Hamano
2016-08-03 21:56         ` Lars Schneider
2016-08-03 16:42     ` [PATCH v4 02/12] pkt-line: add direct_packet_write() and direct_packet_write_data() larsxschneider
2016-08-03 16:42     ` [PATCH v4 03/12] pkt-line: add packet_flush_gentle() larsxschneider
2016-08-03 21:39       ` Jeff King
2016-08-03 22:56         ` [PATCH 0/7] minor trace fixes and cosmetic improvements Jeff King
2016-08-03 22:56           ` [PATCH 1/7] trace: handle NULL argument in trace_disable() Jeff King
2016-08-03 22:58           ` [PATCH 2/7] trace: stop using write_or_whine_pipe() Jeff King
2016-08-03 22:58           ` [PATCH 3/7] trace: use warning() for printing trace errors Jeff King
2016-08-04 20:41             ` Junio C Hamano
2016-08-04 21:21               ` Jeff King
2016-08-04 21:28                 ` Junio C Hamano
2016-08-05  7:56                   ` Jeff King
2016-08-05  7:59                   ` Christian Couder
2016-08-05 18:41                     ` Junio C Hamano
2016-08-03 23:00           ` [PATCH 4/7] trace: cosmetic fixes for error messages Jeff King
2016-08-04 20:42             ` Junio C Hamano
2016-08-05  8:00               ` Jeff King
2016-08-03 23:00           ` [PATCH 5/7] trace: correct variable name in write() error message Jeff King
2016-08-03 23:01           ` [PATCH 6/7] trace: disable key after write error Jeff King
2016-08-04 20:45             ` Junio C Hamano
2016-08-04 21:22               ` Jeff King
2016-08-05  7:58                 ` Jeff King
2016-08-03 23:01           ` [PATCH 7/7] write_or_die: drop write_or_whine_pipe() Jeff King
2016-08-03 23:04           ` [PATCH 0/7] minor trace fixes and cosmetic improvements Jeff King
2016-08-04 16:16         ` [PATCH v4 03/12] pkt-line: add packet_flush_gentle() Junio C Hamano
2016-08-03 16:42     ` [PATCH v4 04/12] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
2016-08-03 16:42     ` [PATCH v4 05/12] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
2016-08-03 16:42     ` [PATCH v4 06/12] pack-protocol: fix maximum pkt-line size larsxschneider
2016-08-03 16:42     ` [PATCH v4 07/12] run-command: add clean_on_exit_handler larsxschneider
2016-08-03 21:24       ` Jeff King
2016-08-03 22:15         ` Lars Schneider
2016-08-03 22:53           ` Jeff King
2016-08-03 23:09             ` Lars Schneider
2016-08-03 23:15               ` Jeff King
2016-08-05 13:08                 ` Lars Schneider
2016-08-05 21:19                   ` Torsten Bögershausen
2016-08-05 21:50                     ` Lars Schneider
2016-08-03 16:42     ` [PATCH v4 08/12] convert: quote filter names in error messages larsxschneider
2016-08-03 16:42     ` [PATCH v4 09/12] convert: modernize tests larsxschneider
2016-08-03 16:42     ` [PATCH v4 10/12] convert: generate large test files only once larsxschneider
2016-08-03 16:42     ` [PATCH v4 11/12] convert: add filter.<driver>.process option larsxschneider
2016-08-03 17:45       ` Junio C Hamano
2016-08-03 21:48         ` Lars Schneider
2016-08-03 22:46           ` Jeff King
2016-08-05 12:53             ` Lars Schneider
2016-08-03 20:29       ` Junio C Hamano
2016-08-03 21:37         ` Lars Schneider
2016-08-03 21:43           ` Junio C Hamano
2016-08-03 22:01             ` Lars Schneider
2016-08-05 21:34       ` Torsten Bögershausen
2016-08-05 21:49         ` Lars Schneider
2016-08-05 22:06         ` Junio C Hamano
2016-08-05 22:27           ` Jeff King
2016-08-06 11:55             ` Lars Schneider
2016-08-06 12:14               ` Jeff King
2016-08-06 18:19                 ` Lars Schneider
2016-08-08 15:02                   ` Jeff King
2016-08-08 16:21                     ` Lars Schneider
2016-08-08 16:26                       ` Jeff King
2016-08-06 20:40           ` Torsten Bögershausen
2016-08-03 16:42     ` [PATCH v4 12/12] convert: add filter.<driver>.process shutdown command option larsxschneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9DDA993E-2AFD-4C69-8E22-58601EEC8A40@gmail.com \
    --to=larsxschneider@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=mlbright@gmail.com \
    --cc=peff@peff.net \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).