[PATCH v5 0/5] convert: add "status=delayed" to filter process protocol

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH v5 0/5] convert: add "status=delayed" to filter process protocol
@ 2017-06-01  8:21 Lars Schneider
  2017-06-01  8:21 ` [PATCH v5 1/5] t0021: keep filter log files on comparison Lars Schneider
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Lars Schneider @ 2017-06-01  8:21 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben

Hi,

here is the 5th iteration of my "status delayed" topic. Patch 1 to 3 are
minor t0021 test adjustments and haven't been changed since v3. Patch 4
is a minor "extract method" refactoring in convert. Patch 5 is the new
feature.

Changes since v4:

* rebased the topic on top of master to resolve conflicts and be able to
  share code with on Ben Peart's bp/sub-process-convert-filter topic
  http://public-inbox.org/git/xmqq1srgm9kq.fsf@gitster.mtv.corp.google.com/

* removed invalid errno check
  http://public-inbox.org/git/64b1fda4-9f79-1bd8-ad6d-43196b808d61@web.de/

* minor documentation improvement
  http://public-inbox.org/git/E2C871B0-DCDF-40C5-A559-C396F3C8AA66@gmail.com/

If you review this series then please read the "Delay" section in
"Documentation/gitattributes.txt" first for an overview of the delay mechanism.
The changes in "t/t0021/rot13-filter.pl" are easier to review if you ignore
whitespace changes.

I also noticed that the diff around the "static int apply_multi_file_filter"
looks really weird as my new function "int async_query_available_blobs" is
right behind it with similar lines.

Thanks,
Lars


RFC: http://public-inbox.org/git/D10F7C47-14E8-465B-8B7A-A09A1B28A39F@gmail.com/
v1: http://public-inbox.org/git/20170108191736.47359-1-larsxschneider@gmail.com/
v2: http://public-inbox.org/git/20170226184816.30010-1-larsxschneider@gmail.com/
v3: http://public-inbox.org/git/20170409191107.20547-1-larsxschneider@gmail.com/
v4: http://public-inbox.org/git/20170522135001.54506-1-larsxschneider@gmail.com/



Base Ref: master
Web-Diff: https://github.com/larsxschneider/git/commit/23001c0fe2
Checkout: git fetch https://github.com/larsxschneider/git filter-process/delay-v5 && git checkout 23001c0fe2

Lars Schneider (5):
  t0021: keep filter log files on comparison
  t0021: make debug log file name configurable
  t0021: write "OUT" only on success
  convert: move multiple file filter error handling to separate function
  convert: add "status=delayed" to filter process protocol

 Documentation/gitattributes.txt |  65 ++++++++++++-
 builtin/checkout.c              |   3 +
 cache.h                         |   6 +-
 convert.c                       | 162 ++++++++++++++++++++++++--------
 convert.h                       |  21 +++++
 entry.c                         | 110 +++++++++++++++++++++-
 t/t0021-conversion.sh           | 136 ++++++++++++++++++++-------
 t/t0021/rot13-filter.pl         | 199 ++++++++++++++++++++++++++--------------
 unpack-trees.c                  |   2 +
 9 files changed, 559 insertions(+), 145 deletions(-)


base-commit: 0339965c70d68fd3831c9a5306443c869de3f6a8
--
2.13.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v5 1/5] t0021: keep filter log files on comparison
  2017-06-01  8:21 [PATCH v5 0/5] convert: add "status=delayed" to filter process protocol Lars Schneider
@ 2017-06-01  8:21 ` Lars Schneider
  2017-06-01  8:22 ` [PATCH v5 2/5] t0021: make debug log file name configurable Lars Schneider
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Lars Schneider @ 2017-06-01  8:21 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben

The filter log files are modified on comparison. Write the modified log files
to temp files for comparison to fix this.

This is useful for the subsequent patch 'convert: add "status=delayed" to
filter process protocol'.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 t/t0021-conversion.sh | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 161f560446..ff2424225b 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -42,10 +42,10 @@ test_cmp_count () {
 	for FILE in "$expect" "$actual"
 	do
 		sort "$FILE" | uniq -c |
-		sed -e "s/^ *[0-9][0-9]*[ 	]*IN: /x IN: /" >"$FILE.tmp" &&
-		mv "$FILE.tmp" "$FILE" || return
+		sed -e "s/^ *[0-9][0-9]*[ 	]*IN: /x IN: /" >"$FILE.tmp"
 	done &&
-	test_cmp "$expect" "$actual"
+	test_cmp "$expect.tmp" "$actual.tmp" &&
+	rm "$expect.tmp" "$actual.tmp"
 }
 
 # Compare two files but exclude all `clean` invocations because Git can
@@ -56,10 +56,10 @@ test_cmp_exclude_clean () {
 	actual=$2
 	for FILE in "$expect" "$actual"
 	do
-		grep -v "IN: clean" "$FILE" >"$FILE.tmp" &&
-		mv "$FILE.tmp" "$FILE"
+		grep -v "IN: clean" "$FILE" >"$FILE.tmp"
 	done &&
-	test_cmp "$expect" "$actual"
+	test_cmp "$expect.tmp" "$actual.tmp" &&
+	rm "$expect.tmp" "$actual.tmp"
 }
 
 # Check that the contents of two files are equal and that their rot13 version
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 2/5] t0021: make debug log file name configurable
  2017-06-01  8:21 [PATCH v5 0/5] convert: add "status=delayed" to filter process protocol Lars Schneider
  2017-06-01  8:21 ` [PATCH v5 1/5] t0021: keep filter log files on comparison Lars Schneider
@ 2017-06-01  8:22 ` Lars Schneider
  2017-06-01  8:22 ` [PATCH v5 3/5] t0021: write "OUT" only on success Lars Schneider
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Lars Schneider @ 2017-06-01  8:22 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben

The "rot13-filter.pl" helper wrote its debug logs always to "rot13-filter.log".
Make this configurable by defining the log file as first parameter of
"rot13-filter.pl".

This is useful if "rot13-filter.pl" is configured multiple times similar to the
subsequent patch 'convert: add "status=delayed" to filter process protocol'.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 t/t0021-conversion.sh   | 44 ++++++++++++++++++++++----------------------
 t/t0021/rot13-filter.pl |  8 +++++---
 2 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index ff2424225b..0139b460e7 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -28,7 +28,7 @@ file_size () {
 }
 
 filter_git () {
-	rm -f rot13-filter.log &&
+	rm -f *.log &&
 	git "$@"
 }
 
@@ -342,7 +342,7 @@ test_expect_success 'diff does not reuse worktree files that need cleaning' '
 '
 
 test_expect_success PERL 'required process filter should filter data' '
-	test_config_global filter.protocol.process "rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" &&
 	test_config_global filter.protocol.required true &&
 	rm -rf repo &&
 	mkdir repo &&
@@ -375,7 +375,7 @@ test_expect_success PERL 'required process filter should filter data' '
 			IN: clean testsubdir/test3 '\''sq'\'',\$x=.r $S3 [OK] -- OUT: $S3 . [OK]
 			STOP
 		EOF
-		test_cmp_count expected.log rot13-filter.log &&
+		test_cmp_count expected.log debug.log &&
 
 		git commit -m "test commit 2" &&
 		rm -f test2.r "testsubdir/test3 '\''sq'\'',\$x=.r" &&
@@ -388,7 +388,7 @@ test_expect_success PERL 'required process filter should filter data' '
 			IN: smudge testsubdir/test3 '\''sq'\'',\$x=.r $S3 [OK] -- OUT: $S3 . [OK]
 			STOP
 		EOF
-		test_cmp_exclude_clean expected.log rot13-filter.log &&
+		test_cmp_exclude_clean expected.log debug.log &&
 
 		filter_git checkout --quiet --no-progress empty-branch &&
 		cat >expected.log <<-EOF &&
@@ -397,7 +397,7 @@ test_expect_success PERL 'required process filter should filter data' '
 			IN: clean test.r $S [OK] -- OUT: $S . [OK]
 			STOP
 		EOF
-		test_cmp_exclude_clean expected.log rot13-filter.log &&
+		test_cmp_exclude_clean expected.log debug.log &&
 
 		filter_git checkout --quiet --no-progress master &&
 		cat >expected.log <<-EOF &&
@@ -409,7 +409,7 @@ test_expect_success PERL 'required process filter should filter data' '
 			IN: smudge testsubdir/test3 '\''sq'\'',\$x=.r $S3 [OK] -- OUT: $S3 . [OK]
 			STOP
 		EOF
-		test_cmp_exclude_clean expected.log rot13-filter.log &&
+		test_cmp_exclude_clean expected.log debug.log &&
 
 		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.r &&
 		test_cmp_committed_rot13 "$TEST_ROOT/test2.o" test2.r &&
@@ -419,7 +419,7 @@ test_expect_success PERL 'required process filter should filter data' '
 
 test_expect_success PERL 'required process filter takes precedence' '
 	test_config_global filter.protocol.clean false &&
-	test_config_global filter.protocol.process "rot13-filter.pl clean" &&
+	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean" &&
 	test_config_global filter.protocol.required true &&
 	rm -rf repo &&
 	mkdir repo &&
@@ -439,12 +439,12 @@ test_expect_success PERL 'required process filter takes precedence' '
 			IN: clean test.r $S [OK] -- OUT: $S . [OK]
 			STOP
 		EOF
-		test_cmp_count expected.log rot13-filter.log
+		test_cmp_count expected.log debug.log
 	)
 '
 
 test_expect_success PERL 'required process filter should be used only for "clean" operation only' '
-	test_config_global filter.protocol.process "rot13-filter.pl clean" &&
+	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean" &&
 	rm -rf repo &&
 	mkdir repo &&
 	(
@@ -462,7 +462,7 @@ test_expect_success PERL 'required process filter should be used only for "clean
 			IN: clean test.r $S [OK] -- OUT: $S . [OK]
 			STOP
 		EOF
-		test_cmp_count expected.log rot13-filter.log &&
+		test_cmp_count expected.log debug.log &&
 
 		rm test.r &&
 
@@ -474,12 +474,12 @@ test_expect_success PERL 'required process filter should be used only for "clean
 			init handshake complete
 			STOP
 		EOF
-		test_cmp_exclude_clean expected.log rot13-filter.log
+		test_cmp_exclude_clean expected.log debug.log
 	)
 '
 
 test_expect_success PERL 'required process filter should process multiple packets' '
-	test_config_global filter.protocol.process "rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" &&
 	test_config_global filter.protocol.required true &&
 
 	rm -rf repo &&
@@ -514,7 +514,7 @@ test_expect_success PERL 'required process filter should process multiple packet
 			IN: clean 3pkt_2+1.file $(($S*2+1)) [OK] -- OUT: $(($S*2+1)) ... [OK]
 			STOP
 		EOF
-		test_cmp_count expected.log rot13-filter.log &&
+		test_cmp_count expected.log debug.log &&
 
 		rm -f *.file &&
 
@@ -529,7 +529,7 @@ test_expect_success PERL 'required process filter should process multiple packet
 			IN: smudge 3pkt_2+1.file $(($S*2+1)) [OK] -- OUT: $(($S*2+1)) ... [OK]
 			STOP
 		EOF
-		test_cmp_exclude_clean expected.log rot13-filter.log &&
+		test_cmp_exclude_clean expected.log debug.log &&
 
 		for FILE in *.file
 		do
@@ -539,7 +539,7 @@ test_expect_success PERL 'required process filter should process multiple packet
 '
 
 test_expect_success PERL 'required process filter with clean error should fail' '
-	test_config_global filter.protocol.process "rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" &&
 	test_config_global filter.protocol.required true &&
 	rm -rf repo &&
 	mkdir repo &&
@@ -558,7 +558,7 @@ test_expect_success PERL 'required process filter with clean error should fail'
 '
 
 test_expect_success PERL 'process filter should restart after unexpected write failure' '
-	test_config_global filter.protocol.process "rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" &&
 	rm -rf repo &&
 	mkdir repo &&
 	(
@@ -579,7 +579,7 @@ test_expect_success PERL 'process filter should restart after unexpected write f
 		git add . &&
 		rm -f *.r &&
 
-		rm -f rot13-filter.log &&
+		rm -f debug.log &&
 		git checkout --quiet --no-progress . 2>git-stderr.log &&
 
 		grep "smudge write error at" git-stderr.log &&
@@ -595,7 +595,7 @@ test_expect_success PERL 'process filter should restart after unexpected write f
 			IN: smudge test2.r $S2 [OK] -- OUT: $S2 . [OK]
 			STOP
 		EOF
-		test_cmp_exclude_clean expected.log rot13-filter.log &&
+		test_cmp_exclude_clean expected.log debug.log &&
 
 		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.r &&
 		test_cmp_committed_rot13 "$TEST_ROOT/test2.o" test2.r &&
@@ -609,7 +609,7 @@ test_expect_success PERL 'process filter should restart after unexpected write f
 '
 
 test_expect_success PERL 'process filter should not be restarted if it signals an error' '
-	test_config_global filter.protocol.process "rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" &&
 	rm -rf repo &&
 	mkdir repo &&
 	(
@@ -639,7 +639,7 @@ test_expect_success PERL 'process filter should not be restarted if it signals a
 			IN: smudge test2.r $S2 [OK] -- OUT: $S2 . [OK]
 			STOP
 		EOF
-		test_cmp_exclude_clean expected.log rot13-filter.log &&
+		test_cmp_exclude_clean expected.log debug.log &&
 
 		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.r &&
 		test_cmp_committed_rot13 "$TEST_ROOT/test2.o" test2.r &&
@@ -648,7 +648,7 @@ test_expect_success PERL 'process filter should not be restarted if it signals a
 '
 
 test_expect_success PERL 'process filter abort stops processing of all further files' '
-	test_config_global filter.protocol.process "rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" &&
 	rm -rf repo &&
 	mkdir repo &&
 	(
@@ -676,7 +676,7 @@ test_expect_success PERL 'process filter abort stops processing of all further f
 			IN: smudge abort.r $SA [OK] -- OUT: 0 [ABORT]
 			STOP
 		EOF
-		test_cmp_exclude_clean expected.log rot13-filter.log &&
+		test_cmp_exclude_clean expected.log debug.log &&
 
 		test_cmp "$TEST_ROOT/test.o" test.r &&
 		test_cmp "$TEST_ROOT/test2.o" test2.r &&
diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index 617f581e56..0b943bb377 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -2,8 +2,9 @@
 # Example implementation for the Git filter protocol version 2
 # See Documentation/gitattributes.txt, section "Filter Protocol"
 #
-# The script takes the list of supported protocol capabilities as
-# arguments ("clean", "smudge", etc).
+# The first argument defines a debug log file that the script write to.
+# All remaining arguments define a list of supported protocol
+# capabilities ("clean", "smudge", etc).
 #
 # This implementation supports special test cases:
 # (1) If data with the pathname "clean-write-fail.r" is processed with
@@ -24,9 +25,10 @@ use warnings;
 use IO::File;
 
 my $MAX_PACKET_CONTENT_SIZE = 65516;
+my $log_file                = shift @ARGV;
 my @capabilities            = @ARGV;
 
-open my $debug, ">>", "rot13-filter.log" or die "cannot open log file: $!";
+open my $debug, ">>", $log_file or die "cannot open log file: $!";
 
 sub rot13 {
 	my $str = shift;
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 3/5] t0021: write "OUT" only on success
  2017-06-01  8:21 [PATCH v5 0/5] convert: add "status=delayed" to filter process protocol Lars Schneider
  2017-06-01  8:21 ` [PATCH v5 1/5] t0021: keep filter log files on comparison Lars Schneider
  2017-06-01  8:22 ` [PATCH v5 2/5] t0021: make debug log file name configurable Lars Schneider
@ 2017-06-01  8:22 ` Lars Schneider
  2017-06-01  8:22 ` [PATCH v5 4/5] convert: move multiple file filter error handling to separate function Lars Schneider
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Lars Schneider @ 2017-06-01  8:22 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben

"rot13-filter.pl" used to write "OUT <size>" to the debug log even in case of
an abort or error. Fix this by writing "OUT <size>" to the debug log only in
the successful case if output is actually written.

This is useful for the subsequent patch 'convert: add "status=delayed" to
filter process protocol'.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 t/t0021-conversion.sh   | 6 +++---
 t/t0021/rot13-filter.pl | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 0139b460e7..0c04d346a1 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -588,7 +588,7 @@ test_expect_success PERL 'process filter should restart after unexpected write f
 		cat >expected.log <<-EOF &&
 			START
 			init handshake complete
-			IN: smudge smudge-write-fail.r $SF [OK] -- OUT: $SF [WRITE FAIL]
+			IN: smudge smudge-write-fail.r $SF [OK] -- [WRITE FAIL]
 			START
 			init handshake complete
 			IN: smudge test.r $S [OK] -- OUT: $S . [OK]
@@ -634,7 +634,7 @@ test_expect_success PERL 'process filter should not be restarted if it signals a
 		cat >expected.log <<-EOF &&
 			START
 			init handshake complete
-			IN: smudge error.r $SE [OK] -- OUT: 0 [ERROR]
+			IN: smudge error.r $SE [OK] -- [ERROR]
 			IN: smudge test.r $S [OK] -- OUT: $S . [OK]
 			IN: smudge test2.r $S2 [OK] -- OUT: $S2 . [OK]
 			STOP
@@ -673,7 +673,7 @@ test_expect_success PERL 'process filter abort stops processing of all further f
 		cat >expected.log <<-EOF &&
 			START
 			init handshake complete
-			IN: smudge abort.r $SA [OK] -- OUT: 0 [ABORT]
+			IN: smudge abort.r $SA [OK] -- [ABORT]
 			STOP
 		EOF
 		test_cmp_exclude_clean expected.log debug.log &&
diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index 0b943bb377..5e43faeec1 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -153,9 +153,6 @@ while (1) {
 		die "bad command '$command'";
 	}
 
-	print $debug "OUT: " . length($output) . " ";
-	$debug->flush();
-
 	if ( $pathname eq "error.r" ) {
 		print $debug "[ERROR]\n";
 		$debug->flush();
@@ -178,6 +175,9 @@ while (1) {
 			die "${command} write error";
 		}
 
+		print $debug "OUT: " . length($output) . " ";
+		$debug->flush();
+
 		while ( length($output) > 0 ) {
 			my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE );
 			packet_bin_write($packet);
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 4/5] convert: move multiple file filter error handling to separate function
  2017-06-01  8:21 [PATCH v5 0/5] convert: add "status=delayed" to filter process protocol Lars Schneider
                   ` (2 preceding siblings ...)
  2017-06-01  8:22 ` [PATCH v5 3/5] t0021: write "OUT" only on success Lars Schneider
@ 2017-06-01  8:22 ` Lars Schneider
  2017-06-18  7:20   ` Torsten Bögershausen
  2017-06-01  8:22 ` [PATCH v5 5/5] convert: add "status=delayed" to filter process protocol Lars Schneider
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Lars Schneider @ 2017-06-01  8:22 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben

This is useful for the subsequent patch 'convert: add "status=delayed" to
filter process protocol'.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 convert.c | 47 ++++++++++++++++++++++++++---------------------
 1 file changed, 26 insertions(+), 21 deletions(-)

diff --git a/convert.c b/convert.c
index f1e168bc30..a5e09bb0e8 100644
--- a/convert.c
+++ b/convert.c
@@ -565,6 +565,29 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
 	return err;
 }
 
+static void handle_filter_error(const struct strbuf *filter_status,
+				struct cmd2process *entry,
+				const unsigned int wanted_capability) {
+	if (!strcmp(filter_status->buf, "error")) {
+		/* The filter signaled a problem with the file. */
+	} else if (!strcmp(filter_status->buf, "abort") && wanted_capability) {
+		/*
+		 * The filter signaled a permanent problem. Don't try to filter
+		 * files with the same command for the lifetime of the current
+		 * Git process.
+		 */
+		 entry->supported_capabilities &= ~wanted_capability;
+	} else {
+		/*
+		 * Something went wrong with the protocol filter.
+		 * Force shutdown and restart if another blob requires filtering.
+		 */
+		error("external filter '%s' failed", entry->subprocess.cmd);
+		subprocess_stop(&subprocess_map, &entry->subprocess);
+		free(entry);
+	}
+}
+
 static int apply_multi_file_filter(const char *path, const char *src, size_t len,
 				   int fd, struct strbuf *dst, const char *cmd,
 				   const unsigned int wanted_capability)
@@ -656,28 +679,10 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 done:
 	sigchain_pop(SIGPIPE);
 
-	if (err) {
-		if (!strcmp(filter_status.buf, "error")) {
-			/* The filter signaled a problem with the file. */
-		} else if (!strcmp(filter_status.buf, "abort")) {
-			/*
-			 * The filter signaled a permanent problem. Don't try to filter
-			 * files with the same command for the lifetime of the current
-			 * Git process.
-			 */
-			 entry->supported_capabilities &= ~wanted_capability;
-		} else {
-			/*
-			 * Something went wrong with the protocol filter.
-			 * Force shutdown and restart if another blob requires filtering.
-			 */
-			error("external filter '%s' failed", cmd);
-			subprocess_stop(&subprocess_map, &entry->subprocess);
-			free(entry);
-		}
-	} else {
+	if (err)
+		handle_filter_error(&filter_status, entry, wanted_capability);
+	else
 		strbuf_swap(dst, &nbuf);
-	}
 	strbuf_release(&nbuf);
 	return !err;
 }
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 5/5] convert: add "status=delayed" to filter process protocol
  2017-06-01  8:21 [PATCH v5 0/5] convert: add "status=delayed" to filter process protocol Lars Schneider
                   ` (3 preceding siblings ...)
  2017-06-01  8:22 ` [PATCH v5 4/5] convert: move multiple file filter error handling to separate function Lars Schneider
@ 2017-06-01  8:22 ` Lars Schneider
  2017-06-02  2:21   ` Junio C Hamano
  2017-06-24 14:19   ` Jeff King
  2017-06-01  9:44 ` [PATCH v5 0/5] " Junio C Hamano
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 20+ messages in thread
From: Lars Schneider @ 2017-06-01  8:22 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben

Some `clean` / `smudge` filters might require a significant amount of
time to process a single blob (e.g. the Git LFS smudge filter might
perform network requests). During this process the Git checkout
operation is blocked and Git needs to wait until the filter is done to
continue with the checkout.

Teach the filter process protocol (introduced in edcc858) to accept the
status "delayed" as response to a filter request. Upon this response Git
continues with the checkout operation. After the checkout operation Git
calls "finish_delayed_checkout" which queries the filter for remaining
blobs. If the filter is still working on the completion, then the filter
is expected to block. If the filter has completed all remaining blobs
then an empty response is expected.

Git has a multiple code paths that checkout a blob. Support delayed
checkouts only in `clone` (in unpack-trees.c) and `checkout` operations.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 Documentation/gitattributes.txt |  65 +++++++++++++-
 builtin/checkout.c              |   3 +
 cache.h                         |   6 +-
 convert.c                       | 119 +++++++++++++++++++++----
 convert.h                       |  21 +++++
 entry.c                         | 110 +++++++++++++++++++++--
 t/t0021-conversion.sh           |  74 ++++++++++++++++
 t/t0021/rot13-filter.pl         | 189 ++++++++++++++++++++++++++--------------
 unpack-trees.c                  |   2 +
 9 files changed, 498 insertions(+), 91 deletions(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 4736483865..5489e8b723 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -425,8 +425,8 @@ packet:          git< capability=clean
 packet:          git< capability=smudge
 packet:          git< 0000
 ------------------------
-Supported filter capabilities in version 2 are "clean" and
-"smudge".
+Supported filter capabilities in version 2 are "clean", "smudge",
+and "delay".
 
 Afterwards Git sends a list of "key=value" pairs terminated with
 a flush packet. The list will contain at least the filter command
@@ -512,12 +512,69 @@ the protocol then Git will stop the filter process and restart it
 with the next file that needs to be processed. Depending on the
 `filter.<driver>.required` flag Git will interpret that as error.
 
-After the filter has processed a blob it is expected to wait for
-the next "key=value" list containing a command. Git will close
+After the filter has processed a command it is expected to wait for
+a "key=value" list containing the next command. Git will close
 the command pipe on exit. The filter is expected to detect EOF
 and exit gracefully on its own. Git will wait until the filter
 process has stopped.
 
+Delay
+^^^^^
+
+If the filter supports the "delay" capability, then Git can send the
+flag "can-delay" after the filter command and pathname. This flag
+denotes that the filter can delay filtering the current blob (e.g. to
+compensate network latencies) by responding with no content but with
+the status "delayed" and a flush packet.
+------------------------
+packet:          git> command=smudge
+packet:          git> pathname=path/testfile.dat
+packet:          git> can-delay=1
+packet:          git> 0000
+packet:          git> CONTENT
+packet:          git> 0000
+packet:          git< status=delayed
+packet:          git< 0000
+------------------------
+
+If the filter supports the "delay" capability then it must support the
+"list_available_blobs" command. If Git sends this command, then the
+filter is expected to return a list of pathnames of blobs that are
+available. The list must be terminated with a flush packet followed
+by a "success" status that is also terminated with a flush packet. If
+no blobs for the delayed paths are available, yet, then the filter is
+expected to block the response until at least one blob becomes
+available. The filter can tell Git that it has no more delayed blobs
+by sending an empty list.
+------------------------
+packet:          git> command=list_available_blobs
+packet:          git> 0000
+packet:          git< pathname=path/testfile.dat
+packet:          git< pathname=path/otherfile.dat
+packet:          git< 0000
+packet:          git< status=success
+packet:          git< 0000
+------------------------
+
+After Git received the pathnames, it will request the corresponding
+blobs again. These requests contain a pathname and an empty content
+section. The filter is expected to respond with the smudged content
+in the usual way as explained above.
+------------------------
+packet:          git> command=smudge
+packet:          git> pathname=path/testfile.dat
+packet:          git> 0000
+packet:          git> 0000  # empty content!
+packet:          git< status=success
+packet:          git< 0000
+packet:          git< SMUDGED_CONTENT
+packet:          git< 0000
+packet:          git< 0000  # empty list, keep "status=success" unchanged!
+------------------------
+
+Example
+^^^^^^^
+
 A long running filter demo implementation can be found in
 `contrib/long-running-filter/example.pl` located in the Git
 core repository. If you develop your own long running filter
diff --git a/builtin/checkout.c b/builtin/checkout.c
index a6b2af39d3..c1a256df8d 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -376,6 +376,8 @@ static int checkout_paths(const struct checkout_opts *opts,
 	state.force = 1;
 	state.refresh_cache = 1;
 	state.istate = &the_index;
+
+	enable_delayed_checkout(&state);
 	for (pos = 0; pos < active_nr; pos++) {
 		struct cache_entry *ce = active_cache[pos];
 		if (ce->ce_flags & CE_MATCHED) {
@@ -390,6 +392,7 @@ static int checkout_paths(const struct checkout_opts *opts,
 			pos = skip_same_name(ce, pos) - 1;
 		}
 	}
+	errs |= finish_delayed_checkout(&state);
 
 	if (write_locked_index(&the_index, lock_file, COMMIT_LOCK))
 		die(_("unable to write new index file"));
diff --git a/cache.h b/cache.h
index ae4c45d379..523ead1d95 100644
--- a/cache.h
+++ b/cache.h
@@ -1543,16 +1543,20 @@ extern int ident_cmp(const struct ident_split *, const struct ident_split *);
 struct checkout {
 	struct index_state *istate;
 	const char *base_dir;
+	struct delayed_checkout *delayed_checkout;
 	int base_dir_len;
 	unsigned force:1,
 		 quiet:1,
 		 not_new:1,
 		 refresh_cache:1;
 };
-#define CHECKOUT_INIT { NULL, "" }
+#define CHECKOUT_INIT { NULL, "", NULL }
+
 
 #define TEMPORARY_FILENAME_LENGTH 25
 extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
+extern void enable_delayed_checkout(struct checkout *state);
+extern int finish_delayed_checkout(struct checkout *state);
 
 struct cache_def {
 	struct strbuf path;
diff --git a/convert.c b/convert.c
index a5e09bb0e8..c4df174378 100644
--- a/convert.c
+++ b/convert.c
@@ -496,6 +496,7 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le
 
 #define CAP_CLEAN    (1u<<0)
 #define CAP_SMUDGE   (1u<<1)
+#define CAP_DELAY    (1u<<2)
 
 struct cmd2process {
 	struct subprocess_entry subprocess; /* must be the first member! */
@@ -533,7 +534,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
 	if (err)
 		goto done;
 
-	err = packet_writel(process->in, "capability=clean", "capability=smudge", NULL);
+	err = packet_writel(process->in,
+		"capability=clean", "capability=smudge", "capability=delay", NULL);
 
 	for (;;) {
 		cap_buf = packet_read_line(process->out, NULL);
@@ -549,6 +551,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
 			entry->supported_capabilities |= CAP_CLEAN;
 		} else if (!strcmp(cap_name, "smudge")) {
 			entry->supported_capabilities |= CAP_SMUDGE;
+		} else if (!strcmp(cap_name, "delay")) {
+			entry->supported_capabilities |= CAP_DELAY;
 		} else {
 			warning(
 				"external filter '%s' requested unsupported filter capability '%s'",
@@ -590,9 +594,11 @@ static void handle_filter_error(const struct strbuf *filter_status,
 
 static int apply_multi_file_filter(const char *path, const char *src, size_t len,
 				   int fd, struct strbuf *dst, const char *cmd,
-				   const unsigned int wanted_capability)
+				   const unsigned int wanted_capability,
+				   struct delayed_checkout *dco)
 {
 	int err;
+	int can_delay = 0;
 	struct cmd2process *entry;
 	struct child_process *process;
 	struct strbuf nbuf = STRBUF_INIT;
@@ -647,6 +653,14 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	if (err)
 		goto done;
 
+	if (CAP_DELAY & entry->supported_capabilities &&
+	    dco && dco->state == CE_CAN_DELAY) {
+		can_delay = 1;
+		err = packet_write_fmt_gently(process->in, "can-delay=1\n");
+		if (err)
+			goto done;
+	}
+
 	err = packet_flush_gently(process->in);
 	if (err)
 		goto done;
@@ -662,14 +676,78 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	if (err)
 		goto done;
 
-	err = strcmp(filter_status.buf, "success");
+	if (can_delay && !strcmp(filter_status.buf, "delayed")) {
+		dco->state = CE_DELAYED;
+		string_list_insert(&dco->filters, cmd);
+		string_list_insert(&dco->paths, path);
+	} else {
+		/* The filter got the blob and wants to send us a response. */
+		err = strcmp(filter_status.buf, "success");
+		if (err)
+			goto done;
+
+		err = read_packetized_to_strbuf(process->out, &nbuf) < 0;
+		if (err)
+			goto done;
+
+		err = subprocess_read_status(process->out, &filter_status);
+		if (err)
+			goto done;
+
+		err = strcmp(filter_status.buf, "success");
+	}
+
+done:
+	sigchain_pop(SIGPIPE);
+
+	if (err)
+		handle_filter_error(&filter_status, entry, wanted_capability);
+	else
+		strbuf_swap(dst, &nbuf);
+	strbuf_release(&nbuf);
+	return !err;
+}
+
+
+int async_query_available_blobs(const char *cmd, struct string_list *delayed_paths)
+{
+	int err;
+	char *line;
+	struct cmd2process *entry;
+	struct child_process *process;
+	struct strbuf filter_status = STRBUF_INIT;
+
+	assert(subprocess_map_initialized);
+	entry = (struct cmd2process *)subprocess_find_entry(&subprocess_map, cmd);
+	if (!entry) {
+		error("external filter '%s' is not available anymore although "
+		      "not all paths have been filtered", cmd);
+		return 0;
+	}
+	process = &entry->subprocess.process;
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+	err = packet_write_fmt_gently(
+		process->in, "command=list_available_blobs\n");
 	if (err)
 		goto done;
 
-	err = read_packetized_to_strbuf(process->out, &nbuf) < 0;
+	err = packet_flush_gently(process->in);
 	if (err)
 		goto done;
 
+	for (;;) {
+		const char* pre = "pathname=";
+		const int pre_len = strlen(pre);
+		line = packet_read_line(process->out, NULL);
+		if (!line)
+			break;
+		err = strlen(line) <= pre_len || strncmp(line, pre, pre_len);
+		if (err)
+			goto done;
+		string_list_insert(delayed_paths, xstrdup(line+pre_len));
+	}
+
 	err = subprocess_read_status(process->out, &filter_status);
 	if (err)
 		goto done;
@@ -680,10 +758,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	sigchain_pop(SIGPIPE);
 
 	if (err)
-		handle_filter_error(&filter_status, entry, wanted_capability);
-	else
-		strbuf_swap(dst, &nbuf);
-	strbuf_release(&nbuf);
+		handle_filter_error(&filter_status, entry, 0);
 	return !err;
 }
 
@@ -698,7 +773,8 @@ static struct convert_driver {
 
 static int apply_filter(const char *path, const char *src, size_t len,
 			int fd, struct strbuf *dst, struct convert_driver *drv,
-			const unsigned int wanted_capability)
+			const unsigned int wanted_capability,
+			struct delayed_checkout *dco)
 {
 	const char *cmd = NULL;
 
@@ -716,7 +792,8 @@ static int apply_filter(const char *path, const char *src, size_t len,
 	if (cmd && *cmd)
 		return apply_single_file_filter(path, src, len, fd, dst, cmd);
 	else if (drv->process && *drv->process)
-		return apply_multi_file_filter(path, src, len, fd, dst, drv->process, wanted_capability);
+		return apply_multi_file_filter(path, src, len, fd, dst,
+			drv->process, wanted_capability, dco);
 
 	return 0;
 }
@@ -1057,7 +1134,7 @@ int would_convert_to_git_filter_fd(const char *path)
 	if (!ca.drv->required)
 		return 0;
 
-	return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN);
+	return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN, NULL);
 }
 
 const char *get_convert_attr_ascii(const char *path)
@@ -1094,7 +1171,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
 
 	convert_attrs(&ca, path);
 
-	ret |= apply_filter(path, src, len, -1, dst, ca.drv, CAP_CLEAN);
+	ret |= apply_filter(path, src, len, -1, dst, ca.drv, CAP_CLEAN, NULL);
 	if (!ret && ca.drv && ca.drv->required)
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
@@ -1119,7 +1196,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
 	assert(ca.drv);
 	assert(ca.drv->clean || ca.drv->process);
 
-	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv, CAP_CLEAN))
+	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv, CAP_CLEAN, NULL))
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
 	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
@@ -1128,7 +1205,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
 
 static int convert_to_working_tree_internal(const char *path, const char *src,
 					    size_t len, struct strbuf *dst,
-					    int normalizing)
+					    int normalizing, struct delayed_checkout *dco)
 {
 	int ret = 0, ret_filter = 0;
 	struct conv_attrs ca;
@@ -1153,21 +1230,29 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
 		}
 	}
 
-	ret_filter = apply_filter(path, src, len, -1, dst, ca.drv, CAP_SMUDGE);
+	ret_filter = apply_filter(
+		path, src, len, -1, dst, ca.drv, CAP_SMUDGE, dco);
 	if (!ret_filter && ca.drv && ca.drv->required)
 		die("%s: smudge filter %s failed", path, ca.drv->name);
 
 	return ret | ret_filter;
 }
 
+int async_convert_to_working_tree(const char *path, const char *src,
+				  size_t len, struct strbuf *dst,
+				  void *dco)
+{
+	return convert_to_working_tree_internal(path, src, len, dst, 0, dco);
+}
+
 int convert_to_working_tree(const char *path, const char *src, size_t len, struct strbuf *dst)
 {
-	return convert_to_working_tree_internal(path, src, len, dst, 0);
+	return convert_to_working_tree_internal(path, src, len, dst, 0, NULL);
 }
 
 int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst)
 {
-	int ret = convert_to_working_tree_internal(path, src, len, dst, 1);
+	int ret = convert_to_working_tree_internal(path, src, len, dst, 1, NULL);
 	if (ret) {
 		src = dst->buf;
 		len = dst->len;
diff --git a/convert.h b/convert.h
index 82871a11d5..c4beaa5101 100644
--- a/convert.h
+++ b/convert.h
@@ -4,6 +4,8 @@
 #ifndef CONVERT_H
 #define CONVERT_H
 
+#include "string-list.h"
+
 enum safe_crlf {
 	SAFE_CRLF_FALSE = 0,
 	SAFE_CRLF_FAIL = 1,
@@ -32,6 +34,21 @@ enum eol {
 #endif
 };
 
+enum ce_delay_state {
+	CE_NO_DELAY = 0,
+	CE_CAN_DELAY = 1,
+	CE_DELAYED = 2,
+	CE_RETRY = 3
+};
+
+struct delayed_checkout {
+	enum ce_delay_state state;
+	/* List of filter drivers that signaled delayed blobs. */
+	struct string_list filters;
+	/* List of delayed blobs identified by their path. */
+	struct string_list paths;
+};
+
 extern enum eol core_eol;
 extern const char *get_cached_convert_stats_ascii(const char *path);
 extern const char *get_wt_convert_stats_ascii(const char *path);
@@ -42,6 +59,10 @@ extern int convert_to_git(const char *path, const char *src, size_t len,
 			  struct strbuf *dst, enum safe_crlf checksafe);
 extern int convert_to_working_tree(const char *path, const char *src,
 				   size_t len, struct strbuf *dst);
+extern int async_convert_to_working_tree(const char *path, const char *src,
+					 size_t len, struct strbuf *dst,
+					 void *dco);
+extern int async_query_available_blobs(const char *cmd, struct string_list *delayed_paths);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
 			      struct strbuf *dst);
 static inline int would_convert_to_git(const char *path)
diff --git a/entry.c b/entry.c
index d6b263f78e..c9cb557559 100644
--- a/entry.c
+++ b/entry.c
@@ -137,6 +137,81 @@ static int streaming_write_entry(const struct cache_entry *ce, char *path,
 	return result;
 }
 
+void enable_delayed_checkout(struct checkout *state)
+{
+	if (!state->delayed_checkout) {
+		state->delayed_checkout = xmalloc(sizeof(*state->delayed_checkout));
+		state->delayed_checkout->state = CE_CAN_DELAY;
+		string_list_init(&state->delayed_checkout->filters, 0);
+		string_list_init(&state->delayed_checkout->paths, 0);
+	}
+}
+
+static int remove_available_paths(struct string_list_item *item, void *cb_data)
+{
+	struct string_list *available_paths = cb_data;
+	return !string_list_has_string(available_paths, item->string);
+}
+
+int finish_delayed_checkout(struct checkout *state)
+{
+	int errs = 0;
+	struct string_list_item *filter, *path;
+	struct delayed_checkout *dco = state->delayed_checkout;
+
+	if (!state->delayed_checkout) {
+		return errs;
+	}
+
+	while (dco->filters.nr > 0) {
+		for_each_string_list_item(filter, &dco->filters) {
+			struct string_list available_paths;
+			string_list_init(&available_paths, 0);
+
+			if (!async_query_available_blobs(filter->string, &available_paths)) {
+				/* Filter reported an error */
+				errs = 1;
+				filter->string = "";
+				continue;
+			}
+			if (available_paths.nr <= 0) {
+				/* Filter responded with no entries. That means
+				   the filter is done and we can remove the
+				   filter from the list (see
+				   "string_list_remove_empty_items" call below).
+				*/
+				filter->string = "";
+				continue;
+			}
+
+			/* In dco->paths we store a list of all delayed paths.
+			   The filter just send us a list of available paths.
+			   Remove them from the list.
+			*/
+			filter_string_list(&dco->paths, 0,
+				&remove_available_paths, &available_paths);
+
+			for_each_string_list_item(path, &available_paths) {
+				struct cache_entry* ce = index_file_exists(
+					state->istate, path->string,
+					strlen(path->string), 0);
+				dco->state = CE_RETRY;
+				errs |= (ce ? checkout_entry(ce, state, NULL) : 1);
+			}
+		}
+		string_list_remove_empty_items(&dco->filters, 0);
+	}
+	string_list_clear(&dco->filters, 0);
+
+	/* At this point we should not have any delayed paths anymore. */
+	errs |= dco->paths.nr;
+
+	free(dco);
+	state->delayed_checkout = NULL;
+
+	return errs;
+}
+
 static int write_entry(struct cache_entry *ce,
 		       char *path, const struct checkout *state, int to_tempfile)
 {
@@ -179,11 +254,36 @@ static int write_entry(struct cache_entry *ce,
 		/*
 		 * Convert from git internal format to working tree format
 		 */
-		if (ce_mode_s_ifmt == S_IFREG &&
-		    convert_to_working_tree(ce->name, new, size, &buf)) {
-			free(new);
-			new = strbuf_detach(&buf, &newsize);
-			size = newsize;
+		if (ce_mode_s_ifmt == S_IFREG) {
+			struct delayed_checkout *dco = state->delayed_checkout;
+			if (dco && dco->state != CE_NO_DELAY) {
+				/* Do not send the blob in case of a retry. */
+				if (dco->state == CE_RETRY) {
+					new = NULL;
+					size = 0;
+				}
+				ret = async_convert_to_working_tree(
+					ce->name, new, size, &buf, dco);
+				if (ret && dco->state == CE_DELAYED) {
+					free(new);
+					/* Reset the state of the next blob */
+					dco->state = CE_CAN_DELAY;
+					goto finish;
+				}
+			} else
+				ret = convert_to_working_tree(
+					ce->name, new, size, &buf);
+
+			if (ret) {
+				free(new);
+				new = strbuf_detach(&buf, &newsize);
+				size = newsize;
+			}
+			/*
+			 * No "else" here as errors from convert are OK at this
+			 * point. If the error would have been fatal (e.g.
+			 * filter is required), then we would have died already.
+			 */
 		}
 
 		fd = open_output_fd(path, ce, to_tempfile);
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 0c04d346a1..4b5a45fd43 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -701,4 +701,78 @@ test_expect_success PERL 'invalid process filter must fail (and not hang!)' '
 	)
 '
 
+test_expect_success PERL 'delayed checkout in process filter' '
+	test_config_global filter.a.process "rot13-filter.pl a.log clean smudge delay" &&
+	test_config_global filter.a.required true &&
+	test_config_global filter.b.process "rot13-filter.pl b.log clean smudge delay" &&
+	test_config_global filter.b.required true &&
+
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+		echo "*.a filter=a" >.gitattributes &&
+		echo "*.b filter=b" >>.gitattributes &&
+		cp "$TEST_ROOT/test.o" test.a &&
+		cp "$TEST_ROOT/test.o" test-delay10.a &&
+		cp "$TEST_ROOT/test.o" test-delay11.a &&
+		cp "$TEST_ROOT/test.o" test-delay20.a &&
+		cp "$TEST_ROOT/test.o" test-delay10.b &&
+		git add . &&
+		git commit -m "test commit 1"
+	) &&
+
+	S=$(file_size "$TEST_ROOT/test.o") &&
+	cat >a.exp <<-EOF &&
+		START
+		init handshake complete
+		IN: smudge test.a $S [OK] -- OUT: $S . [OK]
+		IN: smudge test-delay10.a $S [OK] -- [DELAYED]
+		IN: smudge test-delay11.a $S [OK] -- [DELAYED]
+		IN: smudge test-delay20.a $S [OK] -- [DELAYED]
+		IN: list_available_blobs test-delay10.a test-delay11.a [OK]
+		IN: smudge test-delay10.a 0 [OK] -- OUT: $S . [OK]
+		IN: smudge test-delay11.a 0 [OK] -- OUT: $S . [OK]
+		IN: list_available_blobs test-delay20.a [OK]
+		IN: smudge test-delay20.a 0 [OK] -- OUT: $S . [OK]
+		IN: list_available_blobs [OK]
+		STOP
+	EOF
+	cat >b.exp <<-EOF &&
+		START
+		init handshake complete
+		IN: smudge test-delay10.b $S [OK] -- [DELAYED]
+		IN: list_available_blobs test-delay10.b [OK]
+		IN: smudge test-delay10.b 0 [OK] -- OUT: $S . [OK]
+		IN: list_available_blobs [OK]
+		STOP
+	EOF
+
+	rm -rf repo-cloned &&
+	filter_git clone repo repo-cloned &&
+	test_cmp_count a.exp repo-cloned/a.log &&
+	test_cmp_count b.exp repo-cloned/b.log &&
+
+	(
+		cd repo-cloned &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.a &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay10.a &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay11.a &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay20.a &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay10.b &&
+
+		rm *.a *.b &&
+		filter_git checkout . &&
+		test_cmp_count ../a.exp a.log &&
+		test_cmp_count ../b.exp b.log &&
+
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.a &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay10.a &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay11.a &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay20.a &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay10.b
+	)
+'
+
 test_done
diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index 5e43faeec1..f0dc0aff4a 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -18,6 +18,11 @@
 #     operation then the filter signals that it cannot or does not want
 #     to process the file and any file after that is processed with the
 #     same command.
+# (5) If data with a pathname that is a key in the DELAY hash is
+#     requested (e.g. 'test-delay10.a') then the filter responds with
+#     a "delay" status and sets the "requested" field in the DELAY hash.
+#     The filter will signal the availability of this object after
+#     "count" (field in DELAY hash) "list_available_blobs" commands.
 #
 
 use strict;
@@ -30,6 +35,13 @@ my @capabilities            = @ARGV;
 
 open my $debug, ">>", $log_file or die "cannot open log file: $!";
 
+my %DELAY = (
+	'test-delay10.a' => { "requested" => 0, "count" => 1 },
+	'test-delay11.a' => { "requested" => 0, "count" => 1 },
+	'test-delay20.a' => { "requested" => 0, "count" => 2 },
+	'test-delay10.b' => { "requested" => 0, "count" => 1 },
+);
+
 sub rot13 {
 	my $str = shift;
 	$str =~ y/A-Za-z/N-ZA-Mn-za-m/;
@@ -66,7 +78,7 @@ sub packet_bin_read {
 
 sub packet_txt_read {
 	my ( $res, $buf ) = packet_bin_read();
-	unless ( $buf =~ s/\n$// ) {
+	unless ( $buf eq '' or $buf =~ s/\n$// ) {
 		die "A non-binary line MUST be terminated by an LF.";
 	}
 	return ( $res, $buf );
@@ -101,6 +113,7 @@ packet_flush();
 
 ( packet_txt_read() eq ( 0, "capability=clean" ) )  || die "bad capability";
 ( packet_txt_read() eq ( 0, "capability=smudge" ) ) || die "bad capability";
+( packet_txt_read() eq ( 0, "capability=delay" ) )  || die "bad capability";
 ( packet_bin_read() eq ( 1, "" ) )                  || die "bad capability end";
 
 foreach (@capabilities) {
@@ -115,84 +128,132 @@ while (1) {
 	print $debug "IN: $command";
 	$debug->flush();
 
-	my ($pathname) = packet_txt_read() =~ /^pathname=(.+)$/;
-	print $debug " $pathname";
-	$debug->flush();
-
-	if ( $pathname eq "" ) {
-		die "bad pathname '$pathname'";
-	}
+	if ( $command eq "list_available_blobs" ) {
+		# Flush
+		packet_bin_read();
 
-	# Flush
-	packet_bin_read();
-
-	my $input = "";
-	{
-		binmode(STDIN);
-		my $buffer;
-		my $done = 0;
-		while ( !$done ) {
-			( $done, $buffer ) = packet_bin_read();
-			$input .= $buffer;
+		foreach my $pathname (sort keys %DELAY) {
+			if ( $DELAY{$pathname}{"requested"} >= 1 ) {
+				$DELAY{$pathname}{"count"} = $DELAY{$pathname}{"count"} - 1;
+				if ($DELAY{$pathname}{"count"} == 0 ) {
+					print $debug " $pathname";
+					packet_txt_write("pathname=$pathname");
+				}
+			}
 		}
-		print $debug " " . length($input) . " [OK] -- ";
-		$debug->flush();
-	}
-
-	my $output;
-	if ( $pathname eq "error.r" or $pathname eq "abort.r" ) {
-		$output = "";
-	}
-	elsif ( $command eq "clean" and grep( /^clean$/, @capabilities ) ) {
-		$output = rot13($input);
-	}
-	elsif ( $command eq "smudge" and grep( /^smudge$/, @capabilities ) ) {
-		$output = rot13($input);
-	}
-	else {
-		die "bad command '$command'";
-	}
 
-	if ( $pathname eq "error.r" ) {
-		print $debug "[ERROR]\n";
-		$debug->flush();
-		packet_txt_write("status=error");
 		packet_flush();
-	}
-	elsif ( $pathname eq "abort.r" ) {
-		print $debug "[ABORT]\n";
+
+		print $debug " [OK]\n";
 		$debug->flush();
-		packet_txt_write("status=abort");
+		packet_txt_write("status=success");
 		packet_flush();
 	}
 	else {
-		packet_txt_write("status=success");
-		packet_flush();
+		my ($pathname) = packet_txt_read() =~ /^pathname=(.+)$/;
+		print $debug " $pathname";
+		$debug->flush();
+
+		if ( $pathname eq "" ) {
+			die "bad pathname '$pathname'";
+		}
+
+		# Read until flush
+		my ( $done, $buffer ) = packet_txt_read();
+		while ( $buffer ne '' ) {
+			if ( $buffer eq "can-delay=1" ) {
+				if ( exists $DELAY{$pathname} and $DELAY{$pathname}{"requested"} == 0 ) {
+					$DELAY{$pathname}{"requested"} = 1;
+				}
+			} else {
+				die "Unknown message '$buffer'";
+			}
 
-		if ( $pathname eq "${command}-write-fail.r" ) {
-			print $debug "[WRITE FAIL]\n";
+			( $done, $buffer ) = packet_txt_read();
+		}
+
+		my $input = "";
+		{
+			binmode(STDIN);
+			my $buffer;
+			my $done = 0;
+			while ( !$done ) {
+				( $done, $buffer ) = packet_bin_read();
+				$input .= $buffer;
+			}
+			print $debug " " . length($input) . " [OK] -- ";
 			$debug->flush();
-			die "${command} write error";
 		}
 
-		print $debug "OUT: " . length($output) . " ";
-		$debug->flush();
+		my $output;
+		if ( exists $DELAY{$pathname} and exists $DELAY{$pathname}{"output"} ) {
+			$output = $DELAY{$pathname}{"output"}
+		}
+		elsif ( $pathname eq "error.r" or $pathname eq "abort.r" ) {
+			$output = "";
+		}
+		elsif ( $command eq "clean" and grep( /^clean$/, @capabilities ) ) {
+			$output = rot13($input);
+		}
+		elsif ( $command eq "smudge" and grep( /^smudge$/, @capabilities ) ) {
+			$output = rot13($input);
+		}
+		else {
+			die "bad command '$command'";
+		}
+
+		if ( $pathname eq "error.r" ) {
+			print $debug "[ERROR]\n";
+			$debug->flush();
+			packet_txt_write("status=error");
+			packet_flush();
+		}
+		elsif ( $pathname eq "abort.r" ) {
+			print $debug "[ABORT]\n";
+			$debug->flush();
+			packet_txt_write("status=abort");
+			packet_flush();
+		}
+		elsif ( $command eq "smudge" and
+			exists $DELAY{$pathname} and
+			$DELAY{$pathname}{"requested"} == 1
+		) {
+			print $debug "[DELAYED]\n";
+			$debug->flush();
+			packet_txt_write("status=delayed");
+			packet_flush();
+			$DELAY{$pathname}{"requested"} = 2;
+			$DELAY{$pathname}{"output"} = $output;
+		}
+		else {
+			packet_txt_write("status=success");
+			packet_flush();
 
-		while ( length($output) > 0 ) {
-			my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE );
-			packet_bin_write($packet);
-			# dots represent the number of packets
-			print $debug ".";
-			if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) {
-				$output = substr( $output, $MAX_PACKET_CONTENT_SIZE );
+			if ( $pathname eq "${command}-write-fail.r" ) {
+				print $debug "[WRITE FAIL]\n";
+				$debug->flush();
+				die "${command} write error";
 			}
-			else {
-				$output = "";
+
+			print $debug "OUT: " . length($output) . " ";
+			$debug->flush();
+
+			while ( length($output) > 0 ) {
+				my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE );
+				packet_bin_write($packet);
+				# dots represent the number of packets
+				print $debug ".";
+				if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) {
+					$output = substr( $output, $MAX_PACKET_CONTENT_SIZE );
+				}
+				else {
+					$output = "";
+				}
 			}
+			packet_flush();
+			print $debug " [OK]\n";
+			$debug->flush();
+			packet_flush();
 		}
-		packet_flush();
-		print $debug " [OK]\n";
-		$debug->flush();
-		packet_flush();
 	}
 }
diff --git a/unpack-trees.c b/unpack-trees.c
index d38c37e38c..491a2562ad 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -379,6 +379,7 @@ static int check_updates(struct unpack_trees_options *o)
 	if (should_update_submodules() && o->update && !o->dry_run)
 		reload_gitmodules_file(index, &state);
 
+	enable_delayed_checkout(&state);
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
@@ -393,6 +394,7 @@ static int check_updates(struct unpack_trees_options *o)
 			}
 		}
 	}
+	errs |= finish_delayed_checkout(&state);
 	stop_progress(&progress);
 	if (o->update)
 		git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
-- 
2.13.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 0/5] convert: add "status=delayed" to filter process protocol
  2017-06-01  8:21 [PATCH v5 0/5] convert: add "status=delayed" to filter process protocol Lars Schneider
                   ` (4 preceding siblings ...)
  2017-06-01  8:22 ` [PATCH v5 5/5] convert: add "status=delayed" to filter process protocol Lars Schneider
@ 2017-06-01  9:44 ` Junio C Hamano
  2017-06-02  2:06 ` Junio C Hamano
  2017-06-24 14:23 ` Jeff King
  7 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2017-06-01  9:44 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, peff, tboegi, e, ttaylorr, peartben

Lars Schneider <larsxschneider@gmail.com> writes:

> here is the 5th iteration of my "status delayed" topic. Patch 1 to 3 are
> minor t0021 test adjustments and haven't been changed since v3. Patch 4
> is a minor "extract method" refactoring in convert. Patch 5 is the new
> feature.

Thanks.  

Will replace but it will take a while for me to get to it (I expect
that others may very well beat me to reviewing this series).


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 0/5] convert: add "status=delayed" to filter process protocol
  2017-06-01  8:21 [PATCH v5 0/5] convert: add "status=delayed" to filter process protocol Lars Schneider
                   ` (5 preceding siblings ...)
  2017-06-01  9:44 ` [PATCH v5 0/5] " Junio C Hamano
@ 2017-06-02  2:06 ` Junio C Hamano
  2017-06-24 14:23 ` Jeff King
  7 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2017-06-02  2:06 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, peff, tboegi, e, ttaylorr, peartben

Lars Schneider <larsxschneider@gmail.com> writes:

> If you review this series then please read the "Delay" section in
> "Documentation/gitattributes.txt" first for an overview of the delay mechanism.

OK.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 5/5] convert: add "status=delayed" to filter process protocol
  2017-06-01  8:22 ` [PATCH v5 5/5] convert: add "status=delayed" to filter process protocol Lars Schneider
@ 2017-06-02  2:21   ` Junio C Hamano
  2017-06-05 11:36     ` Lars Schneider
  2017-06-24 14:19   ` Jeff King
  1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2017-06-02  2:21 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, peff, tboegi, e, ttaylorr, peartben

Lars Schneider <larsxschneider@gmail.com> writes:

> +Delay
> +^^^^^
> +
> +If the filter supports the "delay" capability, then Git can send the
> +flag "can-delay" after the filter command and pathname. This flag
> +denotes that the filter can delay filtering the current blob (e.g. to
> +compensate network latencies) by responding with no content but with
> +the status "delayed" and a flush packet.
> +------------------------
> +packet:          git> command=smudge
> +packet:          git> pathname=path/testfile.dat
> +packet:          git> can-delay=1
> +packet:          git> 0000
> +packet:          git> CONTENT
> +packet:          git> 0000
> +packet:          git< status=delayed
> +packet:          git< 0000
> +------------------------
> +
> +If the filter supports the "delay" capability then it must support the
> +"list_available_blobs" command. If Git sends this command, then the
> +filter is expected to return a list of pathnames of blobs that are
> +available. The list must be terminated with a flush packet followed
> +by a "success" status that is also terminated with a flush packet. If
> +no blobs for the delayed paths are available, yet, then the filter is
> +expected to block the response until at least one blob becomes
> +available. The filter can tell Git that it has no more delayed blobs
> +by sending an empty list.
> +------------------------
> +packet:          git> command=list_available_blobs
> +packet:          git> 0000
> +packet:          git< pathname=path/testfile.dat
> +packet:          git< pathname=path/otherfile.dat
> +packet:          git< 0000
> +packet:          git< status=success
> +packet:          git< 0000
> +------------------------
> +
> +After Git received the pathnames, it will request the corresponding
> +blobs again. These requests contain a pathname and an empty content
> +section. The filter is expected to respond with the smudged content
> +in the usual way as explained above.
> +------------------------
> +packet:          git> command=smudge
> +packet:          git> pathname=path/testfile.dat
> +packet:          git> 0000
> +packet:          git> 0000  # empty content!
> +packet:          git< status=success
> +packet:          git< 0000
> +packet:          git< SMUDGED_CONTENT
> +packet:          git< 0000
> +packet:          git< 0000  # empty list, keep "status=success" unchanged!
> +------------------------

Random things that come to mind (which I suspect has been dealt with
in code but not described in the above in detail):

 * Using pathname as a "delay key" would mean that we assume that a
   single helper process is spawned to serve only a single Git
   invocation (as opposed to be sitting as a daemon accepting
   requests from instances of Git invoked much later than the daemon
   started), which is OK, but it is somewhat unclear when a filter
   is allowed to discard the requests that used to be outstanding
   from the write-up.  Imagine Git asked for A and B, both of which
   was delayed, and then asked for a list of available ones and
   learned that A is now ready.  

   - Is it an error to ask for B at this point, and if so how is the
     error handled?  Or will it turn into a blocking operation?

   - As A is available, Git can ask for it.  After retrieving
     smudged content for A, can Git ask for it again?  Or is it an
     error to do so, without starting over with another can-delay=1
     request for the same path?

 * The pathname does not have to be encoded/quoted in any way as it
   is just a payload on the tail end of a packet line, so I am
   guessing it is just a sequence raw bytes.  It is somewhat unclear
   in the above write-up.  Also, is this considered to be a part of
   "textual" exchange over the packet-line protocol, where it is
   customary to remove the trailing LF from the packet?  "We do not
   support a file whose name ends with LF" may or may not be an
   acceptable stance (I am personally OK, but some people who use
   Git may not), but it needs to be documented if there is such an
   issue.

 * The continued request throws an empty content in the above
   illustration.  Is a filter allowed/encouraged to assume that an
   empty content in the request is a continuation?  I am guessing
   that the answer is NO (otherwise you cannot filter an empty
   file), and it somehow need to be documented, perhaps?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 5/5] convert: add "status=delayed" to filter process protocol
  2017-06-02  2:21   ` Junio C Hamano
@ 2017-06-05 11:36     ` Lars Schneider
  0 siblings, 0 replies; 20+ messages in thread
From: Lars Schneider @ 2017-06-05 11:36 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, Jeff King, Torsten Bögershausen, e,
	ttaylorr, peartben


> On 02 Jun 2017, at 04:21, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Lars Schneider <larsxschneider@gmail.com> writes:
> 
>> ...
>> +After Git received the pathnames, it will request the corresponding
>> +blobs again. These requests contain a pathname and an empty content
>> +section. The filter is expected to respond with the smudged content
>> +in the usual way as explained above.
>> ...
> 
> Random things that come to mind (which I suspect has been dealt with
> in code but not described in the above in detail):
> 
> * Using pathname as a "delay key" would mean that we assume that a
>   single helper process is spawned to serve only a single Git
>   invocation (as opposed to be sitting as a daemon accepting
>   requests from instances of Git invoked much later than the daemon
>   started), which is OK, but it is somewhat unclear when a filter
>   is allowed to discard the requests that used to be outstanding
>   from the write-up.  Imagine Git asked for A and B, both of which
>   was delayed, and then asked for a list of available ones and
>   learned that A is now ready.  
> 
>   - Is it an error to ask for B at this point, and if so how is the
>     error handled?  Or will it turn into a blocking operation?

It is no error per se and it should just work. If CE_DELAYED is set
then the filter could even delay the blob. However, I would consider 
this bug in Git that would need to be fixed.

Would this make the documentation clear enough?

    After Git received the pathnames, it will request the 
    corresponding blobs again (and only these blobs).


>   - As A is available, Git can ask for it.  After retrieving
>     smudged content for A, can Git ask for it again?  Or is it an
>     error to do so, without starting over with another can-delay=1
>     request for the same path?

I'd consider it a bug if Git asks for a blob again. How about this?

    The filter is expected to respond with the smudged content
    in the usual way as explained above. The filter can free any
    resources associated with the blob after Git acknowledged
    the delivery. Git will not ask for the blob again. 


> * The pathname does not have to be encoded/quoted in any way as it
>   is just a payload on the tail end of a packet line, so I am
>   guessing it is just a sequence raw bytes.  It is somewhat unclear
>   in the above write-up.  Also, is this considered to be a part of
>   "textual" exchange over the packet-line protocol, where it is
>   customary to remove the trailing LF from the packet?  "We do not
>   support a file whose name ends with LF" may or may not be an
>   acceptable stance (I am personally OK, but some people who use
>   Git may not), but it needs to be documented if there is such an
>   issue.

How about this?

    If the filter supports the "delay" capability then it must support 
    the "list_available_blobs" command. If Git sends this command, then 
    the filter is expected to return a list of pathnames of blobs that 
    are available. The pathname packets send by the filter must exactly 
    match the pathname packets that Git requested initially.

Maybe even "must match bit-exact"?


> * The continued request throws an empty content in the above
>   illustration.  Is a filter allowed/encouraged to assume that an
>   empty content in the request is a continuation?  I am guessing
>   that the answer is NO (otherwise you cannot filter an empty
>   file), and it somehow need to be documented, perhaps?

Correct, the answer is NO. Would "irrelevant content section" be
OK or should I try to find a more explicit way to document that?

    After Git received the pathnames, it will request the corresponding
    blobs again. These requests contain a pathname and an irrelevant 
    (usually empty) content section. The filter is expected to respond 
    with the smudged content in the usual way as explained above.



Thanks a lot for the review,
Lars

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 4/5] convert: move multiple file filter error handling to separate function
  2017-06-01  8:22 ` [PATCH v5 4/5] convert: move multiple file filter error handling to separate function Lars Schneider
@ 2017-06-18  7:20   ` Torsten Bögershausen
  2017-06-18 11:47     ` Lars Schneider
  0 siblings, 1 reply; 20+ messages in thread
From: Torsten Bögershausen @ 2017-06-18  7:20 UTC (permalink / raw)
  To: Lars Schneider, git; +Cc: gitster, peff, e, ttaylorr, peartben


On 2017-06-01 10:22, Lars Schneider wrote:
> This is useful for the subsequent patch 'convert: add "status=delayed" to
> filter process protocol'.

May be
 Collecting all filter error handling is useful for the subsequent patch
 'convert: add "status=delayed" to filter process protocol'.

> 
> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
> ---
>  convert.c | 47 ++++++++++++++++++++++++++---------------------
>  1 file changed, 26 insertions(+), 21 deletions(-)
> 
> diff --git a/convert.c b/convert.c
> index f1e168bc30..a5e09bb0e8 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -565,6 +565,29 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
>  	return err;
>  }
>  
> +static void handle_filter_error(const struct strbuf *filter_status,
> +				struct cmd2process *entry,
> +				const unsigned int wanted_capability) {
> +	if (!strcmp(filter_status->buf, "error")) {
> +		/* The filter signaled a problem with the file. */
> +	} else if (!strcmp(filter_status->buf, "abort") && wanted_capability) {
> +		/*
> +		 * The filter signaled a permanent problem. Don't try to filter
> +		 * files with the same command for the lifetime of the current
> +		 * Git process.
> +		 */
> +		 entry->supported_capabilities &= ~wanted_capability;
> +	} else {
> +		/*
> +		 * Something went wrong with the protocol filter.
> +		 * Force shutdown and restart if another blob requires filtering.
> +		 */
> +		error("external filter '%s' failed", entry->subprocess.cmd);
> +		subprocess_stop(&subprocess_map, &entry->subprocess);
> +		free(entry);
> +	}
> +}
> +
>  static int apply_multi_file_filter(const char *path, const char *src, size_t len,
>  				   int fd, struct strbuf *dst, const char *cmd,
>  				   const unsigned int wanted_capability)
> @@ -656,28 +679,10 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
>  done:
>  	sigchain_pop(SIGPIPE);
>  
> -	if (err) {
> -		if (!strcmp(filter_status.buf, "error")) {
> -			/* The filter signaled a problem with the file. */
                Note1: Do we need a line with a single ";" here ?
                Question: What should/will happen to the file ?
                Will Git complain later ? Retry later ?
> -		} else if (!strcmp(filter_status.buf, "abort")) {
> -			/*
> -			 * The filter signaled a permanent problem. Don't try to filter
> -			 * files with the same command for the lifetime of the current
> -			 * Git process.
> -			 */
> -			 entry->supported_capabilities &= ~wanted_capability;
                         Hm, could this be clarified somewhat ?
                         Mapping "abort" to "permanent problem" makes sense as
                         such, but the only action that is done is to reset
                         a capablity.

		/*
		 * The filter signaled a missing capabilty. Don't try to filter
		 * files with the same command for the lifetime of the current
		 * Git process.
		 */

                 # And the there is a question why the answer is "abort" and not
                 # "unsupported"
> -		} else {
> -			/*
> -			 * Something went wrong with the protocol filter.
> -			 * Force shutdown and restart if another blob requires filtering.
> -			 */
> -			error("external filter '%s' failed", cmd);
> -			subprocess_stop(&subprocess_map, &entry->subprocess);
> -			free(entry);
> -		}
> -	} else {
> +	if (err)
> +		handle_filter_error(&filter_status, entry, wanted_capability);
> +	else
>  		strbuf_swap(dst, &nbuf);
> -	}
>  	strbuf_release(&nbuf);
>  	return !err;
>  }
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 4/5] convert: move multiple file filter error handling to separate function
  2017-06-18  7:20   ` Torsten Bögershausen
@ 2017-06-18 11:47     ` Lars Schneider
  2017-06-19 17:18       ` Torsten Bögershausen
  0 siblings, 1 reply; 20+ messages in thread
From: Lars Schneider @ 2017-06-18 11:47 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Git Mailing List, Junio C Hamano, peff, e, ttaylorr, peartben


> On 18 Jun 2017, at 09:20, Torsten Bögershausen <tboegi@web.de> wrote:
> 
> 
> On 2017-06-01 10:22, Lars Schneider wrote:
>> This is useful for the subsequent patch 'convert: add "status=delayed" to
>> filter process protocol'.
> 
> May be
> Collecting all filter error handling is useful for the subsequent patch
> 'convert: add "status=delayed" to filter process protocol'.

I think I get your point. However, I feel "Collecting" is not the right word.

How about:
"Refactoring filter error handling is useful..."?


>> 
>> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
>> ---
>> convert.c | 47 ++++++++++++++++++++++++++---------------------
>> 1 file changed, 26 insertions(+), 21 deletions(-)
>> 
>> diff --git a/convert.c b/convert.c
>> index f1e168bc30..a5e09bb0e8 100644
>> --- a/convert.c
>> +++ b/convert.c
>> @@ -565,6 +565,29 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
>> 	return err;
>> }
>> 
>> +static void handle_filter_error(const struct strbuf *filter_status,
>> +				struct cmd2process *entry,
>> +				const unsigned int wanted_capability) {
>> +	if (!strcmp(filter_status->buf, "error")) {
>> +		/* The filter signaled a problem with the file. */
>> +	} else if (!strcmp(filter_status->buf, "abort") && wanted_capability) {
>> +		/*
>> +		 * The filter signaled a permanent problem. Don't try to filter
>> +		 * files with the same command for the lifetime of the current
>> +		 * Git process.
>> +		 */
>> +		 entry->supported_capabilities &= ~wanted_capability;
>> +	} else {
>> +		/*
>> +		 * Something went wrong with the protocol filter.
>> +		 * Force shutdown and restart if another blob requires filtering.
>> +		 */
>> +		error("external filter '%s' failed", entry->subprocess.cmd);
>> +		subprocess_stop(&subprocess_map, &entry->subprocess);
>> +		free(entry);
>> +	}
>> +}
>> +
>> static int apply_multi_file_filter(const char *path, const char *src, size_t len,
>> 				   int fd, struct strbuf *dst, const char *cmd,
>> 				   const unsigned int wanted_capability)
>> @@ -656,28 +679,10 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
>> done:
>> 	sigchain_pop(SIGPIPE);
>> 
>> -	if (err) {
>> -		if (!strcmp(filter_status.buf, "error")) {
>> -			/* The filter signaled a problem with the file. */
>                Note1: Do we need a line with a single ";" here ?

The single ";" wouldn't hurt but I don't think it is helpful either.
I can't find anything in the coding guidelines about this...


>                Question: What should/will happen to the file ?
>                Will Git complain later ? Retry later ?

The file is not filtered and Git does not try, again. 
That might be OK for certain filters:
If "filter.foo.required = false" then this would be OK. 
If "filter.foo.required = true" then this would cause an error.


>> -		} else if (!strcmp(filter_status.buf, "abort")) {
>> -			/*
>> -			 * The filter signaled a permanent problem. Don't try to filter
>> -			 * files with the same command for the lifetime of the current
>> -			 * Git process.
>> -			 */
>> -			 entry->supported_capabilities &= ~wanted_capability;
>                         Hm, could this be clarified somewhat ?
>                         Mapping "abort" to "permanent problem" makes sense as
>                         such, but the only action that is done is to reset
>                         a capablity.

I am not sure what you mean with "reset capability". We disable the capability here.
That means Git will not use the capability for the active session.


> 		/*
> 		 * The filter signaled a missing capabilty. Don't try to filter
> 		 * files with the same command for the lifetime of the current
> 		 * Git process.
> 		 */

I like the original version better because the capability is not "missing".
There is "a permanent problem" and the filter wants to signal Git to not use
this capability for the current session anymore.


>                 # And the there is a question why the answer is "abort" and not
>                 # "unsupported"

Because the filter supports the capability. There is just some kind of failure that 
that causes the capability to not work as expected. Again, depending on the filter
"required" flag this is an error or not.


Thanks for the review,
Lars


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 4/5] convert: move multiple file filter error handling to separate function
  2017-06-18 11:47     ` Lars Schneider
@ 2017-06-19 17:18       ` Torsten Bögershausen
  2017-06-19 17:47         ` Lars Schneider
  0 siblings, 1 reply; 20+ messages in thread
From: Torsten Bögershausen @ 2017-06-19 17:18 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Git Mailing List, Junio C Hamano, peff, e, ttaylorr, peartben

On 2017-06-18 13:47, Lars Schneider wrote:
> 
>> On 18 Jun 2017, at 09:20, Torsten Bögershausen <tboegi@web.de> wrote:
>>
>>
>> On 2017-06-01 10:22, Lars Schneider wrote:
>>> This is useful for the subsequent patch 'convert: add "status=delayed" to
>>> filter process protocol'.
>>
>> May be
>> Collecting all filter error handling is useful for the subsequent patch
>> 'convert: add "status=delayed" to filter process protocol'.
> 
> I think I get your point. However, I feel "Collecting" is not the right word.
> 
> How about:
> "Refactoring filter error handling is useful..."?

OK


> 
>>>
>>> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
>>> ---
>>> convert.c | 47 ++++++++++++++++++++++++++---------------------
>>> 1 file changed, 26 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/convert.c b/convert.c
>>> index f1e168bc30..a5e09bb0e8 100644
>>> --- a/convert.c
>>> +++ b/convert.c
>>> @@ -565,6 +565,29 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
>>> 	return err;
>>> }
>>>
>>> +static void handle_filter_error(const struct strbuf *filter_status,
>>> +				struct cmd2process *entry,
>>> +				const unsigned int wanted_capability) {
>>> +	if (!strcmp(filter_status->buf, "error")) {
>>> +		/* The filter signaled a problem with the file. */
>>> +	} else if (!strcmp(filter_status->buf, "abort") && wanted_capability) {
>>> +		/*
>>> +		 * The filter signaled a permanent problem. Don't try to filter
>>> +		 * files with the same command for the lifetime of the current
>>> +		 * Git process.
>>> +		 */
>>> +		 entry->supported_capabilities &= ~wanted_capability;
>>> +	} else {
>>> +		/*
>>> +		 * Something went wrong with the protocol filter.
>>> +		 * Force shutdown and restart if another blob requires filtering.
>>> +		 */
>>> +		error("external filter '%s' failed", entry->subprocess.cmd);
>>> +		subprocess_stop(&subprocess_map, &entry->subprocess);
>>> +		free(entry);
>>> +	}
>>> +}
>>> +
>>> static int apply_multi_file_filter(const char *path, const char *src, size_t len,
>>> 				   int fd, struct strbuf *dst, const char *cmd,
>>> 				   const unsigned int wanted_capability)
>>> @@ -656,28 +679,10 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
>>> done:
>>> 	sigchain_pop(SIGPIPE);
>>>
>>> -	if (err) {
>>> -		if (!strcmp(filter_status.buf, "error")) {
>>> -			/* The filter signaled a problem with the file. */
>>                Note1: Do we need a line with a single ";" here ?
> 
> The single ";" wouldn't hurt but I don't think it is helpful either.
> I can't find anything in the coding guidelines about this...

OK about that. I was thinking to remove the {}, and the we -need- the ";"

> 
> 
>>                Question: What should/will happen to the file ?
>>                Will Git complain later ? Retry later ?
> 
> The file is not filtered and Git does not try, again. 
> That might be OK for certain filters:
> If "filter.foo.required = false" then this would be OK. 
> If "filter.foo.required = true" then this would cause an error.

Does it make sense to add it as a comment ?
I know, everything is documented otherwise, but when looking at the source
 code only, the context may be useful, it may be not.

> 
> 
>>> -		} else if (!strcmp(filter_status.buf, "abort")) {
>>> -			/*
>>> -			 * The filter signaled a permanent problem. Don't try to filter
>>> -			 * files with the same command for the lifetime of the current
>>> -			 * Git process.
>>> -			 */
>>> -			 entry->supported_capabilities &= ~wanted_capability;
>>                         Hm, could this be clarified somewhat ?
>>                         Mapping "abort" to "permanent problem" makes sense as
>>                         such, but the only action that is done is to reset
>>                         a capablity.
> 
> I am not sure what you mean with "reset capability". We disable the capability here.
> That means Git will not use the capability for the active session.

Sorry, my wrong - reset is a bad word here.
"Git will not use the capability for the active session" is perfect!

> 
> 
>> 		/*
>> 		 * The filter signaled a missing capabilty. Don't try to filter
>> 		 * files with the same command for the lifetime of the current
>> 		 * Git process.
>> 		 */
> 
> I like the original version better because the capability is not "missing".
> There is "a permanent problem" and the filter wants to signal Git to not use
> this capability for the current session anymore.

Git and the filter are in a negotiation phase to find out which side is able
to do what.So I don't think there is a "problem" (in the sense that I understand
it) at all.

And back to the "abort":
I still think that the word feels to harsh...
"abort" in my understanding smells too much "a program is terminated".
If it is not too late, does it may sense to use something like "nack" ?


> 
> 
>>                 # And the there is a question why the answer is "abort" and not
>>                 # "unsupported"
> 
> Because the filter supports the capability. There is just some kind of failure that 
> that causes the capability to not work as expected. Again, depending on the filter
> "required" flag this is an error or not.
> 

May be I misunderstood the whole sequence, and abort is the right thing.
If yes, how about this ?

	} else if (!strcmp(filter_status->buf, "abort") && wanted_capability) {
		/*
		 * Don't try to filter files with this capability for lifetime
		 * of the current Git process.
		 */
		 entry->supported_capabilities &= ~wanted_capability;



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 4/5] convert: move multiple file filter error handling to separate function
  2017-06-19 17:18       ` Torsten Bögershausen
@ 2017-06-19 17:47         ` Lars Schneider
  0 siblings, 0 replies; 20+ messages in thread
From: Lars Schneider @ 2017-06-19 17:47 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Git Mailing List, Junio C Hamano, Jeff King, e, ttaylorr,
	peartben


> On 19 Jun 2017, at 19:18, Torsten Bögershausen <tboegi@web.de> wrote:
> 
> On 2017-06-18 13:47, Lars Schneider wrote:
>> 
>>> On 18 Jun 2017, at 09:20, Torsten Bögershausen <tboegi@web.de> wrote:
>>> 
>>> 
>>> On 2017-06-01 10:22, Lars Schneider wrote:
>>>> This is useful for the subsequent patch 'convert: add "status=delayed" to
>>>> filter process protocol'.
>>> 
>>> May be
>>> Collecting all filter error handling is useful for the subsequent patch
>>> 'convert: add "status=delayed" to filter process protocol'.
>> 
>> I think I get your point. However, I feel "Collecting" is not the right word.
>> 
>> How about:
>> "Refactoring filter error handling is useful..."?
> 
> OK

OK, I'll change it in the next round.

>>>> 
>>>> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
>>>> ---
>>>> convert.c | 47 ++++++++++++++++++++++++++---------------------
>>>> 1 file changed, 26 insertions(+), 21 deletions(-)
>>>> 
>>>> diff --git a/convert.c b/convert.c
>>>> index f1e168bc30..a5e09bb0e8 100644
>>>> --- a/convert.c
>>>> +++ b/convert.c
>>>> @@ -565,6 +565,29 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
>>>> 	return err;
>>>> }
>>>> 
>>>> +static void handle_filter_error(const struct strbuf *filter_status,
>>>> +				struct cmd2process *entry,
>>>> +				const unsigned int wanted_capability) {
>>>> +	if (!strcmp(filter_status->buf, "error")) {
>>>> +		/* The filter signaled a problem with the file. */
>>>> +	} else if (!strcmp(filter_status->buf, "abort") && wanted_capability) {
>>>> +		/*
>>>> +		 * The filter signaled a permanent problem. Don't try to filter
>>>> +		 * files with the same command for the lifetime of the current
>>>> +		 * Git process.
>>>> +		 */
>>>> +		 entry->supported_capabilities &= ~wanted_capability;
>>>> +	} else {
>>>> +		/*
>>>> +		 * Something went wrong with the protocol filter.
>>>> +		 * Force shutdown and restart if another blob requires filtering.
>>>> +		 */
>>>> +		error("external filter '%s' failed", entry->subprocess.cmd);
>>>> +		subprocess_stop(&subprocess_map, &entry->subprocess);
>>>> +		free(entry);
>>>> +	}
>>>> +}
>>>> +
>>>> static int apply_multi_file_filter(const char *path, const char *src, size_t len,
>>>> 				   int fd, struct strbuf *dst, const char *cmd,
>>>> 				   const unsigned int wanted_capability)
>>>> @@ -656,28 +679,10 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
>>>> done:
>>>> 	sigchain_pop(SIGPIPE);
>>>> 
>>>> -	if (err) {
>>>> -		if (!strcmp(filter_status.buf, "error")) {
>>>> -			/* The filter signaled a problem with the file. */
>>>               Note1: Do we need a line with a single ";" here ?
>> 
>> The single ";" wouldn't hurt but I don't think it is helpful either.
>> I can't find anything in the coding guidelines about this...
> 
> OK about that. I was thinking to remove the {}, and the we -need- the ";"

True. However, I prefer the {} style here. Would that be OK with you?
Plus, this commit is not supposed to change code. I just want to move the
code to a different function.


>>>               Question: What should/will happen to the file ?
>>>               Will Git complain later ? Retry later ?
>> 
>> The file is not filtered and Git does not try, again. 
>> That might be OK for certain filters:
>> If "filter.foo.required = false" then this would be OK. 
>> If "filter.foo.required = true" then this would cause an error.
> 
> Does it make sense to add it as a comment ?
> I know, everything is documented otherwise, but when looking at the source
> code only, the context may be useful, it may be not.

I agree. I'll add a comment!

>> 
>>>> -		} else if (!strcmp(filter_status.buf, "abort")) {
>>>> -			/*
>>>> -			 * The filter signaled a permanent problem. Don't try to filter
>>>> -			 * files with the same command for the lifetime of the current
>>>> -			 * Git process.
>>>> -			 */
>>>> -			 entry->supported_capabilities &= ~wanted_capability;
>>>                        Hm, could this be clarified somewhat ?
>>>                        Mapping "abort" to "permanent problem" makes sense as
>>>                        such, but the only action that is done is to reset
>>>                        a capablity.
>> 
>> I am not sure what you mean with "reset capability". We disable the capability here.
>> That means Git will not use the capability for the active session.
> 
> Sorry, my wrong - reset is a bad word here.
> "Git will not use the capability for the active session" is perfect!

OK :)


>>> 		/*
>>> 		 * The filter signaled a missing capabilty. Don't try to filter
>>> 		 * files with the same command for the lifetime of the current
>>> 		 * Git process.
>>> 		 */
>> 
>> I like the original version better because the capability is not "missing".
>> There is "a permanent problem" and the filter wants to signal Git to not use
>> this capability for the current session anymore.
> 
> Git and the filter are in a negotiation phase to find out which side is able
> to do what.So I don't think there is a "problem" (in the sense that I understand
> it) at all.

No, at this point they are passed the negotiation phase. A problem actually
happened.


> And back to the "abort":
> I still think that the word feels to harsh...
> "abort" in my understanding smells too much "a program is terminated".
> If it is not too late, does it may sense to use something like "nack" ?

Well, at this point it is too late because we don't retry.

Plus, I would prefer to not change code here as this commit is supposed
to just move existing code. Changing "abort" would change the protocol
that was released with Git 2.11.


>> 
>>>                # And the there is a question why the answer is "abort" and not
>>>                # "unsupported"
>> 
>> Because the filter supports the capability. There is just some kind of failure that 
>> that causes the capability to not work as expected. Again, depending on the filter
>> "required" flag this is an error or not.
>> 
> 
> May be I misunderstood the whole sequence, and abort is the right thing.
> If yes, how about this ?
> 
> 	} else if (!strcmp(filter_status->buf, "abort") && wanted_capability) {
> 		/*
> 		 * Don't try to filter files with this capability for lifetime
> 		 * of the current Git process.
> 		 */
> 		 entry->supported_capabilities &= ~wanted_capability;

How about this:

The filter signaled a problem. Don't try to filter files with this capability 
for the lifetime of the current Git process.

?

Thanks,
Lars

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 5/5] convert: add "status=delayed" to filter process protocol
  2017-06-01  8:22 ` [PATCH v5 5/5] convert: add "status=delayed" to filter process protocol Lars Schneider
  2017-06-02  2:21   ` Junio C Hamano
@ 2017-06-24 14:19   ` Jeff King
  2017-06-24 17:22     ` Lars Schneider
  1 sibling, 1 reply; 20+ messages in thread
From: Jeff King @ 2017-06-24 14:19 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, gitster, tboegi, e, ttaylorr, peartben

On Thu, Jun 01, 2017 at 10:22:03AM +0200, Lars Schneider wrote:

> Some `clean` / `smudge` filters might require a significant amount of
> time to process a single blob (e.g. the Git LFS smudge filter might
> perform network requests). During this process the Git checkout
> operation is blocked and Git needs to wait until the filter is done to
> continue with the checkout.
> 
> Teach the filter process protocol (introduced in edcc858) to accept the
> status "delayed" as response to a filter request. Upon this response Git
> continues with the checkout operation. After the checkout operation Git
> calls "finish_delayed_checkout" which queries the filter for remaining
> blobs. If the filter is still working on the completion, then the filter
> is expected to block. If the filter has completed all remaining blobs
> then an empty response is expected.
> 
> Git has a multiple code paths that checkout a blob. Support delayed
> checkouts only in `clone` (in unpack-trees.c) and `checkout` operations.

It might be worth giving a reason in this last paragraph. I think the
reason is "because it's more complicated for the caller, as they have to
be OK with out-of-order processing and remembering to go back and handle
the delayed cases".

Speaking of which, _are_ we OK with out-of-order processing in things
like checkout? Certainly we care about deleting items before checking
out new ones (so getting rid of "foo/bar" to make room for "foo" and
vice versa).  I guess it's OK as long as the delayed items are always
filters that check out new items.

> +Delay
> +^^^^^
> +
> +If the filter supports the "delay" capability, then Git can send the
> +flag "can-delay" after the filter command and pathname. This flag
> +denotes that the filter can delay filtering the current blob (e.g. to
> +compensate network latencies) by responding with no content but with
> +the status "delayed" and a flush packet.
> +------------------------
> +packet:          git> command=smudge
> +packet:          git> pathname=path/testfile.dat
> +packet:          git> can-delay=1
> +packet:          git> 0000
> +packet:          git> CONTENT
> +packet:          git> 0000
> +packet:          git< status=delayed
> +packet:          git< 0000
> +------------------------

OK, so the client has to opt into delay for each entry. Makes sense.

> +If the filter supports the "delay" capability then it must support the
> +"list_available_blobs" command. If Git sends this command, then the
> +filter is expected to return a list of pathnames of blobs that are
> +available. The list must be terminated with a flush packet followed
> +by a "success" status that is also terminated with a flush packet. If
> +no blobs for the delayed paths are available, yet, then the filter is
> +expected to block the response until at least one blob becomes
> +available. The filter can tell Git that it has no more delayed blobs
> +by sending an empty list.
> +------------------------
> +packet:          git> command=list_available_blobs
> +packet:          git> 0000
> +packet:          git< pathname=path/testfile.dat
> +packet:          git< pathname=path/otherfile.dat
> +packet:          git< 0000
> +packet:          git< status=success
> +packet:          git< 0000
> +------------------------

Earlier I proposed just having the client ask "give me something that's
ready". This introduces an extra round-trip to say "show me what's
ready" and ask for it. But I think that's OK, as it should be a local
process running over a pipe. The important thing is that the client is
able to always fetch without blocking when something is ready, and to
block when nothing is ready, which this does. Good.

> +After Git received the pathnames, it will request the corresponding
> +blobs again. These requests contain a pathname and an empty content
> +section. The filter is expected to respond with the smudged content
> +in the usual way as explained above.
> +------------------------
> +packet:          git> command=smudge
> +packet:          git> pathname=path/testfile.dat
> +packet:          git> 0000
> +packet:          git> 0000  # empty content!
> +packet:          git< status=success
> +packet:          git< 0000
> +packet:          git< SMUDGED_CONTENT
> +packet:          git< 0000
> +packet:          git< 0000  # empty list, keep "status=success" unchanged!
> +------------------------

Out of curiosity, what should happen if we re-ask for a pathname that
wasn't mentioned in list_available_blobs? I guess it would depend on the
filter implementation, but probably most would just block (since we
wouldn't have asked for can-delay). The other option is returning an
error, I suppose, but blocking and filling the request sounds
friendlier.

> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index a6b2af39d3..c1a256df8d 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -376,6 +376,8 @@ static int checkout_paths(const struct checkout_opts *opts,
>  	state.force = 1;
>  	state.refresh_cache = 1;
>  	state.istate = &the_index;
> +
> +	enable_delayed_checkout(&state);
>  	for (pos = 0; pos < active_nr; pos++) {
>  		struct cache_entry *ce = active_cache[pos];
>  		if (ce->ce_flags & CE_MATCHED) {
> @@ -390,6 +392,7 @@ static int checkout_paths(const struct checkout_opts *opts,
>  			pos = skip_same_name(ce, pos) - 1;
>  		}
>  	}
> +	errs |= finish_delayed_checkout(&state);

The changes to the caller here are satisfyingly small.

> diff --git a/cache.h b/cache.h
> index ae4c45d379..523ead1d95 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1543,16 +1543,20 @@ extern int ident_cmp(const struct ident_split *, const struct ident_split *);
>  struct checkout {
>  	struct index_state *istate;
>  	const char *base_dir;
> +	struct delayed_checkout *delayed_checkout;
>  	int base_dir_len;

Should base_dir_len and base_dir be kept together?

I suspect you ordered it this way because...

>  	unsigned force:1,
>  		 quiet:1,
>  		 not_new:1,
>  		 refresh_cache:1;
>  };
> -#define CHECKOUT_INIT { NULL, "" }
> +#define CHECKOUT_INIT { NULL, "", NULL }
> +

...you wanted to add the NULL to the initializer here. But it's not
necessary. Once one element of a struct is initialized, all the
remaining members are initialized according to the usual static rules
(so NULL for pointers). We're already using that to zero out
base_dir_len and the flags (the "" is necessary because we want a real
empty string, not NULL).

Since you want NULL for your new member, you can just leave the
initializer as-is and it will do the right thing.

> @@ -533,7 +534,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
>  	if (err)
>  		goto done;
>  
> -	err = packet_writel(process->in, "capability=clean", "capability=smudge", NULL);
> +	err = packet_writel(process->in,
> +		"capability=clean", "capability=smudge", "capability=delay", NULL);

Why do we need to tell the filter we know about delay? Shouldn't it just
need to tell us that it knows about delay, and then we choose whether to
ask for can-delay for particular entries?

> @@ -647,6 +653,14 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
>  	if (err)
>  		goto done;
>  
> +	if (CAP_DELAY & entry->supported_capabilities &&
> +	    dco && dco->state == CE_CAN_DELAY) {

In a complicated conditional with bit operations, we usually put the bit
operation in its own parentheses so it's more obvious that it wasn't
supposed to be "&&". Like:

  if ((CAP_DELAY & entry->supported_capabilities) &&
      dco && dco->state == CE_CAN_DELAY))

The operator precedence is such that it works without them, so this is
just a style question (I'd also usually put the flags field before the
flag itself, but that's really getting into aesthetics).

The dco check here is basically asking if can-delay is enabled for the
whole checkout. Would we ever need to mark a specific entry as
not-delayable? Probably not from the start, but I wondered if something
might get marked as a failure and need to be retried without delay or
similar. It's probably not worth going down that road until some caller
needs it.

> @@ -662,14 +676,78 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
>  	if (err)
>  		goto done;
>  
> -	err = strcmp(filter_status.buf, "success");
> +	if (can_delay && !strcmp(filter_status.buf, "delayed")) {
> +		dco->state = CE_DELAYED;
> +		string_list_insert(&dco->filters, cmd);
> +		string_list_insert(&dco->paths, path);

These string_list_insert()s have bad worst-case performance, because
they keep the list sorted. So the worst case is reverse-sorted input,
where each entry goes at the front of the existing array and we have to
memmove() the whole array on each insert, and we do O(n^2) byte copies.

The best case is when the input is already sorted. And I think in
general that's how we'll iterate over paths to checkout. The cmds are
more likely to be random, but we're also not likely to have more than a
handful of them. So this is probably an OK data structure to use, at
least for now.

> +	for (;;) {
> +		const char* pre = "pathname=";
> +		const int pre_len = strlen(pre);
> +		line = packet_read_line(process->out, NULL);
> +		if (!line)
> +			break;
> +		err = strlen(line) <= pre_len || strncmp(line, pre, pre_len);
> +		if (err)
> +			goto done;
> +		string_list_insert(delayed_paths, xstrdup(line+pre_len));
> +	}

I think the prefix matching can be done more simply (and slightly more
efficiently) with:

  const char *path;
  ...
  err = !skip_prefix(line, "pathname=", &path);
  if (err)
	goto done;
  string_list_insert(delayed_paths, xstrdup(path));

I notice we loop here, though. Would we want to be resilient to seeing
other keys besides pathname? Something like:

  while ((line = packet_read_line(process->out, NULL)) {
	const char *path;
	if (skip_prefix(line, "pathname=", &path))
		string_list_insert(delayed_paths, xstrdup(path);
	else
		; /* ignore unknown keys */
  }

I can imagine some future where the filter passes back additional
information about the delayed item. It would be nice if older versions
of Git would just quietly ignore that if they don't know how to handle
it.

> +int finish_delayed_checkout(struct checkout *state)
> +{
> +	int errs = 0;
> +	struct string_list_item *filter, *path;
> +	struct delayed_checkout *dco = state->delayed_checkout;
> +
> +	if (!state->delayed_checkout) {
> +		return errs;
> +	}

Style: we'd usually omit the braces here.

> +			if (available_paths.nr <= 0) {
> +				/* Filter responded with no entries. That means
> +				   the filter is done and we can remove the
> +				   filter from the list (see
> +				   "string_list_remove_empty_items" call below).
> +				*/

This is true because we know the filter would block until it had at
least one item to return.

It does seem to put a lot of faith in the filter. What happens if we hit
this conditional and we know that we still have entries in dco->paths to
be handled?

> +			/* In dco->paths we store a list of all delayed paths.
> +			   The filter just send us a list of available paths.
> +			   Remove them from the list.
> +			*/
> +			filter_string_list(&dco->paths, 0,
> +				&remove_available_paths, &available_paths);

This is O(m log n), which is fine. Since both lists are sorted, we could
technically do it in an O(m+n) merge, but I don't think we have a
function to do that easily.

Sorry to harp on run-times, but it's one place I think it would be easy
to subtly screw this up, so I'm trying to pay close attention.

> +			for_each_string_list_item(path, &available_paths) {
> +				struct cache_entry* ce = index_file_exists(
> +					state->istate, path->string,
> +					strlen(path->string), 0);
> +				dco->state = CE_RETRY;
> +				errs |= (ce ? checkout_entry(ce, state, NULL) : 1);
> +			}

So once we know something is available, we try again. Which makes sense.
We presumably set dco->state to retry here so that we don't re-ask for
it to be delayed. I guess that sort-of answers my earlier question about
per-entry state. We just use the dco->state as a big hammer. It gets
set to retry on the first time through the loop, and then that applies
to everything for the rest of the checkout. But since we do all of the
initial requests in a single pass, followed by all delayed-resolution in
a second pass, that works fine.

> +	/* At this point we should not have any delayed paths anymore. */
> +	errs |= dco->paths.nr;

And I guess this answers my question above about a filter with no
entries. When we bail from the loop, we'll hit this point and complain.

We probably should call string_list_clear(&dco->paths) here to avoid
leaking the paths on error (possibly we should also print them? Or maybe
they'd be warned about elsewhere).

> @@ -179,11 +254,36 @@ static int write_entry(struct cache_entry *ce,
>  		/*
>  		 * Convert from git internal format to working tree format
>  		 */
> -		if (ce_mode_s_ifmt == S_IFREG &&
> -		    convert_to_working_tree(ce->name, new, size, &buf)) {
> -			free(new);
> -			new = strbuf_detach(&buf, &newsize);
> -			size = newsize;
> +		if (ce_mode_s_ifmt == S_IFREG) {
> +			struct delayed_checkout *dco = state->delayed_checkout;
> +			if (dco && dco->state != CE_NO_DELAY) {
> +				/* Do not send the blob in case of a retry. */
> +				if (dco->state == CE_RETRY) {
> +					new = NULL;
> +					size = 0;
> +				}
> +				ret = async_convert_to_working_tree(
> +					ce->name, new, size, &buf, dco);
> +				if (ret && dco->state == CE_DELAYED) {
> +					free(new);
> +					/* Reset the state of the next blob */
> +					dco->state = CE_CAN_DELAY;
> +					goto finish;
> +				}

Hmm. This "reset the state" bit at the end surprised me. I guess it's
not wrong, but it goes against the mental model I had formed above. ;)

We really are using dco->state as a per-entry state flag. It just
happens to be in a persistent shared struct. I don't think it's wrong,
it was mostly just surprising. I don't know if it's worth trying to
simplify, but I think you could do it by:

  1. Passing back the "was delayed" state from async_convert... in the
     return value or via a separate out-parameter.

  2. Setting dco->state to CE_RETRY at the top of finish_delayed... so
     that it's clear that it's about what phase of the conversation
     we're in.

But I'm OK with it as-is, too.

> diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
> index 5e43faeec1..f0dc0aff4a 100644
> --- a/t/t0021/rot13-filter.pl
> +++ b/t/t0021/rot13-filter.pl

I'll admit to skimming the perl changes here, but I didn't see anything
obviously wrong.

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 0/5] convert: add "status=delayed" to filter process protocol
  2017-06-01  8:21 [PATCH v5 0/5] convert: add "status=delayed" to filter process protocol Lars Schneider
                   ` (6 preceding siblings ...)
  2017-06-02  2:06 ` Junio C Hamano
@ 2017-06-24 14:23 ` Jeff King
  7 siblings, 0 replies; 20+ messages in thread
From: Jeff King @ 2017-06-24 14:23 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, gitster, tboegi, e, ttaylorr, peartben

On Thu, Jun 01, 2017 at 10:21:58AM +0200, Lars Schneider wrote:

> here is the 5th iteration of my "status delayed" topic. Patch 1 to 3 are
> minor t0021 test adjustments and haven't been changed since v3. Patch 4
> is a minor "extract method" refactoring in convert. Patch 5 is the new
> feature.

Yeah, patches 1-4 were pretty straightforward and obvious. I just left
some comments after a detailed read-through of patch 5. Overall the
direction looks good. I had a few minor comments on the code, but I
think with those addressed it will probably be ready for 'next'.

Note that I'll be offline all next week, so don't expect a timely
review on the next iteration (unlike this not-so-timely one ;) ).

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 5/5] convert: add "status=delayed" to filter process protocol
  2017-06-24 14:19   ` Jeff King
@ 2017-06-24 17:22     ` Lars Schneider
  2017-06-24 18:51       ` Junio C Hamano
  2017-06-24 20:32       ` Jeff King
  0 siblings, 2 replies; 20+ messages in thread
From: Lars Schneider @ 2017-06-24 17:22 UTC (permalink / raw)
  To: Jeff King; +Cc: git, gitster, tboegi, e, ttaylorr, peartben


> On 24 Jun 2017, at 16:19, Jeff King <peff@peff.net> wrote:
> 
> On Thu, Jun 01, 2017 at 10:22:03AM +0200, Lars Schneider wrote:
> 
>> Some `clean` / `smudge` filters might require a significant amount of
>> time to process a single blob (e.g. the Git LFS smudge filter might
>> perform network requests). During this process the Git checkout
>> operation is blocked and Git needs to wait until the filter is done to
>> continue with the checkout.
>> 
>> Teach the filter process protocol (introduced in edcc858) to accept the
>> status "delayed" as response to a filter request. Upon this response Git
>> continues with the checkout operation. After the checkout operation Git
>> calls "finish_delayed_checkout" which queries the filter for remaining
>> blobs. If the filter is still working on the completion, then the filter
>> is expected to block. If the filter has completed all remaining blobs
>> then an empty response is expected.
>> 
>> Git has a multiple code paths that checkout a blob. Support delayed
>> checkouts only in `clone` (in unpack-trees.c) and `checkout` operations.
> 
> It might be worth giving a reason in this last paragraph. I think the
> reason is "because it's more complicated for the caller, as they have to
> be OK with out-of-order processing and remembering to go back and handle
> the delayed cases".

Correct! However, my real reason was that these code paths process all
files of the tree. Therefore the "out-of-order" processing can be
effective. 

How about this:

Git has a multiple code paths that checkout a blob. Support delayed
checkouts only in `clone` (in unpack-trees.c) and `checkout` operations
for now. The optimization is most effective in these code paths as all 
files of the tree are processed.


> Speaking of which, _are_ we OK with out-of-order processing in things
> like checkout? Certainly we care about deleting items before checking
> out new ones (so getting rid of "foo/bar" to make room for "foo" and
> vice versa).  I guess it's OK as long as the delayed items are always
> filters that check out new items.

Junio noticed that too but thinks we should be OK:
"[...] We checkout removals first to make room so
that creation of a path X can succeed if an existing path X/Y
that used to want to see X as a directory can succeed [...]"

http://public-inbox.org/git/xmqqvavotych.fsf@gitster.mtv.corp.google.com/


>> ...
> 
>> +After Git received the pathnames, it will request the corresponding
>> +blobs again. These requests contain a pathname and an empty content
>> +section. The filter is expected to respond with the smudged content
>> +in the usual way as explained above.
>> +------------------------
>> +packet:          git> command=smudge
>> +packet:          git> pathname=path/testfile.dat
>> +packet:          git> 0000
>> +packet:          git> 0000  # empty content!
>> +packet:          git< status=success
>> +packet:          git< 0000
>> +packet:          git< SMUDGED_CONTENT
>> +packet:          git< 0000
>> +packet:          git< 0000  # empty list, keep "status=success" unchanged!
>> +------------------------
> 
> Out of curiosity, what should happen if we re-ask for a pathname that
> wasn't mentioned in list_available_blobs? I guess it would depend on the
> filter implementation, but probably most would just block (since we
> wouldn't have asked for can-delay). The other option is returning an
> error, I suppose, but blocking and filling the request sounds
> friendlier.

I agree that is up to the filter. I would expect the blocking strategy.
The filter cannot delay it anymore as we have not send the "can-delay"
flag.


> ...
>> diff --git a/cache.h b/cache.h
>> index ae4c45d379..523ead1d95 100644
>> --- a/cache.h
>> +++ b/cache.h
>> @@ -1543,16 +1543,20 @@ extern int ident_cmp(const struct ident_split *, const struct ident_split *);
>> struct checkout {
>> 	struct index_state *istate;
>> 	const char *base_dir;
>> +	struct delayed_checkout *delayed_checkout;
>> 	int base_dir_len;
> 
> Should base_dir_len and base_dir be kept together?
> 
> I suspect you ordered it this way because...
> 
>> 	unsigned force:1,
>> 		 quiet:1,
>> 		 not_new:1,
>> 		 refresh_cache:1;
>> };
>> -#define CHECKOUT_INIT { NULL, "" }
>> +#define CHECKOUT_INIT { NULL, "", NULL }
>> +
> 
> ...you wanted to add the NULL to the initializer here. But it's not
> necessary. Once one element of a struct is initialized, all the
> remaining members are initialized according to the usual static rules
> (so NULL for pointers). We're already using that to zero out
> base_dir_len and the flags (the "" is necessary because we want a real
> empty string, not NULL).
> 
> Since you want NULL for your new member, you can just leave the
> initializer as-is and it will do the right thing.

That was indeed my reasoning. I am no C expert and I wasn't sure about
the initialization. The "Once one element of a struct is initialized"
thing is news to me :-) Thanks!

I'll keep the base_dir* together then!


>> @@ -533,7 +534,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
>> 	if (err)
>> 		goto done;
>> 
>> -	err = packet_writel(process->in, "capability=clean", "capability=smudge", NULL);
>> +	err = packet_writel(process->in,
>> +		"capability=clean", "capability=smudge", "capability=delay", NULL);
> 
> Why do we need to tell the filter we know about delay? Shouldn't it just
> need to tell us that it knows about delay, and then we choose whether to
> ask for can-delay for particular entries?

Because in the protocol I defined that the filter needs to answer with
a strict subset of this list [1]. I thought that this would make the protocol
more future proof/backward compatible. Because the filter is not allowed to
answer with something that Git does not understand.

[1] https://github.com/git/git/blob/5402b1352f5181247405fbff1887008a0cb3b04a/Documentation/gitattributes.txt#L408-L411


> 
>> @@ -647,6 +653,14 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
>> 	if (err)
>> 		goto done;
>> 
>> +	if (CAP_DELAY & entry->supported_capabilities &&
>> +	    dco && dco->state == CE_CAN_DELAY) {
> 
> In a complicated conditional with bit operations, we usually put the bit
> operation in its own parentheses so it's more obvious that it wasn't
> supposed to be "&&". Like:
> 
>  if ((CAP_DELAY & entry->supported_capabilities) &&
>      dco && dco->state == CE_CAN_DELAY))

Agreed!


> The operator precedence is such that it works without them, so this is
> just a style question (I'd also usually put the flags field before the
> flag itself, but that's really getting into aesthetics).

You mean (entry & CAP_DELAY) instead of (CAP_DELAY & entry)?


> The dco check here is basically asking if can-delay is enabled for the
> whole checkout. Would we ever need to mark a specific entry as
> not-delayable? Probably not from the start, but I wondered if something
> might get marked as a failure and need to be retried without delay or
> similar. It's probably not worth going down that road until some caller
> needs it.

Agreed.

> 
>> @@ -662,14 +676,78 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
>> 	if (err)
>> 		goto done;
>> 
>> -	err = strcmp(filter_status.buf, "success");
>> +	if (can_delay && !strcmp(filter_status.buf, "delayed")) {
>> +		dco->state = CE_DELAYED;
>> +		string_list_insert(&dco->filters, cmd);
>> +		string_list_insert(&dco->paths, path);
> 
> These string_list_insert()s have bad worst-case performance, because
> they keep the list sorted. So the worst case is reverse-sorted input,
> where each entry goes at the front of the existing array and we have to
> memmove() the whole array on each insert, and we do O(n^2) byte copies.
> 
> The best case is when the input is already sorted. And I think in
> general that's how we'll iterate over paths to checkout. The cmds are
> more likely to be random, but we're also not likely to have more than a
> handful of them. So this is probably an OK data structure to use, at
> least for now.

Agreed! Up until now I tested the delayed technique only with my small test 
case. It will become really interesting when GitLFS implements this feature.
Then I can test it with our repos that have +20k filtered files. At this point
I will be able to identify the spots that require more optimization.


>> +	for (;;) {
>> +		const char* pre = "pathname=";
>> +		const int pre_len = strlen(pre);
>> +		line = packet_read_line(process->out, NULL);
>> +		if (!line)
>> +			break;
>> +		err = strlen(line) <= pre_len || strncmp(line, pre, pre_len);
>> +		if (err)
>> +			goto done;
>> +		string_list_insert(delayed_paths, xstrdup(line+pre_len));
>> +	}
> 
> I think the prefix matching can be done more simply (and slightly more
> efficiently) with:
> 
>  const char *path;
>  ...
>  err = !skip_prefix(line, "pathname=", &path);
>  if (err)
> 	goto done;
>  string_list_insert(delayed_paths, xstrdup(path));
> 
> I notice we loop here, though. Would we want to be resilient to seeing
> other keys besides pathname? Something like:
> 
>  while ((line = packet_read_line(process->out, NULL)) {
> 	const char *path;
> 	if (skip_prefix(line, "pathname=", &path))
> 		string_list_insert(delayed_paths, xstrdup(path);
> 	else
> 		; /* ignore unknown keys */
>  }
> 
> I can imagine some future where the filter passes back additional
> information about the delayed item. It would be nice if older versions
> of Git would just quietly ignore that if they don't know how to handle
> it.

I agree with all the above. skip_prefix is neat! Although it's so
dominant in the source code I never noticed it.


>> +int finish_delayed_checkout(struct checkout *state)
>> +{
>> +	int errs = 0;
>> +	struct string_list_item *filter, *path;
>> +	struct delayed_checkout *dco = state->delayed_checkout;
>> +
>> +	if (!state->delayed_checkout) {
>> +		return errs;
>> +	}
> 
> Style: we'd usually omit the braces here.

Sure! I'll remove it.

(This is the only rule of the Git style guide that
don't agree with :-) 

> 
>> +			if (available_paths.nr <= 0) {
>> +				/* Filter responded with no entries. That means
>> +				   the filter is done and we can remove the
>> +				   filter from the list (see
>> +				   "string_list_remove_empty_items" call below).
>> +				*/
> 
> This is true because we know the filter would block until it had at
> least one item to return.
> 
> It does seem to put a lot of faith in the filter. What happens if we hit
> this conditional and we know that we still have entries in dco->paths to
> be handled?

We check this later as you noticed below.


>> +			/* In dco->paths we store a list of all delayed paths.
>> +			   The filter just send us a list of available paths.
>> +			   Remove them from the list.
>> +			*/
>> +			filter_string_list(&dco->paths, 0,
>> +				&remove_available_paths, &available_paths);
> 
> This is O(m log n), which is fine. Since both lists are sorted, we could
> technically do it in an O(m+n) merge, but I don't think we have a
> function to do that easily.
> 
> Sorry to harp on run-times, but it's one place I think it would be easy
> to subtly screw this up, so I'm trying to pay close attention.

No worries at all! Thanks for thinking it through on that level.
As I wrote above - I plan to really profile it under "real life conditions"
as soon as we have full GitLFS support.


>> ...
> 
>> +	/* At this point we should not have any delayed paths anymore. */
>> +	errs |= dco->paths.nr;
> 
> And I guess this answers my question above about a filter with no
> entries. When we bail from the loop, we'll hit this point and complain.
> 
> We probably should call string_list_clear(&dco->paths) here to avoid
> leaking the paths on error (possibly we should also print them? Or maybe
> they'd be warned about elsewhere).

Agreed, I will clear the list. I'll also print the items because as far
as I can see they are not printed elsewhere.

Git just says:
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

How about this?

	errs |= dco->paths.nr;
	for_each_string_list_item(path, &dco->paths) {
		warning("%s was not processed properly.", path->string);
	}
	string_list_clear(&dco->paths, 0);

The output would be:

warning: test-delay10.a was not processed properly.
warning: test-delay10.b was not processed properly.
warning: test-delay11.a was not processed properly.
warning: test-delay20.a was not processed properly.
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

I contemplated about the warning text.
"$FILE was not filtered properly." is technical more
correct but maybe it would confuse the user?

Plus, at this point I don't really know what filter should
have given me the files. That's why I can't print it in 
the warning message.

> 
>> @@ -179,11 +254,36 @@ static int write_entry(struct cache_entry *ce,
>> 		/*
>> 		 * Convert from git internal format to working tree format
>> 		 */
>> -		if (ce_mode_s_ifmt == S_IFREG &&
>> -		    convert_to_working_tree(ce->name, new, size, &buf)) {
>> -			free(new);
>> -			new = strbuf_detach(&buf, &newsize);
>> -			size = newsize;
>> +		if (ce_mode_s_ifmt == S_IFREG) {
>> +			struct delayed_checkout *dco = state->delayed_checkout;
>> +			if (dco && dco->state != CE_NO_DELAY) {
>> +				/* Do not send the blob in case of a retry. */
>> +				if (dco->state == CE_RETRY) {
>> +					new = NULL;
>> +					size = 0;
>> +				}
>> +				ret = async_convert_to_working_tree(
>> +					ce->name, new, size, &buf, dco);
>> +				if (ret && dco->state == CE_DELAYED) {
>> +					free(new);
>> +					/* Reset the state of the next blob */
>> +					dco->state = CE_CAN_DELAY;
>> +					goto finish;
>> +				}
> 
> Hmm. This "reset the state" bit at the end surprised me. I guess it's
> not wrong, but it goes against the mental model I had formed above. ;)
> 
> We really are using dco->state as a per-entry state flag. It just
> happens to be in a persistent shared struct. I don't think it's wrong,
> it was mostly just surprising. I don't know if it's worth trying to
> simplify, but I think you could do it by:
> 
>  1. Passing back the "was delayed" state from async_convert... in the
>     return value or via a separate out-parameter.

In the beginning I had it implemented that way. But that meant that I
had to pass two variables through the entire convert stack:

async_convert_to_working_tree
-> convert_to_working_tree_internal
--> apply_filter
---> apply_multi_file_filter

> 
>  2. Setting dco->state to CE_RETRY at the top of finish_delayed... so
>     that it's clear that it's about what phase of the conversation
>     we're in.

I could do that. However, I thought it is safer to set the state *before*
every checkout operation in case convert.c messes with this field (it
should not in this phase).


> But I'm OK with it as-is, too.

I'll try 2.


>> diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
>> index 5e43faeec1..f0dc0aff4a 100644
>> --- a/t/t0021/rot13-filter.pl
>> +++ b/t/t0021/rot13-filter.pl
> 
> I'll admit to skimming the perl changes here, but I didn't see anything
> obviously wrong.


Thanks a lot for the review,
Lars


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 5/5] convert: add "status=delayed" to filter process protocol
  2017-06-24 17:22     ` Lars Schneider
@ 2017-06-24 18:51       ` Junio C Hamano
  2017-06-24 20:36         ` Jeff King
  2017-06-24 20:32       ` Jeff King
  1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2017-06-24 18:51 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Jeff King, git, tboegi, e, ttaylorr, peartben

Lars Schneider <larsxschneider@gmail.com> writes:

>> On 24 Jun 2017, at 16:19, Jeff King <peff@peff.net> wrote:
>> 
>> Speaking of which, _are_ we OK with out-of-order processing in things
>> like checkout? Certainly we care about deleting items before checking
>> out new ones (so getting rid of "foo/bar" to make room for "foo" and
>> vice versa).  I guess it's OK as long as the delayed items are always
>> filters that check out new items.
>
> Junio noticed that too but thinks we should be OK:
> "[...] We checkout removals first to make room so
> that creation of a path X can succeed if an existing path X/Y
> that used to want to see X as a directory can succeed [...]"
>
> http://public-inbox.org/git/xmqqvavotych.fsf@gitster.mtv.corp.google.com/

In which I said

   ...  I
   think "remove all and then create" you do not specifically have
   to worry about with the proposed change, but you may need to
   inspect and verify there aren't other kind of order dependency.

Yes, I think having two separate loops in the caller can help
guaranteeing this, as long as the delayed items are always filters
that check out new things.  It will break once you have delayed
removals but I do not see how such a thing would be necessary ;-)

But there may be other kinds of order dependency--I didn't look for
them.

>>> @@ -647,6 +653,14 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
>>> 	if (err)
>>> 		goto done;
>>> 
>>> +	if (CAP_DELAY & entry->supported_capabilities &&
>>> +	    dco && dco->state == CE_CAN_DELAY) {
>> 
>> In a complicated conditional with bit operations, we usually put the bit
>> operation in its own parentheses so it's more obvious that it wasn't
>> supposed to be "&&". Like:
>> 
>>  if ((CAP_DELAY & entry->supported_capabilities) &&
>>      dco && dco->state == CE_CAN_DELAY))
>
> Agreed!

Why wasn't this caught earlier?  I thought this is something gcc warns about.

>> The operator precedence is such that it works without them, so this is
>> just a style question (I'd also usually put the flags field before the
>> flag itself, but that's really getting into aesthetics).
>
> You mean (entry & CAP_DELAY) instead of (CAP_DELAY & entry)?

Peff is continuing his explanation why (A & B && C) is technically
correct and preferring ((A & B) && C) is purely stylistic.  "A & B"
binds tighter than "something && C" which means that (A & B && C)
cannot be misinterpreted as (A & (B && C)).


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 5/5] convert: add "status=delayed" to filter process protocol
  2017-06-24 17:22     ` Lars Schneider
  2017-06-24 18:51       ` Junio C Hamano
@ 2017-06-24 20:32       ` Jeff King
  1 sibling, 0 replies; 20+ messages in thread
From: Jeff King @ 2017-06-24 20:32 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, gitster, tboegi, e, ttaylorr, peartben

On Sat, Jun 24, 2017 at 07:22:40PM +0200, Lars Schneider wrote:

> > It might be worth giving a reason in this last paragraph. I think the
> > reason is "because it's more complicated for the caller, as they have to
> > be OK with out-of-order processing and remembering to go back and handle
> > the delayed cases".
> 
> Correct! However, my real reason was that these code paths process all
> files of the tree. Therefore the "out-of-order" processing can be
> effective. 
> 
> How about this:
> 
> Git has a multiple code paths that checkout a blob. Support delayed
> checkouts only in `clone` (in unpack-trees.c) and `checkout` operations
> for now. The optimization is most effective in these code paths as all 
> files of the tree are processed.

Sounds good.

> > Why do we need to tell the filter we know about delay? Shouldn't it just
> > need to tell us that it knows about delay, and then we choose whether to
> > ask for can-delay for particular entries?
> 
> Because in the protocol I defined that the filter needs to answer with
> a strict subset of this list [1]. I thought that this would make the protocol
> more future proof/backward compatible. Because the filter is not allowed to
> answer with something that Git does not understand.
> 
> [1] https://github.com/git/git/blob/5402b1352f5181247405fbff1887008a0cb3b04a/Documentation/gitattributes.txt#L408-L411

OK. That makes sense, then.

> > The operator precedence is such that it works without them, so this is
> > just a style question (I'd also usually put the flags field before the
> > flag itself, but that's really getting into aesthetics).
> 
> You mean (entry & CAP_DELAY) instead of (CAP_DELAY & entry)?

Yes, exactly.

> How about this?
> 
> 	errs |= dco->paths.nr;
> 	for_each_string_list_item(path, &dco->paths) {
> 		warning("%s was not processed properly.", path->string);
> 	}
> 	string_list_clear(&dco->paths, 0);
> 
> The output would be:
> 
> warning: test-delay10.a was not processed properly.
> warning: test-delay10.b was not processed properly.
> warning: test-delay11.a was not processed properly.
> warning: test-delay20.a was not processed properly.
> fatal: unable to checkout working tree
> warning: Clone succeeded, but checkout failed.
> You can inspect what was checked out with 'git status'
> and retry the checkout with 'git checkout -f HEAD'

I think it may make sense to use something more specific than
"processed". The user might not even be thinking about filters during
their operation. It would be really nice if we could mention the name of
the filter. As you noted, we don't have it here but I wonder how hard it
would be. Anyway, I'm OK with leaving it more vague for now.

> I contemplated about the warning text.
> "$FILE was not filtered properly." is technical more
> correct but maybe it would confuse the user?

I like it better because "filter" is a word the user might associate
with the filter feature. Whereas "processed" is vague and could mean
many things.

> > Hmm. This "reset the state" bit at the end surprised me. I guess it's
> > not wrong, but it goes against the mental model I had formed above. ;)
> > 
> > We really are using dco->state as a per-entry state flag. It just
> > happens to be in a persistent shared struct. I don't think it's wrong,
> > it was mostly just surprising. I don't know if it's worth trying to
> > simplify, but I think you could do it by:
> > 
> >  1. Passing back the "was delayed" state from async_convert... in the
> >     return value or via a separate out-parameter.
> 
> In the beginning I had it implemented that way. But that meant that I
> had to pass two variables through the entire convert stack:
> 
> async_convert_to_working_tree
> -> convert_to_working_tree_internal
> --> apply_filter
> ---> apply_multi_file_filter

Right, I see. I wonder if just a comment in the definition of the
delayed_checkout struct would make it more clear exactly how we expect
the member to be used.

> >  2. Setting dco->state to CE_RETRY at the top of finish_delayed... so
> >     that it's clear that it's about what phase of the conversation
> >     we're in.
> 
> I could do that. However, I thought it is safer to set the state *before*
> every checkout operation in case convert.c messes with this field (it
> should not in this phase).
>
> > But I'm OK with it as-is, too.
> 
> I'll try 2.

I think you'd have to do (1) and (2) together. But if it causes pain, I
think the comment I suggested above may be the simplest way to go.

> Thanks a lot for the review,

You're welcome. The bits of your response I didn't quote all made sense
to me.

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 5/5] convert: add "status=delayed" to filter process protocol
  2017-06-24 18:51       ` Junio C Hamano
@ 2017-06-24 20:36         ` Jeff King
  0 siblings, 0 replies; 20+ messages in thread
From: Jeff King @ 2017-06-24 20:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Lars Schneider, git, tboegi, e, ttaylorr, peartben

On Sat, Jun 24, 2017 at 11:51:49AM -0700, Junio C Hamano wrote:

> >>  if ((CAP_DELAY & entry->supported_capabilities) &&
> >>      dco && dco->state == CE_CAN_DELAY))
> >
> > Agreed!
> 
> Why wasn't this caught earlier?  I thought this is something gcc warns about.

I thought so, too. If it warned about:

  if (A & B)

that would probably be too annoying. But:

  if (A & B && C)

is much more questionable. I wonder if it used to exist and gcc dropped
it (or dropped it from -Wall). -Wlogical-op seems like the most likely
candidate, but it does not catch it (and it has a false positive in
handle_nonblock, or perhaps I just can't see the problem).

> >> The operator precedence is such that it works without them, so this is
> >> just a style question (I'd also usually put the flags field before the
> >> flag itself, but that's really getting into aesthetics).
> >
> > You mean (entry & CAP_DELAY) instead of (CAP_DELAY & entry)?
> 
> Peff is continuing his explanation why (A & B && C) is technically
> correct and preferring ((A & B) && C) is purely stylistic.  "A & B"
> binds tighter than "something && C" which means that (A & B && C)
> cannot be misinterpreted as (A & (B && C)).

I actually meant both. The bitwise operator binds tighter so it's OK
either way. But I would write "flags & MY_FLAG" and never "MY_FLAG &
flags".

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-06-24 20:36 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-01  8:21 [PATCH v5 0/5] convert: add "status=delayed" to filter process protocol Lars Schneider
2017-06-01  8:21 ` [PATCH v5 1/5] t0021: keep filter log files on comparison Lars Schneider
2017-06-01  8:22 ` [PATCH v5 2/5] t0021: make debug log file name configurable Lars Schneider
2017-06-01  8:22 ` [PATCH v5 3/5] t0021: write "OUT" only on success Lars Schneider
2017-06-01  8:22 ` [PATCH v5 4/5] convert: move multiple file filter error handling to separate function Lars Schneider
2017-06-18  7:20   ` Torsten Bögershausen
2017-06-18 11:47     ` Lars Schneider
2017-06-19 17:18       ` Torsten Bögershausen
2017-06-19 17:47         ` Lars Schneider
2017-06-01  8:22 ` [PATCH v5 5/5] convert: add "status=delayed" to filter process protocol Lars Schneider
2017-06-02  2:21   ` Junio C Hamano
2017-06-05 11:36     ` Lars Schneider
2017-06-24 14:19   ` Jeff King
2017-06-24 17:22     ` Lars Schneider
2017-06-24 18:51       ` Junio C Hamano
2017-06-24 20:36         ` Jeff King
2017-06-24 20:32       ` Jeff King
2017-06-01  9:44 ` [PATCH v5 0/5] " Junio C Hamano
2017-06-02  2:06 ` Junio C Hamano
2017-06-24 14:23 ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).