git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v7 0/6] convert: add "status=delayed" to filter process protocol
@ 2017-06-27 12:10 Lars Schneider
  2017-06-27 12:10 ` [PATCH v7 1/6] t0021: keep filter log files on comparison Lars Schneider
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Lars Schneider @ 2017-06-27 12:10 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben

Hi,

here is the 7th iteration of my "status delayed" topic. Patch 1 to 3 are
minor t0021 test adjustments. Patch 4 is a minor convert.c style adjustment.
Patch 5 is a minor "extract method" refactoring in convert.c with an
additional minor style adjustment. Patch 6 is the new feature.

Changes since v6 (full inter diff below!):
* delayed_checkout->state was used for two purposes:
  (1) Signal the filter if a cache entry can be delayed
  (2) Retrieve info from the filter if a cache entry actually was delayed
  --> That double use is confusing. To make it more clear I removed the
      ce_delay_state "CE_DELAYED" and added the member "is_delayed" to
      delayed_checkout. The filter can use that member to communicate back
      to Git that a cache entry was delayed (Junio/Peff)
* throw an error if the filter sends Git a path in async_query_available_blobs
  that was not delayed before (Junio)
* add test cases for the following "delay related" errors:
  (1) The filter misses to signal the availability of a file
  (2) The filter signals the availability of a blob that was not delayed before
* format comments properly (Junio)
* use STRING_LIST_INIT_NODUP (Junio)
* various commit message improvements (Junio)
* documentation improvements (Junio)
* variable naming improvements (Junio)
* remove unnecessary empty line (Junio)

If you review this series then please read the "Delay" section in
"Documentation/gitattributes.txt" first for an overview of the delay mechanism.
The changes in "t/t0021/rot13-filter.pl" are easier to review if you ignore
whitespace changes.

Thanks,
Lars


RFC: http://public-inbox.org/git/D10F7C47-14E8-465B-8B7A-A09A1B28A39F@gmail.com/
v1: http://public-inbox.org/git/20170108191736.47359-1-larsxschneider@gmail.com/
v2: http://public-inbox.org/git/20170226184816.30010-1-larsxschneider@gmail.com/
v3: http://public-inbox.org/git/20170409191107.20547-1-larsxschneider@gmail.com/
v4: http://public-inbox.org/git/20170522135001.54506-1-larsxschneider@gmail.com/
v5: http://public-inbox.org/git/20170601082203.50397-1-larsxschneider@gmail.com/
v6: http://public-inbox.org/git/20170625182125.6741-1-larsxschneider@gmail.com/


Base Ref: master
Web-Diff: https://github.com/larsxschneider/git/commit/5b124cdf43
Checkout: git fetch https://github.com/larsxschneider/git filter-process/delay-v7 && git checkout 5b124cdf43

Interdiff (v6..v7):

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 5489e8b723..4049a0b9a8 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -539,13 +539,17 @@ packet:          git< 0000

 If the filter supports the "delay" capability then it must support the
 "list_available_blobs" command. If Git sends this command, then the
-filter is expected to return a list of pathnames of blobs that are
-available. The list must be terminated with a flush packet followed
+filter is expected to return a list of pathnames representing blobs
+that have been delayed earlier and are now available.
+The list must be terminated with a flush packet followed
 by a "success" status that is also terminated with a flush packet. If
 no blobs for the delayed paths are available, yet, then the filter is
 expected to block the response until at least one blob becomes
 available. The filter can tell Git that it has no more delayed blobs
-by sending an empty list.
+by sending an empty list. As soon as the filter responds with an empty
+list, Git stops asking. All blobs that Git has not received at this
+point are considered missing and will result in an error.
+
 ------------------------
 packet:          git> command=list_available_blobs
 packet:          git> 0000
diff --git a/cache.h b/cache.h
index 2ec12c4477..69b03b5dc7 100644
--- a/cache.h
+++ b/cache.h
@@ -1552,7 +1552,6 @@ struct checkout {
 };
 #define CHECKOUT_INIT { NULL, "" }

-
 #define TEMPORARY_FILENAME_LENGTH 25
 extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
 extern void enable_delayed_checkout(struct checkout *state);
diff --git a/convert.c b/convert.c
index 6452ab546a..f17d822ac8 100644
--- a/convert.c
+++ b/convert.c
@@ -677,7 +677,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 		goto done;

 	if (can_delay && !strcmp(filter_status.buf, "delayed")) {
-		dco->state = CE_DELAYED;
+		dco->is_delayed = 1;
 		string_list_insert(&dco->filters, cmd);
 		string_list_insert(&dco->paths, path);
 	} else {
@@ -709,7 +709,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 }


-int async_query_available_blobs(const char *cmd, struct string_list *delayed_paths)
+int async_query_available_blobs(const char *cmd, struct string_list *available_paths)
 {
 	int err;
 	char *line;
@@ -739,7 +739,7 @@ int async_query_available_blobs(const char *cmd, struct string_list *delayed_pat
 	while ((line = packet_read_line(process->out, NULL))) {
 		const char *path;
 		if (skip_prefix(line, "pathname=", &path))
-			string_list_insert(delayed_paths, xstrdup(path));
+			string_list_insert(available_paths, xstrdup(path));
 		else
 			; /* ignore unknown keys */
 	}
diff --git a/convert.h b/convert.h
index 51f0fb7fbe..cdb91ab99a 100644
--- a/convert.h
+++ b/convert.h
@@ -37,16 +37,18 @@ enum eol {
 enum ce_delay_state {
 	CE_NO_DELAY = 0,
 	CE_CAN_DELAY = 1,
-	CE_DELAYED = 2,
-	CE_RETRY = 3
+	CE_RETRY = 2
 };

 struct delayed_checkout {
 	/* State of the currently processed cache entry. If the state is
-	CE_CAN_DELAY, then the filter can change this to CE_DELAYED. If
-	the state is CE_RETRY, then this signals the filter that the cache
-	entry was requested before. */
+	 * CE_CAN_DELAY, then the filter can change the 'is_delayed' flag
+	 * to signal that the current cache entry was delayed. If the state is
+	 * CE_RETRY, then this signals the filter that the cache entry was
+	 * requested before.
+	 */
 	enum ce_delay_state state;
+	int is_delayed;
 	/* List of filter drivers that signaled delayed blobs. */
 	struct string_list filters;
 	/* List of delayed blobs identified by their path. */
@@ -66,7 +68,7 @@ extern int convert_to_working_tree(const char *path, const char *src,
 extern int async_convert_to_working_tree(const char *path, const char *src,
 					 size_t len, struct strbuf *dst,
 					 void *dco);
-extern int async_query_available_blobs(const char *cmd, struct string_list *delayed_paths);
+extern int async_query_available_blobs(const char *cmd, struct string_list *available_paths);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
 			      struct strbuf *dst);
 static inline int would_convert_to_git(const char *path)
diff --git a/entry.c b/entry.c
index c6d5cc01dc..8406060e33 100644
--- a/entry.c
+++ b/entry.c
@@ -150,7 +150,12 @@ void enable_delayed_checkout(struct checkout *state)
 static int remove_available_paths(struct string_list_item *item, void *cb_data)
 {
 	struct string_list *available_paths = cb_data;
-	return !string_list_has_string(available_paths, item->string);
+	struct string_list_item *available;
+
+	available = string_list_lookup(available_paths, item->string);
+	if (available)
+		available->util = (void *)item->string;
+	return !available;
 }

 int finish_delayed_checkout(struct checkout *state)
@@ -165,8 +170,7 @@ int finish_delayed_checkout(struct checkout *state)
 	dco->state = CE_RETRY;
 	while (dco->filters.nr > 0) {
 		for_each_string_list_item(filter, &dco->filters) {
-			struct string_list available_paths;
-			string_list_init(&available_paths, 0);
+			struct string_list available_paths = STRING_LIST_INIT_NODUP;

 			if (!async_query_available_blobs(filter->string, &available_paths)) {
 				/* Filter reported an error */
@@ -176,24 +180,38 @@ int finish_delayed_checkout(struct checkout *state)
 			}
 			if (available_paths.nr <= 0) {
 				/* Filter responded with no entries. That means
-				   the filter is done and we can remove the
-				   filter from the list (see
-				   "string_list_remove_empty_items" call below).
+				 * the filter is done and we can remove the
+				 * filter from the list (see
+				 * "string_list_remove_empty_items" call below).
 				 */
 				filter->string = "";
 				continue;
 			}

 			/* In dco->paths we store a list of all delayed paths.
-			   The filter just send us a list of available paths.
-			   Remove them from the list.
+			 * The filter just send us a list of available paths.
+			 * Remove them from the list.
 			 */
 			filter_string_list(&dco->paths, 0,
 				&remove_available_paths, &available_paths);

 			for_each_string_list_item(path, &available_paths) {
-				struct cache_entry* ce = index_file_exists(
-					state->istate, path->string,
+				struct cache_entry* ce;
+
+				if (!path->util) {
+					error("external filter '%s' signaled that '%s' "
+					      "is now available although it has not been "
+					      "delayed earlier",
+					      filter->string, path->string);
+					errs |= 1;
+
+					/* Do not ask the filter for available blobs,
+					 * again, as the filter is likely buggy.
+					 */
+					filter->string = "";
+					continue;
+				}
+				ce = index_file_exists(state->istate, path->string,
 						       strlen(path->string), 0);
 				assert(dco->state == CE_RETRY);
 				errs |= (ce ? checkout_entry(ce, state, NULL) : 1);
@@ -206,7 +224,7 @@ int finish_delayed_checkout(struct checkout *state)
 	/* At this point we should not have any delayed paths anymore. */
 	errs |= dco->paths.nr;
 	for_each_string_list_item(path, &dco->paths) {
-		warning(_("%s was not filtered properly."), path->string);
+		error("'%s' was not filtered properly", path->string);
 	}
 	string_list_clear(&dco->paths, 0);

@@ -266,12 +284,11 @@ static int write_entry(struct cache_entry *ce,
 					new = NULL;
 					size = 0;
 				}
+				dco->is_delayed = 0;
 				ret = async_convert_to_working_tree(
 					ce->name, new, size, &buf, dco);
-				if (ret && dco->state == CE_DELAYED) {
+				if (ret && dco->is_delayed) {
 					free(new);
-					/* Reset the state of the next blob */
-					dco->state = CE_CAN_DELAY;
 					goto finish;
 				}
 			} else
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 4b5a45fd43..eb3d83744a 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -720,7 +720,7 @@ test_expect_success PERL 'delayed checkout in process filter' '
 		cp "$TEST_ROOT/test.o" test-delay20.a &&
 		cp "$TEST_ROOT/test.o" test-delay10.b &&
 		git add . &&
-		git commit -m "test commit 1"
+		git commit -m "test commit"
 	) &&

 	S=$(file_size "$TEST_ROOT/test.o") &&
@@ -775,4 +775,46 @@ test_expect_success PERL 'delayed checkout in process filter' '
 	)
 '

+test_expect_success PERL 'missing file in delayed checkout' '
+	test_config_global filter.bug.process "rot13-filter.pl bug.log clean smudge delay" &&
+	test_config_global filter.bug.required true &&
+
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+		echo "*.a filter=bug" >.gitattributes &&
+		cp "$TEST_ROOT/test.o" missing-delay.a
+		git add . &&
+		git commit -m "test commit"
+	) &&
+
+	rm -rf repo-cloned &&
+	test_must_fail git clone repo repo-cloned 2>git-stderr.log &&
+	cat git-stderr.log &&
+	grep "error: .missing-delay\.a. was not filtered properly" git-stderr.log
+'
+
+test_expect_success PERL 'invalid file in delayed checkout' '
+	test_config_global filter.bug.process "rot13-filter.pl bug.log clean smudge delay" &&
+	test_config_global filter.bug.required true &&
+
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+		echo "*.a filter=bug" >.gitattributes &&
+		cp "$TEST_ROOT/test.o" invalid-delay.a &&
+		cp "$TEST_ROOT/test.o" unfiltered
+		git add . &&
+		git commit -m "test commit"
+	) &&
+
+	rm -rf repo-cloned &&
+	test_must_fail git clone repo repo-cloned 2>git-stderr.log &&
+	grep "error: external filter .* signaled that .unfiltered. is now available although it has not been delayed earlier" git-stderr.log
+'
+
 test_done
diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index f0dc0aff4a..ad685d92f8 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -19,10 +19,15 @@
 #     to process the file and any file after that is processed with the
 #     same command.
 # (5) If data with a pathname that is a key in the DELAY hash is
-#     requested (e.g. 'test-delay10.a') then the filter responds with
+#     requested (e.g. "test-delay10.a") then the filter responds with
 #     a "delay" status and sets the "requested" field in the DELAY hash.
 #     The filter will signal the availability of this object after
 #     "count" (field in DELAY hash) "list_available_blobs" commands.
+# (6) If data with the pathname "missing-delay.a" is processed that the
+#     filter will drop the path from the "list_available_blobs" response.
+# (7) If data with the pathname "invalid-delay.a" is processed that the
+#     filter will add the path "unfiltered" which was not delayed before
+#     to the "list_available_blobs" response.
 #

 use strict;
@@ -40,6 +45,8 @@ my %DELAY = (
 	'test-delay11.a' => { "requested" => 0, "count" => 1 },
 	'test-delay20.a' => { "requested" => 0, "count" => 2 },
 	'test-delay10.b' => { "requested" => 0, "count" => 1 },
+	'missing-delay.a' => { "requested" => 0, "count" => 1 },
+	'invalid-delay.a' => { "requested" => 0, "count" => 1 },
 );

 sub rot13 {
@@ -135,7 +142,13 @@ while (1) {
 		foreach my $pathname ( sort keys %DELAY ) {
 			if ( $DELAY{$pathname}{"requested"} >= 1 ) {
 				$DELAY{$pathname}{"count"} = $DELAY{$pathname}{"count"} - 1;
-				if ($DELAY{$pathname}{"count"} == 0 ) {
+				if ( $pathname eq "invalid-delay.a" ) {
+					# Send Git a pathname that was not delayed earlier
+					packet_txt_write("pathname=unfiltered");
+				}
+				if ( $pathname eq "missing-delay.a" ) {
+					# Do not signal Git that this file is available
+				} elsif ( $DELAY{$pathname}{"count"} == 0 ) {
 					print $debug " $pathname";
 					packet_txt_write("pathname=$pathname");
 				}


Lars Schneider (6):
  t0021: keep filter log files on comparison
  t0021: make debug log file name configurable
  t0021: write "OUT <size>" only on success
  convert: put the flags field before the flag itself for consistent
    style
  convert: move multiple file filter error handling to separate function
  convert: add "status=delayed" to filter process protocol

 Documentation/gitattributes.txt |  69 ++++++++++++-
 builtin/checkout.c              |   3 +
 cache.h                         |   3 +
 convert.c                       | 168 +++++++++++++++++++++++--------
 convert.h                       |  27 +++++
 entry.c                         | 131 +++++++++++++++++++++++-
 t/t0021-conversion.sh           | 178 +++++++++++++++++++++++++++------
 t/t0021/rot13-filter.pl         | 214 +++++++++++++++++++++++++++-------------
 unpack-trees.c                  |   2 +
 9 files changed, 645 insertions(+), 150 deletions(-)


base-commit: 0339965c70d68fd3831c9a5306443c869de3f6a8
--
2.13.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v7 1/6] t0021: keep filter log files on comparison
  2017-06-27 12:10 [PATCH v7 0/6] convert: add "status=delayed" to filter process protocol Lars Schneider
@ 2017-06-27 12:10 ` Lars Schneider
  2017-06-27 12:10 ` [PATCH v7 2/6] t0021: make debug log file name configurable Lars Schneider
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Lars Schneider @ 2017-06-27 12:10 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben

The filter log files are modified on comparison. That might be
unexpected by the caller. It would be even undesirable if the caller
wants to reuse the original log files.

Address these issues by using temp files for modifications. This is
useful for the subsequent patch 'convert: add "status=delayed" to
filter process protocol'.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 t/t0021-conversion.sh | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 161f560446..ff2424225b 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -42,10 +42,10 @@ test_cmp_count () {
 	for FILE in "$expect" "$actual"
 	do
 		sort "$FILE" | uniq -c |
-		sed -e "s/^ *[0-9][0-9]*[ 	]*IN: /x IN: /" >"$FILE.tmp" &&
-		mv "$FILE.tmp" "$FILE" || return
+		sed -e "s/^ *[0-9][0-9]*[ 	]*IN: /x IN: /" >"$FILE.tmp"
 	done &&
-	test_cmp "$expect" "$actual"
+	test_cmp "$expect.tmp" "$actual.tmp" &&
+	rm "$expect.tmp" "$actual.tmp"
 }
 
 # Compare two files but exclude all `clean` invocations because Git can
@@ -56,10 +56,10 @@ test_cmp_exclude_clean () {
 	actual=$2
 	for FILE in "$expect" "$actual"
 	do
-		grep -v "IN: clean" "$FILE" >"$FILE.tmp" &&
-		mv "$FILE.tmp" "$FILE"
+		grep -v "IN: clean" "$FILE" >"$FILE.tmp"
 	done &&
-	test_cmp "$expect" "$actual"
+	test_cmp "$expect.tmp" "$actual.tmp" &&
+	rm "$expect.tmp" "$actual.tmp"
 }
 
 # Check that the contents of two files are equal and that their rot13 version
-- 
2.13.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v7 2/6] t0021: make debug log file name configurable
  2017-06-27 12:10 [PATCH v7 0/6] convert: add "status=delayed" to filter process protocol Lars Schneider
  2017-06-27 12:10 ` [PATCH v7 1/6] t0021: keep filter log files on comparison Lars Schneider
@ 2017-06-27 12:10 ` Lars Schneider
  2017-06-27 12:10 ` [PATCH v7 3/6] t0021: write "OUT <size>" only on success Lars Schneider
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Lars Schneider @ 2017-06-27 12:10 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben

The "rot13-filter.pl" helper wrote its debug logs always to "rot13-filter.log".
Make this configurable by defining the log file as first parameter of
"rot13-filter.pl".

This is useful if "rot13-filter.pl" is configured multiple times similar to the
subsequent patch 'convert: add "status=delayed" to filter process protocol'.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 t/t0021-conversion.sh   | 44 ++++++++++++++++++++++----------------------
 t/t0021/rot13-filter.pl |  8 +++++---
 2 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index ff2424225b..0139b460e7 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -28,7 +28,7 @@ file_size () {
 }
 
 filter_git () {
-	rm -f rot13-filter.log &&
+	rm -f *.log &&
 	git "$@"
 }
 
@@ -342,7 +342,7 @@ test_expect_success 'diff does not reuse worktree files that need cleaning' '
 '
 
 test_expect_success PERL 'required process filter should filter data' '
-	test_config_global filter.protocol.process "rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" &&
 	test_config_global filter.protocol.required true &&
 	rm -rf repo &&
 	mkdir repo &&
@@ -375,7 +375,7 @@ test_expect_success PERL 'required process filter should filter data' '
 			IN: clean testsubdir/test3 '\''sq'\'',\$x=.r $S3 [OK] -- OUT: $S3 . [OK]
 			STOP
 		EOF
-		test_cmp_count expected.log rot13-filter.log &&
+		test_cmp_count expected.log debug.log &&
 
 		git commit -m "test commit 2" &&
 		rm -f test2.r "testsubdir/test3 '\''sq'\'',\$x=.r" &&
@@ -388,7 +388,7 @@ test_expect_success PERL 'required process filter should filter data' '
 			IN: smudge testsubdir/test3 '\''sq'\'',\$x=.r $S3 [OK] -- OUT: $S3 . [OK]
 			STOP
 		EOF
-		test_cmp_exclude_clean expected.log rot13-filter.log &&
+		test_cmp_exclude_clean expected.log debug.log &&
 
 		filter_git checkout --quiet --no-progress empty-branch &&
 		cat >expected.log <<-EOF &&
@@ -397,7 +397,7 @@ test_expect_success PERL 'required process filter should filter data' '
 			IN: clean test.r $S [OK] -- OUT: $S . [OK]
 			STOP
 		EOF
-		test_cmp_exclude_clean expected.log rot13-filter.log &&
+		test_cmp_exclude_clean expected.log debug.log &&
 
 		filter_git checkout --quiet --no-progress master &&
 		cat >expected.log <<-EOF &&
@@ -409,7 +409,7 @@ test_expect_success PERL 'required process filter should filter data' '
 			IN: smudge testsubdir/test3 '\''sq'\'',\$x=.r $S3 [OK] -- OUT: $S3 . [OK]
 			STOP
 		EOF
-		test_cmp_exclude_clean expected.log rot13-filter.log &&
+		test_cmp_exclude_clean expected.log debug.log &&
 
 		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.r &&
 		test_cmp_committed_rot13 "$TEST_ROOT/test2.o" test2.r &&
@@ -419,7 +419,7 @@ test_expect_success PERL 'required process filter should filter data' '
 
 test_expect_success PERL 'required process filter takes precedence' '
 	test_config_global filter.protocol.clean false &&
-	test_config_global filter.protocol.process "rot13-filter.pl clean" &&
+	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean" &&
 	test_config_global filter.protocol.required true &&
 	rm -rf repo &&
 	mkdir repo &&
@@ -439,12 +439,12 @@ test_expect_success PERL 'required process filter takes precedence' '
 			IN: clean test.r $S [OK] -- OUT: $S . [OK]
 			STOP
 		EOF
-		test_cmp_count expected.log rot13-filter.log
+		test_cmp_count expected.log debug.log
 	)
 '
 
 test_expect_success PERL 'required process filter should be used only for "clean" operation only' '
-	test_config_global filter.protocol.process "rot13-filter.pl clean" &&
+	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean" &&
 	rm -rf repo &&
 	mkdir repo &&
 	(
@@ -462,7 +462,7 @@ test_expect_success PERL 'required process filter should be used only for "clean
 			IN: clean test.r $S [OK] -- OUT: $S . [OK]
 			STOP
 		EOF
-		test_cmp_count expected.log rot13-filter.log &&
+		test_cmp_count expected.log debug.log &&
 
 		rm test.r &&
 
@@ -474,12 +474,12 @@ test_expect_success PERL 'required process filter should be used only for "clean
 			init handshake complete
 			STOP
 		EOF
-		test_cmp_exclude_clean expected.log rot13-filter.log
+		test_cmp_exclude_clean expected.log debug.log
 	)
 '
 
 test_expect_success PERL 'required process filter should process multiple packets' '
-	test_config_global filter.protocol.process "rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" &&
 	test_config_global filter.protocol.required true &&
 
 	rm -rf repo &&
@@ -514,7 +514,7 @@ test_expect_success PERL 'required process filter should process multiple packet
 			IN: clean 3pkt_2+1.file $(($S*2+1)) [OK] -- OUT: $(($S*2+1)) ... [OK]
 			STOP
 		EOF
-		test_cmp_count expected.log rot13-filter.log &&
+		test_cmp_count expected.log debug.log &&
 
 		rm -f *.file &&
 
@@ -529,7 +529,7 @@ test_expect_success PERL 'required process filter should process multiple packet
 			IN: smudge 3pkt_2+1.file $(($S*2+1)) [OK] -- OUT: $(($S*2+1)) ... [OK]
 			STOP
 		EOF
-		test_cmp_exclude_clean expected.log rot13-filter.log &&
+		test_cmp_exclude_clean expected.log debug.log &&
 
 		for FILE in *.file
 		do
@@ -539,7 +539,7 @@ test_expect_success PERL 'required process filter should process multiple packet
 '
 
 test_expect_success PERL 'required process filter with clean error should fail' '
-	test_config_global filter.protocol.process "rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" &&
 	test_config_global filter.protocol.required true &&
 	rm -rf repo &&
 	mkdir repo &&
@@ -558,7 +558,7 @@ test_expect_success PERL 'required process filter with clean error should fail'
 '
 
 test_expect_success PERL 'process filter should restart after unexpected write failure' '
-	test_config_global filter.protocol.process "rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" &&
 	rm -rf repo &&
 	mkdir repo &&
 	(
@@ -579,7 +579,7 @@ test_expect_success PERL 'process filter should restart after unexpected write f
 		git add . &&
 		rm -f *.r &&
 
-		rm -f rot13-filter.log &&
+		rm -f debug.log &&
 		git checkout --quiet --no-progress . 2>git-stderr.log &&
 
 		grep "smudge write error at" git-stderr.log &&
@@ -595,7 +595,7 @@ test_expect_success PERL 'process filter should restart after unexpected write f
 			IN: smudge test2.r $S2 [OK] -- OUT: $S2 . [OK]
 			STOP
 		EOF
-		test_cmp_exclude_clean expected.log rot13-filter.log &&
+		test_cmp_exclude_clean expected.log debug.log &&
 
 		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.r &&
 		test_cmp_committed_rot13 "$TEST_ROOT/test2.o" test2.r &&
@@ -609,7 +609,7 @@ test_expect_success PERL 'process filter should restart after unexpected write f
 '
 
 test_expect_success PERL 'process filter should not be restarted if it signals an error' '
-	test_config_global filter.protocol.process "rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" &&
 	rm -rf repo &&
 	mkdir repo &&
 	(
@@ -639,7 +639,7 @@ test_expect_success PERL 'process filter should not be restarted if it signals a
 			IN: smudge test2.r $S2 [OK] -- OUT: $S2 . [OK]
 			STOP
 		EOF
-		test_cmp_exclude_clean expected.log rot13-filter.log &&
+		test_cmp_exclude_clean expected.log debug.log &&
 
 		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.r &&
 		test_cmp_committed_rot13 "$TEST_ROOT/test2.o" test2.r &&
@@ -648,7 +648,7 @@ test_expect_success PERL 'process filter should not be restarted if it signals a
 '
 
 test_expect_success PERL 'process filter abort stops processing of all further files' '
-	test_config_global filter.protocol.process "rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" &&
 	rm -rf repo &&
 	mkdir repo &&
 	(
@@ -676,7 +676,7 @@ test_expect_success PERL 'process filter abort stops processing of all further f
 			IN: smudge abort.r $SA [OK] -- OUT: 0 [ABORT]
 			STOP
 		EOF
-		test_cmp_exclude_clean expected.log rot13-filter.log &&
+		test_cmp_exclude_clean expected.log debug.log &&
 
 		test_cmp "$TEST_ROOT/test.o" test.r &&
 		test_cmp "$TEST_ROOT/test2.o" test2.r &&
diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index 617f581e56..0b943bb377 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -2,8 +2,9 @@
 # Example implementation for the Git filter protocol version 2
 # See Documentation/gitattributes.txt, section "Filter Protocol"
 #
-# The script takes the list of supported protocol capabilities as
-# arguments ("clean", "smudge", etc).
+# The first argument defines a debug log file that the script write to.
+# All remaining arguments define a list of supported protocol
+# capabilities ("clean", "smudge", etc).
 #
 # This implementation supports special test cases:
 # (1) If data with the pathname "clean-write-fail.r" is processed with
@@ -24,9 +25,10 @@ use warnings;
 use IO::File;
 
 my $MAX_PACKET_CONTENT_SIZE = 65516;
+my $log_file                = shift @ARGV;
 my @capabilities            = @ARGV;
 
-open my $debug, ">>", "rot13-filter.log" or die "cannot open log file: $!";
+open my $debug, ">>", $log_file or die "cannot open log file: $!";
 
 sub rot13 {
 	my $str = shift;
-- 
2.13.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v7 3/6] t0021: write "OUT <size>" only on success
  2017-06-27 12:10 [PATCH v7 0/6] convert: add "status=delayed" to filter process protocol Lars Schneider
  2017-06-27 12:10 ` [PATCH v7 1/6] t0021: keep filter log files on comparison Lars Schneider
  2017-06-27 12:10 ` [PATCH v7 2/6] t0021: make debug log file name configurable Lars Schneider
@ 2017-06-27 12:10 ` Lars Schneider
  2017-06-27 18:44   ` Junio C Hamano
  2017-06-27 12:10 ` [PATCH v7 4/6] convert: put the flags field before the flag itself for consistent style Lars Schneider
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Lars Schneider @ 2017-06-27 12:10 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben

"rot13-filter.pl" always writes "OUT <size>" to the debug log at the end
of a response.

This works without issues for the existing responses "abort", "error",
and "success". A new response "delayed", that will be introduced in a
subsequent patch, accepts the input without giving the filtered result
right away. Since we actually have the data already available in our
mock filter the debug log output would be wrong/misleading. Therefore,
we do not write "OUT <size>" for "delayed" responses.

To simplify the code we do not write "OUT <size>" for "abort" and
"error" responses, too, as their size is always zero.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 t/t0021-conversion.sh   | 6 +++---
 t/t0021/rot13-filter.pl | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 0139b460e7..0c04d346a1 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -588,7 +588,7 @@ test_expect_success PERL 'process filter should restart after unexpected write f
 		cat >expected.log <<-EOF &&
 			START
 			init handshake complete
-			IN: smudge smudge-write-fail.r $SF [OK] -- OUT: $SF [WRITE FAIL]
+			IN: smudge smudge-write-fail.r $SF [OK] -- [WRITE FAIL]
 			START
 			init handshake complete
 			IN: smudge test.r $S [OK] -- OUT: $S . [OK]
@@ -634,7 +634,7 @@ test_expect_success PERL 'process filter should not be restarted if it signals a
 		cat >expected.log <<-EOF &&
 			START
 			init handshake complete
-			IN: smudge error.r $SE [OK] -- OUT: 0 [ERROR]
+			IN: smudge error.r $SE [OK] -- [ERROR]
 			IN: smudge test.r $S [OK] -- OUT: $S . [OK]
 			IN: smudge test2.r $S2 [OK] -- OUT: $S2 . [OK]
 			STOP
@@ -673,7 +673,7 @@ test_expect_success PERL 'process filter abort stops processing of all further f
 		cat >expected.log <<-EOF &&
 			START
 			init handshake complete
-			IN: smudge abort.r $SA [OK] -- OUT: 0 [ABORT]
+			IN: smudge abort.r $SA [OK] -- [ABORT]
 			STOP
 		EOF
 		test_cmp_exclude_clean expected.log debug.log &&
diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index 0b943bb377..5e43faeec1 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -153,9 +153,6 @@ while (1) {
 		die "bad command '$command'";
 	}
 
-	print $debug "OUT: " . length($output) . " ";
-	$debug->flush();
-
 	if ( $pathname eq "error.r" ) {
 		print $debug "[ERROR]\n";
 		$debug->flush();
@@ -178,6 +175,9 @@ while (1) {
 			die "${command} write error";
 		}
 
+		print $debug "OUT: " . length($output) . " ";
+		$debug->flush();
+
 		while ( length($output) > 0 ) {
 			my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE );
 			packet_bin_write($packet);
-- 
2.13.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v7 4/6] convert: put the flags field before the flag itself for consistent style
  2017-06-27 12:10 [PATCH v7 0/6] convert: add "status=delayed" to filter process protocol Lars Schneider
                   ` (2 preceding siblings ...)
  2017-06-27 12:10 ` [PATCH v7 3/6] t0021: write "OUT <size>" only on success Lars Schneider
@ 2017-06-27 12:10 ` Lars Schneider
  2017-06-27 12:10 ` [PATCH v7 5/6] convert: move multiple file filter error handling to separate function Lars Schneider
  2017-06-27 12:10 ` [PATCH v7 6/6] convert: add "status=delayed" to filter process protocol Lars Schneider
  5 siblings, 0 replies; 15+ messages in thread
From: Lars Schneider @ 2017-06-27 12:10 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 convert.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/convert.c b/convert.c
index f1e168bc30..9907e3b9ba 100644
--- a/convert.c
+++ b/convert.c
@@ -597,12 +597,12 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	}
 	process = &entry->subprocess.process;
 
-	if (!(wanted_capability & entry->supported_capabilities))
+	if (!(entry->supported_capabilities & wanted_capability))
 		return 0;
 
-	if (CAP_CLEAN & wanted_capability)
+	if (wanted_capability & CAP_CLEAN)
 		filter_type = "clean";
-	else if (CAP_SMUDGE & wanted_capability)
+	else if (wanted_capability & CAP_SMUDGE)
 		filter_type = "smudge";
 	else
 		die("unexpected filter type");
@@ -703,9 +703,9 @@ static int apply_filter(const char *path, const char *src, size_t len,
 	if (!dst)
 		return 1;
 
-	if ((CAP_CLEAN & wanted_capability) && !drv->process && drv->clean)
+	if ((wanted_capability & CAP_CLEAN) && !drv->process && drv->clean)
 		cmd = drv->clean;
-	else if ((CAP_SMUDGE & wanted_capability) && !drv->process && drv->smudge)
+	else if ((wanted_capability & CAP_SMUDGE) && !drv->process && drv->smudge)
 		cmd = drv->smudge;
 
 	if (cmd && *cmd)
-- 
2.13.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v7 5/6] convert: move multiple file filter error handling to separate function
  2017-06-27 12:10 [PATCH v7 0/6] convert: add "status=delayed" to filter process protocol Lars Schneider
                   ` (3 preceding siblings ...)
  2017-06-27 12:10 ` [PATCH v7 4/6] convert: put the flags field before the flag itself for consistent style Lars Schneider
@ 2017-06-27 12:10 ` Lars Schneider
  2017-06-27 12:10 ` [PATCH v7 6/6] convert: add "status=delayed" to filter process protocol Lars Schneider
  5 siblings, 0 replies; 15+ messages in thread
From: Lars Schneider @ 2017-06-27 12:10 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben

Refactoring the filter error handling is useful for the subsequent patch
'convert: add "status=delayed" to filter process protocol'.

In addition, replace the parentheses around the empty "if" block with a
single semicolon to adhere to the Git style guide.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 convert.c | 47 ++++++++++++++++++++++++++---------------------
 1 file changed, 26 insertions(+), 21 deletions(-)

diff --git a/convert.c b/convert.c
index 9907e3b9ba..e55c034d86 100644
--- a/convert.c
+++ b/convert.c
@@ -565,6 +565,29 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
 	return err;
 }
 
+static void handle_filter_error(const struct strbuf *filter_status,
+				struct cmd2process *entry,
+				const unsigned int wanted_capability) {
+	if (!strcmp(filter_status->buf, "error"))
+		; /* The filter signaled a problem with the file. */
+	else if (!strcmp(filter_status->buf, "abort") && wanted_capability) {
+		/*
+		 * The filter signaled a permanent problem. Don't try to filter
+		 * files with the same command for the lifetime of the current
+		 * Git process.
+		 */
+		 entry->supported_capabilities &= ~wanted_capability;
+	} else {
+		/*
+		 * Something went wrong with the protocol filter.
+		 * Force shutdown and restart if another blob requires filtering.
+		 */
+		error("external filter '%s' failed", entry->subprocess.cmd);
+		subprocess_stop(&subprocess_map, &entry->subprocess);
+		free(entry);
+	}
+}
+
 static int apply_multi_file_filter(const char *path, const char *src, size_t len,
 				   int fd, struct strbuf *dst, const char *cmd,
 				   const unsigned int wanted_capability)
@@ -656,28 +679,10 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 done:
 	sigchain_pop(SIGPIPE);
 
-	if (err) {
-		if (!strcmp(filter_status.buf, "error")) {
-			/* The filter signaled a problem with the file. */
-		} else if (!strcmp(filter_status.buf, "abort")) {
-			/*
-			 * The filter signaled a permanent problem. Don't try to filter
-			 * files with the same command for the lifetime of the current
-			 * Git process.
-			 */
-			 entry->supported_capabilities &= ~wanted_capability;
-		} else {
-			/*
-			 * Something went wrong with the protocol filter.
-			 * Force shutdown and restart if another blob requires filtering.
-			 */
-			error("external filter '%s' failed", cmd);
-			subprocess_stop(&subprocess_map, &entry->subprocess);
-			free(entry);
-		}
-	} else {
+	if (err)
+		handle_filter_error(&filter_status, entry, wanted_capability);
+	else
 		strbuf_swap(dst, &nbuf);
-	}
 	strbuf_release(&nbuf);
 	return !err;
 }
-- 
2.13.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v7 6/6] convert: add "status=delayed" to filter process protocol
  2017-06-27 12:10 [PATCH v7 0/6] convert: add "status=delayed" to filter process protocol Lars Schneider
                   ` (4 preceding siblings ...)
  2017-06-27 12:10 ` [PATCH v7 5/6] convert: move multiple file filter error handling to separate function Lars Schneider
@ 2017-06-27 12:10 ` Lars Schneider
  2017-06-27 19:00   ` Junio C Hamano
  5 siblings, 1 reply; 15+ messages in thread
From: Lars Schneider @ 2017-06-27 12:10 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, tboegi, e, ttaylorr, peartben

Some `clean` / `smudge` filters may require a significant amount of
time to process a single blob (e.g. the Git LFS smudge filter might
perform network requests). During this process the Git checkout
operation is blocked and Git needs to wait until the filter is done to
continue with the checkout.

Teach the filter process protocol, introduced in edcc8581 ("convert: add
filter.<driver>.process option", 2016-10-16), to accept the status
"delayed" as response to a filter request. Upon this response Git
continues with the checkout operation. After the checkout operation Git
calls "finish_delayed_checkout" which queries the filter for remaining
blobs. If the filter is still working on the completion, then the filter
is expected to block. If the filter has completed all remaining blobs
then an empty response is expected.

Git has a multiple code paths that checkout a blob. Support delayed
checkouts only in `clone` (in unpack-trees.c) and `checkout` operations
for now. The optimization is most effective in these code paths as all
files of the tree are processed.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 Documentation/gitattributes.txt |  69 +++++++++++++-
 builtin/checkout.c              |   3 +
 cache.h                         |   3 +
 convert.c                       | 115 ++++++++++++++++++----
 convert.h                       |  27 ++++++
 entry.c                         | 131 +++++++++++++++++++++++++-
 t/t0021-conversion.sh           | 116 +++++++++++++++++++++++
 t/t0021/rot13-filter.pl         | 204 +++++++++++++++++++++++++++-------------
 unpack-trees.c                  |   2 +
 9 files changed, 579 insertions(+), 91 deletions(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 4736483865..4049a0b9a8 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -425,8 +425,8 @@ packet:          git< capability=clean
 packet:          git< capability=smudge
 packet:          git< 0000
 ------------------------
-Supported filter capabilities in version 2 are "clean" and
-"smudge".
+Supported filter capabilities in version 2 are "clean", "smudge",
+and "delay".
 
 Afterwards Git sends a list of "key=value" pairs terminated with
 a flush packet. The list will contain at least the filter command
@@ -512,12 +512,73 @@ the protocol then Git will stop the filter process and restart it
 with the next file that needs to be processed. Depending on the
 `filter.<driver>.required` flag Git will interpret that as error.
 
-After the filter has processed a blob it is expected to wait for
-the next "key=value" list containing a command. Git will close
+After the filter has processed a command it is expected to wait for
+a "key=value" list containing the next command. Git will close
 the command pipe on exit. The filter is expected to detect EOF
 and exit gracefully on its own. Git will wait until the filter
 process has stopped.
 
+Delay
+^^^^^
+
+If the filter supports the "delay" capability, then Git can send the
+flag "can-delay" after the filter command and pathname. This flag
+denotes that the filter can delay filtering the current blob (e.g. to
+compensate network latencies) by responding with no content but with
+the status "delayed" and a flush packet.
+------------------------
+packet:          git> command=smudge
+packet:          git> pathname=path/testfile.dat
+packet:          git> can-delay=1
+packet:          git> 0000
+packet:          git> CONTENT
+packet:          git> 0000
+packet:          git< status=delayed
+packet:          git< 0000
+------------------------
+
+If the filter supports the "delay" capability then it must support the
+"list_available_blobs" command. If Git sends this command, then the
+filter is expected to return a list of pathnames representing blobs
+that have been delayed earlier and are now available.
+The list must be terminated with a flush packet followed
+by a "success" status that is also terminated with a flush packet. If
+no blobs for the delayed paths are available, yet, then the filter is
+expected to block the response until at least one blob becomes
+available. The filter can tell Git that it has no more delayed blobs
+by sending an empty list. As soon as the filter responds with an empty
+list, Git stops asking. All blobs that Git has not received at this
+point are considered missing and will result in an error.
+
+------------------------
+packet:          git> command=list_available_blobs
+packet:          git> 0000
+packet:          git< pathname=path/testfile.dat
+packet:          git< pathname=path/otherfile.dat
+packet:          git< 0000
+packet:          git< status=success
+packet:          git< 0000
+------------------------
+
+After Git received the pathnames, it will request the corresponding
+blobs again. These requests contain a pathname and an empty content
+section. The filter is expected to respond with the smudged content
+in the usual way as explained above.
+------------------------
+packet:          git> command=smudge
+packet:          git> pathname=path/testfile.dat
+packet:          git> 0000
+packet:          git> 0000  # empty content!
+packet:          git< status=success
+packet:          git< 0000
+packet:          git< SMUDGED_CONTENT
+packet:          git< 0000
+packet:          git< 0000  # empty list, keep "status=success" unchanged!
+------------------------
+
+Example
+^^^^^^^
+
 A long running filter demo implementation can be found in
 `contrib/long-running-filter/example.pl` located in the Git
 core repository. If you develop your own long running filter
diff --git a/builtin/checkout.c b/builtin/checkout.c
index a6b2af39d3..c1a256df8d 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -376,6 +376,8 @@ static int checkout_paths(const struct checkout_opts *opts,
 	state.force = 1;
 	state.refresh_cache = 1;
 	state.istate = &the_index;
+
+	enable_delayed_checkout(&state);
 	for (pos = 0; pos < active_nr; pos++) {
 		struct cache_entry *ce = active_cache[pos];
 		if (ce->ce_flags & CE_MATCHED) {
@@ -390,6 +392,7 @@ static int checkout_paths(const struct checkout_opts *opts,
 			pos = skip_same_name(ce, pos) - 1;
 		}
 	}
+	errs |= finish_delayed_checkout(&state);
 
 	if (write_locked_index(&the_index, lock_file, COMMIT_LOCK))
 		die(_("unable to write new index file"));
diff --git a/cache.h b/cache.h
index ae4c45d379..69b03b5dc7 100644
--- a/cache.h
+++ b/cache.h
@@ -1544,6 +1544,7 @@ struct checkout {
 	struct index_state *istate;
 	const char *base_dir;
 	int base_dir_len;
+	struct delayed_checkout *delayed_checkout;
 	unsigned force:1,
 		 quiet:1,
 		 not_new:1,
@@ -1553,6 +1554,8 @@ struct checkout {
 
 #define TEMPORARY_FILENAME_LENGTH 25
 extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
+extern void enable_delayed_checkout(struct checkout *state);
+extern int finish_delayed_checkout(struct checkout *state);
 
 struct cache_def {
 	struct strbuf path;
diff --git a/convert.c b/convert.c
index e55c034d86..f17d822ac8 100644
--- a/convert.c
+++ b/convert.c
@@ -496,6 +496,7 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le
 
 #define CAP_CLEAN    (1u<<0)
 #define CAP_SMUDGE   (1u<<1)
+#define CAP_DELAY    (1u<<2)
 
 struct cmd2process {
 	struct subprocess_entry subprocess; /* must be the first member! */
@@ -533,7 +534,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
 	if (err)
 		goto done;
 
-	err = packet_writel(process->in, "capability=clean", "capability=smudge", NULL);
+	err = packet_writel(process->in,
+		"capability=clean", "capability=smudge", "capability=delay", NULL);
 
 	for (;;) {
 		cap_buf = packet_read_line(process->out, NULL);
@@ -549,6 +551,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
 			entry->supported_capabilities |= CAP_CLEAN;
 		} else if (!strcmp(cap_name, "smudge")) {
 			entry->supported_capabilities |= CAP_SMUDGE;
+		} else if (!strcmp(cap_name, "delay")) {
+			entry->supported_capabilities |= CAP_DELAY;
 		} else {
 			warning(
 				"external filter '%s' requested unsupported filter capability '%s'",
@@ -590,9 +594,11 @@ static void handle_filter_error(const struct strbuf *filter_status,
 
 static int apply_multi_file_filter(const char *path, const char *src, size_t len,
 				   int fd, struct strbuf *dst, const char *cmd,
-				   const unsigned int wanted_capability)
+				   const unsigned int wanted_capability,
+				   struct delayed_checkout *dco)
 {
 	int err;
+	int can_delay = 0;
 	struct cmd2process *entry;
 	struct child_process *process;
 	struct strbuf nbuf = STRBUF_INIT;
@@ -647,6 +653,14 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	if (err)
 		goto done;
 
+	if ((entry->supported_capabilities & CAP_DELAY) &&
+	    dco && dco->state == CE_CAN_DELAY) {
+		can_delay = 1;
+		err = packet_write_fmt_gently(process->in, "can-delay=1\n");
+		if (err)
+			goto done;
+	}
+
 	err = packet_flush_gently(process->in);
 	if (err)
 		goto done;
@@ -662,14 +676,74 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	if (err)
 		goto done;
 
-	err = strcmp(filter_status.buf, "success");
+	if (can_delay && !strcmp(filter_status.buf, "delayed")) {
+		dco->is_delayed = 1;
+		string_list_insert(&dco->filters, cmd);
+		string_list_insert(&dco->paths, path);
+	} else {
+		/* The filter got the blob and wants to send us a response. */
+		err = strcmp(filter_status.buf, "success");
+		if (err)
+			goto done;
+
+		err = read_packetized_to_strbuf(process->out, &nbuf) < 0;
+		if (err)
+			goto done;
+
+		err = subprocess_read_status(process->out, &filter_status);
+		if (err)
+			goto done;
+
+		err = strcmp(filter_status.buf, "success");
+	}
+
+done:
+	sigchain_pop(SIGPIPE);
+
+	if (err)
+		handle_filter_error(&filter_status, entry, wanted_capability);
+	else
+		strbuf_swap(dst, &nbuf);
+	strbuf_release(&nbuf);
+	return !err;
+}
+
+
+int async_query_available_blobs(const char *cmd, struct string_list *available_paths)
+{
+	int err;
+	char *line;
+	struct cmd2process *entry;
+	struct child_process *process;
+	struct strbuf filter_status = STRBUF_INIT;
+
+	assert(subprocess_map_initialized);
+	entry = (struct cmd2process *)subprocess_find_entry(&subprocess_map, cmd);
+	if (!entry) {
+		error("external filter '%s' is not available anymore although "
+		      "not all paths have been filtered", cmd);
+		return 0;
+	}
+	process = &entry->subprocess.process;
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+	err = packet_write_fmt_gently(
+		process->in, "command=list_available_blobs\n");
 	if (err)
 		goto done;
 
-	err = read_packetized_to_strbuf(process->out, &nbuf) < 0;
+	err = packet_flush_gently(process->in);
 	if (err)
 		goto done;
 
+	while ((line = packet_read_line(process->out, NULL))) {
+		const char *path;
+		if (skip_prefix(line, "pathname=", &path))
+			string_list_insert(available_paths, xstrdup(path));
+		else
+			; /* ignore unknown keys */
+	}
+
 	err = subprocess_read_status(process->out, &filter_status);
 	if (err)
 		goto done;
@@ -680,10 +754,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	sigchain_pop(SIGPIPE);
 
 	if (err)
-		handle_filter_error(&filter_status, entry, wanted_capability);
-	else
-		strbuf_swap(dst, &nbuf);
-	strbuf_release(&nbuf);
+		handle_filter_error(&filter_status, entry, 0);
 	return !err;
 }
 
@@ -698,7 +769,8 @@ static struct convert_driver {
 
 static int apply_filter(const char *path, const char *src, size_t len,
 			int fd, struct strbuf *dst, struct convert_driver *drv,
-			const unsigned int wanted_capability)
+			const unsigned int wanted_capability,
+			struct delayed_checkout *dco)
 {
 	const char *cmd = NULL;
 
@@ -716,7 +788,8 @@ static int apply_filter(const char *path, const char *src, size_t len,
 	if (cmd && *cmd)
 		return apply_single_file_filter(path, src, len, fd, dst, cmd);
 	else if (drv->process && *drv->process)
-		return apply_multi_file_filter(path, src, len, fd, dst, drv->process, wanted_capability);
+		return apply_multi_file_filter(path, src, len, fd, dst,
+			drv->process, wanted_capability, dco);
 
 	return 0;
 }
@@ -1057,7 +1130,7 @@ int would_convert_to_git_filter_fd(const char *path)
 	if (!ca.drv->required)
 		return 0;
 
-	return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN);
+	return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN, NULL);
 }
 
 const char *get_convert_attr_ascii(const char *path)
@@ -1094,7 +1167,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
 
 	convert_attrs(&ca, path);
 
-	ret |= apply_filter(path, src, len, -1, dst, ca.drv, CAP_CLEAN);
+	ret |= apply_filter(path, src, len, -1, dst, ca.drv, CAP_CLEAN, NULL);
 	if (!ret && ca.drv && ca.drv->required)
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
@@ -1119,7 +1192,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
 	assert(ca.drv);
 	assert(ca.drv->clean || ca.drv->process);
 
-	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv, CAP_CLEAN))
+	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv, CAP_CLEAN, NULL))
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
 	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
@@ -1128,7 +1201,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
 
 static int convert_to_working_tree_internal(const char *path, const char *src,
 					    size_t len, struct strbuf *dst,
-					    int normalizing)
+					    int normalizing, struct delayed_checkout *dco)
 {
 	int ret = 0, ret_filter = 0;
 	struct conv_attrs ca;
@@ -1153,21 +1226,29 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
 		}
 	}
 
-	ret_filter = apply_filter(path, src, len, -1, dst, ca.drv, CAP_SMUDGE);
+	ret_filter = apply_filter(
+		path, src, len, -1, dst, ca.drv, CAP_SMUDGE, dco);
 	if (!ret_filter && ca.drv && ca.drv->required)
 		die("%s: smudge filter %s failed", path, ca.drv->name);
 
 	return ret | ret_filter;
 }
 
+int async_convert_to_working_tree(const char *path, const char *src,
+				  size_t len, struct strbuf *dst,
+				  void *dco)
+{
+	return convert_to_working_tree_internal(path, src, len, dst, 0, dco);
+}
+
 int convert_to_working_tree(const char *path, const char *src, size_t len, struct strbuf *dst)
 {
-	return convert_to_working_tree_internal(path, src, len, dst, 0);
+	return convert_to_working_tree_internal(path, src, len, dst, 0, NULL);
 }
 
 int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst)
 {
-	int ret = convert_to_working_tree_internal(path, src, len, dst, 1);
+	int ret = convert_to_working_tree_internal(path, src, len, dst, 1, NULL);
 	if (ret) {
 		src = dst->buf;
 		len = dst->len;
diff --git a/convert.h b/convert.h
index 82871a11d5..cdb91ab99a 100644
--- a/convert.h
+++ b/convert.h
@@ -4,6 +4,8 @@
 #ifndef CONVERT_H
 #define CONVERT_H
 
+#include "string-list.h"
+
 enum safe_crlf {
 	SAFE_CRLF_FALSE = 0,
 	SAFE_CRLF_FAIL = 1,
@@ -32,6 +34,27 @@ enum eol {
 #endif
 };
 
+enum ce_delay_state {
+	CE_NO_DELAY = 0,
+	CE_CAN_DELAY = 1,
+	CE_RETRY = 2
+};
+
+struct delayed_checkout {
+	/* State of the currently processed cache entry. If the state is
+	 * CE_CAN_DELAY, then the filter can change the 'is_delayed' flag
+	 * to signal that the current cache entry was delayed. If the state is
+	 * CE_RETRY, then this signals the filter that the cache entry was
+	 * requested before.
+	 */
+	enum ce_delay_state state;
+	int is_delayed;
+	/* List of filter drivers that signaled delayed blobs. */
+	struct string_list filters;
+	/* List of delayed blobs identified by their path. */
+	struct string_list paths;
+};
+
 extern enum eol core_eol;
 extern const char *get_cached_convert_stats_ascii(const char *path);
 extern const char *get_wt_convert_stats_ascii(const char *path);
@@ -42,6 +65,10 @@ extern int convert_to_git(const char *path, const char *src, size_t len,
 			  struct strbuf *dst, enum safe_crlf checksafe);
 extern int convert_to_working_tree(const char *path, const char *src,
 				   size_t len, struct strbuf *dst);
+extern int async_convert_to_working_tree(const char *path, const char *src,
+					 size_t len, struct strbuf *dst,
+					 void *dco);
+extern int async_query_available_blobs(const char *cmd, struct string_list *available_paths);
 extern int renormalize_buffer(const char *path, const char *src, size_t len,
 			      struct strbuf *dst);
 static inline int would_convert_to_git(const char *path)
diff --git a/entry.c b/entry.c
index d6b263f78e..8406060e33 100644
--- a/entry.c
+++ b/entry.c
@@ -137,6 +137,103 @@ static int streaming_write_entry(const struct cache_entry *ce, char *path,
 	return result;
 }
 
+void enable_delayed_checkout(struct checkout *state)
+{
+	if (!state->delayed_checkout) {
+		state->delayed_checkout = xmalloc(sizeof(*state->delayed_checkout));
+		state->delayed_checkout->state = CE_CAN_DELAY;
+		string_list_init(&state->delayed_checkout->filters, 0);
+		string_list_init(&state->delayed_checkout->paths, 0);
+	}
+}
+
+static int remove_available_paths(struct string_list_item *item, void *cb_data)
+{
+	struct string_list *available_paths = cb_data;
+	struct string_list_item *available;
+
+	available = string_list_lookup(available_paths, item->string);
+	if (available)
+		available->util = (void *)item->string;
+	return !available;
+}
+
+int finish_delayed_checkout(struct checkout *state)
+{
+	int errs = 0;
+	struct string_list_item *filter, *path;
+	struct delayed_checkout *dco = state->delayed_checkout;
+
+	if (!state->delayed_checkout)
+		return errs;
+
+	dco->state = CE_RETRY;
+	while (dco->filters.nr > 0) {
+		for_each_string_list_item(filter, &dco->filters) {
+			struct string_list available_paths = STRING_LIST_INIT_NODUP;
+
+			if (!async_query_available_blobs(filter->string, &available_paths)) {
+				/* Filter reported an error */
+				errs = 1;
+				filter->string = "";
+				continue;
+			}
+			if (available_paths.nr <= 0) {
+				/* Filter responded with no entries. That means
+				 * the filter is done and we can remove the
+				 * filter from the list (see
+				 * "string_list_remove_empty_items" call below).
+				 */
+				filter->string = "";
+				continue;
+			}
+
+			/* In dco->paths we store a list of all delayed paths.
+			 * The filter just send us a list of available paths.
+			 * Remove them from the list.
+			 */
+			filter_string_list(&dco->paths, 0,
+				&remove_available_paths, &available_paths);
+
+			for_each_string_list_item(path, &available_paths) {
+				struct cache_entry* ce;
+
+				if (!path->util) {
+					error("external filter '%s' signaled that '%s' "
+					      "is now available although it has not been "
+					      "delayed earlier",
+					      filter->string, path->string);
+					errs |= 1;
+
+					/* Do not ask the filter for available blobs,
+					 * again, as the filter is likely buggy.
+					 */
+					filter->string = "";
+					continue;
+				}
+				ce = index_file_exists(state->istate, path->string,
+						       strlen(path->string), 0);
+				assert(dco->state == CE_RETRY);
+				errs |= (ce ? checkout_entry(ce, state, NULL) : 1);
+			}
+		}
+		string_list_remove_empty_items(&dco->filters, 0);
+	}
+	string_list_clear(&dco->filters, 0);
+
+	/* At this point we should not have any delayed paths anymore. */
+	errs |= dco->paths.nr;
+	for_each_string_list_item(path, &dco->paths) {
+		error("'%s' was not filtered properly", path->string);
+	}
+	string_list_clear(&dco->paths, 0);
+
+	free(dco);
+	state->delayed_checkout = NULL;
+
+	return errs;
+}
+
 static int write_entry(struct cache_entry *ce,
 		       char *path, const struct checkout *state, int to_tempfile)
 {
@@ -179,11 +276,35 @@ static int write_entry(struct cache_entry *ce,
 		/*
 		 * Convert from git internal format to working tree format
 		 */
-		if (ce_mode_s_ifmt == S_IFREG &&
-		    convert_to_working_tree(ce->name, new, size, &buf)) {
-			free(new);
-			new = strbuf_detach(&buf, &newsize);
-			size = newsize;
+		if (ce_mode_s_ifmt == S_IFREG) {
+			struct delayed_checkout *dco = state->delayed_checkout;
+			if (dco && dco->state != CE_NO_DELAY) {
+				/* Do not send the blob in case of a retry. */
+				if (dco->state == CE_RETRY) {
+					new = NULL;
+					size = 0;
+				}
+				dco->is_delayed = 0;
+				ret = async_convert_to_working_tree(
+					ce->name, new, size, &buf, dco);
+				if (ret && dco->is_delayed) {
+					free(new);
+					goto finish;
+				}
+			} else
+				ret = convert_to_working_tree(
+					ce->name, new, size, &buf);
+
+			if (ret) {
+				free(new);
+				new = strbuf_detach(&buf, &newsize);
+				size = newsize;
+			}
+			/*
+			 * No "else" here as errors from convert are OK at this
+			 * point. If the error would have been fatal (e.g.
+			 * filter is required), then we would have died already.
+			 */
 		}
 
 		fd = open_output_fd(path, ce, to_tempfile);
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 0c04d346a1..eb3d83744a 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -701,4 +701,120 @@ test_expect_success PERL 'invalid process filter must fail (and not hang!)' '
 	)
 '
 
+test_expect_success PERL 'delayed checkout in process filter' '
+	test_config_global filter.a.process "rot13-filter.pl a.log clean smudge delay" &&
+	test_config_global filter.a.required true &&
+	test_config_global filter.b.process "rot13-filter.pl b.log clean smudge delay" &&
+	test_config_global filter.b.required true &&
+
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+		echo "*.a filter=a" >.gitattributes &&
+		echo "*.b filter=b" >>.gitattributes &&
+		cp "$TEST_ROOT/test.o" test.a &&
+		cp "$TEST_ROOT/test.o" test-delay10.a &&
+		cp "$TEST_ROOT/test.o" test-delay11.a &&
+		cp "$TEST_ROOT/test.o" test-delay20.a &&
+		cp "$TEST_ROOT/test.o" test-delay10.b &&
+		git add . &&
+		git commit -m "test commit"
+	) &&
+
+	S=$(file_size "$TEST_ROOT/test.o") &&
+	cat >a.exp <<-EOF &&
+		START
+		init handshake complete
+		IN: smudge test.a $S [OK] -- OUT: $S . [OK]
+		IN: smudge test-delay10.a $S [OK] -- [DELAYED]
+		IN: smudge test-delay11.a $S [OK] -- [DELAYED]
+		IN: smudge test-delay20.a $S [OK] -- [DELAYED]
+		IN: list_available_blobs test-delay10.a test-delay11.a [OK]
+		IN: smudge test-delay10.a 0 [OK] -- OUT: $S . [OK]
+		IN: smudge test-delay11.a 0 [OK] -- OUT: $S . [OK]
+		IN: list_available_blobs test-delay20.a [OK]
+		IN: smudge test-delay20.a 0 [OK] -- OUT: $S . [OK]
+		IN: list_available_blobs [OK]
+		STOP
+	EOF
+	cat >b.exp <<-EOF &&
+		START
+		init handshake complete
+		IN: smudge test-delay10.b $S [OK] -- [DELAYED]
+		IN: list_available_blobs test-delay10.b [OK]
+		IN: smudge test-delay10.b 0 [OK] -- OUT: $S . [OK]
+		IN: list_available_blobs [OK]
+		STOP
+	EOF
+
+	rm -rf repo-cloned &&
+	filter_git clone repo repo-cloned &&
+	test_cmp_count a.exp repo-cloned/a.log &&
+	test_cmp_count b.exp repo-cloned/b.log &&
+
+	(
+		cd repo-cloned &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.a &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay10.a &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay11.a &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay20.a &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay10.b &&
+
+		rm *.a *.b &&
+		filter_git checkout . &&
+		test_cmp_count ../a.exp a.log &&
+		test_cmp_count ../b.exp b.log &&
+
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.a &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay10.a &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay11.a &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay20.a &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test-delay10.b
+	)
+'
+
+test_expect_success PERL 'missing file in delayed checkout' '
+	test_config_global filter.bug.process "rot13-filter.pl bug.log clean smudge delay" &&
+	test_config_global filter.bug.required true &&
+
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+		echo "*.a filter=bug" >.gitattributes &&
+		cp "$TEST_ROOT/test.o" missing-delay.a
+		git add . &&
+		git commit -m "test commit"
+	) &&
+
+	rm -rf repo-cloned &&
+	test_must_fail git clone repo repo-cloned 2>git-stderr.log &&
+	cat git-stderr.log &&
+	grep "error: .missing-delay\.a. was not filtered properly" git-stderr.log
+'
+
+test_expect_success PERL 'invalid file in delayed checkout' '
+	test_config_global filter.bug.process "rot13-filter.pl bug.log clean smudge delay" &&
+	test_config_global filter.bug.required true &&
+
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+		echo "*.a filter=bug" >.gitattributes &&
+		cp "$TEST_ROOT/test.o" invalid-delay.a &&
+		cp "$TEST_ROOT/test.o" unfiltered
+		git add . &&
+		git commit -m "test commit"
+	) &&
+
+	rm -rf repo-cloned &&
+	test_must_fail git clone repo repo-cloned 2>git-stderr.log &&
+	grep "error: external filter .* signaled that .unfiltered. is now available although it has not been delayed earlier" git-stderr.log
+'
+
 test_done
diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index 5e43faeec1..ad685d92f8 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -18,6 +18,16 @@
 #     operation then the filter signals that it cannot or does not want
 #     to process the file and any file after that is processed with the
 #     same command.
+# (5) If data with a pathname that is a key in the DELAY hash is
+#     requested (e.g. "test-delay10.a") then the filter responds with
+#     a "delay" status and sets the "requested" field in the DELAY hash.
+#     The filter will signal the availability of this object after
+#     "count" (field in DELAY hash) "list_available_blobs" commands.
+# (6) If data with the pathname "missing-delay.a" is processed that the
+#     filter will drop the path from the "list_available_blobs" response.
+# (7) If data with the pathname "invalid-delay.a" is processed that the
+#     filter will add the path "unfiltered" which was not delayed before
+#     to the "list_available_blobs" response.
 #
 
 use strict;
@@ -30,6 +40,15 @@ my @capabilities            = @ARGV;
 
 open my $debug, ">>", $log_file or die "cannot open log file: $!";
 
+my %DELAY = (
+	'test-delay10.a' => { "requested" => 0, "count" => 1 },
+	'test-delay11.a' => { "requested" => 0, "count" => 1 },
+	'test-delay20.a' => { "requested" => 0, "count" => 2 },
+	'test-delay10.b' => { "requested" => 0, "count" => 1 },
+	'missing-delay.a' => { "requested" => 0, "count" => 1 },
+	'invalid-delay.a' => { "requested" => 0, "count" => 1 },
+);
+
 sub rot13 {
 	my $str = shift;
 	$str =~ y/A-Za-z/N-ZA-Mn-za-m/;
@@ -66,7 +85,7 @@ sub packet_bin_read {
 
 sub packet_txt_read {
 	my ( $res, $buf ) = packet_bin_read();
-	unless ( $buf =~ s/\n$// ) {
+	unless ( $buf eq '' or $buf =~ s/\n$// ) {
 		die "A non-binary line MUST be terminated by an LF.";
 	}
 	return ( $res, $buf );
@@ -101,6 +120,7 @@ packet_flush();
 
 ( packet_txt_read() eq ( 0, "capability=clean" ) )  || die "bad capability";
 ( packet_txt_read() eq ( 0, "capability=smudge" ) ) || die "bad capability";
+( packet_txt_read() eq ( 0, "capability=delay" ) )  || die "bad capability";
 ( packet_bin_read() eq ( 1, "" ) )                  || die "bad capability end";
 
 foreach (@capabilities) {
@@ -111,88 +131,142 @@ print $debug "init handshake complete\n";
 $debug->flush();
 
 while (1) {
-	my ($command) = packet_txt_read() =~ /^command=(.+)$/;
+	my ( $command ) = packet_txt_read() =~ /^command=(.+)$/;
 	print $debug "IN: $command";
 	$debug->flush();
 
-	my ($pathname) = packet_txt_read() =~ /^pathname=(.+)$/;
-	print $debug " $pathname";
-	$debug->flush();
-
-	if ( $pathname eq "" ) {
-		die "bad pathname '$pathname'";
-	}
+	if ( $command eq "list_available_blobs" ) {
+		# Flush
+		packet_bin_read();
 
-	# Flush
-	packet_bin_read();
-
-	my $input = "";
-	{
-		binmode(STDIN);
-		my $buffer;
-		my $done = 0;
-		while ( !$done ) {
-			( $done, $buffer ) = packet_bin_read();
-			$input .= $buffer;
+		foreach my $pathname ( sort keys %DELAY ) {
+			if ( $DELAY{$pathname}{"requested"} >= 1 ) {
+				$DELAY{$pathname}{"count"} = $DELAY{$pathname}{"count"} - 1;
+				if ( $pathname eq "invalid-delay.a" ) {
+					# Send Git a pathname that was not delayed earlier
+					packet_txt_write("pathname=unfiltered");
+				}
+				if ( $pathname eq "missing-delay.a" ) {
+					# Do not signal Git that this file is available
+				} elsif ( $DELAY{$pathname}{"count"} == 0 ) {
+					print $debug " $pathname";
+					packet_txt_write("pathname=$pathname");
+				}
+			}
 		}
-		print $debug " " . length($input) . " [OK] -- ";
-		$debug->flush();
-	}
-
-	my $output;
-	if ( $pathname eq "error.r" or $pathname eq "abort.r" ) {
-		$output = "";
-	}
-	elsif ( $command eq "clean" and grep( /^clean$/, @capabilities ) ) {
-		$output = rot13($input);
-	}
-	elsif ( $command eq "smudge" and grep( /^smudge$/, @capabilities ) ) {
-		$output = rot13($input);
-	}
-	else {
-		die "bad command '$command'";
-	}
 
-	if ( $pathname eq "error.r" ) {
-		print $debug "[ERROR]\n";
-		$debug->flush();
-		packet_txt_write("status=error");
 		packet_flush();
-	}
-	elsif ( $pathname eq "abort.r" ) {
-		print $debug "[ABORT]\n";
+
+		print $debug " [OK]\n";
 		$debug->flush();
-		packet_txt_write("status=abort");
+		packet_txt_write("status=success");
 		packet_flush();
 	}
 	else {
-		packet_txt_write("status=success");
-		packet_flush();
+		my ( $pathname ) = packet_txt_read() =~ /^pathname=(.+)$/;
+		print $debug " $pathname";
+		$debug->flush();
+
+		if ( $pathname eq "" ) {
+			die "bad pathname '$pathname'";
+		}
+
+		# Read until flush
+		my ( $done, $buffer ) = packet_txt_read();
+		while ( $buffer ne '' ) {
+			if ( $buffer eq "can-delay=1" ) {
+				if ( exists $DELAY{$pathname} and $DELAY{$pathname}{"requested"} == 0 ) {
+					$DELAY{$pathname}{"requested"} = 1;
+				}
+			} else {
+				die "Unknown message '$buffer'";
+			}
 
-		if ( $pathname eq "${command}-write-fail.r" ) {
-			print $debug "[WRITE FAIL]\n";
+			( $done, $buffer ) = packet_txt_read();
+		}
+
+		my $input = "";
+		{
+			binmode(STDIN);
+			my $buffer;
+			my $done = 0;
+			while ( !$done ) {
+				( $done, $buffer ) = packet_bin_read();
+				$input .= $buffer;
+			}
+			print $debug " " . length($input) . " [OK] -- ";
 			$debug->flush();
-			die "${command} write error";
 		}
 
-		print $debug "OUT: " . length($output) . " ";
-		$debug->flush();
+		my $output;
+		if ( exists $DELAY{$pathname} and exists $DELAY{$pathname}{"output"} ) {
+			$output = $DELAY{$pathname}{"output"}
+		}
+		elsif ( $pathname eq "error.r" or $pathname eq "abort.r" ) {
+			$output = "";
+		}
+		elsif ( $command eq "clean" and grep( /^clean$/, @capabilities ) ) {
+			$output = rot13($input);
+		}
+		elsif ( $command eq "smudge" and grep( /^smudge$/, @capabilities ) ) {
+			$output = rot13($input);
+		}
+		else {
+			die "bad command '$command'";
+		}
+
+		if ( $pathname eq "error.r" ) {
+			print $debug "[ERROR]\n";
+			$debug->flush();
+			packet_txt_write("status=error");
+			packet_flush();
+		}
+		elsif ( $pathname eq "abort.r" ) {
+			print $debug "[ABORT]\n";
+			$debug->flush();
+			packet_txt_write("status=abort");
+			packet_flush();
+		}
+		elsif ( $command eq "smudge" and
+			exists $DELAY{$pathname} and
+			$DELAY{$pathname}{"requested"} == 1
+		) {
+			print $debug "[DELAYED]\n";
+			$debug->flush();
+			packet_txt_write("status=delayed");
+			packet_flush();
+			$DELAY{$pathname}{"requested"} = 2;
+			$DELAY{$pathname}{"output"} = $output;
+		}
+		else {
+			packet_txt_write("status=success");
+			packet_flush();
 
-		while ( length($output) > 0 ) {
-			my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE );
-			packet_bin_write($packet);
-			# dots represent the number of packets
-			print $debug ".";
-			if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) {
-				$output = substr( $output, $MAX_PACKET_CONTENT_SIZE );
+			if ( $pathname eq "${command}-write-fail.r" ) {
+				print $debug "[WRITE FAIL]\n";
+				$debug->flush();
+				die "${command} write error";
 			}
-			else {
-				$output = "";
+
+			print $debug "OUT: " . length($output) . " ";
+			$debug->flush();
+
+			while ( length($output) > 0 ) {
+				my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE );
+				packet_bin_write($packet);
+				# dots represent the number of packets
+				print $debug ".";
+				if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) {
+					$output = substr( $output, $MAX_PACKET_CONTENT_SIZE );
+				}
+				else {
+					$output = "";
+				}
 			}
+			packet_flush();
+			print $debug " [OK]\n";
+			$debug->flush();
+			packet_flush();
 		}
-		packet_flush();
-		print $debug " [OK]\n";
-		$debug->flush();
-		packet_flush();
 	}
 }
diff --git a/unpack-trees.c b/unpack-trees.c
index d38c37e38c..491a2562ad 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -379,6 +379,7 @@ static int check_updates(struct unpack_trees_options *o)
 	if (should_update_submodules() && o->update && !o->dry_run)
 		reload_gitmodules_file(index, &state);
 
+	enable_delayed_checkout(&state);
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
@@ -393,6 +394,7 @@ static int check_updates(struct unpack_trees_options *o)
 			}
 		}
 	}
+	errs |= finish_delayed_checkout(&state);
 	stop_progress(&progress);
 	if (o->update)
 		git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
-- 
2.13.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v7 3/6] t0021: write "OUT <size>" only on success
  2017-06-27 12:10 ` [PATCH v7 3/6] t0021: write "OUT <size>" only on success Lars Schneider
@ 2017-06-27 18:44   ` Junio C Hamano
  2017-06-27 20:51     ` Lars Schneider
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2017-06-27 18:44 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, peff, tboegi, e, ttaylorr, peartben

Lars Schneider <larsxschneider@gmail.com> writes:

> "rot13-filter.pl" always writes "OUT <size>" to the debug log at the end
> of a response.
>
> This works without issues for the existing responses "abort", "error",
> and "success". A new response "delayed", that will be introduced in a
> subsequent patch, accepts the input without giving the filtered result
> right away. Since we actually have the data already available in our
> mock filter the debug log output would be wrong/misleading. Therefore,
> we do not write "OUT <size>" for "delayed" responses.

I still do not get why you think it makes any difference that you
are hoarding the result in the mock program.  If the filter needs to
read a prepared result from a file in t/t0021/ before responding to
a real request after it replies to "delayed", would that change the
argument above?  From Git's and the t0021-conversion.sh test's point
of view, I do not think it makes an iota of difference---it's an
implementation detail of the mock program.

I am totally lost.

Isn't the point of removing the log output from response to "delayed"
that the filter does not give the output back to Git at that point,
hence generally the size would not be available in the real-world
use case (not in the mock program)?

> To simplify the code we do not write "OUT <size>" for "abort" and
> "error" responses, too, as their size is always zero.
>
> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
> ---
>  t/t0021-conversion.sh   | 6 +++---
>  t/t0021/rot13-filter.pl | 6 +++---
>  2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
> index 0139b460e7..0c04d346a1 100755
> --- a/t/t0021-conversion.sh
> +++ b/t/t0021-conversion.sh
> @@ -588,7 +588,7 @@ test_expect_success PERL 'process filter should restart after unexpected write f
>  		cat >expected.log <<-EOF &&
>  			START
>  			init handshake complete
> -			IN: smudge smudge-write-fail.r $SF [OK] -- OUT: $SF [WRITE FAIL]
> +			IN: smudge smudge-write-fail.r $SF [OK] -- [WRITE FAIL]
>  			START
>  			init handshake complete
>  			IN: smudge test.r $S [OK] -- OUT: $S . [OK]
> @@ -634,7 +634,7 @@ test_expect_success PERL 'process filter should not be restarted if it signals a
>  		cat >expected.log <<-EOF &&
>  			START
>  			init handshake complete
> -			IN: smudge error.r $SE [OK] -- OUT: 0 [ERROR]
> +			IN: smudge error.r $SE [OK] -- [ERROR]
>  			IN: smudge test.r $S [OK] -- OUT: $S . [OK]
>  			IN: smudge test2.r $S2 [OK] -- OUT: $S2 . [OK]
>  			STOP
> @@ -673,7 +673,7 @@ test_expect_success PERL 'process filter abort stops processing of all further f
>  		cat >expected.log <<-EOF &&
>  			START
>  			init handshake complete
> -			IN: smudge abort.r $SA [OK] -- OUT: 0 [ABORT]
> +			IN: smudge abort.r $SA [OK] -- [ABORT]
>  			STOP
>  		EOF
>  		test_cmp_exclude_clean expected.log debug.log &&
> diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
> index 0b943bb377..5e43faeec1 100644
> --- a/t/t0021/rot13-filter.pl
> +++ b/t/t0021/rot13-filter.pl
> @@ -153,9 +153,6 @@ while (1) {
>  		die "bad command '$command'";
>  	}
>  
> -	print $debug "OUT: " . length($output) . " ";
> -	$debug->flush();
> -
>  	if ( $pathname eq "error.r" ) {
>  		print $debug "[ERROR]\n";
>  		$debug->flush();
> @@ -178,6 +175,9 @@ while (1) {
>  			die "${command} write error";
>  		}
>  
> +		print $debug "OUT: " . length($output) . " ";
> +		$debug->flush();
> +
>  		while ( length($output) > 0 ) {
>  			my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE );
>  			packet_bin_write($packet);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v7 6/6] convert: add "status=delayed" to filter process protocol
  2017-06-27 12:10 ` [PATCH v7 6/6] convert: add "status=delayed" to filter process protocol Lars Schneider
@ 2017-06-27 19:00   ` Junio C Hamano
  2017-06-27 20:34     ` Lars Schneider
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2017-06-27 19:00 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, peff, tboegi, e, ttaylorr, peartben

Lars Schneider <larsxschneider@gmail.com> writes:

> @@ -533,7 +534,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
>  	if (err)
>  		goto done;
>  
> -	err = packet_writel(process->in, "capability=clean", "capability=smudge", NULL);
> +	err = packet_writel(process->in,
> +		"capability=clean", "capability=smudge", "capability=delay", NULL);
>  
>  	for (;;) {
>  		cap_buf = packet_read_line(process->out, NULL);
> @@ -549,6 +551,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
>  			entry->supported_capabilities |= CAP_CLEAN;
>  		} else if (!strcmp(cap_name, "smudge")) {
>  			entry->supported_capabilities |= CAP_SMUDGE;
> +		} else if (!strcmp(cap_name, "delay")) {
> +			entry->supported_capabilities |= CAP_DELAY;
>  		} else {
>  			warning(
>  				"external filter '%s' requested unsupported filter capability '%s'",

I thought you said something about attempting to make this more
table-driven; did the attempt produce a better result?  Just being
curious.

> @@ -590,9 +594,11 @@ static void handle_filter_error(const struct strbuf *filter_status,
>  
>  static int apply_multi_file_filter(const char *path, const char *src, size_t len,
>  				   int fd, struct strbuf *dst, const char *cmd,
> -				   const unsigned int wanted_capability)
> +				   const unsigned int wanted_capability,
> +				   struct delayed_checkout *dco)
>  {
>  	int err;
> +	int can_delay = 0;
>  	struct cmd2process *entry;
>  	struct child_process *process;
>  	struct strbuf nbuf = STRBUF_INIT;
> @@ -647,6 +653,14 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
>  	if (err)
>  		goto done;
>  
> +	if ((entry->supported_capabilities & CAP_DELAY) &&
> +	    dco && dco->state == CE_CAN_DELAY) {
> +		can_delay = 1;
> +		err = packet_write_fmt_gently(process->in, "can-delay=1\n");
> +		if (err)
> +			goto done;
> +	}
> +
>  	err = packet_flush_gently(process->in);
>  	if (err)
>  		goto done;
> @@ -662,14 +676,74 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
>  	if (err)
>  		goto done;
>  
> -	err = strcmp(filter_status.buf, "success");
> +	if (can_delay && !strcmp(filter_status.buf, "delayed")) {
> +		dco->is_delayed = 1;
> +		string_list_insert(&dco->filters, cmd);
> +		string_list_insert(&dco->paths, path);
> +	} else {
> +		/* The filter got the blob and wants to send us a response. */
> +		err = strcmp(filter_status.buf, "success");
> +		if (err)
> +			goto done;
> +
> +		err = read_packetized_to_strbuf(process->out, &nbuf) < 0;
> +		if (err)
> +			goto done;
> +
> +		err = subprocess_read_status(process->out, &filter_status);
> +		if (err)
> +			goto done;
> +
> +		err = strcmp(filter_status.buf, "success");
> +	}
> +
> +done:
> +	sigchain_pop(SIGPIPE);
> +
> +	if (err)
> +		handle_filter_error(&filter_status, entry, wanted_capability);
> +	else
> +		strbuf_swap(dst, &nbuf);
> +	strbuf_release(&nbuf);
> +	return !err;
> +}

This I can understand better than the previous round ;-)

> diff --git a/convert.h b/convert.h
> index 82871a11d5..cdb91ab99a 100644
> --- a/convert.h
> +++ b/convert.h
> @@ -4,6 +4,8 @@
>  #ifndef CONVERT_H
>  #define CONVERT_H
>  
> +#include "string-list.h"
> +
>  enum safe_crlf {
>  	SAFE_CRLF_FALSE = 0,
>  	SAFE_CRLF_FAIL = 1,
> @@ -32,6 +34,27 @@ enum eol {
>  #endif
>  };
>  
> +enum ce_delay_state {
> +	CE_NO_DELAY = 0,
> +	CE_CAN_DELAY = 1,
> +	CE_RETRY = 2
> +};

This feels more natural and makes it easy to imagine the state
transition diagram.  enable-delay will take us from no-delay to
can-delay, and we tell the filter that it is OK to delay while
making the initial pass of the requests, and then after that we go
into retry state to collect the delayed reponses.

> +struct delayed_checkout {
> +	/* State of the currently processed cache entry. If the state is
> +	 * CE_CAN_DELAY, then the filter can change the 'is_delayed' flag
> +	 * to signal that the current cache entry was delayed. If the state is
> +	 * CE_RETRY, then this signals the filter that the cache entry was
> +	 * requested before.
> +	 */

        /*
         * Our multi-line comment look like this; slash-aster 
         * and aster-slash that opens and closes the block are
         * on their own lines.
         */

> +	enum ce_delay_state state;
> +	int is_delayed;

Hmph, I do not terribly mind but is this thing really needed?

Wouldn't filters and paths being non-empty be a good enough sign
that the backend said "ok, I am allowed to give a delayed response
so I acknowledge this path but would not give a result right away"?

> +int finish_delayed_checkout(struct checkout *state)
> +{
> +	int errs = 0;
> +	struct string_list_item *filter, *path;
> +	struct delayed_checkout *dco = state->delayed_checkout;
> +
> +	if (!state->delayed_checkout)
> +		return errs;
> +
> +	dco->state = CE_RETRY;
> +	while (dco->filters.nr > 0) {
> +		for_each_string_list_item(filter, &dco->filters) {
> +			struct string_list available_paths = STRING_LIST_INIT_NODUP;
> +
> +			if (!async_query_available_blobs(filter->string, &available_paths)) {
> +				/* Filter reported an error */
> +				errs = 1;
> +				filter->string = "";
> +				continue;
> +			}
> +			if (available_paths.nr <= 0) {
> +				/* Filter responded with no entries. That means
> +				 * the filter is done and we can remove the
> +				 * filter from the list (see
> +				 * "string_list_remove_empty_items" call below).
> +				 */
> +				filter->string = "";
> +				continue;
> +			}
> +
> +			/* In dco->paths we store a list of all delayed paths.
> +			 * The filter just send us a list of available paths.
> +			 * Remove them from the list.
> +			 */
> +			filter_string_list(&dco->paths, 0,
> +				&remove_available_paths, &available_paths);
> +
> +			for_each_string_list_item(path, &available_paths) {
> +				struct cache_entry* ce;
> +
> +				if (!path->util) {
> +					error("external filter '%s' signaled that '%s' "
> +					      "is now available although it has not been "
> +					      "delayed earlier",
> +					      filter->string, path->string);
> +					errs |= 1;
> +
> +					/* Do not ask the filter for available blobs,
> +					 * again, as the filter is likely buggy.
> +					 */
> +					filter->string = "";
> +					continue;
> +				}
> +				ce = index_file_exists(state->istate, path->string,
> +						       strlen(path->string), 0);
> +				assert(dco->state == CE_RETRY);

Can anything futz with dco->state at this late in the game?  You
entered into CE_RETRY state at the beginning of this function, and
this loop is going through each delayed ones. At this point, you are
going to make , but not yet have made, a request to the backend via
another call to checkout_entry() again.

Just wondering what kind of programming errors you are protecting
yourself against.  I briefly wondered perhaps you are afraid of a
bug in checkout_entry() that may flip the state back, but it that
is the case then the assert() would be after checkout_entry().

> +				errs |= (ce ? checkout_entry(ce, state, NULL) : 1);

> +			}
> +		}
> +		string_list_remove_empty_items(&dco->filters, 0);
> +	}
> +	string_list_clear(&dco->filters, 0);
> +
> +	/* At this point we should not have any delayed paths anymore. */
> +	errs |= dco->paths.nr;
> +	for_each_string_list_item(path, &dco->paths) {
> +		error("'%s' was not filtered properly", path->string);
> +	}
> +	string_list_clear(&dco->paths, 0);
> +
> +	free(dco);
> +	state->delayed_checkout = NULL;
> +
> +	return errs;
> +}

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v7 6/6] convert: add "status=delayed" to filter process protocol
  2017-06-27 19:00   ` Junio C Hamano
@ 2017-06-27 20:34     ` Lars Schneider
  2017-06-27 21:30       ` Junio C Hamano
  2017-06-27 21:49       ` Lars Schneider
  0 siblings, 2 replies; 15+ messages in thread
From: Lars Schneider @ 2017-06-27 20:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, tboegi, e, ttaylorr, peartben


> On 27 Jun 2017, at 21:00, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Lars Schneider <larsxschneider@gmail.com> writes:
> 
>> @@ -533,7 +534,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
>> 	if (err)
>> 		goto done;
>> 
>> -	err = packet_writel(process->in, "capability=clean", "capability=smudge", NULL);
>> +	err = packet_writel(process->in,
>> +		"capability=clean", "capability=smudge", "capability=delay", NULL);
>> 
>> 	for (;;) {
>> 		cap_buf = packet_read_line(process->out, NULL);
>> @@ -549,6 +551,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
>> 			entry->supported_capabilities |= CAP_CLEAN;
>> 		} else if (!strcmp(cap_name, "smudge")) {
>> 			entry->supported_capabilities |= CAP_SMUDGE;
>> +		} else if (!strcmp(cap_name, "delay")) {
>> +			entry->supported_capabilities |= CAP_DELAY;
>> 		} else {
>> 			warning(
>> 				"external filter '%s' requested unsupported filter capability '%s'",
> 
> I thought you said something about attempting to make this more
> table-driven; did the attempt produce a better result?  Just being
> curious.

I treated that as low-prio as I got the impression that you are generally OK with
the current implementation. I promise to send a patch with this change. 
Either as part of this series or as a follow up patch.


> ...
> 
>> +struct delayed_checkout {
>> +	/* State of the currently processed cache entry. If the state is
>> +	 * CE_CAN_DELAY, then the filter can change the 'is_delayed' flag
>> +	 * to signal that the current cache entry was delayed. If the state is
>> +	 * CE_RETRY, then this signals the filter that the cache entry was
>> +	 * requested before.
>> +	 */
> 
>        /*
>         * Our multi-line comment look like this; slash-aster 
>         * and aster-slash that opens and closes the block are
>         * on their own lines.
>         */

Oops. Lesson learned.


>> +	enum ce_delay_state state;
>> +	int is_delayed;
> 
> Hmph, I do not terribly mind but is this thing really needed?
> 
> Wouldn't filters and paths being non-empty be a good enough sign
> that the backend said "ok, I am allowed to give a delayed response
> so I acknowledge this path but would not give a result right away"?

Yes. I guess that was a premature optimization on my end.
This is how "write_entry()" in entry.c would change:

-                               dco->is_delayed = 0;
                                ret = async_convert_to_working_tree(
                                        ce->name, new, size, &buf, dco);
-                               if (ret && dco->is_delayed) {
+                               if (ret && string_list_has_string(&dco->paths, ce->name)) {
                                        free(new);
                                        goto finish;
                                }

OK?


>> +int finish_delayed_checkout(struct checkout *state)
>> +{
>> +	int errs = 0;
>> +	struct string_list_item *filter, *path;
>> +	struct delayed_checkout *dco = state->delayed_checkout;
>> +
>> +	if (!state->delayed_checkout)
>> +		return errs;
>> +
>> +	dco->state = CE_RETRY;
>> +	while (dco->filters.nr > 0) {
>> +		for_each_string_list_item(filter, &dco->filters) {
>> +			struct string_list available_paths = STRING_LIST_INIT_NODUP;
>> +
>> +			if (!async_query_available_blobs(filter->string, &available_paths)) {
>> +				/* Filter reported an error */
>> +				errs = 1;
>> +				filter->string = "";
>> +				continue;
>> +			}
>> +			if (available_paths.nr <= 0) {
>> +				/* Filter responded with no entries. That means
>> +				 * the filter is done and we can remove the
>> +				 * filter from the list (see
>> +				 * "string_list_remove_empty_items" call below).
>> +				 */
>> +				filter->string = "";
>> +				continue;
>> +			}
>> +
>> +			/* In dco->paths we store a list of all delayed paths.
>> +			 * The filter just send us a list of available paths.
>> +			 * Remove them from the list.
>> +			 */
>> +			filter_string_list(&dco->paths, 0,
>> +				&remove_available_paths, &available_paths);
>> +
>> +			for_each_string_list_item(path, &available_paths) {
>> +				struct cache_entry* ce;
>> +
>> +				if (!path->util) {
>> +					error("external filter '%s' signaled that '%s' "
>> +					      "is now available although it has not been "
>> +					      "delayed earlier",
>> +					      filter->string, path->string);
>> +					errs |= 1;
>> +
>> +					/* Do not ask the filter for available blobs,
>> +					 * again, as the filter is likely buggy.
>> +					 */
>> +					filter->string = "";
>> +					continue;
>> +				}
>> +				ce = index_file_exists(state->istate, path->string,
>> +						       strlen(path->string), 0);
>> +				assert(dco->state == CE_RETRY);
> 
> Can anything futz with dco->state at this late in the game?  You
> entered into CE_RETRY state at the beginning of this function, and
> this loop is going through each delayed ones. At this point, you are
> going to make , but not yet have made, a request to the backend via
> another call to checkout_entry() again.
> 
> Just wondering what kind of programming errors you are protecting
> yourself against.  I briefly wondered perhaps you are afraid of a
> bug in checkout_entry() that may flip the state back, but it that
> is the case then the assert() would be after checkout_entry().

At this point I want to ensure that checkout_entry() is called
with the CE_RETRY state. Because checkout_entry() needs to pass 
that flag to the convert machinery.

This assert was useful when checkout_entry() changed dco->state
between CAN_DELAY and DELAYED. In the current implementation the
state should not be changed anymore.

Would you prefer if I remove it? (with or without is both fine with me)


Thanks,
Lars

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v7 3/6] t0021: write "OUT <size>" only on success
  2017-06-27 18:44   ` Junio C Hamano
@ 2017-06-27 20:51     ` Lars Schneider
  2017-06-27 21:26       ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Lars Schneider @ 2017-06-27 20:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, tboegi, e, ttaylorr, peartben


> On 27 Jun 2017, at 20:44, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Lars Schneider <larsxschneider@gmail.com> writes:
> 
>> "rot13-filter.pl" always writes "OUT <size>" to the debug log at the end
>> of a response.
>> 
>> This works without issues for the existing responses "abort", "error",
>> and "success". A new response "delayed", that will be introduced in a
>> subsequent patch, accepts the input without giving the filtered result
>> right away. Since we actually have the data already available in our
>> mock filter the debug log output would be wrong/misleading. Therefore,
>> we do not write "OUT <size>" for "delayed" responses.
> 
> I still do not get why you think it makes any difference that you
> are hoarding the result in the mock program.  If the filter needs to
> read a prepared result from a file in t/t0021/ before responding to
> a real request after it replies to "delayed", would that change the
> argument above?  From Git's and the t0021-conversion.sh test's point
> of view, I do not think it makes an iota of difference---it's an
> implementation detail of the mock program.
> 
> I am totally lost.
> 
> Isn't the point of removing the log output from response to "delayed"
> that the filter does not give the output back to Git at that point,
> hence generally the size would not be available in the real-world
> use case (not in the mock program)?

Correct! Sorry for the confusion. How about this?

    "rot13-filter.pl" always writes "OUT <size>" to the debug log at the end
    of a response.

    This works perfectly for the existing responses "abort", "error", and 
    "success". A new response "delayed", that will be introduced in a
    subsequent patch, accepts the input without giving the filtered result
    right away. At this point we cannot know the size of the response.
    Therefore, we do not write "OUT <size>" for "delayed" responses.

    To simplify the code we do not write "OUT <size>" for "abort" and
    "error" responses either as their size is always zero.


Thanks,
Lars

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v7 3/6] t0021: write "OUT <size>" only on success
  2017-06-27 20:51     ` Lars Schneider
@ 2017-06-27 21:26       ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2017-06-27 21:26 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, peff, tboegi, e, ttaylorr, peartben

Lars Schneider <larsxschneider@gmail.com> writes:

> Correct! Sorry for the confusion. How about this?
>
>     "rot13-filter.pl" always writes "OUT <size>" to the debug log at the end
>     of a response.
>
>     This works perfectly for the existing responses "abort", "error", and 
>     "success". A new response "delayed", that will be introduced in a
>     subsequent patch, accepts the input without giving the filtered result
>     right away. At this point we cannot know the size of the response.
>     Therefore, we do not write "OUT <size>" for "delayed" responses.
>
>     To simplify the code we do not write "OUT <size>" for "abort" and
>     "error" responses either as their size is always zero.

That unconfuses me tremendously.

Unless there are other issues in v7 (which I am not aware of but
there may be somebody else slow responding), I plan to amend the log
message with the above, so no need to resend.

Thanks.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v7 6/6] convert: add "status=delayed" to filter process protocol
  2017-06-27 20:34     ` Lars Schneider
@ 2017-06-27 21:30       ` Junio C Hamano
  2017-06-27 21:58         ` Lars Schneider
  2017-06-27 21:49       ` Lars Schneider
  1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2017-06-27 21:30 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, peff, tboegi, e, ttaylorr, peartben

Lars Schneider <larsxschneider@gmail.com> writes:

> I treated that as low-prio as I got the impression that you are generally OK with
> the current implementation. I promise to send a patch with this change. 
> Either as part of this series or as a follow up patch.

Sure. I was just being curious what happened; I agree that it is a
good idea to treat this as a lower priority clean-up item and leave
it to later.

> Yes. I guess that was a premature optimization on my end.
> This is how "write_entry()" in entry.c would change:
>
> -                               dco->is_delayed = 0;
>                                 ret = async_convert_to_working_tree(
>                                         ce->name, new, size, &buf, dco);
> -                               if (ret && dco->is_delayed) {
> +                               if (ret && string_list_has_string(&dco->paths, ce->name)) {
>                                         free(new);
>                                         goto finish;
>                                 }
>
> OK?

Actually I should be asking the "OK?" question---the above was what
I had in mind when I commented, but you are a lot more familiar with
the way how this code is supposed to work, and I didn't know if the
emptyness of the list can be a 100% substitute for that bit.  If you
think they are equivalent, that is a great news.

> At this point I want to ensure that checkout_entry() is called
> with the CE_RETRY state. Because checkout_entry() needs to pass 
> that flag to the convert machinery.
>
> This assert was useful when checkout_entry() changed dco->state
> between CAN_DELAY and DELAYED. In the current implementation the
> state should not be changed anymore.
>
> Would you prefer if I remove it? (with or without is both fine with me)

If that is the reason behind the assert, I am OK with either (1)
moving it _after_ checkout_entry() returns (to ensure that the
function no longer mucks with the state, or (2) removing it.

Leaving it at the current position does not make much sense to me.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v7 6/6] convert: add "status=delayed" to filter process protocol
  2017-06-27 20:34     ` Lars Schneider
  2017-06-27 21:30       ` Junio C Hamano
@ 2017-06-27 21:49       ` Lars Schneider
  1 sibling, 0 replies; 15+ messages in thread
From: Lars Schneider @ 2017-06-27 21:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, tboegi, e, ttaylorr, peartben


> On 27 Jun 2017, at 22:34, Lars Schneider <larsxschneider@gmail.com> wrote:
> 
> 
>> On 27 Jun 2017, at 21:00, Junio C Hamano <gitster@pobox.com> wrote:
>> 
>> Lars Schneider <larsxschneider@gmail.com> writes:
>> 
>>> @@ -533,7 +534,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
>>> 	if (err)
>>> 		goto done;
>>> 
>>> -	err = packet_writel(process->in, "capability=clean", "capability=smudge", NULL);
>>> +	err = packet_writel(process->in,
>>> +		"capability=clean", "capability=smudge", "capability=delay", NULL);
>>> 
>>> 	for (;;) {
>>> 		cap_buf = packet_read_line(process->out, NULL);
>>> @@ -549,6 +551,8 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
>>> 			entry->supported_capabilities |= CAP_CLEAN;
>>> 		} else if (!strcmp(cap_name, "smudge")) {
>>> 			entry->supported_capabilities |= CAP_SMUDGE;
>>> +		} else if (!strcmp(cap_name, "delay")) {
>>> +			entry->supported_capabilities |= CAP_DELAY;
>>> 		} else {
>>> 			warning(
>>> 				"external filter '%s' requested unsupported filter capability '%s'",
>> 
>> I thought you said something about attempting to make this more
>> table-driven; did the attempt produce a better result?  Just being
>> curious.
> 
> I treated that as low-prio as I got the impression that you are generally OK with
> the current implementation. I promise to send a patch with this change. 
> Either as part of this series or as a follow up patch.

How about this?


 static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
 {
-       int err;
+       int err, i;
        struct cmd2process *entry = (struct cmd2process *)subprocess;
        struct string_list cap_list = STRING_LIST_INIT_NODUP;
        char *cap_buf;
@@ -516,6 +516,15 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
        struct child_process *process = &subprocess->process;
        const char *cmd = subprocess->cmd;

+       static const struct {
+               const char *name;
+               unsigned int cap;
+       } known_caps[] = {
+               { "clean",  CAP_CLEAN  },
+               { "smudge", CAP_SMUDGE },
+               { "delay",  CAP_DELAY  },
+       };
+
        sigchain_push(SIGPIPE, SIG_IGN);

        err = packet_writel(process->in, "git-filter-client", "version=2", NULL);
@@ -534,8 +543,15 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
        if (err)
                goto done;

-       err = packet_writel(process->in,
-               "capability=clean", "capability=smudge", "capability=delay", NULL);
+       for (i = 0; i < ARRAY_SIZE(known_caps); ++i) {
+               err = packet_write_fmt_gently(
+                       process->in, "capability=%s\n", known_caps[i].name);
+               if (err)
+                       goto done;
+       }
+       err = packet_flush_gently(process->in);
+       if (err)
+               goto done;

        for (;;) {
                cap_buf = packet_read_line(process->out, NULL);
@@ -547,18 +563,15 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
                        continue;

                cap_name = cap_list.items[1].string;
-               if (!strcmp(cap_name, "clean")) {
-                       entry->supported_capabilities |= CAP_CLEAN;
-               } else if (!strcmp(cap_name, "smudge")) {
-                       entry->supported_capabilities |= CAP_SMUDGE;
-               } else if (!strcmp(cap_name, "delay")) {
-                       entry->supported_capabilities |= CAP_DELAY;
-               } else {
-                       warning(
-                               "external filter '%s' requested unsupported filter capability '%s'",
-                               cmd, cap_name
-                       );
-               }
+               i = ARRAY_SIZE(known_caps) - 1;
+               while (i >= 0 && strcmp(cap_name, known_caps[i].name))
+                       i--;
+
+               if (i >= 0)
+                       entry->supported_capabilities |= known_caps[i].cap;
+               else
+                       warning("external filter '%s' requested unsupported filter capability '%s'",
+                       cmd, cap_name);

                string_list_clear(&cap_list, 0);






^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v7 6/6] convert: add "status=delayed" to filter process protocol
  2017-06-27 21:30       ` Junio C Hamano
@ 2017-06-27 21:58         ` Lars Schneider
  0 siblings, 0 replies; 15+ messages in thread
From: Lars Schneider @ 2017-06-27 21:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, tboegi, e, ttaylorr, peartben


> On 27 Jun 2017, at 23:30, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Lars Schneider <larsxschneider@gmail.com> writes:
> 
>> I treated that as low-prio as I got the impression that you are generally OK with
>> the current implementation. I promise to send a patch with this change. 
>> Either as part of this series or as a follow up patch.
> 
> Sure. I was just being curious what happened; I agree that it is a
> good idea to treat this as a lower priority clean-up item and leave
> it to later.

I posted it here:
http://public-inbox.org/git/7FF53F59-1240-4BED-999F-FE9C9AD7E6AC@gmail.com/

If you are OK with it, then I will post a v8 with all changes combined.


>> Yes. I guess that was a premature optimization on my end.
>> This is how "write_entry()" in entry.c would change:
>> 
>> -                               dco->is_delayed = 0;
>>                                ret = async_convert_to_working_tree(
>>                                        ce->name, new, size, &buf, dco);
>> -                               if (ret && dco->is_delayed) {
>> +                               if (ret && string_list_has_string(&dco->paths, ce->name)) {
>>                                        free(new);
>>                                        goto finish;
>>                                }
>> 
>> OK?
> 
> Actually I should be asking the "OK?" question---the above was what
> I had in mind when I commented, but you are a lot more familiar with
> the way how this code is supposed to work, and I didn't know if the
> emptyness of the list can be a 100% substitute for that bit.  If you
> think they are equivalent, that is a great news.

Yes, I think this should be equivalent.


>> At this point I want to ensure that checkout_entry() is called
>> with the CE_RETRY state. Because checkout_entry() needs to pass 
>> that flag to the convert machinery.
>> 
>> This assert was useful when checkout_entry() changed dco->state
>> between CAN_DELAY and DELAYED. In the current implementation the
>> state should not be changed anymore.
>> 
>> Would you prefer if I remove it? (with or without is both fine with me)
> 
> If that is the reason behind the assert, I am OK with either (1)
> moving it _after_ checkout_entry() returns (to ensure that the
> function no longer mucks with the state, or (2) removing it.
> 
> Leaving it at the current position does not make much sense to me.

I'll remove it.


Thanks,
Lars


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-06-27 21:59 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-27 12:10 [PATCH v7 0/6] convert: add "status=delayed" to filter process protocol Lars Schneider
2017-06-27 12:10 ` [PATCH v7 1/6] t0021: keep filter log files on comparison Lars Schneider
2017-06-27 12:10 ` [PATCH v7 2/6] t0021: make debug log file name configurable Lars Schneider
2017-06-27 12:10 ` [PATCH v7 3/6] t0021: write "OUT <size>" only on success Lars Schneider
2017-06-27 18:44   ` Junio C Hamano
2017-06-27 20:51     ` Lars Schneider
2017-06-27 21:26       ` Junio C Hamano
2017-06-27 12:10 ` [PATCH v7 4/6] convert: put the flags field before the flag itself for consistent style Lars Schneider
2017-06-27 12:10 ` [PATCH v7 5/6] convert: move multiple file filter error handling to separate function Lars Schneider
2017-06-27 12:10 ` [PATCH v7 6/6] convert: add "status=delayed" to filter process protocol Lars Schneider
2017-06-27 19:00   ` Junio C Hamano
2017-06-27 20:34     ` Lars Schneider
2017-06-27 21:30       ` Junio C Hamano
2017-06-27 21:58         ` Lars Schneider
2017-06-27 21:49       ` Lars Schneider

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).