git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: larsxschneider@gmail.com
To: git@vger.kernel.org
Cc: gitster@pobox.com, jnareb@gmail.com, tboegi@web.de,
	mlbright@gmail.com, e@80x24.org, peff@peff.net,
	Lars Schneider <larsxschneider@gmail.com>
Subject: [PATCH v4 11/12] convert: add filter.<driver>.process option
Date: Wed,  3 Aug 2016 18:42:24 +0200	[thread overview]
Message-ID: <20160803164225.46355-12-larsxschneider@gmail.com> (raw)
In-Reply-To: <20160803164225.46355-1-larsxschneider@gmail.com>

From: Lars Schneider <larsxschneider@gmail.com>

Git's clean/smudge mechanism invokes an external filter process for every
single blob that is affected by a filter. If Git filters a lot of blobs
then the startup time of the external filter processes can become a
significant part of the overall Git execution time.

In a preliminary performance test this developer used a clean/smudge filter
written in golang to filter 12,000 files. This process took 364s with the
existing filter mechanism and 5s with the new mechanism. See details here:
https://github.com/github/git-lfs/pull/1382

This patch adds the `filter.<driver>.process` string option which, if used,
keeps the external filter process running and processes all blobs with
the packet format (pkt-line) based protocol over standard input and standard
output described below.

Git starts the filter when it encounters the first file
that needs to be cleaned or smudged. After the filter started
Git expects a welcome message, protocol version number, and
filter capabilities separated by spaces:
------------------------
packet:          git< git-filter-protocol\n
packet:          git< version=2\n
packet:          git< capabilities=clean smudge\n
------------------------
Supported filter capabilities are "clean" and "smudge".

Afterwards Git sends a command (based on the supported
capabilities), the pathname of a file relative to the
repository root, the content split in zero or more pkt-line
packets, and a flush packet at the end:
------------------------
packet:          git> command=smudge\n
packet:          git> pathname=path/testfile.dat\n
packet:          git> CONTENT
packet:          git> 0000
------------------------

The filter is expected to respond with the result content in zero
or more pkt-line packets and a flush packet at the end. Finally, a
"result=success" packet is expected if everything went well.
------------------------
packet:          git< SMUDGED_CONTENT
packet:          git< 0000
packet:          git< result=success\n
------------------------

If the result content is empty then the filter is expected to respond
only with a flush packet and a "result=success" packet.
------------------------
packet:          git< 0000
packet:          git< result=success\n
------------------------

In case the filter cannot or does not want to process the content,
it is expected to respond with a flush packet and a "result=reject"
packet. Depending on the `filter.<driver>.required` flag Git will
interpret that as error but it will not stop or restart the filter
process.
------------------------
packet:          git< 0000
packet:          git< result=reject\n
------------------------

If the filter experiences an error during processing, then it can
either die or send a flush packet and a "result=error" packet. If
Git receives such an error then it will stop and restart the filter
with the next file that needs to be processed.
------------------------
packet:          git< HALF_WRITTEN_ERRONEOUS_CONTENT
packet:          git< 0000
packet:          git< result=error\n
------------------------

After the filter has processed a blob it is expected to wait for
the next command. When the Git process terminates, it will send
a kill signal to the filter in that stage.

If a `filter.<driver>.clean` or `filter.<driver>.smudge` command
is configured then these commands always take precedence over
a configured `filter.<driver>.process` command.

Helped-by: Martin-Louis Bright <mlbright@gmail.com>
Reviewed-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 Documentation/gitattributes.txt |  98 +++++++++++-
 convert.c                       | 277 ++++++++++++++++++++++++++++++----
 t/t0021-conversion.sh           | 322 ++++++++++++++++++++++++++++++++++++++++
 t/t0021/rot13-filter.pl         | 148 ++++++++++++++++++
 4 files changed, 815 insertions(+), 30 deletions(-)
 create mode 100755 t/t0021/rot13-filter.pl

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 8882a3e..49514ab 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -300,7 +300,13 @@ checkout, when the `smudge` command is specified, the command is
 fed the blob object from its standard input, and its standard
 output is used to update the worktree file.  Similarly, the
 `clean` command is used to convert the contents of worktree file
-upon checkin.
+upon checkin. By default these commands process only a single
+blob and terminate.  If a long running `process` filter is used
+in place of `clean` and/or `smudge` filters, then Git can process
+all blobs with a single filter command invocation for the entire
+life of a single Git command, for example `git add --all`.  See
+section below for the description of the protocol used to
+communicate with a `process` filter.
 
 One use of the content filtering is to massage the content into a shape
 that is more convenient for the platform, filesystem, and the user to use.
@@ -375,6 +381,96 @@ substitution.  For example:
 ------------------------
 
 
+Long Running Filter Process
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If the filter command (a string value) is defined via
+`filter.<driver>.process` then Git can process all blobs with a
+single filter invocation for the entire life of a single Git
+command. This is achieved by using the following packet format
+(pkt-line, see technical/protocol-common.txt) based protocol over
+standard input and standard output.
+
+Git starts the filter when it encounters the first file
+that needs to be cleaned or smudged. After the filter started
+Git expects a welcome message, protocol version number, and
+filter capabilities separated by spaces:
+------------------------
+packet:          git< git-filter-protocol\n
+packet:          git< version=2\n
+packet:          git< capabilities=clean smudge\n
+------------------------
+Supported filter capabilities are "clean" and "smudge".
+
+Afterwards Git sends a command (based on the supported
+capabilities), the pathname of a file relative to the
+repository root, the content split in zero or more pkt-line
+packets, and a flush packet at the end:
+------------------------
+packet:          git> command=smudge\n
+packet:          git> pathname=path/testfile.dat\n
+packet:          git> CONTENT
+packet:          git> 0000
+------------------------
+
+The filter is expected to respond with the result content in zero
+or more pkt-line packets and a flush packet at the end. Finally, a
+"result=success" packet is expected if everything went well.
+------------------------
+packet:          git< SMUDGED_CONTENT
+packet:          git< 0000
+packet:          git< result=success\n
+------------------------
+
+If the result content is empty then the filter is expected to respond
+only with a flush packet and a "result=success" packet.
+------------------------
+packet:          git< 0000
+packet:          git< result=success\n
+------------------------
+
+In case the filter cannot or does not want to process the content,
+it is expected to respond with a flush packet and a "result=reject"
+packet. Depending on the `filter.<driver>.required` flag Git will
+interpret that as error but it will not stop or restart the filter
+process.
+------------------------
+packet:          git< 0000
+packet:          git< result=reject\n
+------------------------
+
+If the filter experiences an error during processing, then it can
+either die or send a flush packet and a "result=error" packet. If
+Git receives such an error then it will stop and restart the filter
+with the next file that needs to be processed.
+------------------------
+packet:          git< HALF_WRITTEN_ERRONEOUS_CONTENT
+packet:          git< 0000
+packet:          git< result=error\n
+------------------------
+
+After the filter has processed a blob it is expected to wait for
+the next command. When the Git process terminates, it will send
+a kill signal to the filter in that stage.
+
+A long running filter demo implementation can be found in
+`t/t0021/rot13-filter.pl` located in the Git core repository.
+If you develop your own long running filter process then the
+`GIT_TRACE_PACKET` environment variables can be very helpful
+for debugging (see linkgit:git[1]).
+
+If a `filter.<driver>.clean` or `filter.<driver>.smudge` command
+is configured then these commands always take precedence over
+a configured `filter.<driver>.process` command.
+
+Please note that you cannot use an existing `filter.<driver>.clean`
+or `filter.<driver>.smudge` command with `filter.<driver>.process`
+because the former two use a different inter process communication
+protocol than the latter one. As soon as Git would detect a file
+that needs to be processed by such an invalid "process" filter,
+it would wait for a proper protocol handshake and appear "hanging".
+
+
 Interaction between checkin/checkout attributes
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/convert.c b/convert.c
index 522e2c5..130430a 100644
--- a/convert.c
+++ b/convert.c
@@ -3,6 +3,7 @@
 #include "run-command.h"
 #include "quote.h"
 #include "sigchain.h"
+#include "pkt-line.h"
 
 /*
  * convert.c - convert a file when checking it out and checking it in.
@@ -427,7 +428,7 @@ static int filter_buffer_or_fd(int in, int out, void *data)
 	return (write_err || status);
 }
 
-static int apply_filter(const char *path, const char *src, size_t len, int fd,
+static int apply_single_file_filter(const char *path, const char *src, size_t len, int fd,
                         struct strbuf *dst, const char *cmd)
 {
 	/*
@@ -441,12 +442,6 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
 	struct async async;
 	struct filter_params params;
 
-	if (!cmd || !*cmd)
-		return 0;
-
-	if (!dst)
-		return 1;
-
 	memset(&async, 0, sizeof(async));
 	async.proc = filter_buffer_or_fd;
 	async.data = &params;
@@ -481,14 +476,245 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
 	return ret;
 }
 
+#define FILTER_CAPABILITIES_CLEAN    (1u<<0)
+#define FILTER_CAPABILITIES_SMUDGE   (1u<<1)
+#define FILTER_SUPPORTS_CLEAN(type)  ((type) & FILTER_CAPABILITIES_CLEAN)
+#define FILTER_SUPPORTS_SMUDGE(type) ((type) & FILTER_CAPABILITIES_SMUDGE)
+
+struct cmd2process {
+	struct hashmap_entry ent; /* must be the first member! */
+	int supported_capabilities;
+	const char *cmd;
+	struct child_process process;
+};
+
+static int cmd_process_map_initialized = 0;
+static struct hashmap cmd_process_map;
+
+static int cmd2process_cmp(const struct cmd2process *e1,
+                           const struct cmd2process *e2,
+                           const void *unused)
+{
+	return strcmp(e1->cmd, e2->cmd);
+}
+
+static struct cmd2process *find_multi_file_filter_entry(struct hashmap *hashmap, const char *cmd)
+{
+	struct cmd2process key;
+	hashmap_entry_init(&key, strhash(cmd));
+	key.cmd = cmd;
+	return hashmap_get(hashmap, &key, NULL);
+}
+
+static void kill_multi_file_filter(struct hashmap *hashmap, struct cmd2process *entry)
+{
+	if (!entry)
+		return;
+	sigchain_push(SIGPIPE, SIG_IGN);
+	close(entry->process.in);
+	close(entry->process.out);
+	sigchain_pop(SIGPIPE);
+	finish_command(&entry->process);
+	child_process_clear(&entry->process);
+	hashmap_remove(hashmap, entry, NULL);
+	free(entry);
+}
+
+static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, const char *cmd)
+{
+	int did_fail;
+	struct cmd2process *entry;
+	struct child_process *process;
+	const char *argv[] = { cmd, NULL };
+	static const char cap_key[] = "capabilities=";
+	int cap_key_len = strlen(cap_key);
+	struct string_list cap_list = STRING_LIST_INIT_NODUP;
+	char *cap_buf;
+	int i;
+
+	entry = xmalloc(sizeof(*entry));
+	hashmap_entry_init(entry, strhash(cmd));
+	entry->cmd = cmd;
+	entry->supported_capabilities = 0;
+	process = &entry->process;
+
+	child_process_init(process);
+	process->argv = argv;
+	process->use_shell = 1;
+	process->in = -1;
+	process->out = -1;
+
+	if (start_command(process)) {
+		error("cannot fork to run external filter '%s'", cmd);
+		kill_multi_file_filter(hashmap, entry);
+		return NULL;
+	}
+
+	did_fail = strcmp(packet_read_line(process->out, NULL), "git-filter-protocol");
+	if (did_fail)
+		goto done;
+
+	did_fail = strcmp(packet_read_line(process->out, NULL), "version=2");
+	if (did_fail)
+		goto done;
+
+	cap_buf = packet_read_line(process->out, NULL);
+	if (!cap_buf ||
+		strlen(cap_buf) <= cap_key_len ||
+		strncmp(cap_buf, cap_key, cap_key_len)) {
+		error("filter capabilities not found");
+		did_fail = 1;
+		goto done;
+	}
+
+	string_list_split_in_place(&cap_list, &cap_buf[cap_key_len], ' ', -1);
+	if (cap_list.nr > 0) {
+		for (i = 0; i < cap_list.nr; i++) {
+			const char *requested = cap_list.items[i].string;
+			if (!strcmp(requested, "clean")) {
+				entry->supported_capabilities |= FILTER_CAPABILITIES_CLEAN;
+			} else if (!strcmp(requested, "smudge")) {
+				entry->supported_capabilities |= FILTER_CAPABILITIES_SMUDGE;
+			} else {
+				warning(
+					"external filter '%s' requested unsupported filter capability '%s'",
+					cmd, requested
+				);
+			}
+		}
+	}
+	string_list_clear(&cap_list, 0);
+
+done:
+	if (did_fail) {
+		error("initialization for external filter '%s' failed", cmd);
+		kill_multi_file_filter(hashmap, entry);
+		return NULL;
+	}
+
+	hashmap_add(hashmap, entry);
+	return entry;
+}
+
+static int apply_multi_file_filter(const char *path, const char *src, size_t len,
+                                   int fd, struct strbuf *dst, const char *cmd,
+                                   const int wanted_capability)
+{
+	int ret = 1;
+	struct cmd2process *entry;
+	struct child_process *process;
+	struct stat file_stat;
+	struct strbuf nbuf = STRBUF_INIT;
+	char *filter_type;
+	char *filter_result = NULL;
+
+	if (!cmd_process_map_initialized) {
+		cmd_process_map_initialized = 1;
+		hashmap_init(&cmd_process_map, (hashmap_cmp_fn) cmd2process_cmp, 0);
+		entry = NULL;
+	} else {
+		entry = find_multi_file_filter_entry(&cmd_process_map, cmd);
+	}
+
+	fflush(NULL);
+
+	if (!entry) {
+		entry = start_multi_file_filter(&cmd_process_map, cmd);
+		if (!entry)
+			return 0;
+	}
+	process = &entry->process;
+
+	if (!(wanted_capability & entry->supported_capabilities))
+		return 1;  // it is OK if the wanted capability is not supported
+
+	if (FILTER_SUPPORTS_CLEAN(wanted_capability))
+		filter_type = "clean";
+	else if (FILTER_SUPPORTS_SMUDGE(wanted_capability))
+		filter_type = "smudge";
+	else
+		die("unexpected filter type");
+
+	if (fd >= 0 && !src) {
+		if (fstat(fd, &file_stat) == -1)
+			return 0;
+		len = file_stat.st_size;
+	}
+
+	packet_buf_write(&nbuf, "command=%s\n", filter_type);
+	ret = !direct_packet_write(process->in, nbuf.buf, nbuf.len, 1);
+	if (!ret)
+		goto done;
+
+	strbuf_reset(&nbuf);
+	packet_buf_write(&nbuf, "pathname=%s\n", path);
+	ret = !direct_packet_write(process->in, nbuf.buf, nbuf.len, 1);
+	if (!ret)
+		goto done;
+
+	if (fd >= 0)
+		ret = !packet_write_stream_with_flush_from_fd(fd, process->in);
+	else
+		ret = !packet_write_stream_with_flush_from_buf(src, len, process->in);
+	if (!ret)
+		goto done;
+
+	strbuf_reset(&nbuf);
+	ret = packet_read_till_flush(process->out, &nbuf) >= 0;
+	if (!ret)
+		goto done;
+
+	filter_result = packet_read_line(process->out, NULL);
+	ret = !strcmp(filter_result, "result=success");
+
+done:
+	if (ret) {
+		strbuf_swap(dst, &nbuf);
+	} else {
+		if (!filter_result || strcmp(filter_result, "result=reject")) {
+			// Something went wrong with the protocol filter. Force shutdown!
+			error("external filter '%s' failed", cmd);
+			kill_multi_file_filter(&cmd_process_map, entry);
+		}
+	}
+	strbuf_release(&nbuf);
+	return ret;
+}
+
 static struct convert_driver {
 	const char *name;
 	struct convert_driver *next;
 	const char *smudge;
 	const char *clean;
+	const char *process;
 	int required;
 } *user_convert, **user_convert_tail;
 
+static int apply_filter(const char *path, const char *src, size_t len,
+                        int fd, struct strbuf *dst, struct convert_driver *drv,
+                        const int wanted_capability)
+{
+	const char* cmd = NULL;
+
+	if (!drv)
+		return 0;
+
+	if (!dst)
+		return 1;
+
+	if (FILTER_SUPPORTS_CLEAN(wanted_capability) && drv->clean)
+		cmd = drv->clean;
+	else if (FILTER_SUPPORTS_SMUDGE(wanted_capability) && drv->smudge)
+		cmd = drv->smudge;
+
+	if (cmd && *cmd)
+		return apply_single_file_filter(path, src, len, fd, dst, cmd);
+	else if (drv->process && *drv->process)
+		return apply_multi_file_filter(path, src, len, fd, dst, drv->process, wanted_capability);
+
+	return 0;
+}
+
 static int read_convert_config(const char *var, const char *value, void *cb)
 {
 	const char *key, *name;
@@ -526,6 +752,10 @@ static int read_convert_config(const char *var, const char *value, void *cb)
 	if (!strcmp("clean", key))
 		return git_config_string(&drv->clean, var, value);
 
+	if (!strcmp("process", key)) {
+		return git_config_string(&drv->process, var, value);
+	}
+
 	if (!strcmp("required", key)) {
 		drv->required = git_config_bool(var, value);
 		return 0;
@@ -823,7 +1053,7 @@ int would_convert_to_git_filter_fd(const char *path)
 	if (!ca.drv->required)
 		return 0;
 
-	return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean);
+	return apply_filter(path, NULL, 0, -1, NULL, ca.drv, FILTER_CAPABILITIES_CLEAN);
 }
 
 const char *get_convert_attr_ascii(const char *path)
@@ -856,18 +1086,12 @@ int convert_to_git(const char *path, const char *src, size_t len,
                    struct strbuf *dst, enum safe_crlf checksafe)
 {
 	int ret = 0;
-	const char *filter = NULL;
-	int required = 0;
 	struct conv_attrs ca;
 
 	convert_attrs(&ca, path);
-	if (ca.drv) {
-		filter = ca.drv->clean;
-		required = ca.drv->required;
-	}
 
-	ret |= apply_filter(path, src, len, -1, dst, filter);
-	if (!ret && required)
+	ret |= apply_filter(path, src, len, -1, dst, ca.drv, FILTER_CAPABILITIES_CLEAN);
+	if (!ret && ca.drv && ca.drv->required)
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
 	if (ret && dst) {
@@ -889,9 +1113,9 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
 	convert_attrs(&ca, path);
 
 	assert(ca.drv);
-	assert(ca.drv->clean);
+	assert(ca.drv->clean || ca.drv->process);
 
-	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
+	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv, FILTER_CAPABILITIES_CLEAN))
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
 	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
@@ -903,15 +1127,9 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
 					    int normalizing)
 {
 	int ret = 0, ret_filter = 0;
-	const char *filter = NULL;
-	int required = 0;
 	struct conv_attrs ca;
 
 	convert_attrs(&ca, path);
-	if (ca.drv) {
-		filter = ca.drv->smudge;
-		required = ca.drv->required;
-	}
 
 	ret |= ident_to_worktree(path, src, len, dst, ca.ident);
 	if (ret) {
@@ -920,9 +1138,10 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
 	}
 	/*
 	 * CRLF conversion can be skipped if normalizing, unless there
-	 * is a smudge filter.  The filter might expect CRLFs.
+	 * is a smudge or process filter (even if the process filter doesn't
+	 * support smudge).  The filters might expect CRLFs.
 	 */
-	if (filter || !normalizing) {
+	if ((ca.drv && (ca.drv->smudge || ca.drv->process)) || !normalizing) {
 		ret |= crlf_to_worktree(path, src, len, dst, ca.crlf_action);
 		if (ret) {
 			src = dst->buf;
@@ -930,8 +1149,8 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
 		}
 	}
 
-	ret_filter = apply_filter(path, src, len, -1, dst, filter);
-	if (!ret_filter && required)
+	ret_filter = apply_filter(path, src, len, -1, dst, ca.drv, FILTER_CAPABILITIES_SMUDGE);
+	if (!ret_filter && ca.drv && ca.drv->required)
 		die("%s: smudge filter %s failed", path, ca.drv->name);
 
 	return ret | ret_filter;
@@ -1383,7 +1602,7 @@ struct stream_filter *get_stream_filter(const char *path, const unsigned char *s
 	struct stream_filter *filter = NULL;
 
 	convert_attrs(&ca, path);
-	if (ca.drv && (ca.drv->smudge || ca.drv->clean))
+	if (ca.drv && (ca.drv->process || ca.drv->smudge || ca.drv->clean))
 		return NULL;
 
 	if (ca.crlf_action == CRLF_AUTO || ca.crlf_action == CRLF_AUTO_CRLF)
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 34c8eb9..c1a22f4 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -42,6 +42,9 @@ test_expect_success setup '
 	rm -f test test.t test.i &&
 	git checkout -- test test.t test.i &&
 
+	echo "content-test2" >test2.o &&
+	echo "content-test3-subdir" >test3-subdir.o &&
+
 	mkdir generated-test-data &&
 	for i in $(test_seq 1 $T0021_LARGE_FILE_SIZE)
 	do
@@ -296,4 +299,323 @@ test_expect_success 'disable filter with empty override' '
 	test_must_be_empty err
 '
 
+check_filter () {
+	rm -f rot13-filter.log actual.log &&
+	"$@" 2> git_stderr.log &&
+	test_must_be_empty git_stderr.log &&
+	cat >expected.log &&
+	sort rot13-filter.log | uniq -c | sed "s/^[ ]*//" >actual.log &&
+	test_cmp expected.log actual.log
+}
+
+check_filter_count_clean () {
+	rm -f rot13-filter.log actual.log &&
+	"$@" 2> git_stderr.log &&
+	test_must_be_empty git_stderr.log &&
+	cat >expected.log &&
+	sort rot13-filter.log | uniq -c | sed "s/^[ ]*//" |
+		sed "s/^\([0-9]\) IN: clean/x IN: clean/" >actual.log &&
+	test_cmp expected.log actual.log
+}
+
+check_filter_ignore_clean () {
+	rm -f rot13-filter.log actual.log &&
+	"$@" &&
+	cat >expected.log &&
+	grep -v "IN: clean" rot13-filter.log >actual.log &&
+	test_cmp expected.log actual.log
+}
+
+check_filter_no_call () {
+	rm -f rot13-filter.log &&
+	"$@" 2> git_stderr.log &&
+	test_must_be_empty git_stderr.log &&
+	test_must_be_empty rot13-filter.log
+}
+
+check_rot13 () {
+	test_cmp $1 $2 &&
+	./../rot13.sh <$1 >expected &&
+	git cat-file blob :$2 >actual &&
+	test_cmp expected actual
+}
+
+test_expect_success PERL 'required process filter should filter data' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+		git add . &&
+		git commit . -m "test commit" &&
+		git branch empty &&
+
+		cat ../test.o >test.r &&
+		cat ../test2.o >test2.r &&
+		mkdir testsubdir &&
+		cat ../test3-subdir.o >testsubdir/test3-subdir.r &&
+		>test4-empty.r &&
+
+		check_filter \
+			git add . \
+				<<-\EOF &&
+					1 IN: clean test.r 57 [OK] -- OUT: 57 [OK]
+					1 IN: clean test2.r 14 [OK] -- OUT: 14 [OK]
+					1 IN: clean test4-empty.r 0 [OK] -- OUT: 0 [OK]
+					1 IN: clean testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 [OK]
+					1 start
+					1 wrote filter header
+				EOF
+
+		check_filter_count_clean \
+			git commit . -m "test commit" \
+				<<-\EOF &&
+					x IN: clean test.r 57 [OK] -- OUT: 57 [OK]
+					x IN: clean test2.r 14 [OK] -- OUT: 14 [OK]
+					x IN: clean test4-empty.r 0 [OK] -- OUT: 0 [OK]
+					x IN: clean testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 [OK]
+					1 start
+					1 wrote filter header
+				EOF
+
+		rm -f test?.r testsubdir/test3-subdir.r &&
+
+		check_filter_ignore_clean \
+			git checkout . \
+				<<-\EOF &&
+					start
+					wrote filter header
+					IN: smudge test2.r 14 [OK] -- OUT: 14 [OK]
+					IN: smudge testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 [OK]
+				EOF
+
+		check_filter_ignore_clean \
+			git checkout empty \
+				<<-\EOF &&
+					start
+					wrote filter header
+				EOF
+
+		check_filter_ignore_clean \
+			git checkout master \
+				<<-\EOF &&
+					start
+					wrote filter header
+					IN: smudge test.r 57 [OK] -- OUT: 57 [OK]
+					IN: smudge test2.r 14 [OK] -- OUT: 14 [OK]
+					IN: smudge test4-empty.r 0 [OK] -- OUT: 0 [OK]
+					IN: smudge testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 [OK]
+				EOF
+
+		check_rot13 ../test.o test.r &&
+		check_rot13 ../test2.o test2.r &&
+		check_rot13 ../test3-subdir.o testsubdir/test3-subdir.r
+	)
+'
+
+test_expect_success PERL 'required process filter should filter smudge data and one-shot filter should clean' '
+	test_config_global filter.protocol.clean ./../rot13.sh &&
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl smudge" &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+		git add . &&
+		git commit . -m "test commit" &&
+		git branch empty &&
+
+		cat ../test.o >test.r &&
+		cat ../test2.o >test2.r &&
+
+		check_filter_no_call \
+			git add . &&
+
+		check_filter_no_call \
+			git commit . -m "test commit" &&
+
+		rm -f test?.r testsubdir/test3-subdir.r &&
+
+		check_filter_ignore_clean \
+			git checkout . \
+				<<-\EOF &&
+					start
+					wrote filter header
+					IN: smudge test2.r 14 [OK] -- OUT: 14 [OK]
+				EOF
+
+		git checkout empty &&
+
+		check_filter_ignore_clean \
+			git checkout master\
+				<<-\EOF &&
+					start
+					wrote filter header
+					IN: smudge test.r 57 [OK] -- OUT: 57 [OK]
+					IN: smudge test2.r 14 [OK] -- OUT: 14 [OK]
+				EOF
+
+		check_rot13 ../test.o test.r &&
+		check_rot13 ../test2.o test2.r
+	)
+'
+
+test_expect_success PERL 'required process filter should clean only' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean" &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+		git add . &&
+		git commit . -m "test commit" &&
+		git branch empty &&
+
+		cat ../test.o >test.r &&
+
+		check_filter \
+			git add . \
+				<<-\EOF &&
+					1 IN: clean test.r 57 [OK] -- OUT: 57 [OK]
+					1 start
+					1 wrote filter header
+				EOF
+
+		check_filter_count_clean \
+			git commit . -m "test commit" \
+				<<-\EOF
+					x IN: clean test.r 57 [OK] -- OUT: 57 [OK]
+					1 start
+					1 wrote filter header
+				EOF
+	)
+'
+
+test_expect_success PERL 'required process filter should process files larger LARGE_PACKET_MAX' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.file filter=protocol" >.gitattributes &&
+		cat ../generated-test-data/largish.file.rot13 >large.rot13 &&
+		cat ../generated-test-data/largish.file >large.file &&
+		cat large.file >large.original &&
+
+		git add large.file .gitattributes &&
+		git commit . -m "test commit" &&
+
+		rm -f large.file &&
+		git checkout -- large.file &&
+		git cat-file blob :large.file >actual &&
+		test_cmp large.rot13 actual
+	)
+'
+
+test_expect_success PERL 'required process filter should with clean error should fail' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+
+		cat ../test.o >test.r &&
+		echo "this is going to fail" >clean-write-fail.r &&
+		echo "content-test3-subdir" >test3.r &&
+
+		# Note: There are three clean paths in convert.c we just test one here.
+		test_must_fail git add .
+	)
+'
+
+test_expect_success PERL 'process filter should restart after unexpected write failure' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+
+		cat ../test.o >test.r &&
+		cat ../test2.o >test2.r &&
+		echo "this is going to fail" >smudge-write-fail.o &&
+		cat smudge-write-fail.o >smudge-write-fail.r &&
+		git add . &&
+		git commit . -m "test commit" &&
+		rm -f *.r &&
+
+		check_filter_ignore_clean \
+			git checkout . \
+				<<-\EOF &&
+					start
+					wrote filter header
+					IN: smudge smudge-write-fail.r 22 [OK] -- OUT: 22 [WRITE FAIL]
+					start
+					wrote filter header
+					IN: smudge test.r 57 [OK] -- OUT: 57 [OK]
+					IN: smudge test2.r 14 [OK] -- OUT: 14 [OK]
+				EOF
+
+		check_rot13 ../test.o test.r &&
+		check_rot13 ../test2.o test2.r &&
+
+		! test_cmp smudge-write-fail.o smudge-write-fail.r && # Smudge failed!
+		./../rot13.sh <smudge-write-fail.o >expected &&
+		git cat-file blob :smudge-write-fail.r >actual &&
+		test_cmp expected actual							  # Clean worked!
+	)
+'
+
+test_expect_success PERL 'process filter should not restart after intentionally rejected file' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+
+		cat ../test.o >test.r &&
+		cat ../test2.o >test2.r &&
+		echo "this is going to be rejected" >reject.o &&
+		cat reject.o >reject.r &&
+		git add . &&
+		git commit . -m "test commit" &&
+		rm -f *.r &&
+
+		check_filter_ignore_clean \
+			git checkout . \
+				<<-\EOF &&
+					start
+					wrote filter header
+					IN: smudge reject.r 29 [OK] -- OUT: 0 [REJECT]
+					IN: smudge test.r 57 [OK] -- OUT: 57 [OK]
+					IN: smudge test2.r 14 [OK] -- OUT: 14 [OK]
+				EOF
+
+		check_rot13 ../test.o test.r &&
+		check_rot13 ../test2.o test2.r
+	)
+'
+
 test_done
diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
new file mode 100755
index 0000000..ca6d5e4
--- /dev/null
+++ b/t/t0021/rot13-filter.pl
@@ -0,0 +1,148 @@
+#!/usr/bin/perl
+#
+# Example implementation for the Git filter protocol version 2
+# See Documentation/gitattributes.txt, section "Filter Protocol"
+#
+# The script takes the list of supported protocol capabilities as
+# arguments ("clean", "smudge", etc).
+#
+# This implementation supports three special test cases:
+# (1) If data with the pathname "clean-write-fail.r" is processed with
+#     a "clean" operation then the write operation will die.
+# (2) If data with the pathname "smudge-write-fail.r" is processed with
+#     a "smudge" operation then the write operation will die.
+# (3) If data with the pathname "reject.r" is processed with any
+#     operation then the filter signals that it does not want to process
+#     the file.
+#
+
+use strict;
+use warnings;
+
+my $MAX_PACKET_CONTENT_SIZE = 65516;
+my @capabilities            = @ARGV;
+
+sub rot13 {
+    my ($str) = @_;
+    $str =~ y/A-Za-z/N-ZA-Mn-za-m/;
+    return $str;
+}
+
+sub packet_read {
+    my $buffer;
+    my $bytes_read = read STDIN, $buffer, 4;
+    if ( $bytes_read == 0 ) {
+        return;
+    }
+    elsif ( $bytes_read != 4 ) {
+        die "invalid packet size '$bytes_read' field";
+    }
+    my $pkt_size = hex($buffer);
+    if ( $pkt_size == 0 ) {
+        return ( 1, "" );
+    }
+    elsif ( $pkt_size > 4 ) {
+        my $content_size = $pkt_size - 4;
+        $bytes_read = read STDIN, $buffer, $content_size;
+        if ( $bytes_read != $content_size ) {
+            die "invalid packet ($content_size expected; $bytes_read read)";
+        }
+        return ( 0, $buffer );
+    }
+    else {
+        die "invalid packet size";
+    }
+}
+
+sub packet_write {
+    my ($packet) = @_;
+    print STDOUT sprintf( "%04x", length($packet) + 4 );
+    print STDOUT $packet;
+    STDOUT->flush();
+}
+
+sub packet_flush {
+    print STDOUT sprintf( "%04x", 0 );
+    STDOUT->flush();
+}
+
+open my $debug, ">>", "rot13-filter.log";
+print $debug "start\n";
+$debug->flush();
+
+packet_write("git-filter-protocol\n");
+packet_write("version=2\n");
+packet_write( "capabilities=" . join( ' ', @capabilities ) . "\n" );
+print $debug "wrote filter header\n";
+$debug->flush();
+
+while (1) {
+    my ($command) = packet_read() =~ /^command=([^=]+)\n$/;
+    unless ( defined($command) ) {
+        exit();
+    }
+    print $debug "IN: $command";
+    $debug->flush();
+
+    my ($pathname) = packet_read() =~ /^pathname=([^=]+)\n$/;
+    print $debug " $pathname";
+    $debug->flush();
+
+    my $input = "";
+    {
+        binmode(STDIN);
+        my $buffer;
+        my $done = 0;
+        while ( !$done ) {
+            ( $done, $buffer ) = packet_read();
+            $input .= $buffer;
+        }
+        print $debug " " . length($input) . " [OK] -- ";
+        $debug->flush();
+    }
+
+    my $output;
+    if ( $pathname eq "reject.r" ) {
+        $output = "";
+    }
+    elsif ( $command eq "clean" and grep( /^clean$/, @capabilities ) ) {
+        $output = rot13($input);
+    }
+    elsif ( $command eq "smudge" and grep( /^smudge$/, @capabilities ) ) {
+        $output = rot13($input);
+    }
+    else {
+        die "bad command $command";
+    }
+
+    print $debug "OUT: " . length($output) . " ";
+    $debug->flush();
+
+    if ( $pathname eq "${command}-write-fail.r" ) {
+        print $debug "[WRITE FAIL]\n";
+        $debug->flush();
+        die "write error";
+    }
+    elsif ( $pathname eq "reject.r" ) {
+        packet_flush();
+        print $debug "[REJECT]\n";
+        $debug->flush();
+        packet_write("result=reject\n");
+    }
+    else {
+        while ( length($output) > 0 ) {
+            my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE );
+            packet_write($packet);
+            if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) {
+                $output = substr( $output, $MAX_PACKET_CONTENT_SIZE );
+            }
+            else {
+                $output = "";
+            }
+        }
+        packet_flush();
+        print $debug "[OK]\n";
+        $debug->flush();
+        packet_write("result=success\n");
+    }
+}
-- 
2.9.0


  parent reply	other threads:[~2016-08-03 16:43 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20160727000605.49982-1-larsxschneider%40gmail.com/>
2016-07-29 23:37 ` [PATCH v3 00/10] Git filter protocol larsxschneider
2016-07-29 23:37   ` [PATCH v3 01/10] pkt-line: extract set_packet_header() larsxschneider
2016-07-30 10:30     ` Jakub Narębski
2016-08-01 11:33       ` Lars Schneider
2016-08-03 20:05         ` Jakub Narębski
2016-08-05 11:52           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 02/10] pkt-line: add direct_packet_write() and direct_packet_write_data() larsxschneider
2016-07-30 10:49     ` Jakub Narębski
2016-08-01 12:00       ` Lars Schneider
2016-08-03 20:12         ` Jakub Narębski
2016-08-05 12:02           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 03/10] pkt-line: add packet_flush_gentle() larsxschneider
2016-07-30 12:04     ` Jakub Narębski
2016-08-01 12:28       ` Lars Schneider
2016-07-31 20:36     ` Torstem Bögershausen
2016-07-31 21:45       ` Lars Schneider
2016-08-02 19:56         ` Torsten Bögershausen
2016-08-05  9:59           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 04/10] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
2016-07-30 12:29     ` Jakub Narębski
2016-08-01 12:18       ` Lars Schneider
2016-08-03 20:15         ` Jakub Narębski
2016-07-29 23:37   ` [PATCH v3 05/10] pack-protocol: fix maximum pkt-line size larsxschneider
2016-07-30 13:58     ` Jakub Narębski
2016-08-01 12:23       ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 06/10] run-command: add clean_on_exit_handler larsxschneider
2016-07-30  9:50     ` Johannes Sixt
2016-08-01 11:14       ` Lars Schneider
2016-08-02  5:53         ` Johannes Sixt
2016-08-02  7:41           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 07/10] convert: quote filter names in error messages larsxschneider
2016-07-29 23:37   ` [PATCH v3 08/10] convert: modernize tests larsxschneider
2016-07-29 23:38   ` [PATCH v3 09/10] convert: generate large test files only once larsxschneider
2016-07-29 23:38   ` [PATCH v3 10/10] convert: add filter.<driver>.process option larsxschneider
2016-07-30 22:05     ` Jakub Narębski
2016-07-31  9:42       ` Jakub Narębski
2016-07-31 19:49         ` Lars Schneider
2016-07-31 22:59           ` Jakub Narębski
2016-08-01 13:32       ` Lars Schneider
2016-08-03 18:30         ` Designing the filter process protocol (was: Re: [PATCH v3 10/10] convert: add filter.<driver>.process option) Jakub Narębski
2016-08-05 10:32           ` Lars Schneider
2016-08-06 18:24           ` Lars Schneider
2016-08-03 22:47         ` [PATCH v3 10/10] convert: add filter.<driver>.process option Jakub Narębski
2016-07-31 22:19     ` Jakub Narębski
2016-08-01 17:55       ` Lars Schneider
2016-08-04  0:42         ` Jakub Narębski
2016-08-03 13:10       ` Lars Schneider
2016-08-04 10:18         ` Jakub Narębski
2016-08-05 13:20           ` Lars Schneider
2016-08-03 16:42   ` [PATCH v4 00/12] Git filter protocol larsxschneider
2016-08-03 16:42     ` [PATCH v4 01/12] pkt-line: extract set_packet_header() larsxschneider
2016-08-03 20:18       ` Junio C Hamano
2016-08-03 21:12         ` Jeff King
2016-08-03 21:27           ` Jeff King
2016-08-04 16:14           ` Junio C Hamano
2016-08-05 14:55             ` Lars Schneider
2016-08-05 16:31               ` Junio C Hamano
2016-08-05 17:31             ` Lars Schneider
2016-08-05 17:41               ` Junio C Hamano
2016-08-03 21:56         ` Lars Schneider
2016-08-03 16:42     ` [PATCH v4 02/12] pkt-line: add direct_packet_write() and direct_packet_write_data() larsxschneider
2016-08-03 16:42     ` [PATCH v4 03/12] pkt-line: add packet_flush_gentle() larsxschneider
2016-08-03 21:39       ` Jeff King
2016-08-03 22:56         ` [PATCH 0/7] minor trace fixes and cosmetic improvements Jeff King
2016-08-03 22:56           ` [PATCH 1/7] trace: handle NULL argument in trace_disable() Jeff King
2016-08-03 22:58           ` [PATCH 2/7] trace: stop using write_or_whine_pipe() Jeff King
2016-08-03 22:58           ` [PATCH 3/7] trace: use warning() for printing trace errors Jeff King
2016-08-04 20:41             ` Junio C Hamano
2016-08-04 21:21               ` Jeff King
2016-08-04 21:28                 ` Junio C Hamano
2016-08-05  7:56                   ` Jeff King
2016-08-05  7:59                   ` Christian Couder
2016-08-05 18:41                     ` Junio C Hamano
2016-08-03 23:00           ` [PATCH 4/7] trace: cosmetic fixes for error messages Jeff King
2016-08-04 20:42             ` Junio C Hamano
2016-08-05  8:00               ` Jeff King
2016-08-03 23:00           ` [PATCH 5/7] trace: correct variable name in write() error message Jeff King
2016-08-03 23:01           ` [PATCH 6/7] trace: disable key after write error Jeff King
2016-08-04 20:45             ` Junio C Hamano
2016-08-04 21:22               ` Jeff King
2016-08-05  7:58                 ` Jeff King
2016-08-03 23:01           ` [PATCH 7/7] write_or_die: drop write_or_whine_pipe() Jeff King
2016-08-03 23:04           ` [PATCH 0/7] minor trace fixes and cosmetic improvements Jeff King
2016-08-04 16:16         ` [PATCH v4 03/12] pkt-line: add packet_flush_gentle() Junio C Hamano
2016-08-03 16:42     ` [PATCH v4 04/12] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
2016-08-03 16:42     ` [PATCH v4 05/12] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
2016-08-03 16:42     ` [PATCH v4 06/12] pack-protocol: fix maximum pkt-line size larsxschneider
2016-08-03 16:42     ` [PATCH v4 07/12] run-command: add clean_on_exit_handler larsxschneider
2016-08-03 21:24       ` Jeff King
2016-08-03 22:15         ` Lars Schneider
2016-08-03 22:53           ` Jeff King
2016-08-03 23:09             ` Lars Schneider
2016-08-03 23:15               ` Jeff King
2016-08-05 13:08                 ` Lars Schneider
2016-08-05 21:19                   ` Torsten Bögershausen
2016-08-05 21:50                     ` Lars Schneider
2016-08-03 16:42     ` [PATCH v4 08/12] convert: quote filter names in error messages larsxschneider
2016-08-03 16:42     ` [PATCH v4 09/12] convert: modernize tests larsxschneider
2016-08-03 16:42     ` [PATCH v4 10/12] convert: generate large test files only once larsxschneider
2016-08-03 16:42     ` larsxschneider [this message]
2016-08-03 17:45       ` [PATCH v4 11/12] convert: add filter.<driver>.process option Junio C Hamano
2016-08-03 21:48         ` Lars Schneider
2016-08-03 22:46           ` Jeff King
2016-08-05 12:53             ` Lars Schneider
2016-08-03 20:29       ` Junio C Hamano
2016-08-03 21:37         ` Lars Schneider
2016-08-03 21:43           ` Junio C Hamano
2016-08-03 22:01             ` Lars Schneider
2016-08-05 21:34       ` Torsten Bögershausen
2016-08-05 21:49         ` Lars Schneider
2016-08-05 22:06         ` Junio C Hamano
2016-08-05 22:27           ` Jeff King
2016-08-06 11:55             ` Lars Schneider
2016-08-06 12:14               ` Jeff King
2016-08-06 18:19                 ` Lars Schneider
2016-08-08 15:02                   ` Jeff King
2016-08-08 16:21                     ` Lars Schneider
2016-08-08 16:26                       ` Jeff King
2016-08-06 20:40           ` Torsten Bögershausen
2016-08-03 16:42     ` [PATCH v4 12/12] convert: add filter.<driver>.process shutdown command option larsxschneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160803164225.46355-12-larsxschneider@gmail.com \
    --to=larsxschneider@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=mlbright@gmail.com \
    --cc=peff@peff.net \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).