[PATCH v10 00/14] Git filter protocol

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH v10 00/14] Git filter protocol
@ 2016-10-08 11:25 larsxschneider
  2016-10-08 11:25 ` [PATCH v10 01/14] convert: quote filter names in error messages larsxschneider
                   ` (13 more replies)
  0 siblings, 14 replies; 34+ messages in thread
From: larsxschneider @ 2016-10-08 11:25 UTC (permalink / raw)
  To: git; +Cc: gitster, jnareb, peff, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

The goal of this series is to avoid launching a new clean/smudge filter
process for each file that is filtered.

A short summary about v1 to v5 can be found here:
https://git.github.io/rev_news/2016/08/17/edition-18/

This series is also published on web:
https://github.com/larsxschneider/git/pull/14

Patches 1 and 2 are cleanups and not strictly necessary for the series.
Patches 3 to 12 are required preparation. Patch 13 is the main patch.
Patch 14 adds an example how to use the Git filter protocol in contrib.

Thanks a lot to
 Jakub, Junio, and Peff
for very helpful reviews,
Lars

## Changes since v9
  * replace the very specific "wait after close(stdin)" behavior with the
    more flexible run-command "clean_on_exit_handler" flag to fix flaky t0021,
    see discussion:
    http://public-inbox.org/git/xmqq37k9mm7k.fsf@gitster.mtv.corp.google.com/
    http://public-inbox.org/git/xmqq8tubitjs.fsf@gitster.mtv.corp.google.com/
  * run stop_multi_file_filter() for all filters on Git shutdown
  * actually kill filter in kill_multi_file_filter()
  * add new filter process to hashmap only if the process start was successful
  * remove superfluous fstat() call
  * avoid potential buffer overflow in packet_write_gently() packet size check
  * remove superfluous buffer in write_packetized_from_buf()
  * improve test name


Lars Schneider (14):
  convert: quote filter names in error messages
  convert: modernize tests
  run-command: move check_pipe() from write_or_die to run_command
  run-command: add clean_on_exit_handler
  pkt-line: rename packet_write() to packet_write_fmt()
  pkt-line: extract set_packet_header()
  pkt-line: add packet_write_fmt_gently()
  pkt-line: add packet_flush_gently()
  pkt-line: add packet_write_gently()
  pkt-line: add functions to read/write flush terminated packet streams
  convert: make apply_filter() adhere to standard Git error handling
  convert: prepare filter.<driver>.process option
  convert: add filter.<driver>.process option
  contrib/long-running-filter: add long running filter example

 Documentation/gitattributes.txt        | 159 ++++++++++-
 builtin/archive.c                      |   4 +-
 builtin/receive-pack.c                 |   4 +-
 builtin/remote-ext.c                   |   4 +-
 builtin/upload-archive.c               |   4 +-
 connect.c                              |   2 +-
 contrib/long-running-filter/example.pl | 127 +++++++++
 convert.c                              | 372 +++++++++++++++++++++---
 daemon.c                               |   2 +-
 http-backend.c                         |   2 +-
 pkt-line.c                             | 152 +++++++++-
 pkt-line.h                             |  12 +-
 run-command.c                          |  36 ++-
 run-command.h                          |   4 +-
 shallow.c                              |   2 +-
 t/t0021-conversion.sh                  | 505 ++++++++++++++++++++++++++++++---
 t/t0021/rot13-filter.pl                | 191 +++++++++++++
 upload-pack.c                          |  30 +-
 write_or_die.c                         |  13 -
 19 files changed, 1492 insertions(+), 133 deletions(-)
 create mode 100755 contrib/long-running-filter/example.pl
 create mode 100755 t/t0021/rot13-filter.pl



## Interdiff (v9..v10)

diff --git a/convert.c b/convert.c
index 88581d6..1d89632 100644
--- a/convert.c
+++ b/convert.c
@@ -516,23 +516,6 @@ static struct cmd2process *find_multi_file_filter_entry(struct hashmap *hashmap,
 	return hashmap_get(hashmap, &key, NULL);
 }

-static void kill_multi_file_filter(struct hashmap *hashmap, struct cmd2process *entry)
-{
-	if (!entry)
-		return;
-	sigchain_push(SIGPIPE, SIG_IGN);
-	/*
-	 * We kill the filter most likely because an error happened already.
-	 * That's why we are not interested in any error code here.
-	 */
-	close(entry->process.in);
-	close(entry->process.out);
-	sigchain_pop(SIGPIPE);
-	finish_command(&entry->process);
-	hashmap_remove(hashmap, entry, NULL);
-	free(entry);
-}
-
 static int packet_write_list(int fd, const char *line, ...)
 {
 	va_list args;
@@ -552,6 +535,49 @@ static int packet_write_list(int fd, const char *line, ...)
 	return packet_flush_gently(fd);
 }

+static void read_multi_file_filter_status(int fd, struct strbuf *status) {
+	struct strbuf **pair;
+	char *line;
+	for (;;) {
+		line = packet_read_line(fd, NULL);
+		if (!line)
+			break;
+		pair = strbuf_split_str(line, '=', 2);
+		if (pair[0] && pair[0]->len && pair[1]) {
+			/* the last "status=<foo>" line wins */
+			if (!strcmp(pair[0]->buf, "status=")) {
+				strbuf_reset(status);
+				strbuf_addbuf(status, pair[1]);
+			}
+		}
+		strbuf_list_free(pair);
+	}
+}
+
+static void kill_multi_file_filter(struct hashmap *hashmap, struct cmd2process *entry)
+{
+	if (!entry)
+		return;
+
+	entry->process.clean_on_exit = 0;
+	kill(entry->process.pid, SIGTERM);
+	finish_command(&entry->process);
+
+	hashmap_remove(hashmap, entry, NULL);
+	free(entry);
+}
+
+void stop_multi_file_filter(struct child_process *process)
+{
+	sigchain_push(SIGPIPE, SIG_IGN);
+	/* Closing the pipe signals the filter to initiate a shutdown. */
+	close(process->in);
+	close(process->out);
+	sigchain_pop(SIGPIPE);
+	/* Finish command will wait until the shutdown is complete. */
+	finish_command(process);
+}
+
 static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, const char *cmd)
 {
 	int err;
@@ -563,7 +589,6 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
 	const char *cap_name;

 	entry = xmalloc(sizeof(*entry));
-	hashmap_entry_init(entry, strhash(cmd));
 	entry->cmd = cmd;
 	entry->supported_capabilities = 0;
 	process = &entry->process;
@@ -573,14 +598,16 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
 	process->use_shell = 1;
 	process->in = -1;
 	process->out = -1;
-	process->wait_on_exit = 1;
+	process->clean_on_exit = 1;
+	process->clean_on_exit_handler = stop_multi_file_filter;

 	if (start_command(process)) {
 		error("cannot fork to run external filter '%s'", cmd);
-		kill_multi_file_filter(hashmap, entry);
 		return NULL;
 	}

+	hashmap_entry_init(entry, strhash(cmd));
+
 	sigchain_push(SIGPIPE, SIG_IGN);

 	err = packet_write_list(process->in, "git-filter-client", "version=2", NULL);
@@ -635,24 +662,6 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
 	return entry;
 }

-static void read_multi_file_filter_status(int fd, struct strbuf *status) {
-	struct strbuf **pair;
-	char *line;
-	for (;;) {
-		line = packet_read_line(fd, NULL);
-		if (!line)
-			break;
-		pair = strbuf_split_str(line, '=', 2);
-		if (pair[0] && pair[0]->len && pair[1]) {
-			if (!strcmp(pair[0]->buf, "status=")) {
-				strbuf_reset(status);
-				strbuf_addbuf(status, pair[1]);
-			}
-		}
-		strbuf_list_free(pair);
-	}
-}
-
 static int apply_multi_file_filter(const char *path, const char *src, size_t len,
 				   int fd, struct strbuf *dst, const char *cmd,
 				   const unsigned int wanted_capability)
@@ -660,10 +669,9 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	int err;
 	struct cmd2process *entry;
 	struct child_process *process;
-	struct stat file_stat;
 	struct strbuf nbuf = STRBUF_INIT;
 	struct strbuf filter_status = STRBUF_INIT;
-	char *filter_type;
+	const char *filter_type;

 	if (!cmd_process_map_initialized) {
 		cmd_process_map_initialized = 1;
@@ -692,12 +700,6 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	else
 		die("unexpected filter type");

-	if (fd >= 0 && !src) {
-		if (fstat(fd, &file_stat) == -1)
-			return 0;
-		len = xsize_t(file_stat.st_size);
-	}
-
 	sigchain_push(SIGPIPE, SIG_IGN);

 	assert(strlen(filter_type) < LARGE_PACKET_DATA_MAX - strlen("command=\n"));
diff --git a/pkt-line.c b/pkt-line.c
index b82aaca..0b5125f 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -174,12 +174,13 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...)
 static int packet_write_gently(const int fd_out, const char *buf, size_t size)
 {
 	static char packet_write_buffer[LARGE_PACKET_MAX];
-	const size_t packet_size = size + 4;
+	size_t packet_size;

-	if (packet_size > sizeof(packet_write_buffer))
+	if (size > sizeof(packet_write_buffer) - 4)
 		return error("packet write failed - data exceeds max packet size");

 	packet_trace(buf, size, 1);
+	packet_size = size + 4;
 	set_packet_header(packet_write_buffer, packet_size);
 	memcpy(packet_write_buffer + 4, buf, size);
 	if (write_in_full(fd_out, packet_write_buffer, packet_size) == packet_size)
@@ -217,14 +218,13 @@ int write_packetized_from_fd(int fd_in, int fd_out)

 int write_packetized_from_buf(const char *src_in, size_t len, int fd_out)
 {
-	static char buf[LARGE_PACKET_DATA_MAX];
 	int err = 0;
 	size_t bytes_written = 0;
 	size_t bytes_to_write;

 	while (!err) {
-		if ((len - bytes_written) > sizeof(buf))
-			bytes_to_write = sizeof(buf);
+		if ((len - bytes_written) > LARGE_PACKET_DATA_MAX)
+			bytes_to_write = LARGE_PACKET_DATA_MAX;
 		else
 			bytes_to_write = len - bytes_written;
 		if (bytes_to_write == 0)
diff --git a/run-command.c b/run-command.c
index 96c54fe..e5fd6ff 100644
--- a/run-command.c
+++ b/run-command.c
@@ -21,9 +21,7 @@ void child_process_clear(struct child_process *child)

 struct child_to_clean {
 	pid_t pid;
-	char *name;
-	int stdin;
-	int wait;
+	struct child_process *process;
 	struct child_to_clean *next;
 };
 static struct child_to_clean *children_to_clean;
@@ -31,35 +29,23 @@ static int installed_child_cleanup_handler;

 static void cleanup_children(int sig, int in_signal)
 {
-	int status;
-	struct child_to_clean *p = children_to_clean;
-
-	/* Close the the child's stdin as indicator that Git will exit soon */
-	while (p) {
-		if (p->wait)
-			if (p->stdin > 0)
-				close(p->stdin);
-		p = p->next;
-	}
-
 	while (children_to_clean) {
-		p = children_to_clean;
+		struct child_to_clean *p = children_to_clean;
 		children_to_clean = p->next;

-		if (p->wait) {
-			fprintf(stderr, _("Waiting for '%s' to finish..."), p->name);
-			while ((waitpid(p->pid, &status, 0)) < 0 && errno == EINTR)
-				;	/* nothing */
-			fprintf(stderr, _("done!\n"));
+		if (p->process && !in_signal) {
+			struct child_process *process = p->process;
+			if (process->clean_on_exit_handler) {
+				trace_printf("trace: run_command: running exit handler for pid %d", p->pid);
+				process->clean_on_exit_handler(process);
+			}
 		}

 		kill(p->pid, sig);
-		if (!in_signal) {
-			free(p->name);
+		if (!in_signal)
 			free(p);
 	}
 }
-}

 static void cleanup_children_on_signal(int sig)
 {
@@ -73,16 +59,11 @@ static void cleanup_children_on_exit(void)
 	cleanup_children(SIGTERM, 0);
 }

-static void mark_child_for_cleanup(pid_t pid, const char *name, int stdin, int wait)
+static void mark_child_for_cleanup(pid_t pid, struct child_process *process)
 {
 	struct child_to_clean *p = xmalloc(sizeof(*p));
 	p->pid = pid;
-	p->wait = wait;
-	p->stdin = stdin;
-	if (name)
-		p->name = xstrdup(name);
-	else
-		p->name = "process";
+	p->process = process;
 	p->next = children_to_clean;
 	children_to_clean = p;

@@ -93,13 +74,6 @@ static void mark_child_for_cleanup(pid_t pid, const char *name, int stdin, int w
 	}
 }

-#ifdef NO_PTHREADS
-static void mark_child_for_cleanup_no_wait(pid_t pid, const char *name, int timeout, int stdin)
-{
-	mark_child_for_cleanup(pid, NULL, 0, 0);
-}
-#endif
-
 static void clear_child_for_cleanup(pid_t pid)
 {
 	struct child_to_clean **pp;
@@ -458,9 +432,8 @@ int start_command(struct child_process *cmd)
 	}
 	if (cmd->pid < 0)
 		error_errno("cannot fork() for %s", cmd->argv[0]);
-	else if (cmd->clean_on_exit || cmd->wait_on_exit)
-		mark_child_for_cleanup(
-			cmd->pid, cmd->argv[0], cmd->in, cmd->wait_on_exit);
+	else if (cmd->clean_on_exit)
+		mark_child_for_cleanup(cmd->pid, cmd);

 	/*
 	 * Wait for child's execvp. If the execvp succeeds (or if fork()
@@ -520,9 +493,8 @@ int start_command(struct child_process *cmd)
 	failed_errno = errno;
 	if (cmd->pid < 0 && (!cmd->silent_exec_failure || errno != ENOENT))
 		error_errno("cannot spawn %s", cmd->argv[0]);
-	if ((cmd->clean_on_exit || cmd->wait_on_exit) && cmd->pid >= 0)
-		mark_child_for_cleanup(
-			cmd->pid, cmd->argv[0], cmd->in, cmd->clean_on_exit_timeout);
+	if (cmd->clean_on_exit && cmd->pid >= 0)
+		mark_child_for_cleanup(cmd->pid, cmd);

 	argv_array_clear(&nargv);
 	cmd->argv = sargv;
@@ -804,7 +776,7 @@ int start_async(struct async *async)
 		exit(!!async->proc(proc_in, proc_out, async->data));
 	}

-	mark_child_for_cleanup_no_wait(async->pid);
+	mark_child_for_cleanup(async->pid, NULL);

 	if (need_in)
 		close(fdin[0]);
diff --git a/run-command.h b/run-command.h
index f7b9907..dd1c78c 100644
--- a/run-command.h
+++ b/run-command.h
@@ -42,10 +42,9 @@ struct child_process {
 	unsigned silent_exec_failure:1;
 	unsigned stdout_to_stderr:1;
 	unsigned use_shell:1;
-	 /* kill the child on Git exit */
 	unsigned clean_on_exit:1;
-	/* close the child's stdin on Git exit and wait until it terminates */
-	unsigned wait_on_exit:1;
+	void (*clean_on_exit_handler)(struct child_process *process);
+	void *clean_on_exit_handler_cbdata;
 };

 #define CHILD_PROCESS_INIT { NULL, ARGV_ARRAY_INIT, ARGV_ARRAY_INIT }
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 52b7fe9..9f892c0 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -28,9 +28,7 @@ file_size () {
 filter_git () {
 	rm -f rot13-filter.log &&
 	git "$@" 2>git-stderr.log &&
-	sed '/Waiting for/d' git-stderr.log >git-stderr-clean.log &&
-	test_must_be_empty git-stderr-clean.log &&
-	rm -f git-stderr.log git-stderr-clean.log
+	rm -f git-stderr.log
 }

 # Count unique lines in two files and compare them.
@@ -668,7 +666,7 @@ test_expect_success PERL 'process filter should not be restarted if it signals a
 	)
 '

-test_expect_success PERL 'process filter signals abort once to abort processing of all future files' '
+test_expect_success PERL 'process filter abort stops processing of all further files' '
 	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
 	rm -rf repo &&
 	mkdir repo &&
@@ -688,6 +686,8 @@ test_expect_success PERL 'process filter signals abort once to abort processing
 		git add . &&
 		rm -f *.r &&

+		# Note: This test assumes that Git filters files in alphabetical
+		# order ("abort.r" before "test.r").
 		filter_git checkout --quiet --no-progress . &&
 		cat >expected.log <<-EOF &&
 			START

--
2.10.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v10 01/14] convert: quote filter names in error messages
  2016-10-08 11:25 [PATCH v10 00/14] Git filter protocol larsxschneider
@ 2016-10-08 11:25 ` larsxschneider
  2016-10-08 11:25 ` [PATCH v10 02/14] convert: modernize tests larsxschneider
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: larsxschneider @ 2016-10-08 11:25 UTC (permalink / raw)
  To: git; +Cc: gitster, jnareb, peff, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Git filter driver commands with spaces (e.g. `filter.sh foo`) are hard
to read in error messages. Quote them to improve the readability.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 convert.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/convert.c b/convert.c
index 077f5e6..986c239 100644
--- a/convert.c
+++ b/convert.c
@@ -412,7 +412,7 @@ static int filter_buffer_or_fd(int in, int out, void *data)
 	child_process.out = out;
 
 	if (start_command(&child_process))
-		return error("cannot fork to run external filter %s", params->cmd);
+		return error("cannot fork to run external filter '%s'", params->cmd);
 
 	sigchain_push(SIGPIPE, SIG_IGN);
 
@@ -430,13 +430,13 @@ static int filter_buffer_or_fd(int in, int out, void *data)
 	if (close(child_process.in))
 		write_err = 1;
 	if (write_err)
-		error("cannot feed the input to external filter %s", params->cmd);
+		error("cannot feed the input to external filter '%s'", params->cmd);
 
 	sigchain_pop(SIGPIPE);
 
 	status = finish_command(&child_process);
 	if (status)
-		error("external filter %s failed %d", params->cmd, status);
+		error("external filter '%s' failed %d", params->cmd, status);
 
 	strbuf_release(&cmd);
 	return (write_err || status);
@@ -477,15 +477,15 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
 		return 0;	/* error was already reported */
 
 	if (strbuf_read(&nbuf, async.out, len) < 0) {
-		error("read from external filter %s failed", cmd);
+		error("read from external filter '%s' failed", cmd);
 		ret = 0;
 	}
 	if (close(async.out)) {
-		error("read from external filter %s failed", cmd);
+		error("read from external filter '%s' failed", cmd);
 		ret = 0;
 	}
 	if (finish_async(&async)) {
-		error("external filter %s failed", cmd);
+		error("external filter '%s' failed", cmd);
 		ret = 0;
 	}
 
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v10 02/14] convert: modernize tests
  2016-10-08 11:25 [PATCH v10 00/14] Git filter protocol larsxschneider
  2016-10-08 11:25 ` [PATCH v10 01/14] convert: quote filter names in error messages larsxschneider
@ 2016-10-08 11:25 ` larsxschneider
  2016-10-08 11:25 ` [PATCH v10 03/14] run-command: move check_pipe() from write_or_die to run_command larsxschneider
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: larsxschneider @ 2016-10-08 11:25 UTC (permalink / raw)
  To: git; +Cc: gitster, jnareb, peff, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Use `test_config` to set the config, check that files are empty with
`test_must_be_empty`, compare files with `test_cmp`, and remove spaces
after ">" and "<".

Please note that the "rot13" filter configured in "setup" keeps using
`git config` instead of `test_config` because subsequent tests might
depend on it.

Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t0021-conversion.sh | 58 +++++++++++++++++++++++++--------------------------
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index e799e59..dc50938 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -38,8 +38,8 @@ script='s/^\$Id: \([0-9a-f]*\) \$/\1/p'
 
 test_expect_success check '
 
-	cmp test.o test &&
-	cmp test.o test.t &&
+	test_cmp test.o test &&
+	test_cmp test.o test.t &&
 
 	# ident should be stripped in the repository
 	git diff --raw --exit-code :test :test.i &&
@@ -47,10 +47,10 @@ test_expect_success check '
 	embedded=$(sed -ne "$script" test.i) &&
 	test "z$id" = "z$embedded" &&
 
-	git cat-file blob :test.t > test.r &&
+	git cat-file blob :test.t >test.r &&
 
-	./rot13.sh < test.o > test.t &&
-	cmp test.r test.t
+	./rot13.sh <test.o >test.t &&
+	test_cmp test.r test.t
 '
 
 # If an expanded ident ever gets into the repository, we want to make sure that
@@ -130,7 +130,7 @@ test_expect_success 'filter shell-escaped filenames' '
 
 	# delete the files and check them out again, using a smudge filter
 	# that will count the args and echo the command-line back to us
-	git config filter.argc.smudge "sh ./argc.sh %f" &&
+	test_config filter.argc.smudge "sh ./argc.sh %f" &&
 	rm "$normal" "$special" &&
 	git checkout -- "$normal" "$special" &&
 
@@ -141,7 +141,7 @@ test_expect_success 'filter shell-escaped filenames' '
 	test_cmp expect "$special" &&
 
 	# do the same thing, but with more args in the filter expression
-	git config filter.argc.smudge "sh ./argc.sh %f --my-extra-arg" &&
+	test_config filter.argc.smudge "sh ./argc.sh %f --my-extra-arg" &&
 	rm "$normal" "$special" &&
 	git checkout -- "$normal" "$special" &&
 
@@ -154,9 +154,9 @@ test_expect_success 'filter shell-escaped filenames' '
 '
 
 test_expect_success 'required filter should filter data' '
-	git config filter.required.smudge ./rot13.sh &&
-	git config filter.required.clean ./rot13.sh &&
-	git config filter.required.required true &&
+	test_config filter.required.smudge ./rot13.sh &&
+	test_config filter.required.clean ./rot13.sh &&
+	test_config filter.required.required true &&
 
 	echo "*.r filter=required" >.gitattributes &&
 
@@ -165,17 +165,17 @@ test_expect_success 'required filter should filter data' '
 
 	rm -f test.r &&
 	git checkout -- test.r &&
-	cmp test.o test.r &&
+	test_cmp test.o test.r &&
 
 	./rot13.sh <test.o >expected &&
 	git cat-file blob :test.r >actual &&
-	cmp expected actual
+	test_cmp expected actual
 '
 
 test_expect_success 'required filter smudge failure' '
-	git config filter.failsmudge.smudge false &&
-	git config filter.failsmudge.clean cat &&
-	git config filter.failsmudge.required true &&
+	test_config filter.failsmudge.smudge false &&
+	test_config filter.failsmudge.clean cat &&
+	test_config filter.failsmudge.required true &&
 
 	echo "*.fs filter=failsmudge" >.gitattributes &&
 
@@ -186,9 +186,9 @@ test_expect_success 'required filter smudge failure' '
 '
 
 test_expect_success 'required filter clean failure' '
-	git config filter.failclean.smudge cat &&
-	git config filter.failclean.clean false &&
-	git config filter.failclean.required true &&
+	test_config filter.failclean.smudge cat &&
+	test_config filter.failclean.clean false &&
+	test_config filter.failclean.required true &&
 
 	echo "*.fc filter=failclean" >.gitattributes &&
 
@@ -197,8 +197,8 @@ test_expect_success 'required filter clean failure' '
 '
 
 test_expect_success 'filtering large input to small output should use little memory' '
-	git config filter.devnull.clean "cat >/dev/null" &&
-	git config filter.devnull.required true &&
+	test_config filter.devnull.clean "cat >/dev/null" &&
+	test_config filter.devnull.required true &&
 	for i in $(test_seq 1 30); do printf "%1048576d" 1; done >30MB &&
 	echo "30MB filter=devnull" >.gitattributes &&
 	GIT_MMAP_LIMIT=1m GIT_ALLOC_LIMIT=1m git add 30MB
@@ -207,7 +207,7 @@ test_expect_success 'filtering large input to small output should use little mem
 test_expect_success 'filter that does not read is fine' '
 	test-genrandom foo $((128 * 1024 + 1)) >big &&
 	echo "big filter=epipe" >.gitattributes &&
-	git config filter.epipe.clean "echo xyzzy" &&
+	test_config filter.epipe.clean "echo xyzzy" &&
 	git add big &&
 	git cat-file blob :big >actual &&
 	echo xyzzy >expect &&
@@ -215,20 +215,20 @@ test_expect_success 'filter that does not read is fine' '
 '
 
 test_expect_success EXPENSIVE 'filter large file' '
-	git config filter.largefile.smudge cat &&
-	git config filter.largefile.clean cat &&
+	test_config filter.largefile.smudge cat &&
+	test_config filter.largefile.clean cat &&
 	for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
 	echo "2GB filter=largefile" >.gitattributes &&
 	git add 2GB 2>err &&
-	! test -s err &&
+	test_must_be_empty err &&
 	rm -f 2GB &&
 	git checkout -- 2GB 2>err &&
-	! test -s err
+	test_must_be_empty err
 '
 
 test_expect_success "filter: clean empty file" '
-	git config filter.in-repo-header.clean  "echo cleaned && cat" &&
-	git config filter.in-repo-header.smudge "sed 1d" &&
+	test_config filter.in-repo-header.clean  "echo cleaned && cat" &&
+	test_config filter.in-repo-header.smudge "sed 1d" &&
 
 	echo "empty-in-worktree    filter=in-repo-header" >>.gitattributes &&
 	>empty-in-worktree &&
@@ -240,8 +240,8 @@ test_expect_success "filter: clean empty file" '
 '
 
 test_expect_success "filter: smudge empty file" '
-	git config filter.empty-in-repo.clean "cat >/dev/null" &&
-	git config filter.empty-in-repo.smudge "echo smudged && cat" &&
+	test_config filter.empty-in-repo.clean "cat >/dev/null" &&
+	test_config filter.empty-in-repo.smudge "echo smudged && cat" &&
 
 	echo "empty-in-repo filter=empty-in-repo" >>.gitattributes &&
 	echo dead data walking >empty-in-repo &&
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v10 03/14] run-command: move check_pipe() from write_or_die to run_command
  2016-10-08 11:25 [PATCH v10 00/14] Git filter protocol larsxschneider
  2016-10-08 11:25 ` [PATCH v10 01/14] convert: quote filter names in error messages larsxschneider
  2016-10-08 11:25 ` [PATCH v10 02/14] convert: modernize tests larsxschneider
@ 2016-10-08 11:25 ` larsxschneider
  2016-10-08 11:25 ` [PATCH v10 04/14] run-command: add clean_on_exit_handler larsxschneider
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: larsxschneider @ 2016-10-08 11:25 UTC (permalink / raw)
  To: git; +Cc: gitster, jnareb, peff, Lars Schneider, Ramsay Jones

From: Lars Schneider <larsxschneider@gmail.com>

Move check_pipe() to run_command and make it public. This is necessary
to call the function from pkt-line in a subsequent patch.

While at it, make async_exit() static to run_command.c as it is no
longer used from outside.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 run-command.c  | 17 +++++++++++++++--
 run-command.h  |  2 +-
 write_or_die.c | 13 -------------
 3 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/run-command.c b/run-command.c
index 5a4dbb6..3269362 100644
--- a/run-command.c
+++ b/run-command.c
@@ -634,7 +634,7 @@ int in_async(void)
 	return !pthread_equal(main_thread, pthread_self());
 }
 
-void NORETURN async_exit(int code)
+static void NORETURN async_exit(int code)
 {
 	pthread_exit((void *)(intptr_t)code);
 }
@@ -684,13 +684,26 @@ int in_async(void)
 	return process_is_async;
 }
 
-void NORETURN async_exit(int code)
+static void NORETURN async_exit(int code)
 {
 	exit(code);
 }
 
 #endif
 
+void check_pipe(int err)
+{
+	if (err == EPIPE) {
+		if (in_async())
+			async_exit(141);
+
+		signal(SIGPIPE, SIG_DFL);
+		raise(SIGPIPE);
+		/* Should never happen, but just in case... */
+		exit(141);
+	}
+}
+
 int start_async(struct async *async)
 {
 	int need_in, need_out;
diff --git a/run-command.h b/run-command.h
index 5066649..cf29a31 100644
--- a/run-command.h
+++ b/run-command.h
@@ -139,7 +139,7 @@ struct async {
 int start_async(struct async *async);
 int finish_async(struct async *async);
 int in_async(void);
-void NORETURN async_exit(int code);
+void check_pipe(int err);
 
 /**
  * This callback should initialize the child process and preload the
diff --git a/write_or_die.c b/write_or_die.c
index 0734432..eab8c8d 100644
--- a/write_or_die.c
+++ b/write_or_die.c
@@ -1,19 +1,6 @@
 #include "cache.h"
 #include "run-command.h"
 
-static void check_pipe(int err)
-{
-	if (err == EPIPE) {
-		if (in_async())
-			async_exit(141);
-
-		signal(SIGPIPE, SIG_DFL);
-		raise(SIGPIPE);
-		/* Should never happen, but just in case... */
-		exit(141);
-	}
-}
-
 /*
  * Some cases use stdio, but want to flush after the write
  * to get error handling (and to get better interactive
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v10 04/14] run-command: add clean_on_exit_handler
  2016-10-08 11:25 [PATCH v10 00/14] Git filter protocol larsxschneider
                   ` (2 preceding siblings ...)
  2016-10-08 11:25 ` [PATCH v10 03/14] run-command: move check_pipe() from write_or_die to run_command larsxschneider
@ 2016-10-08 11:25 ` larsxschneider
  2016-10-11 12:12   ` Johannes Schindelin
  2016-10-08 11:25 ` [PATCH v10 05/14] pkt-line: rename packet_write() to packet_write_fmt() larsxschneider
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 34+ messages in thread
From: larsxschneider @ 2016-10-08 11:25 UTC (permalink / raw)
  To: git; +Cc: gitster, jnareb, peff, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Some processes might want to perform cleanup tasks before Git kills them
due to the 'clean_on_exit' flag. Let's give them an interface for doing
this. The feature is used in a subsequent patch.

Please note, that the cleanup callback is not executed if Git dies of a
signal. The reason is that only "async-signal-safe" functions would be
allowed to be call in that case. Since we cannot control what functions
the callback will use, we will not support the case. See 507d7804 for
more details.

Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 run-command.c | 19 +++++++++++++++----
 run-command.h |  2 ++
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/run-command.c b/run-command.c
index 3269362..e5fd6ff 100644
--- a/run-command.c
+++ b/run-command.c
@@ -21,6 +21,7 @@ void child_process_clear(struct child_process *child)
 
 struct child_to_clean {
 	pid_t pid;
+	struct child_process *process;
 	struct child_to_clean *next;
 };
 static struct child_to_clean *children_to_clean;
@@ -31,6 +32,15 @@ static void cleanup_children(int sig, int in_signal)
 	while (children_to_clean) {
 		struct child_to_clean *p = children_to_clean;
 		children_to_clean = p->next;
+
+		if (p->process && !in_signal) {
+			struct child_process *process = p->process;
+			if (process->clean_on_exit_handler) {
+				trace_printf("trace: run_command: running exit handler for pid %d", p->pid);
+				process->clean_on_exit_handler(process);
+			}
+		}
+
 		kill(p->pid, sig);
 		if (!in_signal)
 			free(p);
@@ -49,10 +59,11 @@ static void cleanup_children_on_exit(void)
 	cleanup_children(SIGTERM, 0);
 }
 
-static void mark_child_for_cleanup(pid_t pid)
+static void mark_child_for_cleanup(pid_t pid, struct child_process *process)
 {
 	struct child_to_clean *p = xmalloc(sizeof(*p));
 	p->pid = pid;
+	p->process = process;
 	p->next = children_to_clean;
 	children_to_clean = p;
 
@@ -422,7 +433,7 @@ int start_command(struct child_process *cmd)
 	if (cmd->pid < 0)
 		error_errno("cannot fork() for %s", cmd->argv[0]);
 	else if (cmd->clean_on_exit)
-		mark_child_for_cleanup(cmd->pid);
+		mark_child_for_cleanup(cmd->pid, cmd);
 
 	/*
 	 * Wait for child's execvp. If the execvp succeeds (or if fork()
@@ -483,7 +494,7 @@ int start_command(struct child_process *cmd)
 	if (cmd->pid < 0 && (!cmd->silent_exec_failure || errno != ENOENT))
 		error_errno("cannot spawn %s", cmd->argv[0]);
 	if (cmd->clean_on_exit && cmd->pid >= 0)
-		mark_child_for_cleanup(cmd->pid);
+		mark_child_for_cleanup(cmd->pid, cmd);
 
 	argv_array_clear(&nargv);
 	cmd->argv = sargv;
@@ -765,7 +776,7 @@ int start_async(struct async *async)
 		exit(!!async->proc(proc_in, proc_out, async->data));
 	}
 
-	mark_child_for_cleanup(async->pid);
+	mark_child_for_cleanup(async->pid, NULL);
 
 	if (need_in)
 		close(fdin[0]);
diff --git a/run-command.h b/run-command.h
index cf29a31..dd1c78c 100644
--- a/run-command.h
+++ b/run-command.h
@@ -43,6 +43,8 @@ struct child_process {
 	unsigned stdout_to_stderr:1;
 	unsigned use_shell:1;
 	unsigned clean_on_exit:1;
+	void (*clean_on_exit_handler)(struct child_process *process);
+	void *clean_on_exit_handler_cbdata;
 };
 
 #define CHILD_PROCESS_INIT { NULL, ARGV_ARRAY_INIT, ARGV_ARRAY_INIT }
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v10 05/14] pkt-line: rename packet_write() to packet_write_fmt()
  2016-10-08 11:25 [PATCH v10 00/14] Git filter protocol larsxschneider
                   ` (3 preceding siblings ...)
  2016-10-08 11:25 ` [PATCH v10 04/14] run-command: add clean_on_exit_handler larsxschneider
@ 2016-10-08 11:25 ` larsxschneider
  2016-10-08 11:25 ` [PATCH v10 06/14] pkt-line: extract set_packet_header() larsxschneider
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: larsxschneider @ 2016-10-08 11:25 UTC (permalink / raw)
  To: git; +Cc: gitster, jnareb, peff, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

packet_write() should be called packet_write_fmt() because it is a
printf-like function that takes a format string as first parameter.

packet_write_fmt() should be used for text strings only. Arbitrary
binary data should use a new packet_write() function that is introduced
in a subsequent patch.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/archive.c        |  4 ++--
 builtin/receive-pack.c   |  4 ++--
 builtin/remote-ext.c     |  4 ++--
 builtin/upload-archive.c |  4 ++--
 connect.c                |  2 +-
 daemon.c                 |  2 +-
 http-backend.c           |  2 +-
 pkt-line.c               |  2 +-
 pkt-line.h               |  2 +-
 shallow.c                |  2 +-
 upload-pack.c            | 30 +++++++++++++++---------------
 11 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index a1e3b94..49f4914 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -47,10 +47,10 @@ static int run_remote_archiver(int argc, const char **argv,
 	if (name_hint) {
 		const char *format = archive_format_from_filename(name_hint);
 		if (format)
-			packet_write(fd[1], "argument --format=%s\n", format);
+			packet_write_fmt(fd[1], "argument --format=%s\n", format);
 	}
 	for (i = 1; i < argc; i++)
-		packet_write(fd[1], "argument %s\n", argv[i]);
+		packet_write_fmt(fd[1], "argument %s\n", argv[i]);
 	packet_flush(fd[1]);
 
 	buf = packet_read_line(fd[0], NULL);
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 011db00..1ce7682 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -218,7 +218,7 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
 static void show_ref(const char *path, const unsigned char *sha1)
 {
 	if (sent_capabilities) {
-		packet_write(1, "%s %s\n", sha1_to_hex(sha1), path);
+		packet_write_fmt(1, "%s %s\n", sha1_to_hex(sha1), path);
 	} else {
 		struct strbuf cap = STRBUF_INIT;
 
@@ -233,7 +233,7 @@ static void show_ref(const char *path, const unsigned char *sha1)
 		if (advertise_push_options)
 			strbuf_addstr(&cap, " push-options");
 		strbuf_addf(&cap, " agent=%s", git_user_agent_sanitized());
-		packet_write(1, "%s %s%c%s\n",
+		packet_write_fmt(1, "%s %s%c%s\n",
 			     sha1_to_hex(sha1), path, 0, cap.buf);
 		strbuf_release(&cap);
 		sent_capabilities = 1;
diff --git a/builtin/remote-ext.c b/builtin/remote-ext.c
index 88eb8f9..11b48bf 100644
--- a/builtin/remote-ext.c
+++ b/builtin/remote-ext.c
@@ -128,9 +128,9 @@ static void send_git_request(int stdin_fd, const char *serv, const char *repo,
 	const char *vhost)
 {
 	if (!vhost)
-		packet_write(stdin_fd, "%s %s%c", serv, repo, 0);
+		packet_write_fmt(stdin_fd, "%s %s%c", serv, repo, 0);
 	else
-		packet_write(stdin_fd, "%s %s%chost=%s%c", serv, repo, 0,
+		packet_write_fmt(stdin_fd, "%s %s%chost=%s%c", serv, repo, 0,
 			     vhost, 0);
 }
 
diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index 2caedf1..dc872f6 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -88,11 +88,11 @@ int cmd_upload_archive(int argc, const char **argv, const char *prefix)
 	writer.git_cmd = 1;
 	if (start_command(&writer)) {
 		int err = errno;
-		packet_write(1, "NACK unable to spawn subprocess\n");
+		packet_write_fmt(1, "NACK unable to spawn subprocess\n");
 		die("upload-archive: %s", strerror(err));
 	}
 
-	packet_write(1, "ACK\n");
+	packet_write_fmt(1, "ACK\n");
 	packet_flush(1);
 
 	while (1) {
diff --git a/connect.c b/connect.c
index 722dc3f..5330d9c 100644
--- a/connect.c
+++ b/connect.c
@@ -730,7 +730,7 @@ struct child_process *git_connect(int fd[2], const char *url,
 		 * Note: Do not add any other headers here!  Doing so
 		 * will cause older git-daemon servers to crash.
 		 */
-		packet_write(fd[1],
+		packet_write_fmt(fd[1],
 			     "%s %s%chost=%s%c",
 			     prog, path, 0,
 			     target_host, 0);
diff --git a/daemon.c b/daemon.c
index 425aad0..afce1b9 100644
--- a/daemon.c
+++ b/daemon.c
@@ -281,7 +281,7 @@ static int daemon_error(const char *dir, const char *msg)
 {
 	if (!informative_errors)
 		msg = "access denied or repository not exported";
-	packet_write(1, "ERR %s: %s", msg, dir);
+	packet_write_fmt(1, "ERR %s: %s", msg, dir);
 	return -1;
 }
 
diff --git a/http-backend.c b/http-backend.c
index adc8c8c..eef0a36 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -464,7 +464,7 @@ static void get_info_refs(struct strbuf *hdr, char *arg)
 		hdr_str(hdr, content_type, buf.buf);
 		end_headers(hdr);
 
-		packet_write(1, "# service=git-%s\n", svc->name);
+		packet_write_fmt(1, "# service=git-%s\n", svc->name);
 		packet_flush(1);
 
 		argv[0] = svc->name;
diff --git a/pkt-line.c b/pkt-line.c
index 62fdb37..0a9b61c 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -118,7 +118,7 @@ static void format_packet(struct strbuf *out, const char *fmt, va_list args)
 	packet_trace(out->buf + orig_len + 4, n - 4, 1);
 }
 
-void packet_write(int fd, const char *fmt, ...)
+void packet_write_fmt(int fd, const char *fmt, ...)
 {
 	static struct strbuf buf = STRBUF_INIT;
 	va_list args;
diff --git a/pkt-line.h b/pkt-line.h
index 3cb9d91..1902fb3 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -20,7 +20,7 @@
  * side can't, we stay with pure read/write interfaces.
  */
 void packet_flush(int fd);
-void packet_write(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
+void packet_write_fmt(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 
diff --git a/shallow.c b/shallow.c
index 54e2db7..d666e24 100644
--- a/shallow.c
+++ b/shallow.c
@@ -260,7 +260,7 @@ static int advertise_shallow_grafts_cb(const struct commit_graft *graft, void *c
 {
 	int fd = *(int *)cb;
 	if (graft->nr_parent == -1)
-		packet_write(fd, "shallow %s\n", oid_to_hex(&graft->oid));
+		packet_write_fmt(fd, "shallow %s\n", oid_to_hex(&graft->oid));
 	return 0;
 }
 
diff --git a/upload-pack.c b/upload-pack.c
index ca7f941..cd47de6 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -393,13 +393,13 @@ static int get_common_commits(void)
 			if (multi_ack == 2 && got_common
 			    && !got_other && ok_to_give_up()) {
 				sent_ready = 1;
-				packet_write(1, "ACK %s ready\n", last_hex);
+				packet_write_fmt(1, "ACK %s ready\n", last_hex);
 			}
 			if (have_obj.nr == 0 || multi_ack)
-				packet_write(1, "NAK\n");
+				packet_write_fmt(1, "NAK\n");
 
 			if (no_done && sent_ready) {
-				packet_write(1, "ACK %s\n", last_hex);
+				packet_write_fmt(1, "ACK %s\n", last_hex);
 				return 0;
 			}
 			if (stateless_rpc)
@@ -416,20 +416,20 @@ static int get_common_commits(void)
 					const char *hex = sha1_to_hex(sha1);
 					if (multi_ack == 2) {
 						sent_ready = 1;
-						packet_write(1, "ACK %s ready\n", hex);
+						packet_write_fmt(1, "ACK %s ready\n", hex);
 					} else
-						packet_write(1, "ACK %s continue\n", hex);
+						packet_write_fmt(1, "ACK %s continue\n", hex);
 				}
 				break;
 			default:
 				got_common = 1;
 				memcpy(last_hex, sha1_to_hex(sha1), 41);
 				if (multi_ack == 2)
-					packet_write(1, "ACK %s common\n", last_hex);
+					packet_write_fmt(1, "ACK %s common\n", last_hex);
 				else if (multi_ack)
-					packet_write(1, "ACK %s continue\n", last_hex);
+					packet_write_fmt(1, "ACK %s continue\n", last_hex);
 				else if (have_obj.nr == 1)
-					packet_write(1, "ACK %s\n", last_hex);
+					packet_write_fmt(1, "ACK %s\n", last_hex);
 				break;
 			}
 			continue;
@@ -437,10 +437,10 @@ static int get_common_commits(void)
 		if (!strcmp(line, "done")) {
 			if (have_obj.nr > 0) {
 				if (multi_ack)
-					packet_write(1, "ACK %s\n", last_hex);
+					packet_write_fmt(1, "ACK %s\n", last_hex);
 				return 0;
 			}
-			packet_write(1, "NAK\n");
+			packet_write_fmt(1, "NAK\n");
 			return -1;
 		}
 		die("git upload-pack: expected SHA1 list, got '%s'", line);
@@ -650,7 +650,7 @@ static void receive_needs(void)
 		while (result) {
 			struct object *object = &result->item->object;
 			if (!(object->flags & (CLIENT_SHALLOW|NOT_SHALLOW))) {
-				packet_write(1, "shallow %s",
+				packet_write_fmt(1, "shallow %s",
 						oid_to_hex(&object->oid));
 				register_shallow(object->oid.hash);
 				shallow_nr++;
@@ -662,7 +662,7 @@ static void receive_needs(void)
 			struct object *object = shallows.objects[i].item;
 			if (object->flags & NOT_SHALLOW) {
 				struct commit_list *parents;
-				packet_write(1, "unshallow %s",
+				packet_write_fmt(1, "unshallow %s",
 					oid_to_hex(&object->oid));
 				object->flags &= ~CLIENT_SHALLOW;
 				/* make sure the real parents are parsed */
@@ -741,7 +741,7 @@ static int send_ref(const char *refname, const struct object_id *oid,
 		struct strbuf symref_info = STRBUF_INIT;
 
 		format_symref_info(&symref_info, cb_data);
-		packet_write(1, "%s %s%c%s%s%s%s%s agent=%s\n",
+		packet_write_fmt(1, "%s %s%c%s%s%s%s%s agent=%s\n",
 			     oid_to_hex(oid), refname_nons,
 			     0, capabilities,
 			     (allow_unadvertised_object_request & ALLOW_TIP_SHA1) ?
@@ -753,11 +753,11 @@ static int send_ref(const char *refname, const struct object_id *oid,
 			     git_user_agent_sanitized());
 		strbuf_release(&symref_info);
 	} else {
-		packet_write(1, "%s %s\n", oid_to_hex(oid), refname_nons);
+		packet_write_fmt(1, "%s %s\n", oid_to_hex(oid), refname_nons);
 	}
 	capabilities = NULL;
 	if (!peel_ref(refname, peeled.hash))
-		packet_write(1, "%s %s^{}\n", oid_to_hex(&peeled), refname_nons);
+		packet_write_fmt(1, "%s %s^{}\n", oid_to_hex(&peeled), refname_nons);
 	return 0;
 }
 
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v10 06/14] pkt-line: extract set_packet_header()
  2016-10-08 11:25 [PATCH v10 00/14] Git filter protocol larsxschneider
                   ` (4 preceding siblings ...)
  2016-10-08 11:25 ` [PATCH v10 05/14] pkt-line: rename packet_write() to packet_write_fmt() larsxschneider
@ 2016-10-08 11:25 ` larsxschneider
  2016-10-08 11:25 ` [PATCH v10 07/14] pkt-line: add packet_write_fmt_gently() larsxschneider
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: larsxschneider @ 2016-10-08 11:25 UTC (permalink / raw)
  To: git; +Cc: gitster, jnareb, peff, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Extracted set_packet_header() function converts an integer to a 4 byte
hex string. Make this function locally available so that other pkt-line
functions could use it.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 pkt-line.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index 0a9b61c..e8adc0f 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -97,10 +97,20 @@ void packet_buf_flush(struct strbuf *buf)
 	strbuf_add(buf, "0000", 4);
 }
 
-#define hex(a) (hexchar[(a) & 15])
-static void format_packet(struct strbuf *out, const char *fmt, va_list args)
+static void set_packet_header(char *buf, const int size)
 {
 	static char hexchar[] = "0123456789abcdef";
+
+	#define hex(a) (hexchar[(a) & 15])
+	buf[0] = hex(size >> 12);
+	buf[1] = hex(size >> 8);
+	buf[2] = hex(size >> 4);
+	buf[3] = hex(size);
+	#undef hex
+}
+
+static void format_packet(struct strbuf *out, const char *fmt, va_list args)
+{
 	size_t orig_len, n;
 
 	orig_len = out->len;
@@ -111,10 +121,7 @@ static void format_packet(struct strbuf *out, const char *fmt, va_list args)
 	if (n > LARGE_PACKET_MAX)
 		die("protocol error: impossibly long line");
 
-	out->buf[orig_len + 0] = hex(n >> 12);
-	out->buf[orig_len + 1] = hex(n >> 8);
-	out->buf[orig_len + 2] = hex(n >> 4);
-	out->buf[orig_len + 3] = hex(n);
+	set_packet_header(&out->buf[orig_len], n);
 	packet_trace(out->buf + orig_len + 4, n - 4, 1);
 }
 
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v10 07/14] pkt-line: add packet_write_fmt_gently()
  2016-10-08 11:25 [PATCH v10 00/14] Git filter protocol larsxschneider
                   ` (5 preceding siblings ...)
  2016-10-08 11:25 ` [PATCH v10 06/14] pkt-line: extract set_packet_header() larsxschneider
@ 2016-10-08 11:25 ` larsxschneider
  2016-10-08 11:25 ` [PATCH v10 08/14] pkt-line: add packet_flush_gently() larsxschneider
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: larsxschneider @ 2016-10-08 11:25 UTC (permalink / raw)
  To: git; +Cc: gitster, jnareb, peff, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

packet_write_fmt() would die in case of a write error even though for
some callers an error would be acceptable. Add packet_write_fmt_gently()
which writes a formatted pkt-line like packet_write_fmt() but does not
die in case of an error. The function is used in a subsequent patch.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 pkt-line.c | 34 ++++++++++++++++++++++++++++++----
 pkt-line.h |  1 +
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index e8adc0f..56915f0 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -125,16 +125,42 @@ static void format_packet(struct strbuf *out, const char *fmt, va_list args)
 	packet_trace(out->buf + orig_len + 4, n - 4, 1);
 }
 
+static int packet_write_fmt_1(int fd, int gently,
+			      const char *fmt, va_list args)
+{
+	struct strbuf buf = STRBUF_INIT;
+	ssize_t count;
+
+	format_packet(&buf, fmt, args);
+	count = write_in_full(fd, buf.buf, buf.len);
+	if (count == buf.len)
+		return 0;
+
+	if (!gently) {
+		check_pipe(errno);
+		die_errno("packet write with format failed");
+	}
+	return error("packet write with format failed");
+}
+
 void packet_write_fmt(int fd, const char *fmt, ...)
 {
-	static struct strbuf buf = STRBUF_INIT;
 	va_list args;
 
-	strbuf_reset(&buf);
 	va_start(args, fmt);
-	format_packet(&buf, fmt, args);
+	packet_write_fmt_1(fd, 0, fmt, args);
+	va_end(args);
+}
+
+int packet_write_fmt_gently(int fd, const char *fmt, ...)
+{
+	int status;
+	va_list args;
+
+	va_start(args, fmt);
+	status = packet_write_fmt_1(fd, 1, fmt, args);
 	va_end(args);
-	write_or_die(fd, buf.buf, buf.len);
+	return status;
 }
 
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
diff --git a/pkt-line.h b/pkt-line.h
index 1902fb3..3caea77 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -23,6 +23,7 @@ void packet_flush(int fd);
 void packet_write_fmt(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
+int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 
 /*
  * Read a packetized line into the buffer, which must be at least size bytes
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v10 08/14] pkt-line: add packet_flush_gently()
  2016-10-08 11:25 [PATCH v10 00/14] Git filter protocol larsxschneider
                   ` (6 preceding siblings ...)
  2016-10-08 11:25 ` [PATCH v10 07/14] pkt-line: add packet_write_fmt_gently() larsxschneider
@ 2016-10-08 11:25 ` larsxschneider
  2016-10-08 11:25 ` [PATCH v10 09/14] pkt-line: add packet_write_gently() larsxschneider
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: larsxschneider @ 2016-10-08 11:25 UTC (permalink / raw)
  To: git; +Cc: gitster, jnareb, peff, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

packet_flush() would die in case of a write error even though for some
callers an error would be acceptable. Add packet_flush_gently() which
writes a pkt-line flush packet like packet_flush() but does not die in
case of an error. The function is used in a subsequent patch.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 pkt-line.c | 8 ++++++++
 pkt-line.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 56915f0..286eb09 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -91,6 +91,14 @@ void packet_flush(int fd)
 	write_or_die(fd, "0000", 4);
 }
 
+int packet_flush_gently(int fd)
+{
+	packet_trace("0000", 4, 1);
+	if (write_in_full(fd, "0000", 4) == 4)
+		return 0;
+	return error("flush packet write failed");
+}
+
 void packet_buf_flush(struct strbuf *buf)
 {
 	packet_trace("0000", 4, 1);
diff --git a/pkt-line.h b/pkt-line.h
index 3caea77..3fa0899 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -23,6 +23,7 @@ void packet_flush(int fd);
 void packet_write_fmt(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
+int packet_flush_gently(int fd);
 int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 
 /*
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v10 09/14] pkt-line: add packet_write_gently()
  2016-10-08 11:25 [PATCH v10 00/14] Git filter protocol larsxschneider
                   ` (7 preceding siblings ...)
  2016-10-08 11:25 ` [PATCH v10 08/14] pkt-line: add packet_flush_gently() larsxschneider
@ 2016-10-08 11:25 ` larsxschneider
  2016-10-08 11:25 ` [PATCH v10 10/14] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: larsxschneider @ 2016-10-08 11:25 UTC (permalink / raw)
  To: git; +Cc: gitster, jnareb, peff, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

packet_write_fmt_gently() uses format_packet() which lets the caller
only send string data via "%s". That means it cannot be used for
arbitrary data that may contain NULs.

Add packet_write_gently() which writes arbitrary data and does not die
in case of an error. The function is used by other pkt-line functions in
a subsequent patch.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 pkt-line.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 286eb09..dca5a64 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -171,6 +171,23 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...)
 	return status;
 }
 
+static int packet_write_gently(const int fd_out, const char *buf, size_t size)
+{
+	static char packet_write_buffer[LARGE_PACKET_MAX];
+	size_t packet_size;
+
+	if (size > sizeof(packet_write_buffer) - 4)
+		return error("packet write failed - data exceeds max packet size");
+
+	packet_trace(buf, size, 1);
+	packet_size = size + 4;
+	set_packet_header(packet_write_buffer, packet_size);
+	memcpy(packet_write_buffer + 4, buf, size);
+	if (write_in_full(fd_out, packet_write_buffer, packet_size) == packet_size)
+		return 0;
+	return error("packet write failed");
+}
+
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
 {
 	va_list args;
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v10 10/14] pkt-line: add functions to read/write flush terminated packet streams
  2016-10-08 11:25 [PATCH v10 00/14] Git filter protocol larsxschneider
                   ` (8 preceding siblings ...)
  2016-10-08 11:25 ` [PATCH v10 09/14] pkt-line: add packet_write_gently() larsxschneider
@ 2016-10-08 11:25 ` larsxschneider
  2016-10-08 11:25 ` [PATCH v10 11/14] convert: make apply_filter() adhere to standard Git error handling larsxschneider
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: larsxschneider @ 2016-10-08 11:25 UTC (permalink / raw)
  To: git; +Cc: gitster, jnareb, peff, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

write_packetized_from_fd() and write_packetized_from_buf() write a
stream of packets. All content packets use the maximal packet size
except for the last one. After the last content packet a `flush` control
packet is written.

read_packetized_to_strbuf() reads arbitrary sized packets until it
detects a `flush` packet.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 pkt-line.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 pkt-line.h |  8 +++++++
 2 files changed, 80 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index dca5a64..0b5125f 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -197,6 +197,46 @@ void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
 	va_end(args);
 }
 
+int write_packetized_from_fd(int fd_in, int fd_out)
+{
+	static char buf[LARGE_PACKET_DATA_MAX];
+	int err = 0;
+	ssize_t bytes_to_write;
+
+	while (!err) {
+		bytes_to_write = xread(fd_in, buf, sizeof(buf));
+		if (bytes_to_write < 0)
+			return COPY_READ_ERROR;
+		if (bytes_to_write == 0)
+			break;
+		err = packet_write_gently(fd_out, buf, bytes_to_write);
+	}
+	if (!err)
+		err = packet_flush_gently(fd_out);
+	return err;
+}
+
+int write_packetized_from_buf(const char *src_in, size_t len, int fd_out)
+{
+	int err = 0;
+	size_t bytes_written = 0;
+	size_t bytes_to_write;
+
+	while (!err) {
+		if ((len - bytes_written) > LARGE_PACKET_DATA_MAX)
+			bytes_to_write = LARGE_PACKET_DATA_MAX;
+		else
+			bytes_to_write = len - bytes_written;
+		if (bytes_to_write == 0)
+			break;
+		err = packet_write_gently(fd_out, src_in + bytes_written, bytes_to_write);
+		bytes_written += bytes_to_write;
+	}
+	if (!err)
+		err = packet_flush_gently(fd_out);
+	return err;
+}
+
 static int get_packet_data(int fd, char **src_buf, size_t *src_size,
 			   void *dst, unsigned size, int options)
 {
@@ -306,3 +346,35 @@ char *packet_read_line_buf(char **src, size_t *src_len, int *dst_len)
 {
 	return packet_read_line_generic(-1, src, src_len, dst_len);
 }
+
+ssize_t read_packetized_to_strbuf(int fd_in, struct strbuf *sb_out)
+{
+	int packet_len;
+
+	size_t orig_len = sb_out->len;
+	size_t orig_alloc = sb_out->alloc;
+
+	for (;;) {
+		strbuf_grow(sb_out, LARGE_PACKET_DATA_MAX);
+		packet_len = packet_read(fd_in, NULL, NULL,
+			/* strbuf_grow() above always allocates one extra byte to
+			 * store a '\0' at the end of the string. packet_read()
+			 * writes a '\0' extra byte at the end, too. Let it know
+			 * that there is already room for the extra byte.
+			 */
+			sb_out->buf + sb_out->len, LARGE_PACKET_DATA_MAX+1,
+			PACKET_READ_GENTLE_ON_EOF);
+		if (packet_len <= 0)
+			break;
+		sb_out->len += packet_len;
+	}
+
+	if (packet_len < 0) {
+		if (orig_alloc == 0)
+			strbuf_release(sb_out);
+		else
+			strbuf_setlen(sb_out, orig_len);
+		return packet_len;
+	}
+	return sb_out->len - orig_len;
+}
diff --git a/pkt-line.h b/pkt-line.h
index 3fa0899..18eac64 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -25,6 +25,8 @@ void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 int packet_flush_gently(int fd);
 int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
+int write_packetized_from_fd(int fd_in, int fd_out);
+int write_packetized_from_buf(const char *src_in, size_t len, int fd_out);
 
 /*
  * Read a packetized line into the buffer, which must be at least size bytes
@@ -77,8 +79,14 @@ char *packet_read_line(int fd, int *size);
  */
 char *packet_read_line_buf(char **src_buf, size_t *src_len, int *size);
 
+/*
+ * Reads a stream of variable sized packets until a flush packet is detected.
+ */
+ssize_t read_packetized_to_strbuf(int fd_in, struct strbuf *sb_out);
+
 #define DEFAULT_PACKET_MAX 1000
 #define LARGE_PACKET_MAX 65520
+#define LARGE_PACKET_DATA_MAX (LARGE_PACKET_MAX - 4)
 extern char packet_buffer[LARGE_PACKET_MAX];
 
 #endif
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v10 11/14] convert: make apply_filter() adhere to standard Git error handling
  2016-10-08 11:25 [PATCH v10 00/14] Git filter protocol larsxschneider
                   ` (9 preceding siblings ...)
  2016-10-08 11:25 ` [PATCH v10 10/14] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
@ 2016-10-08 11:25 ` larsxschneider
  2016-10-08 11:25 ` [PATCH v10 12/14] convert: prepare filter.<driver>.process option larsxschneider
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 34+ messages in thread
From: larsxschneider @ 2016-10-08 11:25 UTC (permalink / raw)
  To: git; +Cc: gitster, jnareb, peff, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

apply_filter() returns a boolean that tells the caller if it
"did convert or did not convert". The variable `ret` was used throughout
the function to track errors whereas `1` denoted success and `0`
failure. This is unusual for the Git source where `0` denotes success.

Rename the variable and flip its value to make the function easier
readable for Git developers.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 convert.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/convert.c b/convert.c
index 986c239..597f561 100644
--- a/convert.c
+++ b/convert.c
@@ -451,7 +451,7 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
 	 *
 	 * (child --> cmd) --> us
 	 */
-	int ret = 1;
+	int err = 0;
 	struct strbuf nbuf = STRBUF_INIT;
 	struct async async;
 	struct filter_params params;
@@ -477,23 +477,20 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
 		return 0;	/* error was already reported */
 
 	if (strbuf_read(&nbuf, async.out, len) < 0) {
-		error("read from external filter '%s' failed", cmd);
-		ret = 0;
+		err = error("read from external filter '%s' failed", cmd);
 	}
 	if (close(async.out)) {
-		error("read from external filter '%s' failed", cmd);
-		ret = 0;
+		err = error("read from external filter '%s' failed", cmd);
 	}
 	if (finish_async(&async)) {
-		error("external filter '%s' failed", cmd);
-		ret = 0;
+		err = error("external filter '%s' failed", cmd);
 	}
 
-	if (ret) {
+	if (!err) {
 		strbuf_swap(dst, &nbuf);
 	}
 	strbuf_release(&nbuf);
-	return ret;
+	return !err;
 }
 
 static struct convert_driver {
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v10 12/14] convert: prepare filter.<driver>.process option
  2016-10-08 11:25 [PATCH v10 00/14] Git filter protocol larsxschneider
                   ` (10 preceding siblings ...)
  2016-10-08 11:25 ` [PATCH v10 11/14] convert: make apply_filter() adhere to standard Git error handling larsxschneider
@ 2016-10-08 11:25 ` larsxschneider
  2016-10-08 11:25 ` [PATCH v10 13/14] convert: add " larsxschneider
  2016-10-08 11:25 ` [PATCH v10 14/14] contrib/long-running-filter: add long running filter example larsxschneider
  13 siblings, 0 replies; 34+ messages in thread
From: larsxschneider @ 2016-10-08 11:25 UTC (permalink / raw)
  To: git; +Cc: gitster, jnareb, peff, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Refactor the existing 'single shot filter mechanism' and prepare the
new 'long running filter mechanism'.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 convert.c | 60 ++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 34 insertions(+), 26 deletions(-)

diff --git a/convert.c b/convert.c
index 597f561..71e11ff 100644
--- a/convert.c
+++ b/convert.c
@@ -442,7 +442,7 @@ static int filter_buffer_or_fd(int in, int out, void *data)
 	return (write_err || status);
 }
 
-static int apply_filter(const char *path, const char *src, size_t len, int fd,
+static int apply_single_file_filter(const char *path, const char *src, size_t len, int fd,
                         struct strbuf *dst, const char *cmd)
 {
 	/*
@@ -456,12 +456,6 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
 	struct async async;
 	struct filter_params params;
 
-	if (!cmd || !*cmd)
-		return 0;
-
-	if (!dst)
-		return 1;
-
 	memset(&async, 0, sizeof(async));
 	async.proc = filter_buffer_or_fd;
 	async.data = &params;
@@ -493,6 +487,9 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
 	return !err;
 }
 
+#define CAP_CLEAN    (1u<<0)
+#define CAP_SMUDGE   (1u<<1)
+
 static struct convert_driver {
 	const char *name;
 	struct convert_driver *next;
@@ -501,6 +498,29 @@ static struct convert_driver {
 	int required;
 } *user_convert, **user_convert_tail;
 
+static int apply_filter(const char *path, const char *src, size_t len,
+			int fd, struct strbuf *dst, struct convert_driver *drv,
+			const unsigned int wanted_capability)
+{
+	const char *cmd = NULL;
+
+	if (!drv)
+		return 0;
+
+	if (!dst)
+		return 1;
+
+	if ((CAP_CLEAN & wanted_capability) && drv->clean)
+		cmd = drv->clean;
+	else if ((CAP_SMUDGE & wanted_capability) && drv->smudge)
+		cmd = drv->smudge;
+
+	if (cmd && *cmd)
+		return apply_single_file_filter(path, src, len, fd, dst, cmd);
+
+	return 0;
+}
+
 static int read_convert_config(const char *var, const char *value, void *cb)
 {
 	const char *key, *name;
@@ -839,7 +859,7 @@ int would_convert_to_git_filter_fd(const char *path)
 	if (!ca.drv->required)
 		return 0;
 
-	return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean);
+	return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN);
 }
 
 const char *get_convert_attr_ascii(const char *path)
@@ -872,18 +892,12 @@ int convert_to_git(const char *path, const char *src, size_t len,
                    struct strbuf *dst, enum safe_crlf checksafe)
 {
 	int ret = 0;
-	const char *filter = NULL;
-	int required = 0;
 	struct conv_attrs ca;
 
 	convert_attrs(&ca, path);
-	if (ca.drv) {
-		filter = ca.drv->clean;
-		required = ca.drv->required;
-	}
 
-	ret |= apply_filter(path, src, len, -1, dst, filter);
-	if (!ret && required)
+	ret |= apply_filter(path, src, len, -1, dst, ca.drv, CAP_CLEAN);
+	if (!ret && ca.drv && ca.drv->required)
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
 	if (ret && dst) {
@@ -907,7 +921,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
 	assert(ca.drv);
 	assert(ca.drv->clean);
 
-	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
+	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv, CAP_CLEAN))
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
 	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
@@ -919,15 +933,9 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
 					    int normalizing)
 {
 	int ret = 0, ret_filter = 0;
-	const char *filter = NULL;
-	int required = 0;
 	struct conv_attrs ca;
 
 	convert_attrs(&ca, path);
-	if (ca.drv) {
-		filter = ca.drv->smudge;
-		required = ca.drv->required;
-	}
 
 	ret |= ident_to_worktree(path, src, len, dst, ca.ident);
 	if (ret) {
@@ -938,7 +946,7 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
 	 * CRLF conversion can be skipped if normalizing, unless there
 	 * is a smudge filter.  The filter might expect CRLFs.
 	 */
-	if (filter || !normalizing) {
+	if ((ca.drv && ca.drv->smudge) || !normalizing) {
 		ret |= crlf_to_worktree(path, src, len, dst, ca.crlf_action);
 		if (ret) {
 			src = dst->buf;
@@ -946,8 +954,8 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
 		}
 	}
 
-	ret_filter = apply_filter(path, src, len, -1, dst, filter);
-	if (!ret_filter && required)
+	ret_filter = apply_filter(path, src, len, -1, dst, ca.drv, CAP_SMUDGE);
+	if (!ret_filter && ca.drv && ca.drv->required)
 		die("%s: smudge filter %s failed", path, ca.drv->name);
 
 	return ret | ret_filter;
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v10 13/14] convert: add filter.<driver>.process option
  2016-10-08 11:25 [PATCH v10 00/14] Git filter protocol larsxschneider
                   ` (11 preceding siblings ...)
  2016-10-08 11:25 ` [PATCH v10 12/14] convert: prepare filter.<driver>.process option larsxschneider
@ 2016-10-08 11:25 ` larsxschneider
  2016-10-08 23:06   ` Jakub Narębski
  2016-10-10 19:58   ` Junio C Hamano
  2016-10-08 11:25 ` [PATCH v10 14/14] contrib/long-running-filter: add long running filter example larsxschneider
  13 siblings, 2 replies; 34+ messages in thread
From: larsxschneider @ 2016-10-08 11:25 UTC (permalink / raw)
  To: git; +Cc: gitster, jnareb, peff, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Git's clean/smudge mechanism invokes an external filter process for
every single blob that is affected by a filter. If Git filters a lot of
blobs then the startup time of the external filter processes can become
a significant part of the overall Git execution time.

In a preliminary performance test this developer used a clean/smudge
filter written in golang to filter 12,000 files. This process took 364s
with the existing filter mechanism and 5s with the new mechanism. See
details here: https://github.com/github/git-lfs/pull/1382

This patch adds the `filter.<driver>.process` string option which, if
used, keeps the external filter process running and processes all blobs
with the packet format (pkt-line) based protocol over standard input and
standard output. The full protocol is explained in detail in
`Documentation/gitattributes.txt`.

A few key decisions:

* The long running filter process is referred to as filter protocol
  version 2 because the existing single shot filter invocation is
  considered version 1.
* Git sends a welcome message and expects a response right after the
  external filter process has started. This ensures that Git will not
  hang if a version 1 filter is incorrectly used with the
  filter.<driver>.process option for version 2 filters. In addition,
  Git can detect this kind of error and warn the user.
* The status of a filter operation (e.g. "success" or "error) is set
  before the actual response and (if necessary!) re-set after the
  response. The advantage of this two step status response is that if
  the filter detects an error early, then the filter can communicate
  this and Git does not even need to create structures to read the
  response.
* All status responses are pkt-line lists terminated with a flush
  packet. This allows us to send other status fields with the same
  protocol in the future.

Helped-by: Martin-Louis Bright <mlbright@gmail.com>
Reviewed-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/gitattributes.txt | 157 +++++++++++++-
 convert.c                       | 297 +++++++++++++++++++++++++-
 t/t0021-conversion.sh           | 447 +++++++++++++++++++++++++++++++++++++++-
 t/t0021/rot13-filter.pl         | 191 +++++++++++++++++
 4 files changed, 1082 insertions(+), 10 deletions(-)
 create mode 100755 t/t0021/rot13-filter.pl

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 7aff940..5868f00 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -293,7 +293,13 @@ checkout, when the `smudge` command is specified, the command is
 fed the blob object from its standard input, and its standard
 output is used to update the worktree file.  Similarly, the
 `clean` command is used to convert the contents of worktree file
-upon checkin.
+upon checkin. By default these commands process only a single
+blob and terminate.  If a long running `process` filter is used
+in place of `clean` and/or `smudge` filters, then Git can process
+all blobs with a single filter command invocation for the entire
+life of a single Git command, for example `git add --all`.  See
+section below for the description of the protocol used to
+communicate with a `process` filter.
 
 One use of the content filtering is to massage the content into a shape
 that is more convenient for the platform, filesystem, and the user to use.
@@ -373,6 +379,155 @@ not exist, or may have different contents. So, smudge and clean commands
 should not try to access the file on disk, but only act as filters on the
 content provided to them on standard input.
 
+Long Running Filter Process
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If the filter command (a string value) is defined via
+`filter.<driver>.process` then Git can process all blobs with a
+single filter invocation for the entire life of a single Git
+command. This is achieved by using a packet format (pkt-line,
+see technical/protocol-common.txt) based protocol over standard
+input and standard output as follows. All packets, except for the
+"*CONTENT" packets and the "0000" flush packet, are considered
+text and therefore are terminated by a LF.
+
+Git starts the filter when it encounters the first file
+that needs to be cleaned or smudged. After the filter started
+Git sends a welcome message ("git-filter-client"), a list of
+supported protocol version numbers, and a flush packet. Git expects
+to read a welcome response message ("git-filter-server") and exactly
+one protocol version number from the previously sent list. All further
+communication will be based on the selected version. The remaining
+protocol description below documents "version=2". Please note that
+"version=42" in the example below does not exist and is only there
+to illustrate how the protocol would look like with more than one
+version.
+
+After the version negotiation Git sends a list of all capabilities that
+it supports and a flush packet. Git expects to read a list of desired
+capabilities, which must be a subset of the supported capabilities list,
+and a flush packet as response:
+------------------------
+packet:          git> git-filter-client
+packet:          git> version=2
+packet:          git> version=42
+packet:          git> 0000
+packet:          git< git-filter-server
+packet:          git< version=2
+packet:          git> clean=true
+packet:          git> smudge=true
+packet:          git> not-yet-invented=true
+packet:          git> 0000
+packet:          git< clean=true
+packet:          git< smudge=true
+packet:          git< 0000
+------------------------
+Supported filter capabilities in version 2 are "clean" and
+"smudge".
+
+Afterwards Git sends a list of "key=value" pairs terminated with
+a flush packet. The list will contain at least the filter command
+(based on the supported capabilities) and the pathname of the file
+to filter relative to the repository root. Right after these packets
+Git sends the content split in zero or more pkt-line packets and a
+flush packet to terminate content. Please note, that the filter
+must not send any response before it received the content and the
+final flush packet.
+------------------------
+packet:          git> command=smudge
+packet:          git> pathname=path/testfile.dat
+packet:          git> 0000
+packet:          git> CONTENT
+packet:          git> 0000
+------------------------
+
+The filter is expected to respond with a list of "key=value" pairs
+terminated with a flush packet. If the filter does not experience
+problems then the list must contain a "success" status. Right after
+these packets the filter is expected to send the content in zero
+or more pkt-line packets and a flush packet at the end. Finally, a
+second list of "key=value" pairs terminated with a flush packet
+is expected. The filter can change the status in the second list.
+------------------------
+packet:          git< status=success
+packet:          git< 0000
+packet:          git< SMUDGED_CONTENT
+packet:          git< 0000
+packet:          git< 0000  # empty list, keep "status=success" unchanged!
+------------------------
+
+If the result content is empty then the filter is expected to respond
+with a "success" status and an empty list.
+------------------------
+packet:          git< status=success
+packet:          git< 0000
+packet:          git< 0000  # empty content!
+packet:          git< 0000  # empty list, keep "status=success" unchanged!
+------------------------
+
+In case the filter cannot or does not want to process the content,
+it is expected to respond with an "error" status. Depending on the
+`filter.<driver>.required` flag Git will interpret that as error
+but it will not stop or restart the filter process.
+------------------------
+packet:          git< status=error
+packet:          git< 0000
+------------------------
+
+If the filter experiences an error during processing, then it can
+send the status "error" after the content was (partially or
+completely) sent. Depending on the `filter.<driver>.required` flag
+Git will interpret that as error but it will not stop or restart the
+filter process.
+------------------------
+packet:          git< status=success
+packet:          git< 0000
+packet:          git< HALF_WRITTEN_ERRONEOUS_CONTENT
+packet:          git< 0000
+packet:          git< status=error
+packet:          git< 0000
+------------------------
+
+If the filter dies during the communication or does not adhere to
+the protocol then Git will stop the filter process and restart it
+with the next file that needs to be processed. Depending on the
+`filter.<driver>.required` flag Git will interpret that as error.
+
+The error handling for all cases above mimic the behavior of
+the `filter.<driver>.clean` / `filter.<driver>.smudge` error
+handling.
+
+In case the filter cannot or does not want to process the content
+as well as any future content for the lifetime of the Git process,
+it is expected to respond with an "abort" status at any point in
+the protocol. Depending on the `filter.<driver>.required` flag Git
+will interpret that as error for the content as well as any future
+content for the lifetime of the Git process but it will not stop or
+restart the filter process.
+------------------------
+packet:          git< status=abort
+packet:          git< 0000
+------------------------
+
+After the filter has processed a blob it is expected to wait for
+the next "key=value" list containing a command. Git will close
+the command pipe on exit. The filter is expected to detect EOF
+and exit gracefully on its own.
+
+If you develop your own long running filter
+process then the `GIT_TRACE_PACKET` environment variables can be
+very helpful for debugging (see linkgit:git[1]).
+
+If a `filter.<driver>.process` command is configured then it
+always takes precedence over a configured `filter.<driver>.clean`
+or `filter.<driver>.smudge` command.
+
+Please note that you cannot use an existing `filter.<driver>.clean`
+or `filter.<driver>.smudge` command with `filter.<driver>.process`
+because the former two use a different inter process communication
+protocol than the latter one.
+
+
 Interaction between checkin/checkout attributes
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/convert.c b/convert.c
index 71e11ff..1d89632 100644
--- a/convert.c
+++ b/convert.c
@@ -3,6 +3,7 @@
 #include "run-command.h"
 #include "quote.h"
 #include "sigchain.h"
+#include "pkt-line.h"
 
 /*
  * convert.c - convert a file when checking it out and checking it in.
@@ -490,11 +491,289 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le
 #define CAP_CLEAN    (1u<<0)
 #define CAP_SMUDGE   (1u<<1)
 
+struct cmd2process {
+	struct hashmap_entry ent; /* must be the first member! */
+	unsigned int supported_capabilities;
+	const char *cmd;
+	struct child_process process;
+};
+
+static int cmd_process_map_initialized;
+static struct hashmap cmd_process_map;
+
+static int cmd2process_cmp(const struct cmd2process *e1,
+			   const struct cmd2process *e2,
+			   const void *unused)
+{
+	return strcmp(e1->cmd, e2->cmd);
+}
+
+static struct cmd2process *find_multi_file_filter_entry(struct hashmap *hashmap, const char *cmd)
+{
+	struct cmd2process key;
+	hashmap_entry_init(&key, strhash(cmd));
+	key.cmd = cmd;
+	return hashmap_get(hashmap, &key, NULL);
+}
+
+static int packet_write_list(int fd, const char *line, ...)
+{
+	va_list args;
+	int err;
+	va_start(args, line);
+	for (;;) {
+		if (!line)
+			break;
+		if (strlen(line) > LARGE_PACKET_DATA_MAX)
+			return -1;
+		err = packet_write_fmt_gently(fd, "%s\n", line);
+		if (err)
+			return err;
+		line = va_arg(args, const char*);
+	}
+	va_end(args);
+	return packet_flush_gently(fd);
+}
+
+static void read_multi_file_filter_status(int fd, struct strbuf *status) {
+	struct strbuf **pair;
+	char *line;
+	for (;;) {
+		line = packet_read_line(fd, NULL);
+		if (!line)
+			break;
+		pair = strbuf_split_str(line, '=', 2);
+		if (pair[0] && pair[0]->len && pair[1]) {
+			/* the last "status=<foo>" line wins */
+			if (!strcmp(pair[0]->buf, "status=")) {
+				strbuf_reset(status);
+				strbuf_addbuf(status, pair[1]);
+			}
+		}
+		strbuf_list_free(pair);
+	}
+}
+
+static void kill_multi_file_filter(struct hashmap *hashmap, struct cmd2process *entry)
+{
+	if (!entry)
+		return;
+
+	entry->process.clean_on_exit = 0;
+	kill(entry->process.pid, SIGTERM);
+	finish_command(&entry->process);
+
+	hashmap_remove(hashmap, entry, NULL);
+	free(entry);
+}
+
+void stop_multi_file_filter(struct child_process *process)
+{
+	sigchain_push(SIGPIPE, SIG_IGN);
+	/* Closing the pipe signals the filter to initiate a shutdown. */
+	close(process->in);
+	close(process->out);
+	sigchain_pop(SIGPIPE);
+	/* Finish command will wait until the shutdown is complete. */
+	finish_command(process);
+}
+
+static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, const char *cmd)
+{
+	int err;
+	struct cmd2process *entry;
+	struct child_process *process;
+	const char *argv[] = { cmd, NULL };
+	struct string_list cap_list = STRING_LIST_INIT_NODUP;
+	char *cap_buf;
+	const char *cap_name;
+
+	entry = xmalloc(sizeof(*entry));
+	entry->cmd = cmd;
+	entry->supported_capabilities = 0;
+	process = &entry->process;
+
+	child_process_init(process);
+	process->argv = argv;
+	process->use_shell = 1;
+	process->in = -1;
+	process->out = -1;
+	process->clean_on_exit = 1;
+	process->clean_on_exit_handler = stop_multi_file_filter;
+
+	if (start_command(process)) {
+		error("cannot fork to run external filter '%s'", cmd);
+		return NULL;
+	}
+
+	hashmap_entry_init(entry, strhash(cmd));
+
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+	err = packet_write_list(process->in, "git-filter-client", "version=2", NULL);
+	if (err)
+		goto done;
+
+	err = strcmp(packet_read_line(process->out, NULL), "git-filter-server");
+	if (err) {
+		error("external filter '%s' does not support filter protocol version 2", cmd);
+		goto done;
+	}
+	err = strcmp(packet_read_line(process->out, NULL), "version=2");
+	if (err)
+		goto done;
+
+	err = packet_write_list(process->in, "clean=true", "smudge=true", NULL);
+
+	for (;;) {
+		cap_buf = packet_read_line(process->out, NULL);
+		if (!cap_buf)
+			break;
+		string_list_split_in_place(&cap_list, cap_buf, '=', 1);
+
+		if (cap_list.nr != 2 || strcmp(cap_list.items[1].string, "true"))
+			continue;
+
+		cap_name = cap_list.items[0].string;
+		if (!strcmp(cap_name, "clean")) {
+			entry->supported_capabilities |= CAP_CLEAN;
+		} else if (!strcmp(cap_name, "smudge")) {
+			entry->supported_capabilities |= CAP_SMUDGE;
+		} else {
+			warning(
+				"external filter '%s' requested unsupported filter capability '%s'",
+				cmd, cap_name
+			);
+		}
+
+		string_list_clear(&cap_list, 0);
+	}
+
+done:
+	sigchain_pop(SIGPIPE);
+
+	if (err || errno == EPIPE) {
+		error("initialization for external filter '%s' failed", cmd);
+		kill_multi_file_filter(hashmap, entry);
+		return NULL;
+	}
+
+	hashmap_add(hashmap, entry);
+	return entry;
+}
+
+static int apply_multi_file_filter(const char *path, const char *src, size_t len,
+				   int fd, struct strbuf *dst, const char *cmd,
+				   const unsigned int wanted_capability)
+{
+	int err;
+	struct cmd2process *entry;
+	struct child_process *process;
+	struct strbuf nbuf = STRBUF_INIT;
+	struct strbuf filter_status = STRBUF_INIT;
+	const char *filter_type;
+
+	if (!cmd_process_map_initialized) {
+		cmd_process_map_initialized = 1;
+		hashmap_init(&cmd_process_map, (hashmap_cmp_fn) cmd2process_cmp, 0);
+		entry = NULL;
+	} else {
+		entry = find_multi_file_filter_entry(&cmd_process_map, cmd);
+	}
+
+	fflush(NULL);
+
+	if (!entry) {
+		entry = start_multi_file_filter(&cmd_process_map, cmd);
+		if (!entry)
+			return 0;
+	}
+	process = &entry->process;
+
+	if (!(wanted_capability & entry->supported_capabilities))
+		return 0;
+
+	if (CAP_CLEAN & wanted_capability)
+		filter_type = "clean";
+	else if (CAP_SMUDGE & wanted_capability)
+		filter_type = "smudge";
+	else
+		die("unexpected filter type");
+
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+	assert(strlen(filter_type) < LARGE_PACKET_DATA_MAX - strlen("command=\n"));
+	err = packet_write_fmt_gently(process->in, "command=%s\n", filter_type);
+	if (err)
+		goto done;
+
+	err = strlen(path) > LARGE_PACKET_DATA_MAX - strlen("pathname=\n");
+	if (err) {
+		error("path name too long for external filter");
+		goto done;
+	}
+
+	err = packet_write_fmt_gently(process->in, "pathname=%s\n", path);
+	if (err)
+		goto done;
+
+	err = packet_flush_gently(process->in);
+	if (err)
+		goto done;
+
+	if (fd >= 0)
+		err = write_packetized_from_fd(fd, process->in);
+	else
+		err = write_packetized_from_buf(src, len, process->in);
+	if (err)
+		goto done;
+
+	read_multi_file_filter_status(process->out, &filter_status);
+	err = strcmp(filter_status.buf, "success");
+	if (err)
+		goto done;
+
+	err = read_packetized_to_strbuf(process->out, &nbuf) < 0;
+	if (err)
+		goto done;
+
+	read_multi_file_filter_status(process->out, &filter_status);
+	err = strcmp(filter_status.buf, "success");
+
+done:
+	sigchain_pop(SIGPIPE);
+
+	if (err || errno == EPIPE) {
+		if (!strcmp(filter_status.buf, "error")) {
+			/* The filter signaled a problem with the file. */
+		} else if (!strcmp(filter_status.buf, "abort")) {
+			/*
+			 * The filter signaled a permanent problem. Don't try to filter
+			 * files with the same command for the lifetime of the current
+			 * Git process.
+			 */
+			 entry->supported_capabilities &= ~wanted_capability;
+		} else {
+			/*
+			 * Something went wrong with the protocol filter.
+			 * Force shutdown and restart if another blob requires filtering.
+			 */
+			error("external filter '%s' failed", cmd);
+			kill_multi_file_filter(&cmd_process_map, entry);
+		}
+	} else {
+		strbuf_swap(dst, &nbuf);
+	}
+	strbuf_release(&nbuf);
+	return !err;
+}
+
 static struct convert_driver {
 	const char *name;
 	struct convert_driver *next;
 	const char *smudge;
 	const char *clean;
+	const char *process;
 	int required;
 } *user_convert, **user_convert_tail;
 
@@ -510,13 +789,15 @@ static int apply_filter(const char *path, const char *src, size_t len,
 	if (!dst)
 		return 1;
 
-	if ((CAP_CLEAN & wanted_capability) && drv->clean)
+	if ((CAP_CLEAN & wanted_capability) && !drv->process && drv->clean)
 		cmd = drv->clean;
-	else if ((CAP_SMUDGE & wanted_capability) && drv->smudge)
+	else if ((CAP_SMUDGE & wanted_capability) && !drv->process && drv->smudge)
 		cmd = drv->smudge;
 
 	if (cmd && *cmd)
 		return apply_single_file_filter(path, src, len, fd, dst, cmd);
+	else if (drv->process && *drv->process)
+		return apply_multi_file_filter(path, src, len, fd, dst, drv->process, wanted_capability);
 
 	return 0;
 }
@@ -558,6 +839,9 @@ static int read_convert_config(const char *var, const char *value, void *cb)
 	if (!strcmp("clean", key))
 		return git_config_string(&drv->clean, var, value);
 
+	if (!strcmp("process", key))
+		return git_config_string(&drv->process, var, value);
+
 	if (!strcmp("required", key)) {
 		drv->required = git_config_bool(var, value);
 		return 0;
@@ -919,7 +1203,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
 	convert_attrs(&ca, path);
 
 	assert(ca.drv);
-	assert(ca.drv->clean);
+	assert(ca.drv->clean || ca.drv->process);
 
 	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv, CAP_CLEAN))
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
@@ -944,9 +1228,10 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
 	}
 	/*
 	 * CRLF conversion can be skipped if normalizing, unless there
-	 * is a smudge filter.  The filter might expect CRLFs.
+	 * is a smudge or process filter (even if the process filter doesn't
+	 * support smudge).  The filters might expect CRLFs.
 	 */
-	if ((ca.drv && ca.drv->smudge) || !normalizing) {
+	if ((ca.drv && (ca.drv->smudge || ca.drv->process)) || !normalizing) {
 		ret |= crlf_to_worktree(path, src, len, dst, ca.crlf_action);
 		if (ret) {
 			src = dst->buf;
@@ -1407,7 +1692,7 @@ struct stream_filter *get_stream_filter(const char *path, const unsigned char *s
 	struct stream_filter *filter = NULL;
 
 	convert_attrs(&ca, path);
-	if (ca.drv && (ca.drv->smudge || ca.drv->clean))
+	if (ca.drv && (ca.drv->process || ca.drv->smudge || ca.drv->clean))
 		return NULL;
 
 	if (ca.crlf_action == CRLF_AUTO || ca.crlf_action == CRLF_AUTO_CRLF)
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index dc50938..9f892c0 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -4,13 +4,75 @@ test_description='blob conversion via gitattributes'
 
 . ./test-lib.sh
 
-cat <<EOF >rot13.sh
+TEST_ROOT="$(pwd)"
+
+cat <<EOF >"$TEST_ROOT/rot13.sh"
 #!$SHELL_PATH
 tr \
   'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' \
   'nopqrstuvwxyzabcdefghijklmNOPQRSTUVWXYZABCDEFGHIJKLM'
 EOF
-chmod +x rot13.sh
+chmod +x "$TEST_ROOT/rot13.sh"
+
+generate_random_characters () {
+	LEN=$1
+	NAME=$2
+	test-genrandom some-seed $LEN |
+		perl -pe "s/./chr((ord($&) % 26) + ord('a'))/sge" >"$TEST_ROOT/$NAME"
+}
+
+file_size () {
+	cat "$1" | wc -c | sed "s/^[ ]*//"
+}
+
+filter_git () {
+	rm -f rot13-filter.log &&
+	git "$@" 2>git-stderr.log &&
+	rm -f git-stderr.log
+}
+
+# Count unique lines in two files and compare them.
+test_cmp_count () {
+	for FILE in $@
+	do
+		sort $FILE | uniq -c | sed "s/^[ ]*//" >$FILE.tmp
+		cat $FILE.tmp >$FILE
+	done &&
+	test_cmp $@
+}
+
+# Count unique lines except clean invocations in two files and compare
+# them. Clean invocations are not counted because their number can vary.
+# c.f. http://public-inbox.org/git/xmqqshv18i8i.fsf@gitster.mtv.corp.google.com/
+test_cmp_count_except_clean () {
+	for FILE in $@
+	do
+		sort $FILE | uniq -c | sed "s/^[ ]*//" |
+			sed "s/^\([0-9]\) IN: clean/x IN: clean/" >$FILE.tmp
+		cat $FILE.tmp >$FILE
+	done &&
+	test_cmp $@
+}
+
+# Compare two files but exclude clean invocations because they can vary.
+# c.f. http://public-inbox.org/git/xmqqshv18i8i.fsf@gitster.mtv.corp.google.com/
+test_cmp_exclude_clean () {
+	for FILE in $@
+	do
+		grep -v "IN: clean" $FILE >$FILE.tmp
+		cat $FILE.tmp >$FILE
+	done &&
+	test_cmp $@
+}
+
+# Check that the contents of two files are equal and that their rot13 version
+# is equal to the committed content.
+test_cmp_committed_rot13 () {
+	test_cmp "$1" "$2" &&
+	"$TEST_ROOT/rot13.sh" <"$1" >expected &&
+	git cat-file blob :"$2" >actual &&
+	test_cmp expected actual
+}
 
 test_expect_success setup '
 	git config filter.rot13.smudge ./rot13.sh &&
@@ -31,7 +93,10 @@ test_expect_success setup '
 	cat test >test.i &&
 	git add test test.t test.i &&
 	rm -f test test.t test.i &&
-	git checkout -- test test.t test.i
+	git checkout -- test test.t test.i &&
+
+	echo "content-test2" >test2.o &&
+	echo "content-test3 - filename with special characters" >"test3 '\''sq'\'',\$x.o"
 '
 
 script='s/^\$Id: \([0-9a-f]*\) \$/\1/p'
@@ -279,4 +344,380 @@ test_expect_success 'diff does not reuse worktree files that need cleaning' '
 	test_line_count = 0 count
 '
 
+test_expect_success PERL 'required process filter should filter data' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "git-stderr.log" >.gitignore &&
+		echo "*.r filter=protocol" >.gitattributes &&
+		git add . &&
+		git commit . -m "test commit 1" &&
+		git branch empty-branch &&
+
+		cp "$TEST_ROOT/test.o" test.r &&
+		cp "$TEST_ROOT/test2.o" test2.r &&
+		mkdir testsubdir &&
+		cp "$TEST_ROOT/test3 '\''sq'\'',\$x.o" "testsubdir/test3 '\''sq'\'',\$x.r" &&
+		>test4-empty.r &&
+
+		S=$(file_size test.r) &&
+		S2=$(file_size test2.r) &&
+		S3=$(file_size "testsubdir/test3 '\''sq'\'',\$x.r") &&
+
+		filter_git add . &&
+		cat >expected.log <<-EOF &&
+			START
+			init handshake complete
+			IN: clean test.r $S [OK] -- OUT: $S . [OK]
+			IN: clean test2.r $S2 [OK] -- OUT: $S2 . [OK]
+			IN: clean test4-empty.r 0 [OK] -- OUT: 0  [OK]
+			IN: clean testsubdir/test3 '\''sq'\'',\$x.r $S3 [OK] -- OUT: $S3 . [OK]
+			STOP
+		EOF
+		test_cmp_count expected.log rot13-filter.log &&
+
+		filter_git commit . -m "test commit 2" &&
+		cat >expected.log <<-EOF &&
+			START
+			init handshake complete
+			IN: clean test.r $S [OK] -- OUT: $S . [OK]
+			IN: clean test2.r $S2 [OK] -- OUT: $S2 . [OK]
+			IN: clean test4-empty.r 0 [OK] -- OUT: 0  [OK]
+			IN: clean testsubdir/test3 '\''sq'\'',\$x.r $S3 [OK] -- OUT: $S3 . [OK]
+			IN: clean test.r $S [OK] -- OUT: $S . [OK]
+			IN: clean test2.r $S2 [OK] -- OUT: $S2 . [OK]
+			IN: clean test4-empty.r 0 [OK] -- OUT: 0  [OK]
+			IN: clean testsubdir/test3 '\''sq'\'',\$x.r $S3 [OK] -- OUT: $S3 . [OK]
+			STOP
+		EOF
+		test_cmp_count_except_clean expected.log rot13-filter.log &&
+
+		rm -f test2.r "testsubdir/test3 '\''sq'\'',\$x.r" &&
+
+		filter_git checkout --quiet --no-progress . &&
+		cat >expected.log <<-EOF &&
+			START
+			init handshake complete
+			IN: smudge test2.r $S2 [OK] -- OUT: $S2 . [OK]
+			IN: smudge testsubdir/test3 '\''sq'\'',\$x.r $S3 [OK] -- OUT: $S3 . [OK]
+			STOP
+		EOF
+		test_cmp_exclude_clean expected.log rot13-filter.log &&
+
+		filter_git checkout --quiet --no-progress empty-branch &&
+		cat >expected.log <<-EOF &&
+			START
+			init handshake complete
+			IN: clean test.r $S [OK] -- OUT: $S . [OK]
+			STOP
+		EOF
+		test_cmp_exclude_clean expected.log rot13-filter.log &&
+
+		filter_git checkout --quiet --no-progress master &&
+		cat >expected.log <<-EOF &&
+			START
+			init handshake complete
+			IN: smudge test.r $S [OK] -- OUT: $S . [OK]
+			IN: smudge test2.r $S2 [OK] -- OUT: $S2 . [OK]
+			IN: smudge test4-empty.r 0 [OK] -- OUT: 0  [OK]
+			IN: smudge testsubdir/test3 '\''sq'\'',\$x.r $S3 [OK] -- OUT: $S3 . [OK]
+			STOP
+		EOF
+		test_cmp_exclude_clean expected.log rot13-filter.log &&
+
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.r &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test2.o" test2.r &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test3 '\''sq'\'',\$x.o" "testsubdir/test3 '\''sq'\'',\$x.r"
+	)
+'
+
+test_expect_success PERL 'required process filter takes precedence' '
+	test_config_global filter.protocol.clean false &&
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean" &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+		cp "$TEST_ROOT/test.o" test.r &&
+		S=$(file_size test.r) &&
+
+		# Check that the process filter is invoked here
+		filter_git add . &&
+		cat >expected.log <<-EOF &&
+			START
+			init handshake complete
+			IN: clean test.r $S [OK] -- OUT: $S . [OK]
+			STOP
+		EOF
+		test_cmp_count expected.log rot13-filter.log
+	)
+'
+
+test_expect_success PERL 'required process filter should be used only for "clean" operation only' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean" &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+		cp "$TEST_ROOT/test.o" test.r &&
+		S=$(file_size test.r) &&
+
+		filter_git add . &&
+		cat >expected.log <<-EOF &&
+			START
+			init handshake complete
+			IN: clean test.r $S [OK] -- OUT: $S . [OK]
+			STOP
+		EOF
+		test_cmp_count expected.log rot13-filter.log &&
+
+		rm test.r &&
+
+		filter_git checkout --quiet --no-progress . &&
+		# If the filter would be used for "smudge", too, we would see
+		# "IN: smudge test.r 57 [OK] -- OUT: 57 . [OK]" here
+		cat >expected.log <<-EOF &&
+			START
+			init handshake complete
+			STOP
+		EOF
+		test_cmp_exclude_clean expected.log rot13-filter.log
+	)
+'
+
+test_expect_success PERL 'required process filter should process multiple packets' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.required true &&
+
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		# Generate data requiring 1, 2, 3 packets
+		S=65516 && # PKTLINE_DATA_MAXLEN -> Maximal size of a packet
+		generate_random_characters $(($S    )) 1pkt_1__.file &&
+		generate_random_characters $(($S  +1)) 2pkt_1+1.file &&
+		generate_random_characters $(($S*2-1)) 2pkt_2-1.file &&
+		generate_random_characters $(($S*2  )) 2pkt_2__.file &&
+		generate_random_characters $(($S*2+1)) 3pkt_2+1.file &&
+
+		for FILE in "$TEST_ROOT"/*.file
+		do
+			cp "$FILE" . &&
+			"$TEST_ROOT/rot13.sh" <"$FILE" >"$FILE.rot13"
+		done &&
+
+		echo "*.file filter=protocol" >.gitattributes &&
+		filter_git add *.file .gitattributes &&
+		cat >expected.log <<-EOF &&
+			START
+			init handshake complete
+			IN: clean 1pkt_1__.file $(($S    )) [OK] -- OUT: $(($S    )) . [OK]
+			IN: clean 2pkt_1+1.file $(($S  +1)) [OK] -- OUT: $(($S  +1)) .. [OK]
+			IN: clean 2pkt_2-1.file $(($S*2-1)) [OK] -- OUT: $(($S*2-1)) .. [OK]
+			IN: clean 2pkt_2__.file $(($S*2  )) [OK] -- OUT: $(($S*2  )) .. [OK]
+			IN: clean 3pkt_2+1.file $(($S*2+1)) [OK] -- OUT: $(($S*2+1)) ... [OK]
+			STOP
+		EOF
+		test_cmp_count expected.log rot13-filter.log &&
+
+		rm -f *.file &&
+
+		filter_git checkout --quiet --no-progress -- *.file &&
+		cat >expected.log <<-EOF &&
+			START
+			init handshake complete
+			IN: smudge 1pkt_1__.file $(($S    )) [OK] -- OUT: $(($S    )) . [OK]
+			IN: smudge 2pkt_1+1.file $(($S  +1)) [OK] -- OUT: $(($S  +1)) .. [OK]
+			IN: smudge 2pkt_2-1.file $(($S*2-1)) [OK] -- OUT: $(($S*2-1)) .. [OK]
+			IN: smudge 2pkt_2__.file $(($S*2  )) [OK] -- OUT: $(($S*2  )) .. [OK]
+			IN: smudge 3pkt_2+1.file $(($S*2+1)) [OK] -- OUT: $(($S*2+1)) ... [OK]
+			STOP
+		EOF
+		test_cmp_exclude_clean expected.log rot13-filter.log &&
+
+		for FILE in *.file
+		do
+			test_cmp_committed_rot13 "$TEST_ROOT/$FILE" $FILE
+		done
+	)
+'
+
+test_expect_success PERL 'required process filter with clean error should fail' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+
+		cp "$TEST_ROOT/test.o" test.r &&
+		echo "this is going to fail" >clean-write-fail.r &&
+		echo "content-test3-subdir" >test3.r &&
+
+		test_must_fail git add .
+	)
+'
+
+test_expect_success PERL 'process filter should restart after unexpected write failure' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+
+		cp "$TEST_ROOT/test.o" test.r &&
+		cp "$TEST_ROOT/test2.o" test2.r &&
+		echo "this is going to fail" >smudge-write-fail.o &&
+		cp smudge-write-fail.o smudge-write-fail.r &&
+
+		S=$(file_size test.r) &&
+		S2=$(file_size test2.r) &&
+		SF=$(file_size smudge-write-fail.r) &&
+
+		git add . &&
+		rm -f *.r &&
+
+		rm -f rot13-filter.log &&
+		git checkout --quiet --no-progress . 2>git-stderr.log &&
+
+		grep "smudge write error at" git-stderr.log &&
+		grep "error: external filter" git-stderr.log &&
+
+		cat >expected.log <<-EOF &&
+			START
+			init handshake complete
+			IN: smudge smudge-write-fail.r $SF [OK] -- OUT: $SF [WRITE FAIL]
+			START
+			init handshake complete
+			IN: smudge test.r $S [OK] -- OUT: $S . [OK]
+			IN: smudge test2.r $S2 [OK] -- OUT: $S2 . [OK]
+			STOP
+		EOF
+		test_cmp_exclude_clean expected.log rot13-filter.log &&
+
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.r &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test2.o" test2.r &&
+
+		# Smudge failed
+		! test_cmp smudge-write-fail.o smudge-write-fail.r &&
+		"$TEST_ROOT/rot13.sh" <smudge-write-fail.o >expected &&
+		git cat-file blob :smudge-write-fail.r >actual &&
+		test_cmp expected actual
+	)
+'
+
+test_expect_success PERL 'process filter should not be restarted if it signals an error' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+
+		cp "$TEST_ROOT/test.o" test.r &&
+		cp "$TEST_ROOT/test2.o" test2.r &&
+		echo "this will cause an error" >error.o &&
+		cp error.o error.r &&
+
+		S=$(file_size test.r) &&
+		S2=$(file_size test2.r) &&
+		SE=$(file_size error.r) &&
+
+		git add . &&
+		rm -f *.r &&
+
+		filter_git checkout --quiet --no-progress . &&
+		cat >expected.log <<-EOF &&
+			START
+			init handshake complete
+			IN: smudge error.r $SE [OK] -- OUT: 0 [ERROR]
+			IN: smudge test.r $S [OK] -- OUT: $S . [OK]
+			IN: smudge test2.r $S2 [OK] -- OUT: $S2 . [OK]
+			STOP
+		EOF
+		test_cmp_exclude_clean expected.log rot13-filter.log &&
+
+		test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.r &&
+		test_cmp_committed_rot13 "$TEST_ROOT/test2.o" test2.r &&
+		test_cmp error.o error.r
+	)
+'
+
+test_expect_success PERL 'process filter abort stops processing of all further files' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+
+		cp "$TEST_ROOT/test.o" test.r &&
+		cp "$TEST_ROOT/test2.o" test2.r &&
+		echo "error this blob and all future blobs" >abort.o &&
+		cp abort.o abort.r &&
+
+		SA=$(file_size abort.r) &&
+
+		git add . &&
+		rm -f *.r &&
+
+		# Note: This test assumes that Git filters files in alphabetical
+		# order ("abort.r" before "test.r").
+		filter_git checkout --quiet --no-progress . &&
+		cat >expected.log <<-EOF &&
+			START
+			init handshake complete
+			IN: smudge abort.r $SA [OK] -- OUT: 0 [ABORT]
+			STOP
+		EOF
+		test_cmp_exclude_clean expected.log rot13-filter.log &&
+
+		test_cmp "$TEST_ROOT/test.o" test.r &&
+		test_cmp "$TEST_ROOT/test2.o" test2.r &&
+		test_cmp abort.o abort.r
+	)
+'
+
+test_expect_success PERL 'invalid process filter must fail (and not hang!)' '
+	test_config_global filter.protocol.process cat &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+
+		cp "$TEST_ROOT/test.o" test.r &&
+		test_must_fail git add . 2>git-stderr.log &&
+		grep "does not support filter protocol version" git-stderr.log
+	)
+'
+
 test_done
diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
new file mode 100755
index 0000000..1a6959c
--- /dev/null
+++ b/t/t0021/rot13-filter.pl
@@ -0,0 +1,191 @@
+#!/usr/bin/perl
+#
+# Example implementation for the Git filter protocol version 2
+# See Documentation/gitattributes.txt, section "Filter Protocol"
+#
+# The script takes the list of supported protocol capabilities as
+# arguments ("clean", "smudge", etc).
+#
+# This implementation supports special test cases:
+# (1) If data with the pathname "clean-write-fail.r" is processed with
+#     a "clean" operation then the write operation will die.
+# (2) If data with the pathname "smudge-write-fail.r" is processed with
+#     a "smudge" operation then the write operation will die.
+# (3) If data with the pathname "error.r" is processed with any
+#     operation then the filter signals that it cannot or does not want
+#     to process the file.
+# (4) If data with the pathname "abort.r" is processed with any
+#     operation then the filter signals that it cannot or does not want
+#     to process the file and any file after that is processed with the
+#     same command.
+#
+
+use strict;
+use warnings;
+
+my $MAX_PACKET_CONTENT_SIZE = 65516;
+my @capabilities            = @ARGV;
+
+open my $debug, ">>", "rot13-filter.log" or die "cannot open log file: $!";
+
+sub rot13 {
+	my $str = shift;
+	$str =~ y/A-Za-z/N-ZA-Mn-za-m/;
+	return $str;
+}
+
+sub packet_bin_read {
+	my $buffer;
+	my $bytes_read = read STDIN, $buffer, 4;
+	if ( $bytes_read == 0 ) {
+		# EOF - Git stopped talking to us!
+		print $debug "STOP\n";
+		exit();
+	}
+	elsif ( $bytes_read != 4 ) {
+		die "invalid packet: '$buffer'";
+	}
+	my $pkt_size = hex($buffer);
+	if ( $pkt_size == 0 ) {
+		return ( 1, "" );
+	}
+	elsif ( $pkt_size > 4 ) {
+		my $content_size = $pkt_size - 4;
+		$bytes_read = read STDIN, $buffer, $content_size;
+		if ( $bytes_read != $content_size ) {
+			die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";
+		}
+		return ( 0, $buffer );
+	}
+	else {
+		die "invalid packet size: $pkt_size";
+	}
+}
+
+sub packet_txt_read {
+	my ( $res, $buf ) = packet_bin_read();
+	unless ( $buf =~ s/\n$// ) {
+		die "A non-binary line MUST be terminated by an LF.";
+	}
+	return ( $res, $buf );
+}
+
+sub packet_bin_write {
+	my $buf = shift;
+	print STDOUT sprintf( "%04x", length($buf) + 4 );
+	print STDOUT $buf;
+	STDOUT->flush();
+}
+
+sub packet_txt_write {
+	packet_bin_write( $_[0] . "\n" );
+}
+
+sub packet_flush {
+	print STDOUT sprintf( "%04x", 0 );
+	STDOUT->flush();
+}
+
+print $debug "START\n";
+$debug->flush();
+
+( packet_txt_read() eq ( 0, "git-filter-client" ) ) || die "bad initialize";
+( packet_txt_read() eq ( 0, "version=2" ) )         || die "bad version";
+( packet_bin_read() eq ( 1, "" ) )                  || die "bad version end";
+
+packet_txt_write("git-filter-server");
+packet_txt_write("version=2");
+
+( packet_txt_read() eq ( 0, "clean=true" ) )  || die "bad capability";
+( packet_txt_read() eq ( 0, "smudge=true" ) ) || die "bad capability";
+( packet_bin_read() eq ( 1, "" ) )            || die "bad capability end";
+
+foreach (@capabilities) {
+	packet_txt_write( $_ . "=true" );
+}
+packet_flush();
+print $debug "init handshake complete\n";
+$debug->flush();
+
+while (1) {
+	my ($command) = packet_txt_read() =~ /^command=([^=]+)$/;
+	print $debug "IN: $command";
+	$debug->flush();
+
+	my ($pathname) = packet_txt_read() =~ /^pathname=([^=]+)$/;
+	print $debug " $pathname";
+	$debug->flush();
+
+	# Flush
+	packet_bin_read();
+
+	my $input = "";
+	{
+		binmode(STDIN);
+		my $buffer;
+		my $done = 0;
+		while ( !$done ) {
+			( $done, $buffer ) = packet_bin_read();
+			$input .= $buffer;
+		}
+		print $debug " " . length($input) . " [OK] -- ";
+		$debug->flush();
+	}
+
+	my $output;
+	if ( $pathname eq "error.r" or $pathname eq "abort.r" ) {
+		$output = "";
+	}
+	elsif ( $command eq "clean" and grep( /^clean$/, @capabilities ) ) {
+		$output = rot13($input);
+	}
+	elsif ( $command eq "smudge" and grep( /^smudge$/, @capabilities ) ) {
+		$output = rot13($input);
+	}
+	else {
+		die "bad command '$command'";
+	}
+
+	print $debug "OUT: " . length($output) . " ";
+	$debug->flush();
+
+	if ( $pathname eq "error.r" ) {
+		print $debug "[ERROR]\n";
+		$debug->flush();
+		packet_txt_write("status=error");
+		packet_flush();
+	}
+	elsif ( $pathname eq "abort.r" ) {
+		print $debug "[ABORT]\n";
+		$debug->flush();
+		packet_txt_write("status=abort");
+		packet_flush();
+	}
+	else {
+		packet_txt_write("status=success");
+		packet_flush();
+
+		if ( $pathname eq "${command}-write-fail.r" ) {
+			print $debug "[WRITE FAIL]\n";
+			$debug->flush();
+			die "${command} write error";
+		}
+
+		while ( length($output) > 0 ) {
+			my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE );
+			packet_bin_write($packet);
+			# dots represent the number of packets
+			print $debug ".";
+			if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) {
+				$output = substr( $output, $MAX_PACKET_CONTENT_SIZE );
+			}
+			else {
+				$output = "";
+			}
+		}
+		packet_flush();
+		print $debug " [OK]\n";
+		$debug->flush();
+		packet_flush();
+	}
+}
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v10 14/14] contrib/long-running-filter: add long running filter example
  2016-10-08 11:25 [PATCH v10 00/14] Git filter protocol larsxschneider
                   ` (12 preceding siblings ...)
  2016-10-08 11:25 ` [PATCH v10 13/14] convert: add " larsxschneider
@ 2016-10-08 11:25 ` larsxschneider
  2016-10-09  5:42   ` Torsten Bögershausen
  13 siblings, 1 reply; 34+ messages in thread
From: larsxschneider @ 2016-10-08 11:25 UTC (permalink / raw)
  To: git; +Cc: gitster, jnareb, peff, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Add a simple pass-thru filter as example implementation for the Git
filter protocol version 2. See Documentation/gitattributes.txt, section
"Filter Protocol" for more info.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 Documentation/gitattributes.txt        |   4 +-
 contrib/long-running-filter/example.pl | 127 +++++++++++++++++++++++++++++++++
 2 files changed, 130 insertions(+), 1 deletion(-)
 create mode 100755 contrib/long-running-filter/example.pl

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 5868f00..a182ef2 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -514,7 +514,9 @@ the next "key=value" list containing a command. Git will close
 the command pipe on exit. The filter is expected to detect EOF
 and exit gracefully on its own.
 
-If you develop your own long running filter
+A long running filter demo implementation can be found in
+`contrib/long-running-filter/example.pl` located in the Git
+core repository. If you develop your own long running filter
 process then the `GIT_TRACE_PACKET` environment variables can be
 very helpful for debugging (see linkgit:git[1]).
 
diff --git a/contrib/long-running-filter/example.pl b/contrib/long-running-filter/example.pl
new file mode 100755
index 0000000..f4102d2
--- /dev/null
+++ b/contrib/long-running-filter/example.pl
@@ -0,0 +1,127 @@
+#!/usr/bin/perl
+#
+# Example implementation for the Git filter protocol version 2
+# See Documentation/gitattributes.txt, section "Filter Protocol"
+#
+# Please note, this pass-thru filter is a minimal skeleton. No proper
+# error handling was implemented.
+#
+
+use strict;
+use warnings;
+
+my $MAX_PACKET_CONTENT_SIZE = 65516;
+
+sub packet_bin_read {
+	my $buffer;
+	my $bytes_read = read STDIN, $buffer, 4;
+	if ( $bytes_read == 0 ) {
+
+		# EOF - Git stopped talking to us!
+		exit();
+	}
+	elsif ( $bytes_read != 4 ) {
+		die "invalid packet: '$buffer'";
+	}
+	my $pkt_size = hex($buffer);
+	if ( $pkt_size == 0 ) {
+		return ( 1, "" );
+	}
+	elsif ( $pkt_size > 4 ) {
+		my $content_size = $pkt_size - 4;
+		$bytes_read = read STDIN, $buffer, $content_size;
+		if ( $bytes_read != $content_size ) {
+			die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";
+		}
+		return ( 0, $buffer );
+	}
+	else {
+		die "invalid packet size: $pkt_size";
+	}
+}
+
+sub packet_txt_read {
+	my ( $res, $buf ) = packet_bin_read();
+	unless ( $buf =~ s/\n$// ) {
+		die "A non-binary line MUST be terminated by an LF.";
+	}
+	return ( $res, $buf );
+}
+
+sub packet_bin_write {
+	my $buf = shift;
+	print STDOUT sprintf( "%04x", length($buf) + 4 );
+	print STDOUT $buf;
+	STDOUT->flush();
+}
+
+sub packet_txt_write {
+	packet_bin_write( $_[0] . "\n" );
+}
+
+sub packet_flush {
+	print STDOUT sprintf( "%04x", 0 );
+	STDOUT->flush();
+}
+
+( packet_txt_read() eq ( 0, "git-filter-client" ) ) || die "bad initialize";
+( packet_txt_read() eq ( 0, "version=2" ) )         || die "bad version";
+( packet_bin_read() eq ( 1, "" ) )                  || die "bad version end";
+
+packet_txt_write("git-filter-server");
+packet_txt_write("version=2");
+
+( packet_txt_read() eq ( 0, "clean=true" ) )  || die "bad capability";
+( packet_txt_read() eq ( 0, "smudge=true" ) ) || die "bad capability";
+( packet_bin_read() eq ( 1, "" ) )            || die "bad capability end";
+
+packet_txt_write("clean=true");
+packet_txt_write("smudge=true");
+packet_flush();
+
+while (1) {
+	my ($command)  = packet_txt_read() =~ /^command=([^=]+)$/;
+	my ($pathname) = packet_txt_read() =~ /^pathname=([^=]+)$/;
+
+	packet_bin_read();
+
+	my $input = "";
+	{
+		binmode(STDIN);
+		my $buffer;
+		my $done = 0;
+		while ( !$done ) {
+			( $done, $buffer ) = packet_bin_read();
+			$input .= $buffer;
+		}
+	}
+
+	my $output;
+	if ( $command eq "clean" ) {
+		### Perform clean here ###
+		$output = $input;
+	}
+	elsif ( $command eq "smudge" ) {
+		### Perform smudge here ###
+		$output = $input;
+	}
+	else {
+		die "bad command '$command'";
+	}
+
+	packet_txt_write("status=success");
+	packet_flush();
+	while ( length($output) > 0 ) {
+		my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE );
+		packet_bin_write($packet);
+		if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) {
+			$output = substr( $output, $MAX_PACKET_CONTENT_SIZE );
+		}
+		else {
+			$output = "";
+		}
+	}
+	packet_flush();    # flush content!
+	packet_flush();    # empty list, keep "status=success" unchanged!
+
+}
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v10 13/14] convert: add filter.<driver>.process option
  2016-10-08 11:25 ` [PATCH v10 13/14] convert: add " larsxschneider
@ 2016-10-08 23:06   ` Jakub Narębski
  2016-10-09  5:32     ` Torsten Bögershausen
  2016-10-11 22:26     ` Lars Schneider
  2016-10-10 19:58   ` Junio C Hamano
  1 sibling, 2 replies; 34+ messages in thread
From: Jakub Narębski @ 2016-10-08 23:06 UTC (permalink / raw)
  To: Lars Schneider, git; +Cc: Junio C Hamano, Jeff King

Part 1 of review, starting with the protocol v2 itself.

W dniu 08.10.2016 o 13:25, larsxschneider@gmail.com pisze:
> From: Lars Schneider <larsxschneider@gmail.com>
> 
> Git's clean/smudge mechanism invokes an external filter process for
> every single blob that is affected by a filter. If Git filters a lot of
> blobs then the startup time of the external filter processes can become
> a significant part of the overall Git execution time.
> 
> In a preliminary performance test this developer used a clean/smudge
> filter written in golang to filter 12,000 files. This process took 364s
> with the existing filter mechanism and 5s with the new mechanism. See
> details here: https://github.com/github/git-lfs/pull/1382
> 
> This patch adds the `filter.<driver>.process` string option which, if
> used, keeps the external filter process running and processes all blobs
> with the packet format (pkt-line) based protocol over standard input and
> standard output. The full protocol is explained in detail in
> `Documentation/gitattributes.txt`.
> 
> A few key decisions:
> 
> * The long running filter process is referred to as filter protocol
>   version 2 because the existing single shot filter invocation is
>   considered version 1.
> * Git sends a welcome message and expects a response right after the
>   external filter process has started. This ensures that Git will not
>   hang if a version 1 filter is incorrectly used with the
>   filter.<driver>.process option for version 2 filters. In addition,
>   Git can detect this kind of error and warn the user.
> * The status of a filter operation (e.g. "success" or "error) is set
>   before the actual response and (if necessary!) re-set after the
>   response. The advantage of this two step status response is that if
>   the filter detects an error early, then the filter can communicate
>   this and Git does not even need to create structures to read the
>   response.
> * All status responses are pkt-line lists terminated with a flush
>   packet. This allows us to send other status fields with the same
>   protocol in the future.

Looks good to me.

> 
> Helped-by: Martin-Louis Bright <mlbright@gmail.com>
> Reviewed-by: Jakub Narebski <jnareb@gmail.com>
> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  Documentation/gitattributes.txt | 157 +++++++++++++-
>  convert.c                       | 297 +++++++++++++++++++++++++-
>  t/t0021-conversion.sh           | 447 +++++++++++++++++++++++++++++++++++++++-
>  t/t0021/rot13-filter.pl         | 191 +++++++++++++++++
>  4 files changed, 1082 insertions(+), 10 deletions(-)
>  create mode 100755 t/t0021/rot13-filter.pl
> 
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 7aff940..5868f00 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -293,7 +293,13 @@ checkout, when the `smudge` command is specified, the command is
>  fed the blob object from its standard input, and its standard
>  output is used to update the worktree file.  Similarly, the
>  `clean` command is used to convert the contents of worktree file
> -upon checkin.
> +upon checkin. By default these commands process only a single
> +blob and terminate.  If a long running `process` filter is used
> +in place of `clean` and/or `smudge` filters, then Git can process
> +all blobs with a single filter command invocation for the entire
> +life of a single Git command, for example `git add --all`.  See
> +section below for the description of the protocol used to
> +communicate with a `process` filter.

I don't remember how this part looked like in previous versions
of this patch series, but "... is used in place of `clean` ..."
does not tell explicitly about the precedence of those 
configuration variables.  I think it should be stated explicitly
that `process` takes precedence over any `clean` and/or `smudge`
settings for the same `filter.<driver>` (regardless of whether
the long running `process` filter support "clean" and/or "smudge"
operations or not).

>  
>  One use of the content filtering is to massage the content into a shape
>  that is more convenient for the platform, filesystem, and the user to use.
> @@ -373,6 +379,155 @@ not exist, or may have different contents. So, smudge and clean commands
>  should not try to access the file on disk, but only act as filters on the
>  content provided to them on standard input.
>  
> +Long Running Filter Process
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +If the filter command (a string value) is defined via
> +`filter.<driver>.process` then Git can process all blobs with a
> +single filter invocation for the entire life of a single Git
> +command. This is achieved by using a packet format (pkt-line,
> +see technical/protocol-common.txt) based protocol over standard
> +input and standard output as follows. All packets, except for the
> +"*CONTENT" packets and the "0000" flush packet, are considered
> +text and therefore are terminated by a LF.

Maybe s/standard input and output/\& of filter process,/ (that is,
add "... of filter process," to the third sentence in the above
paragraph).

I guess what LF (line-feed character, "\n") is should be obvious
for anybody who will be reading this part.  All right.

> +
> +Git starts the filter when it encounters the first file
> +that needs to be cleaned or smudged. After the filter started
> +Git sends a welcome message ("git-filter-client"), a list of
> +supported protocol version numbers, and a flush packet.

I guess there is no need to be more explicit in description here,
as the exact format should be obvious from the example below.
We could add that the list of supported protocol version numbers
is send as series of "version=<integer number>" text-packet
lines.

>                                                          Git expects
> +to read a welcome response message ("git-filter-server") and exactly
> +one protocol version number from the previously sent list. All further
> +communication will be based on the selected version. The remaining
> +protocol description below documents "version=2". Please note that
> +"version=42" in the example below does not exist and is only there
> +to illustrate how the protocol would look like with more than one
> +version.
> +
> +After the version negotiation Git sends a list of all capabilities that
> +it supports and a flush packet. Git expects to read a list of desired
> +capabilities, which must be a subset of the supported capabilities list,
> +and a flush packet as response:
> +------------------------
> +packet:          git> git-filter-client
> +packet:          git> version=2
> +packet:          git> version=42
> +packet:          git> 0000
> +packet:          git< git-filter-server
> +packet:          git< version=2
> +packet:          git> clean=true
> +packet:          git> smudge=true
> +packet:          git> not-yet-invented=true
> +packet:          git> 0000
> +packet:          git< clean=true
> +packet:          git< smudge=true
> +packet:          git< 0000

WARNING: This example is different from description!!!

In example you have Git sending "git-filter-client" and list of supported
protocol versions, terminated with flush packet, then filter driver
process sends "git-filter-server", exactly one version, *AND* list of
supported capabilities in "<capability>=true" format, terminated with
flush packet.

In description above the example you have 4-part handshake, not 3-part;
the filter is described to send list of supported capabilities last
(a subset of what Git command supports).  Moreover in the example in
previous version at least as far as v8 of this series, the response
from filter driver was fixed length list of two lines: magic string
"git-filter-server" and exactly one line with protocol version; this
part was *not* terminated with a flush packet (complicating code of
filter driver program a bit, I think).

I think this version of protocol is *better*, just the text needs to
be updated to match.  I wanted to propose something like this in v9,...

By the way, now I look at it, the argument for using the
"<capability>=true" format instead of "capability=<capability>"
(or "supported-command=<capability>") is weak.  The argument for
using "<variable>=<value>" to make it easier to implement parsing
is sound, but the argument for "<capability>=true" is weak.

The argument was that with "<capability>=true" one can simply
parse metadata into hash / dictionary / hashmap, and choose
response based on that.  Hash / hashmap / associative array
needs different keys, so the reasoning went for "<capability>=true"
over "capability=<capability>"... but the filter process still
needs to handle lines with repeating keys, namely "version=<N>"
lines!

So the argument doesn't hold water IMVHO, and we can choose
version which reads better / is more natural.

> +------------------------
> +Supported filter capabilities in version 2 are "clean" and
> +"smudge".

I think it would be good to have something here separating the
handshake part of protocol from the description of the working
part.  The latter loops over each file to be affected by given
filter driver; handshake is done only once per filter.

Maybe subsections?

> +
> +Afterwards Git sends a list of "key=value" pairs terminated with
> +a flush packet. The list will contain at least the filter command
> +(based on the supported capabilities) and the pathname of the file
> +to filter relative to the repository root. Right after these packets

I think you meant here "right after the flush packet", isn't it?
It would be more explicit.

> +Git sends the content split in zero or more pkt-line packets and a
> +flush packet to terminate content. Please note, that the filter
> +must not send any response before it received the content and the
> +final flush packet.

That's good to have this information in the documentation.

BTW. I hope that in the future this restriction could be lifted with
"stream" capability / option, by having Git read response from filter
driver in an asynchronous process, like for one-shot v1 filters.
But that is certainly for later.  Let's polish this series and have
it accepted first.

> +------------------------
> +packet:          git> command=smudge
> +packet:          git> pathname=path/testfile.dat
> +packet:          git> 0000
> +packet:          git> CONTENT
> +packet:          git> 0000
> +------------------------
> +
> +The filter is expected to respond with a list of "key=value" pairs
> +terminated with a flush packet.

I wonder if we could be more explicit that it is about "status"
response.  But I don't have good idea how to improve this sentence;
not that it is really needed, I don't think.

>                                  If the filter does not experience
> +problems then the list must contain a "success" status. Right after
> +these packets the filter is expected to send the content in zero
> +or more pkt-line packets and a flush packet at the end.

Perhaps "terminating it with a flush packet"?  But it is quite all
right as it is now.

>                                                       Finally, a
> +second list of "key=value" pairs terminated with a flush packet
> +is expected. The filter can change the status in the second list.

I would add here, to be more explicit:

  This second list of "key=value" pairs may be empty, and usually
  would be if there is nothing wrong with response or filter; the
  terminating flush packet must be here regardless.

Or something like that.  The above proposal could be certainly
improved.

> +------------------------
> +packet:          git< status=success
> +packet:          git< 0000
> +packet:          git< SMUDGED_CONTENT
> +packet:          git< 0000
> +packet:          git< 0000  # empty list, keep "status=success" unchanged!

All right, looks good.  Is this exclamation mark "!" necessary / wanted?

> +------------------------
> +
> +If the result content is empty then the filter is expected to respond
> +with a "success" status and an empty list.

Actually, it is empty content, not empty list; that is response (filter
output) composed entirely of flush packet.

Sidenote: I first thought that "empty list" here was about the post-content
information, which may be empty, and for empty contents it would almost
certainly be empty list - there is nothing I think that can change status
of filter...

> +------------------------
> +packet:          git< status=success
> +packet:          git< 0000
> +packet:          git< 0000  # empty content!
> +packet:          git< 0000  # empty list, keep "status=success" unchanged!
> +------------------------
> +
> +In case the filter cannot or does not want to process the content,
> +it is expected to respond with an "error" status. Depending on the
> +`filter.<driver>.required` flag Git will interpret that as error
> +but it will not stop or restart the filter process.

I think those two parts of last sentence: the part of 'required' flag,
and the part about restarting process would be better either split,
or their order reversed: first, tell that Git would not restart filter;
second, that it would continue with next file without 'required' flag,
and error-out if filter has 'required' flag.  Though perhaps this should
be known at this point of documentation, and doesn't need repeating...

Ugh.  Well, it's quite good as it is now.

> +------------------------
> +packet:          git< status=error
> +packet:          git< 0000
> +------------------------
> +
> +If the filter experiences an error during processing, then it can
> +send the status "error" after the content was (partially or
> +completely) sent. Depending on the `filter.<driver>.required` flag
> +Git will interpret that as error but it will not stop or restart the
> +filter process.

Errr... this is literal repetition.  You need to decide whether to
put it before example, or after example.  Or maybe split it.

> +------------------------
> +packet:          git< status=success
> +packet:          git< 0000
> +packet:          git< HALF_WRITTEN_ERRONEOUS_CONTENT
> +packet:          git< 0000
> +packet:          git< status=error
> +packet:          git< 0000
> +------------------------
> +
> +If the filter dies during the communication or does not adhere to
> +the protocol then Git will stop the filter process and restart it
> +with the next file that needs to be processed. Depending on the
> +`filter.<driver>.required` flag Git will interpret that as error.

Uhh... until now the order was explanation, then example.  From the
duplicated description above, it is now first example, then
description.  Consistency would be good.

> +
> +The error handling for all cases above mimic the behavior of
> +the `filter.<driver>.clean` / `filter.<driver>.smudge` error
> +handling.

You have "error handling" repeated here.

> +
> +In case the filter cannot or does not want to process the content
> +as well as any future content for the lifetime of the Git process,
> +it is expected to respond with an "abort" status at any point in
> +the protocol. Depending on the `filter.<driver>.required` flag Git
> +will interpret that as error for the content as well as any future
> +content for the lifetime of the Git process but it will not stop or
> +restart the filter process.

Here the order, first about `required` then without, looks all right
to me: the result wrt process is the same, only the error or not
changes.

And here we have description first, example second.

> +------------------------
> +packet:          git< status=abort
> +packet:          git< 0000
> +------------------------
> +
> +After the filter has processed a blob it is expected to wait for
> +the next "key=value" list containing a command. Git will close
> +the command pipe on exit. The filter is expected to detect EOF
> +and exit gracefully on its own.

Any "kill filter" solutions should probably be put here.  I guess
that filter exiting means EOF on its standard output when read
by Git command, isn't it?

> +
> +If you develop your own long running filter
> +process then the `GIT_TRACE_PACKET` environment variables can be
> +very helpful for debugging (see linkgit:git[1]).

s/environment variables/environment variable/  - there is only
one GIT_TRACE_PACKET.  Unless you wanted to write about GIT_TRACE?

> +
> +If a `filter.<driver>.process` command is configured then it
> +always takes precedence over a configured `filter.<driver>.clean`
> +or `filter.<driver>.smudge` command.

Ah, it is here! I think it would be better to put it upfront; you
don't need information about the protocol to *use* the existing
filter, but you need this info.

Or maybe we can repeat this information.

> +
> +Please note that you cannot use an existing `filter.<driver>.clean`
> +or `filter.<driver>.smudge` command with `filter.<driver>.process`
> +because the former two use a different inter process communication
> +protocol than the latter one.

I'm not sure where this should be (but it is needed)

> +
> +
>  Interaction between checkin/checkout attributes
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v10 13/14] convert: add filter.<driver>.process option
  2016-10-08 23:06   ` Jakub Narębski
@ 2016-10-09  5:32     ` Torsten Bögershausen
  2016-10-11 15:29       ` Lars Schneider
  2016-10-11 22:26     ` Lars Schneider
  1 sibling, 1 reply; 34+ messages in thread
From: Torsten Bögershausen @ 2016-10-09  5:32 UTC (permalink / raw)
  To: Jakub Narębski, Lars Schneider, git; +Cc: Junio C Hamano, Jeff King

On 09.10.16 01:06, Jakub Narębski wrote:
>> +------------------------
>> > +packet:          git< status=abort
>> > +packet:          git< 0000
>> > +------------------------
>> > +
>> > +After the filter has processed a blob it is expected to wait for
>> > +the next "key=value" list containing a command. Git will close
>> > +the command pipe on exit. The filter is expected to detect EOF
>> > +and exit gracefully on its own.
> Any "kill filter" solutions should probably be put here.  I guess
> that filter exiting means EOF on its standard output when read
> by Git command, isn't it?
>
Isn't it that Git closes the command pipe, then filter sees EOF on it's stdin

and does a graceful exit.




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v10 14/14] contrib/long-running-filter: add long running filter example
  2016-10-08 11:25 ` [PATCH v10 14/14] contrib/long-running-filter: add long running filter example larsxschneider
@ 2016-10-09  5:42   ` Torsten Bögershausen
  2016-10-15 14:47     ` Lars Schneider
  0 siblings, 1 reply; 34+ messages in thread
From: Torsten Bögershausen @ 2016-10-09  5:42 UTC (permalink / raw)
  To: larsxschneider, git; +Cc: gitster, jnareb, peff

On 08.10.16 13:25, larsxschneider@gmail.com wrote:
> From: Lars Schneider <larsxschneider@gmail.com>
> 
> Add a simple pass-thru filter as example implementation for the Git
> filter protocol version 2. See Documentation/gitattributes.txt, section
> "Filter Protocol" for more info.
> 

Nothing wrong with code in contrib.
I may have missed parts of the discussion, was there a good reason to
drop the test case completely?

>When adding a new feature, make sure that you have new tests to show
>the feature triggers the new behavior when it should, and to show the
>feature does not trigger when it shouldn't.  After any code change, make
>sure that the entire test suite passes.

Or is there a plan to add them later ?


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v10 13/14] convert: add filter.<driver>.process option
  2016-10-08 11:25 ` [PATCH v10 13/14] convert: add " larsxschneider
  2016-10-08 23:06   ` Jakub Narębski
@ 2016-10-10 19:58   ` Junio C Hamano
  2016-10-11  8:11     ` Lars Schneider
  1 sibling, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2016-10-10 19:58 UTC (permalink / raw)
  To: larsxschneider; +Cc: git, jnareb, peff

larsxschneider@gmail.com writes:

> +# Count unique lines in two files and compare them.
> +test_cmp_count () {
> +	for FILE in $@
> +	do
> +		sort $FILE | uniq -c | sed "s/^[ ]*//" >$FILE.tmp
> +		cat $FILE.tmp >$FILE

Unquoted references to $FILE bothers me.  Are you relying on them
getting split at IFS boundaries?  Otherwise write this (and other
similar ones) like so:

	for FILE in "$@"
	do
		do-this-to "$FILE" | ... >"$FILE.tmp" &&
		cat "$FILE.tmp" >"$FILE" &&
		rm -f "$FILE.tmp"

> +	done &&
> +	test_cmp $@

The use of "$@" here is quite pointless, as you _know_ all of them
are filenames, and you _know_ that test_cmp takes only two
filenames.  Be explicit and say

	test_cmp "$1" "$2"

or even

	test_cmp_count () {
	expect=$1 actual=$2
	for FILE in "$expect" "$actual"
	do
		...
	done &&
	test_cmp "$expect" "$actual"

> +# Count unique lines except clean invocations in two files and compare
> +# them. Clean invocations are not counted because their number can vary.
> +# c.f. http://public-inbox.org/git/xmqqshv18i8i.fsf@gitster.mtv.corp.google.com/
> +test_cmp_count_except_clean () {
> +	for FILE in $@
> +	do
> +		sort $FILE | uniq -c | sed "s/^[ ]*//" |
> +			sed "s/^\([0-9]\) IN: clean/x IN: clean/" >$FILE.tmp
> +		cat $FILE.tmp >$FILE
> +	done &&
> +	test_cmp $@
> +}

Why do you even _care_ about the number of invocations?  While I
told you why "clean" could be called multiple times under racy Git
as an example, that was not meant to be an exhaustive example.  I
wouldn't be surprised if we needed to run smudge twice, for example,
in some weirdly racy cases in the future.

Can we just have the correctness (i.e. "we expect that the working
tree file gets this as the result of checking it out, and we made
sure that is the case") test without getting into such an
implementation detail?

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v10 13/14] convert: add filter.<driver>.process option
  2016-10-10 19:58   ` Junio C Hamano
@ 2016-10-11  8:11     ` Lars Schneider
  2016-10-11 10:09       ` Torsten Bögershausen
  0 siblings, 1 reply; 34+ messages in thread
From: Lars Schneider @ 2016-10-11  8:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jakub Narębski, peff


> On 10 Oct 2016, at 21:58, Junio C Hamano <gitster@pobox.com> wrote:
> 
> larsxschneider@gmail.com writes:
> 
> [...]
>> +# Count unique lines except clean invocations in two files and compare
>> +# them. Clean invocations are not counted because their number can vary.
>> +# c.f. http://public-inbox.org/git/xmqqshv18i8i.fsf@gitster.mtv.corp.google.com/
>> +test_cmp_count_except_clean () {
>> +	for FILE in $@
>> +	do
>> +		sort $FILE | uniq -c | sed "s/^[ ]*//" |
>> +			sed "s/^\([0-9]\) IN: clean/x IN: clean/" >$FILE.tmp
>> +		cat $FILE.tmp >$FILE
>> +	done &&
>> +	test_cmp $@
>> +}
> 
> Why do you even _care_ about the number of invocations?  While I
> told you why "clean" could be called multiple times under racy Git
> as an example, that was not meant to be an exhaustive example.  I
> wouldn't be surprised if we needed to run smudge twice, for example,
> in some weirdly racy cases in the future.
> 
> Can we just have the correctness (i.e. "we expect that the working
> tree file gets this as the result of checking it out, and we made
> sure that is the case") test without getting into such an
> implementation detail?

My goal is to check that clean/smudge is invoked at least once. I could
just run `uniq` to achieve that but then all other filter commands could
happen multiple times and the test would not detect that.

I also prefer to check the filter commands to ensure the filter is 
working as expected (e.g. no multiple start ups etc) in addition to 
checking the working tree.

Would the patch below work for you? If yes, then please squash it into
"convert: add filter.<driver>.process option".

Thank you,
Lars



diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 9f892c0..714f706 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -31,38 +31,33 @@ filter_git () {
 	rm -f git-stderr.log
 }
 
-# Count unique lines in two files and compare them.
-test_cmp_count () {
-	for FILE in $@
-	do
-		sort $FILE | uniq -c | sed "s/^[ ]*//" >$FILE.tmp
-		cat $FILE.tmp >$FILE
-	done &&
-	test_cmp $@
-}
-
-# Count unique lines except clean invocations in two files and compare
-# them. Clean invocations are not counted because their number can vary.
+# Compare two files and ensure that `clean` and `smudge` respectively are
+# called at least once if specified in the `expect` file. The actual
+# invocation count is not relevant because their number can vary.
 # c.f. http://public-inbox.org/git/xmqqshv18i8i.fsf@gitster.mtv.corp.google.com/
-test_cmp_count_except_clean () {
-	for FILE in $@
+test_cmp_count () {
+	expect=$1 actual=$2
+	for FILE in "$expect" "$actual"
 	do
-		sort $FILE | uniq -c | sed "s/^[ ]*//" |
-			sed "s/^\([0-9]\) IN: clean/x IN: clean/" >$FILE.tmp
-		cat $FILE.tmp >$FILE
+		sort "$FILE" | uniq -c | sed "s/^[ ]*//" |
+			sed "s/^\([0-9]\) IN: clean/x IN: clean/" |
+			sed "s/^\([0-9]\) IN: smudge/x IN: smudge/" >"$FILE.tmp" &&
+		cat "$FILE.tmp" >"$FILE"
 	done &&
-	test_cmp $@
+	test_cmp "$expect" "$actual"
 }
 
-# Compare two files but exclude clean invocations because they can vary.
+# Compare two files but exclude all `clean` invocations because Git can
+# call `clean` zero or more times.
 # c.f. http://public-inbox.org/git/xmqqshv18i8i.fsf@gitster.mtv.corp.google.com/
 test_cmp_exclude_clean () {
-	for FILE in $@
+	expect=$1 actual=$2
+	for FILE in "$expect" "$actual"
 	do
-		grep -v "IN: clean" $FILE >$FILE.tmp
-		cat $FILE.tmp >$FILE
+		grep -v "IN: clean" "$FILE" >"$FILE.tmp" &&
+		cat "$FILE.tmp" >"$FILE"
 	done &&
-	test_cmp $@
+	test_cmp "$expect" "$actual"
 }
 
 # Check that the contents of two files are equal and that their rot13 version
@@ -395,7 +390,7 @@ test_expect_success PERL 'required process filter should filter data' '
 			IN: clean testsubdir/test3 '\''sq'\'',\$x.r $S3 [OK] -- OUT: $S3 . [OK]
 			STOP
 		EOF
-		test_cmp_count_except_clean expected.log rot13-filter.log &&
+		test_cmp_count expected.log rot13-filter.log &&
 
 		rm -f test2.r "testsubdir/test3 '\''sq'\'',\$x.r" &&


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v10 13/14] convert: add filter.<driver>.process option
  2016-10-11  8:11     ` Lars Schneider
@ 2016-10-11 10:09       ` Torsten Bögershausen
  2016-10-16 23:13         ` Lars Schneider
  2016-10-17 17:05         ` Junio C Hamano
  0 siblings, 2 replies; 34+ messages in thread
From: Torsten Bögershausen @ 2016-10-11 10:09 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Junio C Hamano, git, Jakub Narębski, peff

On Tue, Oct 11, 2016 at 10:11:22AM +0200, Lars Schneider wrote:
> 
> > On 10 Oct 2016, at 21:58, Junio C Hamano <gitster@pobox.com> wrote:
> > 
> > larsxschneider@gmail.com writes:
> > 
> > [...]
> >> +# Count unique lines except clean invocations in two files and compare
> >> +# them. Clean invocations are not counted because their number can vary.
> >> +# c.f. http://public-inbox.org/git/xmqqshv18i8i.fsf@gitster.mtv.corp.google.com/
> >> +test_cmp_count_except_clean () {
> >> +	for FILE in $@
> >> +	do
> >> +		sort $FILE | uniq -c | sed "s/^[ ]*//" |
> >> +			sed "s/^\([0-9]\) IN: clean/x IN: clean/" >$FILE.tmp
> >> +		cat $FILE.tmp >$FILE
> >> +	done &&
> >> +	test_cmp $@
> >> +}
> > 
> > Why do you even _care_ about the number of invocations?  While I
> > told you why "clean" could be called multiple times under racy Git
> > as an example, that was not meant to be an exhaustive example.  I
> > wouldn't be surprised if we needed to run smudge twice, for example,
> > in some weirdly racy cases in the future.
> > 
> > Can we just have the correctness (i.e. "we expect that the working
> > tree file gets this as the result of checking it out, and we made
> > sure that is the case") test without getting into such an
> > implementation detail?
> 
> My goal is to check that clean/smudge is invoked at least once. I could
> just run `uniq` to achieve that but then all other filter commands could
> happen multiple times and the test would not detect that.
> 
> I also prefer to check the filter commands to ensure the filter is 
> working as expected (e.g. no multiple start ups etc) in addition to 
> checking the working tree.
> 
> Would the patch below work for you? If yes, then please squash it into
> "convert: add filter.<driver>.process option".
> 
> Thank you,
> Lars
> 
> 
> 
> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
> index 9f892c0..714f706 100755
> --- a/t/t0021-conversion.sh
> +++ b/t/t0021-conversion.sh
> @@ -31,38 +31,33 @@ filter_git () {
>  	rm -f git-stderr.log
>  }
>  
> -# Count unique lines in two files and compare them.
> -test_cmp_count () {
> -	for FILE in $@
> -	do
> -		sort $FILE | uniq -c | sed "s/^[ ]*//" >$FILE.tmp
> -		cat $FILE.tmp >$FILE
> -	done &&
> -	test_cmp $@
> -}
> -
> -# Count unique lines except clean invocations in two files and compare
> -# them. Clean invocations are not counted because their number can vary.
> +# Compare two files and ensure that `clean` and `smudge` respectively are
> +# called at least once if specified in the `expect` file. The actual
> +# invocation count is not relevant because their number can vary.
>  # c.f. http://public-inbox.org/git/xmqqshv18i8i.fsf@gitster.mtv.corp.google.com/
> -test_cmp_count_except_clean () {
> -	for FILE in $@

> +test_cmp_count () {
> +	expect=$1 actual=$2

That could be 
expect="$1"
actual="$2"

> +	for FILE in "$expect" "$actual"
>  	do

> +		sort "$FILE" | uniq -c | sed "s/^[ ]*//" |
> +			sed "s/^\([0-9]\) IN: clean/x IN: clean/" |
> +			sed "s/^\([0-9]\) IN: smudge/x IN: smudge/" >"$FILE.tmp" &&
> +		cat "$FILE.tmp" >"$FILE"

How about 
		cp "$FILE.tmp" "$FILE"


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v10 04/14] run-command: add clean_on_exit_handler
  2016-10-08 11:25 ` [PATCH v10 04/14] run-command: add clean_on_exit_handler larsxschneider
@ 2016-10-11 12:12   ` Johannes Schindelin
  2016-10-15 15:02     ` Lars Schneider
  0 siblings, 1 reply; 34+ messages in thread
From: Johannes Schindelin @ 2016-10-11 12:12 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, gitster, jnareb, peff

Hi Lars,

On Sat, 8 Oct 2016, larsxschneider@gmail.com wrote:

> @@ -31,6 +32,15 @@ static void cleanup_children(int sig, int in_signal)
>  	while (children_to_clean) {
>  		struct child_to_clean *p = children_to_clean;
>  		children_to_clean = p->next;
> +
> +		if (p->process && !in_signal) {
> +			struct child_process *process = p->process;
> +			if (process->clean_on_exit_handler) {
> +				trace_printf("trace: run_command: running exit handler for pid %d", p->pid);

On Windows, pid_t translates to long long int, resulting in this build
error:

-- snip --
 In file included from cache.h:10:0,
                  from run-command.c:1:
 run-command.c: In function 'cleanup_children':
 run-command.c:39:18: error: format '%d' expects argument of type 'int', but argument 5 has type 'pid_t {aka long long int}' [-Werror=format=]
      trace_printf("trace: run_command: running exit handler for pid %d", p->pid);
                   ^
 trace.h:81:53: note: in definition of macro 'trace_printf'
   trace_printf_key_fl(TRACE_CONTEXT, __LINE__, NULL, __VA_ARGS__)
                                                      ^~~~~~~~~~~
 cc1.exe: all warnings being treated as errors
 make: *** [Makefile:1987: run-command.o] Error 1
-- snap --

Maybe use PRIuMAX as we do elsewhere (see output of `git grep
printf.*pid`):

	trace_printf("trace: run_command: running exit handler for pid %"
		     PRIuMAX, (uintmax_t)p->pid);

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v10 13/14] convert: add filter.<driver>.process option
  2016-10-09  5:32     ` Torsten Bögershausen
@ 2016-10-11 15:29       ` Lars Schneider
  0 siblings, 0 replies; 34+ messages in thread
From: Lars Schneider @ 2016-10-11 15:29 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Jakub Narębski, git, Junio C Hamano, Jeff King


> On 09 Oct 2016, at 07:32, Torsten Bögershausen <tboegi@web.de> wrote:
> 
> On 09.10.16 01:06, Jakub Narębski wrote:
>>> +------------------------
>>>> +packet:          git< status=abort
>>>> +packet:          git< 0000
>>>> +------------------------
>>>> +
>>>> +After the filter has processed a blob it is expected to wait for
>>>> +the next "key=value" list containing a command. Git will close
>>>> +the command pipe on exit. The filter is expected to detect EOF
>>>> +and exit gracefully on its own.
>> Any "kill filter" solutions should probably be put here.  I guess
>> that filter exiting means EOF on its standard output when read
>> by Git command, isn't it?
>> 
> Isn't it that Git closes the command pipe, then filter sees EOF on it's stdin
> 
> and does a graceful exit.

Correct!

- Lars

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v10 13/14] convert: add filter.<driver>.process option
  2016-10-08 23:06   ` Jakub Narębski
  2016-10-09  5:32     ` Torsten Bögershausen
@ 2016-10-11 22:26     ` Lars Schneider
  2016-10-12 10:54       ` Jakub Narębski
  1 sibling, 1 reply; 34+ messages in thread
From: Lars Schneider @ 2016-10-11 22:26 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: git, Junio C Hamano, Jeff King


> On 09 Oct 2016, at 01:06, Jakub Narębski <jnareb@gmail.com> wrote:
> 
> Part 1 of review, starting with the protocol v2 itself.
> 
> W dniu 08.10.2016 o 13:25, larsxschneider@gmail.com pisze:
>> From: Lars Schneider <larsxschneider@gmail.com>
>> 
>> +upon checkin. By default these commands process only a single
>> +blob and terminate.  If a long running `process` filter is used
>> +in place of `clean` and/or `smudge` filters, then Git can process
>> +all blobs with a single filter command invocation for the entire
>> +life of a single Git command, for example `git add --all`.  See
>> +section below for the description of the protocol used to
>> +communicate with a `process` filter.
> 
> I don't remember how this part looked like in previous versions
> of this patch series, but "... is used in place of `clean` ..."
> does not tell explicitly about the precedence of those 
> configuration variables.  I think it should be stated explicitly
> that `process` takes precedence over any `clean` and/or `smudge`
> settings for the same `filter.<driver>` (regardless of whether
> the long running `process` filter support "clean" and/or "smudge"
> operations or not).

This is stated explicitly later on. I moved it up here:

"If a long running `process` filter is used
in place of `clean` and/or `smudge` filters, then Git can process
all blobs with a single filter command invocation for the entire
life of a single Git command, for example `git add --all`. If a 
long running `process` filter is configured then it always takes 
precedence over a configured single blob filter. "

OK?


>> +If the filter command (a string value) is defined via
>> +`filter.<driver>.process` then Git can process all blobs with a
>> +single filter invocation for the entire life of a single Git
>> +command. This is achieved by using a packet format (pkt-line,
>> +see technical/protocol-common.txt) based protocol over standard
>> +input and standard output as follows. All packets, except for the
>> +"*CONTENT" packets and the "0000" flush packet, are considered
>> +text and therefore are terminated by a LF.
> 
> Maybe s/standard input and output/\& of filter process,/ (that is,
> add "... of filter process," to the third sentence in the above
> paragraph).

You mean "This is achieved by using a packet format (pkt-line,
see technical/protocol-common.txt) based protocol over standard
input and standard output of filter process as follows." ?

I think I like the original version better.


>> After the filter started
> Git sends a welcome message ("git-filter-client"), a list of
>> supported protocol version numbers, and a flush packet. Git expects
>> +to read a welcome response message ("git-filter-server") and exactly
>> +one protocol version number from the previously sent list. All further
>> +communication will be based on the selected version. The remaining
>> +protocol description below documents "version=2". Please note that
>> +"version=42" in the example below does not exist and is only there
>> +to illustrate how the protocol would look like with more than one
>> +version.
>> +
>> +After the version negotiation Git sends a list of all capabilities that
>> +it supports and a flush packet. Git expects to read a list of desired
>> +capabilities, which must be a subset of the supported capabilities list,
>> +and a flush packet as response:
>> +------------------------
>> +packet:          git> git-filter-client
>> +packet:          git> version=2
>> +packet:          git> version=42
>> +packet:          git> 0000
>> +packet:          git< git-filter-server
>> +packet:          git< version=2
>> +packet:          git> clean=true
>> +packet:          git> smudge=true
>> +packet:          git> not-yet-invented=true
>> +packet:          git> 0000
>> +packet:          git< clean=true
>> +packet:          git< smudge=true
>> +packet:          git< 0000
> 
> WARNING: This example is different from description!!!

Can you try to explain the difference more clearly? I read it multiple
times and I think this is sound.


> In example you have Git sending "git-filter-client" and list of supported
> protocol versions, terminated with flush packet,

Correct.


> then filter driver
> process sends "git-filter-server", exactly one version, *AND* list of
> supported capabilities in "<capability>=true" format, terminated with
> flush packet.

Correct. That's what I read in the text and in the example.

> 
> In description above the example you have 4-part handshake, not 3-part;
> the filter is described to send list of supported capabilities last
> (a subset of what Git command supports).

Part 1: Git sends a welcome message...
Part 2: Git expects to read a welcome response message...
Part 3: After the version negotiation Git sends a list of all capabilities...
Part 4: Git expects to read a list of desired capabilities...

I think example and text match, no?


> Moreover in the example in
> previous version at least as far as v8 of this series, the response
> from filter driver was fixed length list of two lines: magic string
> "git-filter-server" and exactly one line with protocol version; this
> part was *not* terminated with a flush packet (complicating code of
> filter driver program a bit, I think).
> 
> I think this version of protocol is *better*, just the text needs to
> be updated to match.  I wanted to propose something like this in v9,...

I didn't change that behavior since v8:
packet:          git< git-filter-server
packet:          git< version=2


> By the way, now I look at it, the argument for using the
> "<capability>=true" format instead of "capability=<capability>"
> (or "supported-command=<capability>") is weak.  The argument for
> using "<variable>=<value>" to make it easier to implement parsing
> is sound, but the argument for "<capability>=true" is weak.
> 
> The argument was that with "<capability>=true" one can simply
> parse metadata into hash / dictionary / hashmap, and choose
> response based on that.  Hash / hashmap / associative array
> needs different keys, so the reasoning went for "<capability>=true"
> over "capability=<capability>"... but the filter process still
> needs to handle lines with repeating keys, namely "version=<N>"
> lines!
> 
> So the argument doesn't hold water IMVHO, and we can choose
> version which reads better / is more natural.

I have to agree that "capability=<capability>" might read a
little bit nicer. However, Peff suggested "<capability>=true" 
as his preference and this is absolutely OK with me.

I am happy to change that if a second reviewer shares your
opinion.


>> +Afterwards Git sends a list of "key=value" pairs terminated with
>> +a flush packet. The list will contain at least the filter command
>> +(based on the supported capabilities) and the pathname of the file
>> +to filter relative to the repository root. Right after these packets
> 
> I think you meant here "right after the flush packet", isn't it?
> It would be more explicit.

I feel "right after these packets" reads better, but I agree that your
version is more explicit. I will change it.


>>                                                     Finally, a
>> +second list of "key=value" pairs terminated with a flush packet
>> +is expected. The filter can change the status in the second list.
> 
> I would add here, to be more explicit:
> 
> This second list of "key=value" pairs may be empty, and usually
> would be if there is nothing wrong with response or filter; the
> terminating flush packet must be here regardless.
> 
> Or something like that.  The above proposal could be certainly
> improved.

How about this:

"Finally, a
second list of "key=value" pairs terminated with a flush packet
is expected. The filter can change the status in the second list
or keep the status as is with an empty list. Please note that the
empty list must be terminated with a flush packet regardless."

TBH I like the original version and I wonder if the new version
is redundant?!


>> +------------------------
>> +packet:          git< status=success
>> +packet:          git< 0000
>> +packet:          git< SMUDGED_CONTENT
>> +packet:          git< 0000
>> +packet:          git< 0000  # empty list, keep "status=success" unchanged!
> 
> All right, looks good.  Is this exclamation mark "!" necessary / wanted?

Yes, to draw the attention towards the two flushes.


>> +------------------------
>> +
>> +If the result content is empty then the filter is expected to respond
>> +with a "success" status and an empty list.
> 
> Actually, it is empty content, not empty list; that is response (filter
> output) composed entirely of flush packet.

Correct!

"If the result content is empty then the filter is expected to respond
with a "success" status and a flush packet to signal the empty content."

Better?

> 
>> +------------------------
>> +packet:          git< status=error
>> +packet:          git< 0000
>> +------------------------
>> +
>> +If the filter experiences an error during processing, then it can
>> +send the status "error" after the content was (partially or
>> +completely) sent. Depending on the `filter.<driver>.required` flag
>> +Git will interpret that as error but it will not stop or restart the
>> +filter process.
> 
> Errr... this is literal repetition.  You need to decide whether to
> put it before example, or after example.  Or maybe split it.

Agreed. I removed the repetition and changed the previous paragraph
to:

"In case the filter cannot or does not want to process the content,
it is expected to respond with an "error" status. Git will handle 
the "error" status according to the `filter.<driver>.required` flag
but it will not stop or restart the filter process."


>> +------------------------
>> +packet:          git< status=success
>> +packet:          git< 0000
>> +packet:          git< HALF_WRITTEN_ERRONEOUS_CONTENT
>> +packet:          git< 0000
>> +packet:          git< status=error
>> +packet:          git< 0000
>> +------------------------
>> +
>> +If the filter dies during the communication or does not adhere to
>> +the protocol then Git will stop the filter process and restart it
>> +with the next file that needs to be processed. Depending on the
>> +`filter.<driver>.required` flag Git will interpret that as error.
> 
> Uhh... until now the order was explanation, then example.  From the
> duplicated description above, it is now first example, then
> description.  Consistency would be good.

OK, I moved that down after the EOF exit explanation.


>> +The error handling for all cases above mimic the behavior of
>> +the `filter.<driver>.clean` / `filter.<driver>.smudge` error
>> +handling.
> 
> You have "error handling" repeated here.

True. That might not be nice from a stylistic point of view but it is
precise, no?


>> +------------------------
>> +packet:          git< status=abort
>> +packet:          git< 0000
>> +------------------------
>> +
>> +After the filter has processed a blob it is expected to wait for
>> +the next "key=value" list containing a command. Git will close
>> +the command pipe on exit. The filter is expected to detect EOF
>> +and exit gracefully on its own.
> 
> Any "kill filter" solutions should probably be put here.

Agreed.


> I guess
> that filter exiting means EOF on its standard output when read
> by Git command, isn't it?

Yes, but at this point Git is not listing anymore.


>> +If you develop your own long running filter
>> +process then the `GIT_TRACE_PACKET` environment variables can be
>> +very helpful for debugging (see linkgit:git[1]).
> 
> s/environment variables/environment variable/  - there is only
> one GIT_TRACE_PACKET.  Unless you wanted to write about GIT_TRACE?

Agreed.


Thanks for the review,
Lars

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v10 13/14] convert: add filter.<driver>.process option
  2016-10-11 22:26     ` Lars Schneider
@ 2016-10-12 10:54       ` Jakub Narębski
  2016-10-15 14:45         ` Lars Schneider
  0 siblings, 1 reply; 34+ messages in thread
From: Jakub Narębski @ 2016-10-12 10:54 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, Junio C Hamano, Jeff King

W dniu 12.10.2016 o 00:26, Lars Schneider pisze: 
>> On 09 Oct 2016, at 01:06, Jakub Narębski <jnareb@gmail.com> wrote:
>>
>> Part 1 of review, starting with the protocol v2 itself.
>>
>> W dniu 08.10.2016 o 13:25, larsxschneider@gmail.com pisze:
>>> From: Lars Schneider <larsxschneider@gmail.com>
>>>
>>> +upon checkin. By default these commands process only a single
>>> +blob and terminate.  If a long running `process` filter is used
>>> +in place of `clean` and/or `smudge` filters, then Git can process
>>> +all blobs with a single filter command invocation for the entire
>>> +life of a single Git command, for example `git add --all`.  See
>>> +section below for the description of the protocol used to
>>> +communicate with a `process` filter.
>>
>> I don't remember how this part looked like in previous versions
>> of this patch series, but "... is used in place of `clean` ..."
>> does not tell explicitly about the precedence of those 
>> configuration variables.  I think it should be stated explicitly
>> that `process` takes precedence over any `clean` and/or `smudge`
>> settings for the same `filter.<driver>` (regardless of whether
>> the long running `process` filter support "clean" and/or "smudge"
>> operations or not).
> 
> This is stated explicitly later on. I moved it up here:
> 
> "If a long running `process` filter is used
> in place of `clean` and/or `smudge` filters, then Git can process
> all blobs with a single filter command invocation for the entire
> life of a single Git command, for example `git add --all`. If a 
> long running `process` filter is configured then it always takes 
> precedence over a configured single blob filter. "
> 
> OK?

Looks good to me.

I think this information about precedence between one-shot `clean`
and `smudge` filter driver configuration, and multi-file `process`
filter driver should be here for two reasons.

First, if one is interested in running filter, but do not want to
write one (he or she uses existing tool, for example one of
existing LFS solutions), one can skip the "Long Running Filter
Process" section.  But one still needs to know if to remove or
comment out old `clean` and `smudge` config, or how to provide
fallback for older Git (if one uses the same configuration with
pre-process Git and Git including support for this feature).

Second, the configuration belongs, in my opinion, here.  It is
not a part of long running filter protocol.

>>> +If the filter command (a string value) is defined via
>>> +`filter.<driver>.process` then Git can process all blobs with a
>>> +single filter invocation for the entire life of a single Git
>>> +command. This is achieved by using a packet format (pkt-line,
>>> +see technical/protocol-common.txt) based protocol over standard
>>> +input and standard output as follows. All packets, except for the
>>> +"*CONTENT" packets and the "0000" flush packet, are considered
>>> +text and therefore are terminated by a LF.
>>
>> Maybe s/standard input and output/\& of filter process,/ (that is,
>> add "... of filter process," to the third sentence in the above
>> paragraph).
> 
> You mean "This is achieved by using a packet format (pkt-line,
> see technical/protocol-common.txt) based protocol over standard
> input and standard output of filter process as follows." ?

Yes.

> I think I like the original version better.

Well, I think it is better to err out on the side of being more
explicit.

> 
>>> After the filter started
>>> Git sends a welcome message ("git-filter-client"), a list of
>>> supported protocol version numbers, and a flush packet. Git expects
>>> +to read a welcome response message ("git-filter-server") and exactly
>>> +one protocol version number from the previously sent list. All further
>>> +communication will be based on the selected version. The remaining
>>> +protocol description below documents "version=2". Please note that
>>> +"version=42" in the example below does not exist and is only there
>>> +to illustrate how the protocol would look like with more than one
>>> +version.
>>> +
>>> +After the version negotiation Git sends a list of all capabilities that
>>> +it supports and a flush packet. Git expects to read a list of desired
>>> +capabilities, which must be a subset of the supported capabilities list,
>>> +and a flush packet as response:
>>> +------------------------
>>> +packet:          git> git-filter-client
>>> +packet:          git> version=2
>>> +packet:          git> version=42
>>> +packet:          git> 0000
>>> +packet:          git< git-filter-server
>>> +packet:          git< version=2
>>> +packet:          git> clean=true
>>> +packet:          git> smudge=true
>>> +packet:          git> not-yet-invented=true
>>> +packet:          git> 0000
>>> +packet:          git< clean=true
>>> +packet:          git< smudge=true
>>> +packet:          git< 0000
>>
>> WARNING: This example is different from description!!!
> 
> Can you try to explain the difference more clearly? I read it multiple
> times and I think this is sound.

I'm sorry it was *my mistake*.  I have read the example exchange wrong.

On the other hand that means that I have other comment, which I though
was addressed already in v10, namely that not all exchanges ends with
flush packet (inconsistency, and I think a bit of lack of extendability).

>> In example you have Git sending "git-filter-client" and list of supported
>> protocol versions, terminated with flush packet,
> 
> Correct.

[thinking out loud]

And this serves as a 'canary' to detect single-shot driver mis-configured
to serve as multi-file filter driver.
 
>> then filter driver
>> process sends "git-filter-server", exactly one version, *AND* list of
>> supported capabilities in "<capability>=true" format, terminated with
>> flush packet.
> 
> Correct. That's what I read in the text and in the example.

Actually, the text reads that filter driver sends two lines: a line with
magic signature "git-filter-server", and exactly one line with protocol
version "version=2", *WITHOUT* terminating flush packet.

The example reads the same, I have just missed change of prefix from
"git<" to "git>" (that is "<" to mark response from filter, to ">" to
mark signal from Git).

So the text and example agrees, just me (and now you) misread the
example ;-/


IMHO this exchange should be also terminated with a flush packet,
even if in protocol version 2 it is fixed length list, and doesn't
strictly need it.

First, it would make easier to implement the filter driver process.
You would need only one 'read until flush' helper function, and two
higher-level functions: one for handling metadata, one for handling
contents (where handling = sending or receiving).  Currently first
data send from filter is a bit of special case: you need to send
two pkt-lines, not send this list of lines and terminate with flush.

Second, it would allow for additional possibilities for new versions
and extending protocol, either 3-part handshake (but now I think that
4-part is better, at least in some cases), or some other "early start"
extension.  OTOH we could stuff this data in additional exchange
(assuming new protocol version), and unless the exchange data goes
through slow channel (e.g. network), it shouldn't matter for the
latency that we have one more exchange.

Third, as we can see first from my error, then from yours, it would
make it easier to debug the protocol...

>>
>> In description above the example you have 4-part handshake, not 3-part;
>> the filter is described to send list of supported capabilities last
>> (a subset of what Git command supports).
> 
> Part 1: Git sends a welcome message...
> Part 2: Git expects to read a welcome response message...
> Part 3: After the version negotiation Git sends a list of all capabilities...
> Part 4: Git expects to read a list of desired capabilities...
> 
> I think example and text match, no?

Yes, it does; as I have said already, I have misread the example. 

Anyway, in some cases 4-way handshake, where Git sends list of
supported capabilities first, is better.  If the protocol has
to prepare something for each of capabilities, and perhaps check
those preparation status, it can do it after Git sends what it
could need, and before it sends what it does support.

Though it looks a bit strange that client (as Git is client here)
sends its capabilities first...

>> Moreover in the example in
>> previous version at least as far as v8 of this series, the response
>> from filter driver was fixed length list of two lines: magic string
>> "git-filter-server" and exactly one line with protocol version; this
>> part was *not* terminated with a flush packet (complicating code of
>> filter driver program a bit, I think).
>>
>> I think this version of protocol is *better*, just the text needs to
>> be updated to match.  I wanted to propose something like this in v9,...
> 
> I didn't change that behavior since v8:
> packet:          git< git-filter-server
> packet:          git< version=2

Right. 

>> By the way, now I look at it, the argument for using the
>> "<capability>=true" format instead of "capability=<capability>"
>> (or "supported-command=<capability>") is weak.  The argument for
>> using "<variable>=<value>" to make it easier to implement parsing
>> is sound, but the argument for "<capability>=true" is weak.
>>
>> The argument was that with "<capability>=true" one can simply
>> parse metadata into hash / dictionary / hashmap, and choose
>> response based on that.  Hash / hashmap / associative array
>> needs different keys, so the reasoning went for "<capability>=true"
>> over "capability=<capability>"... but the filter process still
>> needs to handle lines with repeating keys, namely "version=<N>"
>> lines!
>>
>> So the argument doesn't hold water IMVHO, and we can choose
>> version which reads better / is more natural.
> 
> I have to agree that "capability=<capability>" might read a
> little bit nicer. However, Peff suggested "<capability>=true" 
> as his preference and this is absolutely OK with me.

From what I remember it was Peff stating that he thinks "<foo>=true"
is easier for parsing (it is, but we still need to support the harder
way parsing anyway), and offered that "<foo>" is good enough (if less
consistent).

> I am happy to change that if a second reviewer shares your
> opinion.

Also, with "capability=<foo>" we can be more self descriptive,
for example "supported-command=<foo>"; though "capability" is good
enough for me.

For example

 packet:          git> wants=clean
 packet:          git> wants=smudge
 packet:          git> wants=size
 packet:          git> 0000
 packet:          git< supports=clean
 packet:          git< supports=smudge
 packet:          git< 0000

Though coming up with good names is hard; and as I said "capability"
is good enough; OTOH with "smudge=true" etc. we don't need to come
up with good name at all... though I wonder if it is a good thing `\_o,_/

>>> +Afterwards Git sends a list of "key=value" pairs terminated with
>>> +a flush packet. The list will contain at least the filter command
>>> +(based on the supported capabilities) and the pathname of the file
>>> +to filter relative to the repository root. Right after these packets
>>
>> I think you meant here "right after the flush packet", isn't it?
>> It would be more explicit.
> 
> I feel "right after these packets" reads better, but I agree that your
> version is more explicit. I will change it.

Thanks.  That doesn't matter much, but it matters.

Though it could go either way.

>>>                                                     Finally, a
>>> +second list of "key=value" pairs terminated with a flush packet
>>> +is expected. The filter can change the status in the second list.
>>
>> I would add here, to be more explicit:
>>
>> This second list of "key=value" pairs may be empty, and usually
>> would be if there is nothing wrong with response or filter; the
>> terminating flush packet must be here regardless.
>>
>> Or something like that.  The above proposal could be certainly
>> improved.
> 
> How about this:
> 
> "Finally, a
> second list of "key=value" pairs terminated with a flush packet
> is expected. The filter can change the status in the second list
> or keep the status as is with an empty list. Please note that the
> empty list must be terminated with a flush packet regardless."
> 
> TBH I like the original version and I wonder if the new version
> is redundant?!

I'm a bit unsure.  Original reads better and is shorter; the new
proposal is more explicit, but also more repetitive and longer.

>>> +------------------------
>>> +packet:          git< status=success
>>> +packet:          git< 0000
>>> +packet:          git< SMUDGED_CONTENT
>>> +packet:          git< 0000
>>> +packet:          git< 0000  # empty list, keep "status=success" unchanged!
>>
>> All right, looks good.  Is this exclamation mark "!" necessary / wanted?
> 
> Yes, to draw the attention towards the two flushes.

O.K. though shouldn't it be after "empty list", then?

>>> +------------------------
>>> +
>>> +If the result content is empty then the filter is expected to respond
>>> +with a "success" status and an empty list.
>>
>> Actually, it is empty content, not empty list; that is response (filter
>> output) composed entirely of flush packet.
> 
> Correct!
> 
> "If the result content is empty then the filter is expected to respond
> with a "success" status and a flush packet to signal the empty content."
> 
> Better?

Better, I think.

>>
>>> +------------------------
>>> +packet:          git< status=error
>>> +packet:          git< 0000
>>> +------------------------
>>> +
>>> +If the filter experiences an error during processing, then it can
>>> +send the status "error" after the content was (partially or
>>> +completely) sent. Depending on the `filter.<driver>.required` flag
>>> +Git will interpret that as error but it will not stop or restart the
>>> +filter process.
>>
>> Errr... this is literal repetition.  You need to decide whether to
>> put it before example, or after example.  Or maybe split it.
> 
> Agreed. I removed the repetition and changed the previous paragraph
> to:
> 
> "In case the filter cannot or does not want to process the content,
> it is expected to respond with an "error" status. Git will handle 
> the "error" status according to the `filter.<driver>.required` flag
> but it will not stop or restart the filter process."

All right, I think. 

>>> +------------------------
>>> +packet:          git< status=success
>>> +packet:          git< 0000
>>> +packet:          git< HALF_WRITTEN_ERRONEOUS_CONTENT
>>> +packet:          git< 0000
>>> +packet:          git< status=error
>>> +packet:          git< 0000
>>> +------------------------
>>> +
>>> +If the filter dies during the communication or does not adhere to
>>> +the protocol then Git will stop the filter process and restart it
>>> +with the next file that needs to be processed. Depending on the
>>> +`filter.<driver>.required` flag Git will interpret that as error.
>>
>> Uhh... until now the order was explanation, then example.  From the
>> duplicated description above, it is now first example, then
>> description.  Consistency would be good.
> 
> OK, I moved that down after the EOF exit explanation.

Good. 
 
>>> +The error handling for all cases above mimic the behavior of
>>> +the `filter.<driver>.clean` / `filter.<driver>.smudge` error
>>> +handling.
>>
>> You have "error handling" repeated here.
> 
> True. That might not be nice from a stylistic point of view but it is
> precise, no?
 
All right, though you could also write it as "mimic what the ...
do in those cases"; I'm not sure if its better or worse.

>>> +------------------------
>>> +packet:          git< status=abort
>>> +packet:          git< 0000
>>> +------------------------
>>> +
>>> +After the filter has processed a blob it is expected to wait for
>>> +the next "key=value" list containing a command. Git will close
>>> +the command pipe on exit. The filter is expected to detect EOF
>>> +and exit gracefully on its own.
>>
>> Any "kill filter" solutions should probably be put here.
> 
> Agreed. 
> 
>> I guess
>> that filter exiting means EOF on its standard output when read
>> by Git command, isn't it?
> 
> Yes, but at this point Git is not listening anymore.

I think it might be good idea to have here the information about
what filter process should do if it needs maybe lengthy closing
process, to not hold/stop Git command or to not be killed.

>>> +If you develop your own long running filter
>>> +process then the `GIT_TRACE_PACKET` environment variables can be
>>> +very helpful for debugging (see linkgit:git[1]).
>>
>> s/environment variables/environment variable/  - there is only
>> one GIT_TRACE_PACKET.  Unless you wanted to write about GIT_TRACE?
> 
> Agreed.
> 
> 
> Thanks for the review,

You are welcome.

Thanks for working on this series,
-- 
Jakub Narębski


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v10 13/14] convert: add filter.<driver>.process option
  2016-10-12 10:54       ` Jakub Narębski
@ 2016-10-15 14:45         ` Lars Schneider
  2016-10-15 17:41           ` Jeff King
  2016-10-15 19:42           ` Jakub Narębski
  0 siblings, 2 replies; 34+ messages in thread
From: Lars Schneider @ 2016-10-15 14:45 UTC (permalink / raw)
  To: Jakub Narębski, Jeff King; +Cc: git, Junio C Hamano

@Peff: If you have time, it would be great if you could comment on
one question below prefixed with "@Peff". Thanks!


> On 12 Oct 2016, at 03:54, Jakub Narębski <jnareb@gmail.com> wrote:
> 
> W dniu 12.10.2016 o 00:26, Lars Schneider pisze: 
>>> On 09 Oct 2016, at 01:06, Jakub Narębski <jnareb@gmail.com> wrote:
>>>> 
>> 
>>>> After the filter started
>>>> Git sends a welcome message ("git-filter-client"), a list of
>>>> supported protocol version numbers, and a flush packet. Git expects
>>>> +to read a welcome response message ("git-filter-server") and exactly
>>>> +one protocol version number from the previously sent list. All further
>>>> +communication will be based on the selected version. The remaining
>>>> +protocol description below documents "version=2". Please note that
>>>> +"version=42" in the example below does not exist and is only there
>>>> +to illustrate how the protocol would look like with more than one
>>>> +version.
>>>> +
>>>> +After the version negotiation Git sends a list of all capabilities that
>>>> +it supports and a flush packet. Git expects to read a list of desired
>>>> +capabilities, which must be a subset of the supported capabilities list,
>>>> +and a flush packet as response:
>>>> +------------------------
>>>> +packet:          git> git-filter-client
>>>> +packet:          git> version=2
>>>> +packet:          git> version=42
>>>> +packet:          git> 0000
>>>> +packet:          git< git-filter-server
>>>> +packet:          git< version=2
>>>> +packet:          git> clean=true
>>>> +packet:          git> smudge=true
>>>> +packet:          git> not-yet-invented=true
>>>> +packet:          git> 0000
>>>> +packet:          git< clean=true
>>>> +packet:          git< smudge=true
>>>> +packet:          git< 0000
>>> 
>>> WARNING: This example is different from description!!!
>> 
>> Can you try to explain the difference more clearly? I read it multiple
>> times and I think this is sound.
> 
> I'm sorry it was *my mistake*.  I have read the example exchange wrong.
> 
> On the other hand that means that I have other comment, which I though
> was addressed already in v10, namely that not all exchanges ends with
> flush packet (inconsistency, and I think a bit of lack of extendability).

Well, this part of the protocol is not supposed to be extensible because
it is supposed to deal *only* with the version number. It needs to keep 
the same structure to ensure forward and backward compatibility.

However, for consistency sake I will add a flush packet.


>>> In description above the example you have 4-part handshake, not 3-part;
>>> the filter is described to send list of supported capabilities last
>>> (a subset of what Git command supports).
>> 
>> Part 1: Git sends a welcome message...
>> Part 2: Git expects to read a welcome response message...
>> Part 3: After the version negotiation Git sends a list of all capabilities...
>> Part 4: Git expects to read a list of desired capabilities...
>> 
>> I think example and text match, no?
> 
> Yes, it does; as I have said already, I have misread the example. 
> 
> Anyway, in some cases 4-way handshake, where Git sends list of
> supported capabilities first, is better.  If the protocol has
> to prepare something for each of capabilities, and perhaps check
> those preparation status, it can do it after Git sends what it
> could need, and before it sends what it does support.
> 
> Though it looks a bit strange that client (as Git is client here)
> sends its capabilities first...

Git tells the filter what it can do. Then the filter decides what
features it supports. I would prefer to keep it that way as I don't
see a strong advantage for the other way around.


>>> By the way, now I look at it, the argument for using the
>>> "<capability>=true" format instead of "capability=<capability>"
>>> (or "supported-command=<capability>") is weak.  The argument for
>>> using "<variable>=<value>" to make it easier to implement parsing
>>> is sound, but the argument for "<capability>=true" is weak.
>>> 
>>> The argument was that with "<capability>=true" one can simply
>>> parse metadata into hash / dictionary / hashmap, and choose
>>> response based on that.  Hash / hashmap / associative array
>>> needs different keys, so the reasoning went for "<capability>=true"
>>> over "capability=<capability>"... but the filter process still
>>> needs to handle lines with repeating keys, namely "version=<N>"
>>> lines!
>>> 
>>> So the argument doesn't hold water IMVHO, and we can choose
>>> version which reads better / is more natural.
>> 
>> I have to agree that "capability=<capability>" might read a
>> little bit nicer. However, Peff suggested "<capability>=true" 
>> as his preference and this is absolutely OK with me.
> 
> From what I remember it was Peff stating that he thinks "<foo>=true"
> is easier for parsing (it is, but we still need to support the harder
> way parsing anyway), and offered that "<foo>" is good enough (if less
> consistent).
> 
>> I am happy to change that if a second reviewer shares your
>> opinion.
> 
> Also, with "capability=<foo>" we can be more self descriptive,
> for example "supported-command=<foo>"; though "capability" is good
> enough for me.
> 
> For example
> 
> packet:          git> wants=clean
> packet:          git> wants=smudge
> packet:          git> wants=size
> packet:          git> 0000
> packet:          git< supports=clean
> packet:          git< supports=smudge
> packet:          git< 0000
> 
> Though coming up with good names is hard; and as I said "capability"
> is good enough; OTOH with "smudge=true" etc. we don't need to come
> up with good name at all... though I wonder if it is a good thing `\_o,_/

How about this (I borrowed these terms from contract negotiation)?

packet:          git> offers=clean
packet:          git> offers=smudge
packet:          git> offers=size
packet:          git> 0000
packet:          git< accepts=clean
packet:          git< accepts=smudge
packet:          git< 0000

@Peff: Would that be OK for you?


>>>> +------------------------
>>>> +packet:          git< status=abort
>>>> +packet:          git< 0000
>>>> +------------------------
>>>> +
>>>> +After the filter has processed a blob it is expected to wait for
>>>> +the next "key=value" list containing a command. Git will close
>>>> +the command pipe on exit. The filter is expected to detect EOF
>>>> +and exit gracefully on its own.
>>> 
>>> Any "kill filter" solutions should probably be put here.
>> 
>> Agreed. 
>> 
>>> I guess
>>> that filter exiting means EOF on its standard output when read
>>> by Git command, isn't it?
>> 
>> Yes, but at this point Git is not listening anymore.
> 
> I think it might be good idea to have here the information about
> what filter process should do if it needs maybe lengthy closing
> process, to not hold/stop Git command or to not be killed.

I've added:

"Git will wait until the filter process has stopped."


Thanks,
Lars

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v10 14/14] contrib/long-running-filter: add long running filter example
  2016-10-09  5:42   ` Torsten Bögershausen
@ 2016-10-15 14:47     ` Lars Schneider
  0 siblings, 0 replies; 34+ messages in thread
From: Lars Schneider @ 2016-10-15 14:47 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git, gitster, jnareb, peff


> On 08 Oct 2016, at 22:42, Torsten Bögershausen <tboegi@web.de> wrote:
> 
> On 08.10.16 13:25, larsxschneider@gmail.com wrote:
>> From: Lars Schneider <larsxschneider@gmail.com>
>> 
>> Add a simple pass-thru filter as example implementation for the Git
>> filter protocol version 2. See Documentation/gitattributes.txt, section
>> "Filter Protocol" for more info.
>> 
> 
> Nothing wrong with code in contrib.
> I may have missed parts of the discussion, was there a good reason to
> drop the test case completely?
> 
>> When adding a new feature, make sure that you have new tests to show
>> the feature triggers the new behavior when it should, and to show the
>> feature does not trigger when it shouldn't.  After any code change, make
>> sure that the entire test suite passes.
> 
> Or is there a plan to add them later ?

The test is part of the "main feature patch" 13/14:
http://public-inbox.org/git/20161008112530.15506-14-larsxschneider@gmail.com/

Cheers,
Lars

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v10 04/14] run-command: add clean_on_exit_handler
  2016-10-11 12:12   ` Johannes Schindelin
@ 2016-10-15 15:02     ` Lars Schneider
  2016-10-16  8:03       ` Johannes Schindelin
  0 siblings, 1 reply; 34+ messages in thread
From: Lars Schneider @ 2016-10-15 15:02 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, gitster, jnareb, peff


> On 11 Oct 2016, at 05:12, Johannes Schindelin <johannes.schindelin@gmx.de> wrote:
> 
> Hi Lars,
> 
> On Sat, 8 Oct 2016, larsxschneider@gmail.com wrote:
> 
>> @@ -31,6 +32,15 @@ static void cleanup_children(int sig, int in_signal)
>> 	while (children_to_clean) {
>> 		struct child_to_clean *p = children_to_clean;
>> 		children_to_clean = p->next;
>> +
>> +		if (p->process && !in_signal) {
>> +			struct child_process *process = p->process;
>> +			if (process->clean_on_exit_handler) {
>> +				trace_printf("trace: run_command: running exit handler for pid %d", p->pid);
> 
> On Windows, pid_t translates to long long int, resulting in this build
> error:
> 
> -- snip --
> In file included from cache.h:10:0,
>                  from run-command.c:1:
> run-command.c: In function 'cleanup_children':
> run-command.c:39:18: error: format '%d' expects argument of type 'int', but argument 5 has type 'pid_t {aka long long int}' [-Werror=format=]
>      trace_printf("trace: run_command: running exit handler for pid %d", p->pid);
>                   ^
> trace.h:81:53: note: in definition of macro 'trace_printf'
>   trace_printf_key_fl(TRACE_CONTEXT, __LINE__, NULL, __VA_ARGS__)
>                                                      ^~~~~~~~~~~
> cc1.exe: all warnings being treated as errors
> make: *** [Makefile:1987: run-command.o] Error 1
> -- snap --
> 
> Maybe use PRIuMAX as we do elsewhere (see output of `git grep
> printf.*pid`):
> 
> 	trace_printf("trace: run_command: running exit handler for pid %"
> 		     PRIuMAX, (uintmax_t)p->pid);

Thanks for hint! I'll change it!

However, I am building on Win 8.1 with your latest SDK and I cannot
reproduce the error. Any idea why that might be the case?

Thanks,
Lars


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v10 13/14] convert: add filter.<driver>.process option
  2016-10-15 14:45         ` Lars Schneider
@ 2016-10-15 17:41           ` Jeff King
  2016-10-15 19:42           ` Jakub Narębski
  1 sibling, 0 replies; 34+ messages in thread
From: Jeff King @ 2016-10-15 17:41 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Jakub Narębski, git, Junio C Hamano

On Sat, Oct 15, 2016 at 07:45:48AM -0700, Lars Schneider wrote:

> >> I have to agree that "capability=<capability>" might read a
> >> little bit nicer. However, Peff suggested "<capability>=true" 
> >> as his preference and this is absolutely OK with me.
> > 
> > From what I remember it was Peff stating that he thinks "<foo>=true"
> > is easier for parsing (it is, but we still need to support the harder
> > way parsing anyway), and offered that "<foo>" is good enough (if less
> > consistent).

I don't mind that much if you want to do it the other way. You are the
one writing the parsing/use code.

> > Also, with "capability=<foo>" we can be more self descriptive,
> > for example "supported-command=<foo>"; though "capability" is good
> > enough for me.
> > 
> > For example
> > 
> > packet:          git> wants=clean
> > packet:          git> wants=smudge
> > packet:          git> wants=size
> > packet:          git> 0000
> > packet:          git< supports=clean
> > packet:          git< supports=smudge
> > packet:          git< 0000
> > 
> > Though coming up with good names is hard; and as I said "capability"
> > is good enough; OTOH with "smudge=true" etc. we don't need to come
> > up with good name at all... though I wonder if it is a good thing `\_o,_/
> 
> How about this (I borrowed these terms from contract negotiation)?
> 
> packet:          git> offers=clean
> packet:          git> offers=smudge
> packet:          git> offers=size
> packet:          git> 0000
> packet:          git< accepts=clean
> packet:          git< accepts=smudge
> packet:          git< 0000
> 
> @Peff: Would that be OK for you?

Is it always an offers/accepts relationship? Can the response say "you
did not ask about <foo>, but just so you know I support it"?

I cannot think offhand of an example, but at the same time, if you leave
the terms as generic as possible, you do not end up later with words
that do not make sense. It is trading off one problem now (vagueness of
the protocol terms) for a potential one later (words that have a
specific meaning, but one that is not accurate).

I don't have a strong preference, though.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v10 13/14] convert: add filter.<driver>.process option
  2016-10-15 14:45         ` Lars Schneider
  2016-10-15 17:41           ` Jeff King
@ 2016-10-15 19:42           ` Jakub Narębski
  1 sibling, 0 replies; 34+ messages in thread
From: Jakub Narębski @ 2016-10-15 19:42 UTC (permalink / raw)
  To: Lars Schneider, Jeff King; +Cc: git, Junio C Hamano

W dniu 15.10.2016 o 16:45, Lars Schneider pisze:
>> On 12 Oct 2016, at 03:54, Jakub Narębski <jnareb@gmail.com> wrote:
>> W dniu 12.10.2016 o 00:26, Lars Schneider pisze: 
>>>> On 09 Oct 2016, at 01:06, Jakub Narębski <jnareb@gmail.com> wrote:
>>>>>
>>>
>>>>> After the filter started
>>>>> Git sends a welcome message ("git-filter-client"), a list of
>>>>> supported protocol version numbers, and a flush packet. Git expects
>>>>> +to read a welcome response message ("git-filter-server") and exactly
>>>>> +one protocol version number from the previously sent list. All further
>>>>> +communication will be based on the selected version. The remaining
>>>>> +protocol description below documents "version=2". Please note that
>>>>> +"version=42" in the example below does not exist and is only there
>>>>> +to illustrate how the protocol would look like with more than one
>>>>> +version.
>>>>> +
>>>>> +After the version negotiation Git sends a list of all capabilities that
>>>>> +it supports and a flush packet. Git expects to read a list of desired
>>>>> +capabilities, which must be a subset of the supported capabilities list,
>>>>> +and a flush packet as response:
>>>>> +------------------------
>>>>> +packet:          git> git-filter-client
>>>>> +packet:          git> version=2
>>>>> +packet:          git> version=42
>>>>> +packet:          git> 0000
>>>>> +packet:          git< git-filter-server
>>>>> +packet:          git< version=2
>>>>> +packet:          git> clean=true
>>>>> +packet:          git> smudge=true
>>>>> +packet:          git> not-yet-invented=true
>>>>> +packet:          git> 0000
>>>>> +packet:          git< clean=true
>>>>> +packet:          git< smudge=true
>>>>> +packet:          git< 0000
>>>>
>>>> WARNING: This example is different from description!!!
>>>
>>> Can you try to explain the difference more clearly? I read it multiple
>>> times and I think this is sound.
>>
>> I'm sorry it was *my mistake*.  I have read the example exchange wrong.
>>
>> On the other hand that means that I have other comment, which I though
>> was addressed already in v10, namely that not all exchanges ends with
>> flush packet (inconsistency, and I think a bit of lack of extendability).
> 
> Well, this part of the protocol is not supposed to be extensible because
> it is supposed to deal *only* with the version number. It needs to keep 
> the same structure to ensure forward and backward compatibility.
> 
> However, for consistency sake I will add a flush packet.

Thanks.  That is one thing I feel quite strongly about.

I can agree that extendability does not matter much here: we can always
change the version number.  But there might be some additional information
that filter process wants to send to Git in first exchange, and using
flush-terminated list here means that we don't need to change version
number, assuming that this additional information is advisory.

The consistency means in my opinion that it should be easier to implement
filter scripts.

>>>> In description above the example you have 4-part handshake, not 3-part;
>>>> the filter is described to send list of supported capabilities last
>>>> (a subset of what Git command supports).
>>>
>>> Part 1: Git sends a welcome message...
>>> Part 2: Git expects to read a welcome response message...
>>> Part 3: After the version negotiation Git sends a list of all capabilities...
>>> Part 4: Git expects to read a list of desired capabilities...
>>>
>>> I think example and text match, no?
>>
>> Yes, it does; as I have said already, I have misread the example. 
>>
>> Anyway, in some cases 4-way handshake, where Git sends list of
>> supported capabilities first, is better.  If the protocol has
>> to prepare something for each of capabilities, and perhaps check
>> those preparation status, it can do it after Git sends what it
>> could need, and before it sends what it does support.
>>
>> Though it looks a bit strange that client (as Git is client here)
>> sends its capabilities first...
> 
> Git tells the filter what it can do. Then the filter decides what
> features it supports. I would prefer to keep it that way as I don't
> see a strong advantage for the other way around.

I think the current order is good, no need to change it.
As I said it is better for Git to send capabilities first.
 

>>>> By the way, now I look at it, the argument for using the
>>>> "<capability>=true" format instead of "capability=<capability>"
>>>> (or "supported-command=<capability>") is weak.  The argument for
>>>> using "<variable>=<value>" to make it easier to implement parsing
>>>> is sound, but the argument for "<capability>=true" is weak.
>>>>
>>>> The argument was that with "<capability>=true" one can simply
>>>> parse metadata into hash / dictionary / hashmap, and choose
>>>> response based on that.  Hash / hashmap / associative array
>>>> needs different keys, so the reasoning went for "<capability>=true"
>>>> over "capability=<capability>"... but the filter process still
>>>> needs to handle lines with repeating keys, namely "version=<N>"
>>>> lines!
>>>>
>>>> So the argument doesn't hold water IMVHO, and we can choose
>>>> version which reads better / is more natural.
>>>
>>> I have to agree that "capability=<capability>" might read a
>>> little bit nicer. However, Peff suggested "<capability>=true" 
>>> as his preference and this is absolutely OK with me.
>>
>> From what I remember it was Peff stating that he thinks "<foo>=true"
>> is easier for parsing (it is, but we still need to support the harder
>> way parsing anyway), and offered that "<foo>" is good enough (if less
>> consistent).
>>
>>> I am happy to change that if a second reviewer shares your
>>> opinion.
>>
>> Also, with "capability=<foo>" we can be more self descriptive,
>> for example "supported-command=<foo>"; though "capability" is good
>> enough for me.
>>
>> For example
>>
>> packet:          git> wants=clean
>> packet:          git> wants=smudge
>> packet:          git> wants=size
>> packet:          git> 0000
>> packet:          git< supports=clean
>> packet:          git< supports=smudge
>> packet:          git< 0000
>>
>> Though coming up with good names is hard; and as I said "capability"
>> is good enough; OTOH with "smudge=true" etc. we don't need to come
>> up with good name at all... though I wonder if it is a good thing `\_o,_/
> 
> How about this (I borrowed these terms from contract negotiation)?
> 
> packet:          git> offers=clean
> packet:          git> offers=smudge
> packet:          git> offers=size
> packet:          git> 0000
> packet:          git< accepts=clean
> packet:          git< accepts=smudge
> packet:          git< 0000
> 
> @Peff: Would that be OK for you?

I don't feel strongly about it.  It could be "<capability>=true", it could
be "<capability>", it could be "capability=<capability>", it could be
something more descriptive.

I guess "<capability>=true" looks a bit strange (would it ever be there
"<capability>=false"?), but it is good enough for me.


One think we can all agree on is that each capability is to be send as
separate packets, and not as space or comma separated list in a single
packet (like for fetch / push).

>>>>> +------------------------
>>>>> +packet:          git< status=abort
>>>>> +packet:          git< 0000
>>>>> +------------------------
>>>>> +
>>>>> +After the filter has processed a blob it is expected to wait for
>>>>> +the next "key=value" list containing a command. Git will close
>>>>> +the command pipe on exit. The filter is expected to detect EOF
>>>>> +and exit gracefully on its own.
>>>>
>>>> Any "kill filter" solutions should probably be put here.
>>>
>>> Agreed. 
>>>
>>>> I guess
>>>> that filter exiting means EOF on its standard output when read
>>>> by Git command, isn't it?
>>>
>>> Yes, but at this point Git is not listening anymore.
>>
>> I think it might be good idea to have here the information about
>> what filter process should do if it needs maybe lengthy closing
>> process, to not hold/stop Git command or to not be killed.
> 
> I've added:
> 
> "Git will wait until the filter process has stopped."

Thanks.  Looks good for me.

I think any advices (like how to handle shutdown in filter without
blocking Git) could be added later, when we have some experience
making and using long-running multi-file filter drivers.

Thank you for your work on this series.
-- 
Jakub Narębski


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v10 04/14] run-command: add clean_on_exit_handler
  2016-10-15 15:02     ` Lars Schneider
@ 2016-10-16  8:03       ` Johannes Schindelin
  2016-10-16 21:57         ` Lars Schneider
  0 siblings, 1 reply; 34+ messages in thread
From: Johannes Schindelin @ 2016-10-16  8:03 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, gitster, jnareb, peff

Hi Lars,

On Sat, 15 Oct 2016, Lars Schneider wrote:

> 
> > On 11 Oct 2016, at 05:12, Johannes Schindelin <johannes.schindelin@gmx.de> wrote:
> > 
> > Hi Lars,
> > 
> > On Sat, 8 Oct 2016, larsxschneider@gmail.com wrote:
> > 
> >> @@ -31,6 +32,15 @@ static void cleanup_children(int sig, int in_signal)
> >> 	while (children_to_clean) {
> >> 		struct child_to_clean *p = children_to_clean;
> >> 		children_to_clean = p->next;
> >> +
> >> +		if (p->process && !in_signal) {
> >> +			struct child_process *process = p->process;
> >> +			if (process->clean_on_exit_handler) {
> >> +				trace_printf("trace: run_command: running exit handler for pid %d", p->pid);
> > 
> > On Windows, pid_t translates to long long int, resulting in this build
> > error:
> > 
> > -- snip --
> > In file included from cache.h:10:0,
> >                  from run-command.c:1:
> > run-command.c: In function 'cleanup_children':
> > run-command.c:39:18: error: format '%d' expects argument of type 'int', but argument 5 has type 'pid_t {aka long long int}' [-Werror=format=]
> >      trace_printf("trace: run_command: running exit handler for pid %d", p->pid);
> >                   ^
> > trace.h:81:53: note: in definition of macro 'trace_printf'
> >   trace_printf_key_fl(TRACE_CONTEXT, __LINE__, NULL, __VA_ARGS__)
> >                                                      ^~~~~~~~~~~
> > cc1.exe: all warnings being treated as errors
> > make: *** [Makefile:1987: run-command.o] Error 1
> > -- snap --
> > 
> > Maybe use PRIuMAX as we do elsewhere (see output of `git grep
> > printf.*pid`):
> > 
> > 	trace_printf("trace: run_command: running exit handler for pid %"
> > 		     PRIuMAX, (uintmax_t)p->pid);
> 
> Thanks for hint! I'll change it!
> 
> However, I am building on Win 8.1 with your latest SDK and I cannot
> reproduce the error. Any idea why that might be the case?

Are you building with DEVELOPER=1?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v10 04/14] run-command: add clean_on_exit_handler
  2016-10-16  8:03       ` Johannes Schindelin
@ 2016-10-16 21:57         ` Lars Schneider
  0 siblings, 0 replies; 34+ messages in thread
From: Lars Schneider @ 2016-10-16 21:57 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, gitster, jnareb, peff


> On 16 Oct 2016, at 01:03, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> 
> Hi Lars,
> 
> On Sat, 15 Oct 2016, Lars Schneider wrote:
> 
>> 
>>> On 11 Oct 2016, at 05:12, Johannes Schindelin <johannes.schindelin@gmx.de> wrote:
>>> 
>>> On Windows, pid_t translates to long long int, resulting in this build
>>> error:
>>> 
>> 
>> Thanks for hint! I'll change it!
>> 
>> However, I am building on Win 8.1 with your latest SDK and I cannot
>> reproduce the error. Any idea why that might be the case?
> 
> Are you building with DEVELOPER=1?

Argh! Of course ... I forgot to add this flag to my config.mak on Windows.

Thanks,
Lars

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v10 13/14] convert: add filter.<driver>.process option
  2016-10-11 10:09       ` Torsten Bögershausen
@ 2016-10-16 23:13         ` Lars Schneider
  2016-10-17 17:05         ` Junio C Hamano
  1 sibling, 0 replies; 34+ messages in thread
From: Lars Schneider @ 2016-10-16 23:13 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Junio C Hamano, git, Jakub Narębski, peff


> On 11 Oct 2016, at 03:09, Torsten Bögershausen <tboegi@web.de> wrote:
> 
> On Tue, Oct 11, 2016 at 10:11:22AM +0200, Lars Schneider wrote:
>> 
>>> On 10 Oct 2016, at 21:58, Junio C Hamano <gitster@pobox.com> wrote:
>>> 
>>> larsxschneider@gmail.com writes:
>>> 
>>> [...]
>>>> 
>> -test_cmp_count_except_clean () {
>> -	for FILE in $@
> 
>> +test_cmp_count () {
>> +	expect=$1 actual=$2
> 
> That could be 
> expect="$1"
> actual="$2"

Sure!


>> +	for FILE in "$expect" "$actual"
>> 	do
> 
>> +		sort "$FILE" | uniq -c | sed "s/^[ ]*//" |
>> +			sed "s/^\([0-9]\) IN: clean/x IN: clean/" |
>> +			sed "s/^\([0-9]\) IN: smudge/x IN: smudge/" >"$FILE.tmp" &&
>> +		cat "$FILE.tmp" >"$FILE"
> 
> How about 
> 		cp "$FILE.tmp" "$FILE"

OK, I'll use "mv".

Thanks,
Lars

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v10 13/14] convert: add filter.<driver>.process option
  2016-10-11 10:09       ` Torsten Bögershausen
  2016-10-16 23:13         ` Lars Schneider
@ 2016-10-17 17:05         ` Junio C Hamano
  1 sibling, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2016-10-17 17:05 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Lars Schneider, git, Jakub Narębski, peff

Torsten Bögershausen <tboegi@web.de> writes:

>> +test_cmp_count () {
>> +	expect=$1 actual=$2
>
> That could be 
> expect="$1"
> actual="$2"

Yes, but it does not have to ;-).

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2016-10-17 17:05 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-08 11:25 [PATCH v10 00/14] Git filter protocol larsxschneider
2016-10-08 11:25 ` [PATCH v10 01/14] convert: quote filter names in error messages larsxschneider
2016-10-08 11:25 ` [PATCH v10 02/14] convert: modernize tests larsxschneider
2016-10-08 11:25 ` [PATCH v10 03/14] run-command: move check_pipe() from write_or_die to run_command larsxschneider
2016-10-08 11:25 ` [PATCH v10 04/14] run-command: add clean_on_exit_handler larsxschneider
2016-10-11 12:12   ` Johannes Schindelin
2016-10-15 15:02     ` Lars Schneider
2016-10-16  8:03       ` Johannes Schindelin
2016-10-16 21:57         ` Lars Schneider
2016-10-08 11:25 ` [PATCH v10 05/14] pkt-line: rename packet_write() to packet_write_fmt() larsxschneider
2016-10-08 11:25 ` [PATCH v10 06/14] pkt-line: extract set_packet_header() larsxschneider
2016-10-08 11:25 ` [PATCH v10 07/14] pkt-line: add packet_write_fmt_gently() larsxschneider
2016-10-08 11:25 ` [PATCH v10 08/14] pkt-line: add packet_flush_gently() larsxschneider
2016-10-08 11:25 ` [PATCH v10 09/14] pkt-line: add packet_write_gently() larsxschneider
2016-10-08 11:25 ` [PATCH v10 10/14] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
2016-10-08 11:25 ` [PATCH v10 11/14] convert: make apply_filter() adhere to standard Git error handling larsxschneider
2016-10-08 11:25 ` [PATCH v10 12/14] convert: prepare filter.<driver>.process option larsxschneider
2016-10-08 11:25 ` [PATCH v10 13/14] convert: add " larsxschneider
2016-10-08 23:06   ` Jakub Narębski
2016-10-09  5:32     ` Torsten Bögershausen
2016-10-11 15:29       ` Lars Schneider
2016-10-11 22:26     ` Lars Schneider
2016-10-12 10:54       ` Jakub Narębski
2016-10-15 14:45         ` Lars Schneider
2016-10-15 17:41           ` Jeff King
2016-10-15 19:42           ` Jakub Narębski
2016-10-10 19:58   ` Junio C Hamano
2016-10-11  8:11     ` Lars Schneider
2016-10-11 10:09       ` Torsten Bögershausen
2016-10-16 23:13         ` Lars Schneider
2016-10-17 17:05         ` Junio C Hamano
2016-10-08 11:25 ` [PATCH v10 14/14] contrib/long-running-filter: add long running filter example larsxschneider
2016-10-09  5:42   ` Torsten Bögershausen
2016-10-15 14:47     ` Lars Schneider

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).