git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2 0/2] Refactor the filter process code into a reusable module
@ 2017-03-24 15:27 Ben Peart
  2017-03-24 15:27 ` [PATCH v2 1/2] pkt-line: add packet_writel() and packet_read_line_gently() Ben Peart
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Ben Peart @ 2017-03-24 15:27 UTC (permalink / raw)
  To: git; +Cc: benpeart, christian.couder, larsxschneider

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 1463 bytes --]

We have a couple of patch series we’re working on (ObjectDB/Read-Object,
Watchman integration) where we need the ability to have a background
process running that can accept multiple commands thus avoiding the
overhead of spawning a new process for every command.

The ability to do this was added in:

	Commit edcc85814c ("convert: add filter.<driver>.process option", 2016-10-16)
	keeps the external process running and processes all commands

but it is integrated into the convert code.  This patch series takes the
code from convert.c and refactors it into a separate "sub-process"
module so that we can centralize and reuse this logic in other areas.

Ben Peart (2):
  pkt-line: add packet_writel() and packet_read_line_gently()
  sub-process: refactor the filter process code into a reusable module

 Documentation/technical/api-sub-process.txt |  54 ++++++++++
 Makefile                                    |   1 +
 convert.c                                   | 160 ++++++----------------------
 pkt-line.c                                  |  31 ++++++
 pkt-line.h                                  |  11 ++
 sub-process.c                               | 117 ++++++++++++++++++++
 sub-process.h                               |  46 ++++++++
 7 files changed, 290 insertions(+), 130 deletions(-)
 create mode 100644 Documentation/technical/api-sub-process.txt
 create mode 100644 sub-process.c
 create mode 100644 sub-process.h

-- 
2.12.0.gvfs.1.43.g876ba2a


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/2] pkt-line: add packet_writel() and packet_read_line_gently()
  2017-03-24 15:27 [PATCH v2 0/2] Refactor the filter process code into a reusable module Ben Peart
@ 2017-03-24 15:27 ` Ben Peart
  2017-03-25  5:47   ` Torsten Bögershausen
  2017-03-24 15:27 ` [PATCH v2 1/4] t7800: remove whitespace before redirect Ben Peart
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Ben Peart @ 2017-03-24 15:27 UTC (permalink / raw)
  To: git; +Cc: benpeart, christian.couder, larsxschneider

Add packet_writel() which writes multiple lines in a single call and
then calls packet_flush_gently(). Add packet_read_line_gently() to
enable reading a line without dying on EOF.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 pkt-line.c | 31 +++++++++++++++++++++++++++++++
 pkt-line.h | 11 +++++++++++
 2 files changed, 42 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index d4b6bfe076..2788aa1af6 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -171,6 +171,25 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...)
 	return status;
 }
 
+int packet_writel(int fd, const char *line, ...)
+{
+	va_list args;
+	int err;
+	va_start(args, line);
+	for (;;) {
+		if (!line)
+			break;
+		if (strlen(line) > LARGE_PACKET_DATA_MAX)
+			return -1;
+		err = packet_write_fmt_gently(fd, "%s\n", line);
+		if (err)
+			return err;
+		line = va_arg(args, const char*);
+	}
+	va_end(args);
+	return packet_flush_gently(fd);
+}
+
 static int packet_write_gently(const int fd_out, const char *buf, size_t size)
 {
 	static char packet_write_buffer[LARGE_PACKET_MAX];
@@ -323,6 +342,18 @@ char *packet_read_line(int fd, int *len_p)
 	return packet_read_line_generic(fd, NULL, NULL, len_p);
 }
 
+int packet_read_line_gently(int fd, int *dst_len, char** dst_line)
+{
+	int len = packet_read(fd, NULL, NULL,
+		packet_buffer, sizeof(packet_buffer),
+		PACKET_READ_CHOMP_NEWLINE|PACKET_READ_GENTLE_ON_EOF);
+	if (dst_len)
+		*dst_len = len;
+	if (dst_line)
+		*dst_line = (len > 0) ? packet_buffer : NULL;
+	return len;
+}
+
 char *packet_read_line_buf(char **src, size_t *src_len, int *dst_len)
 {
 	return packet_read_line_generic(-1, src, src_len, dst_len);
diff --git a/pkt-line.h b/pkt-line.h
index 18eac64830..cb3eda9695 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -25,6 +25,7 @@ void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 int packet_flush_gently(int fd);
 int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
+int packet_writel(int fd, const char *line, ...);
 int write_packetized_from_fd(int fd_in, int fd_out);
 int write_packetized_from_buf(const char *src_in, size_t len, int fd_out);
 
@@ -74,6 +75,16 @@ int packet_read(int fd, char **src_buffer, size_t *src_len, char
 char *packet_read_line(int fd, int *size);
 
 /*
+ * Convenience wrapper for packet_read that sets the PACKET_READ_GENTLE_ON_EOF
+ * and CHOMP_NEWLINE options. The return value specifies the number of bytes
+ * read into the buffer or -1 on truncated input. if the *dst_line parameter
+ * is not NULL it will return NULL for a flush packet and otherwise points to
+ * a static buffer (that may be overwritten by subsequent calls). If the size
+ * parameter is not NULL, the length of the packet is written to it.
+ */
+int packet_read_line_gently(int fd, int *size, char** dst_line);
+
+/*
  * Same as packet_read_line, but read from a buf rather than a descriptor;
  * see packet_read for details on how src_* is used.
  */
-- 
2.12.0.gvfs.1.43.g876ba2a


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 1/4] t7800: remove whitespace before redirect
  2017-03-24 15:27 [PATCH v2 0/2] Refactor the filter process code into a reusable module Ben Peart
  2017-03-24 15:27 ` [PATCH v2 1/2] pkt-line: add packet_writel() and packet_read_line_gently() Ben Peart
@ 2017-03-24 15:27 ` Ben Peart
  2017-03-24 16:21   ` Ben Peart
  2017-03-24 15:27 ` [PATCH v2 2/2] sub-process: refactor the filter process code into a reusable module Ben Peart
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Ben Peart @ 2017-03-24 15:27 UTC (permalink / raw)
  To: git
  Cc: benpeart, christian.couder, larsxschneider, David Aguilar,
	Junio C Hamano

From: David Aguilar <davvid@gmail.com>

Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 t/t7800-difftool.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t7800-difftool.sh b/t/t7800-difftool.sh
index 25241f4096..e1ec292718 100755
--- a/t/t7800-difftool.sh
+++ b/t/t7800-difftool.sh
@@ -428,7 +428,7 @@ run_dir_diff_test 'difftool --dir-diff branch from subdirectory' '
 		git difftool --dir-diff $symlinks --extcmd ls branch >output &&
 		# "sub" must only exist in "right"
 		# "file" and "file2" must be listed in both "left" and "right"
-		grep sub output > sub-output &&
+		grep sub output >sub-output &&
 		test_line_count = 1 sub-output &&
 		grep file"$" output >file-output &&
 		test_line_count = 2 file-output &&
-- 
2.12.0.gvfs.1.43.g876ba2a


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/2] sub-process: refactor the filter process code into a reusable module
  2017-03-24 15:27 [PATCH v2 0/2] Refactor the filter process code into a reusable module Ben Peart
  2017-03-24 15:27 ` [PATCH v2 1/2] pkt-line: add packet_writel() and packet_read_line_gently() Ben Peart
  2017-03-24 15:27 ` [PATCH v2 1/4] t7800: remove whitespace before redirect Ben Peart
@ 2017-03-24 15:27 ` Ben Peart
  2017-03-27 18:59   ` Jonathan Tan
  2017-03-24 15:27 ` [PATCH v2 2/4] t7800: cleanup cruft left behind by tests Ben Peart
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Ben Peart @ 2017-03-24 15:27 UTC (permalink / raw)
  To: git; +Cc: benpeart, christian.couder, larsxschneider

Refactor the filter.<driver>.process code into a separate sub-process
module that can be used to reduce the cost of starting up a sub-process
for multiple commands.  It does this by keeping the external process
running and processing all commands by communicating over standard input
and standard output using the packet format (pkt-line) based protocol.
Full documentation is in Documentation/technical/api-sub-process.txt.

This code is refactored from:

	Commit edcc85814c ("convert: add filter.<driver>.process option", 2016-10-16)
	keeps the external process running and processes all commands

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 Documentation/technical/api-sub-process.txt |  54 ++++++++++
 Makefile                                    |   1 +
 convert.c                                   | 160 ++++++----------------------
 sub-process.c                               | 117 ++++++++++++++++++++
 sub-process.h                               |  46 ++++++++
 5 files changed, 248 insertions(+), 130 deletions(-)
 create mode 100644 Documentation/technical/api-sub-process.txt
 create mode 100644 sub-process.c
 create mode 100644 sub-process.h

diff --git a/Documentation/technical/api-sub-process.txt b/Documentation/technical/api-sub-process.txt
new file mode 100644
index 0000000000..eb5005aa72
--- /dev/null
+++ b/Documentation/technical/api-sub-process.txt
@@ -0,0 +1,54 @@
+sub-process API
+===============
+
+The sub-process API makes it possible to run background sub-processes
+that should run until the git command exits and communicate with it
+through stdin and stdout.  This reduces the overhead of having to fork
+a new process each time it needs to be communicated with.
+
+The sub-processes are kept in a hashmap by command name and looked up
+via the subprocess_find_entry function.  If an existing instance can not
+be found then a new process should be created and started.  When the
+parent git command terminates, all sub-processes are also terminated.
+
+This API is based on the run-command API.
+
+Data structures
+---------------
+
+* `struct subprocess_entry`
+
+The sub-process structure.  Members should not be accessed directly.
+
+Types
+-----
+
+'int(*subprocess_start_fn)(struct subprocess_entry *entry)'::
+
+	User-supplied function to initialize the sub-process.  This is
+	typically used to negoiate the interface version and capabilities.
+
+
+Functions
+---------
+
+`subprocess_start`::
+
+	Start a subprocess and add it to the subprocess hashmap.
+
+`subprocess_stop`::
+
+	Kill a subprocess and remove it from the subprocess hashmap.
+
+`subprocess_find_entry`::
+
+	Find a subprocess in the subprocess hashmap.
+
+`subprocess_get_child_process`::
+
+	Get the underlying `struct child_process` from a subprocess.
+
+`subprocess_read_status`::
+
+	Helper function to read packets looking for the last "status=<foo>"
+	key/value pair.
diff --git a/Makefile b/Makefile
index a5a11e721a..8afe733092 100644
--- a/Makefile
+++ b/Makefile
@@ -830,6 +830,7 @@ LIB_OBJS += streaming.o
 LIB_OBJS += string-list.o
 LIB_OBJS += submodule.o
 LIB_OBJS += submodule-config.o
+LIB_OBJS += sub-process.o
 LIB_OBJS += symlinks.o
 LIB_OBJS += tag.o
 LIB_OBJS += tempfile.o
diff --git a/convert.c b/convert.c
index 8d652bf27c..565a1d0d8c 100644
--- a/convert.c
+++ b/convert.c
@@ -4,6 +4,7 @@
 #include "quote.h"
 #include "sigchain.h"
 #include "pkt-line.h"
+#include "sub-process.h"
 
 /*
  * convert.c - convert a file when checking it out and checking it in.
@@ -497,132 +498,30 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le
 #define CAP_SMUDGE   (1u<<1)
 
 struct cmd2process {
-	struct hashmap_entry ent; /* must be the first member! */
+	struct subprocess_entry subprocess;
 	unsigned int supported_capabilities;
-	const char *cmd;
-	struct child_process process;
 };
 
-static int cmd_process_map_initialized;
-static struct hashmap cmd_process_map;
-
-static int cmd2process_cmp(const struct cmd2process *e1,
-			   const struct cmd2process *e2,
-			   const void *unused)
+static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
 {
-	return strcmp(e1->cmd, e2->cmd);
-}
-
-static struct cmd2process *find_multi_file_filter_entry(struct hashmap *hashmap, const char *cmd)
-{
-	struct cmd2process key;
-	hashmap_entry_init(&key, strhash(cmd));
-	key.cmd = cmd;
-	return hashmap_get(hashmap, &key, NULL);
-}
-
-static int packet_write_list(int fd, const char *line, ...)
-{
-	va_list args;
 	int err;
-	va_start(args, line);
-	for (;;) {
-		if (!line)
-			break;
-		if (strlen(line) > LARGE_PACKET_DATA_MAX)
-			return -1;
-		err = packet_write_fmt_gently(fd, "%s\n", line);
-		if (err)
-			return err;
-		line = va_arg(args, const char*);
-	}
-	va_end(args);
-	return packet_flush_gently(fd);
-}
-
-static void read_multi_file_filter_status(int fd, struct strbuf *status)
-{
-	struct strbuf **pair;
-	char *line;
-	for (;;) {
-		line = packet_read_line(fd, NULL);
-		if (!line)
-			break;
-		pair = strbuf_split_str(line, '=', 2);
-		if (pair[0] && pair[0]->len && pair[1]) {
-			/* the last "status=<foo>" line wins */
-			if (!strcmp(pair[0]->buf, "status=")) {
-				strbuf_reset(status);
-				strbuf_addbuf(status, pair[1]);
-			}
-		}
-		strbuf_list_free(pair);
-	}
-}
-
-static void kill_multi_file_filter(struct hashmap *hashmap, struct cmd2process *entry)
-{
-	if (!entry)
-		return;
-
-	entry->process.clean_on_exit = 0;
-	kill(entry->process.pid, SIGTERM);
-	finish_command(&entry->process);
-
-	hashmap_remove(hashmap, entry, NULL);
-	free(entry);
-}
-
-static void stop_multi_file_filter(struct child_process *process)
-{
-	sigchain_push(SIGPIPE, SIG_IGN);
-	/* Closing the pipe signals the filter to initiate a shutdown. */
-	close(process->in);
-	close(process->out);
-	sigchain_pop(SIGPIPE);
-	/* Finish command will wait until the shutdown is complete. */
-	finish_command(process);
-}
-
-static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, const char *cmd)
-{
-	int err;
-	struct cmd2process *entry;
+	struct cmd2process *entry = (struct cmd2process *)subprocess;
 	struct child_process *process;
-	const char *argv[] = { cmd, NULL };
 	struct string_list cap_list = STRING_LIST_INIT_NODUP;
 	char *cap_buf;
 	const char *cap_name;
 
-	entry = xmalloc(sizeof(*entry));
-	entry->cmd = cmd;
-	entry->supported_capabilities = 0;
-	process = &entry->process;
-
-	child_process_init(process);
-	process->argv = argv;
-	process->use_shell = 1;
-	process->in = -1;
-	process->out = -1;
-	process->clean_on_exit = 1;
-	process->clean_on_exit_handler = stop_multi_file_filter;
-
-	if (start_command(process)) {
-		error("cannot fork to run external filter '%s'", cmd);
-		return NULL;
-	}
-
-	hashmap_entry_init(entry, strhash(cmd));
+	process = subprocess_get_child_process(&entry->subprocess);
 
 	sigchain_push(SIGPIPE, SIG_IGN);
 
-	err = packet_write_list(process->in, "git-filter-client", "version=2", NULL);
+	err = packet_writel(process->in, "git-filter-client", "version=2", NULL);
 	if (err)
 		goto done;
 
 	err = strcmp(packet_read_line(process->out, NULL), "git-filter-server");
 	if (err) {
-		error("external filter '%s' does not support filter protocol version 2", cmd);
+		error("external filter '%s' does not support filter protocol version 2", subprocess->cmd);
 		goto done;
 	}
 	err = strcmp(packet_read_line(process->out, NULL), "version=2");
@@ -632,7 +531,7 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
 	if (err)
 		goto done;
 
-	err = packet_write_list(process->in, "capability=clean", "capability=smudge", NULL);
+	err = packet_writel(process->in, "capability=clean", "capability=smudge", NULL);
 
 	for (;;) {
 		cap_buf = packet_read_line(process->out, NULL);
@@ -651,7 +550,7 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
 		} else {
 			warning(
 				"external filter '%s' requested unsupported filter capability '%s'",
-				cmd, cap_name
+				subprocess->cmd, cap_name
 			);
 		}
 
@@ -661,14 +560,10 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
 done:
 	sigchain_pop(SIGPIPE);
 
-	if (err || errno == EPIPE) {
-		error("initialization for external filter '%s' failed", cmd);
-		kill_multi_file_filter(hashmap, entry);
-		return NULL;
-	}
+	if (err || errno == EPIPE)
+		err = err ? err : errno;
 
-	hashmap_add(hashmap, entry);
-	return entry;
+	return err;
 }
 
 static int apply_multi_file_filter(const char *path, const char *src, size_t len,
@@ -682,22 +577,20 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	struct strbuf filter_status = STRBUF_INIT;
 	const char *filter_type;
 
-	if (!cmd_process_map_initialized) {
-		cmd_process_map_initialized = 1;
-		hashmap_init(&cmd_process_map, (hashmap_cmp_fn) cmd2process_cmp, 0);
-		entry = NULL;
-	} else {
-		entry = find_multi_file_filter_entry(&cmd_process_map, cmd);
-	}
+	entry = (struct cmd2process *)subprocess_find_entry(cmd);
 
 	fflush(NULL);
 
 	if (!entry) {
-		entry = start_multi_file_filter(&cmd_process_map, cmd);
-		if (!entry)
+		entry = xmalloc(sizeof(*entry));
+		entry->supported_capabilities = 0;
+
+		if (subprocess_start(&entry->subprocess, cmd, start_multi_file_filter_fn)) {
+			free(entry);
 			return 0;
+		}
 	}
-	process = &entry->process;
+	process = subprocess_get_child_process(&entry->subprocess);
 
 	if (!(wanted_capability & entry->supported_capabilities))
 		return 0;
@@ -737,7 +630,10 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	if (err)
 		goto done;
 
-	read_multi_file_filter_status(process->out, &filter_status);
+	err = subprocess_read_status(process->out, &filter_status);
+	if (err)
+		goto done;
+
 	err = strcmp(filter_status.buf, "success");
 	if (err)
 		goto done;
@@ -746,7 +642,10 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	if (err)
 		goto done;
 
-	read_multi_file_filter_status(process->out, &filter_status);
+	err = subprocess_read_status(process->out, &filter_status);
+	if (err)
+		goto done;
+
 	err = strcmp(filter_status.buf, "success");
 
 done:
@@ -768,7 +667,8 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 			 * Force shutdown and restart if another blob requires filtering.
 			 */
 			error("external filter '%s' failed", cmd);
-			kill_multi_file_filter(&cmd_process_map, entry);
+			subprocess_stop((struct subprocess_entry *)entry);
+			free(entry);
 		}
 	} else {
 		strbuf_swap(dst, &nbuf);
diff --git a/sub-process.c b/sub-process.c
new file mode 100644
index 0000000000..5a06a9fd1e
--- /dev/null
+++ b/sub-process.c
@@ -0,0 +1,117 @@
+/*
+ * Generic implementation of background process infrastructure.
+ */
+#include "sub-process.h"
+#include "sigchain.h"
+#include "pkt-line.h"
+
+static int subprocess_map_initialized;
+static struct hashmap subprocess_map;
+
+static int name2process_cmp(const struct subprocess_entry *e1,
+		const struct subprocess_entry *e2, const void *unused)
+{
+	return strcmp(e1->cmd, e2->cmd);
+}
+
+static void subprocess_exit_handler(struct child_process *process)
+{
+	sigchain_push(SIGPIPE, SIG_IGN);
+	/* Closing the pipe signals the filter to initiate a shutdown. */
+	close(process->in);
+	close(process->out);
+	sigchain_pop(SIGPIPE);
+	/* Finish command will wait until the shutdown is complete. */
+	finish_command(process);
+}
+
+int subprocess_start(struct subprocess_entry *entry, const char *cmd,
+		subprocess_start_fn startfn)
+{
+	int err;
+	const char *argv[] = { cmd, NULL };
+
+	if (!subprocess_map_initialized) {
+		hashmap_init(&subprocess_map, (hashmap_cmp_fn)name2process_cmp, 0);
+		subprocess_map_initialized = 1;
+	}
+
+	entry->cmd = cmd;
+
+	child_process_init(&entry->process);
+	entry->process.argv = argv;
+	entry->process.use_shell = 1;
+	entry->process.in = -1;
+	entry->process.out = -1;
+	entry->process.clean_on_exit = 1;
+	entry->process.clean_on_exit_handler = subprocess_exit_handler;
+
+	err = start_command(&entry->process);
+	if (err) {
+		error("cannot fork to run sub-process '%s'", entry->cmd);
+		return err;
+	}
+
+	err = startfn(entry);
+	if (err) {
+		error("initialization for sub-process '%s' failed", entry->cmd);
+		subprocess_stop(entry);
+		return err;
+	}
+
+	hashmap_entry_init(entry, strhash(entry->cmd));
+	hashmap_add(&subprocess_map, entry);
+
+	return 0;
+}
+
+void subprocess_stop(struct subprocess_entry *entry)
+{
+	if (!entry)
+		return;
+
+	entry->process.clean_on_exit = 0;
+	kill(entry->process.pid, SIGTERM);
+	finish_command(&entry->process);
+
+	hashmap_remove(&subprocess_map, entry, NULL);
+}
+
+struct subprocess_entry *subprocess_find_entry(const char *cmd)
+{
+	struct subprocess_entry key;
+
+	if (!subprocess_map_initialized) {
+		hashmap_init(&subprocess_map, (hashmap_cmp_fn)name2process_cmp, 0);
+		subprocess_map_initialized = 1;
+		return NULL;
+	}
+
+	hashmap_entry_init(&key, strhash(cmd));
+	key.cmd = cmd;
+	return hashmap_get(&subprocess_map, &key, NULL);
+}
+
+int subprocess_read_status(int fd, struct strbuf *status)
+{
+	struct strbuf **pair;
+	char *line;
+	int len;
+
+	for (;;) {
+		len = packet_read_line_gently(fd, NULL, &line);
+		if ((len == -1) || !line)
+			break;
+		pair = strbuf_split_str(line, '=', 2);
+		if (pair[0] && pair[0]->len && pair[1]) {
+			/* the last "status=<foo>" line wins */
+			if (!strcmp(pair[0]->buf, "status=")) {
+				strbuf_reset(status);
+				strbuf_addbuf(status, pair[1]);
+			}
+		}
+		strbuf_list_free(pair);
+	}
+
+	return len == -1 ? len : 0;
+}
diff --git a/sub-process.h b/sub-process.h
new file mode 100644
index 0000000000..d1492f476d
--- /dev/null
+++ b/sub-process.h
@@ -0,0 +1,46 @@
+#ifndef SUBPROCESS_H
+#define SUBPROCESS_H
+
+#include "git-compat-util.h"
+#include "hashmap.h"
+#include "run-command.h"
+
+/*
+ * Generic implementation of background process infrastructure.
+ * See Documentation/technical/api-background-process.txt.
+ */
+
+ /* data structures */
+
+struct subprocess_entry {
+	struct hashmap_entry ent; /* must be the first member! */
+	struct child_process process;
+	const char *cmd;
+};
+
+/* subprocess functions */
+
+typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
+int subprocess_start(struct subprocess_entry *entry, const char *cmd,
+		subprocess_start_fn startfn);
+
+void subprocess_stop(struct subprocess_entry *entry);
+
+struct subprocess_entry *subprocess_find_entry(const char *cmd);
+
+/* subprocess helper functions */
+
+static inline struct child_process *subprocess_get_child_process(
+		struct subprocess_entry *entry)
+{
+	return &entry->process;
+}
+
+/*
+ * Helper function that will read packets looking for "status=<foo>"
+ * key/value pairs and return the value from the last "status" packet
+ */
+
+int subprocess_read_status(int fd, struct strbuf *status);
+
+#endif
-- 
2.12.0.gvfs.1.43.g876ba2a


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/4] t7800: cleanup cruft left behind by tests
  2017-03-24 15:27 [PATCH v2 0/2] Refactor the filter process code into a reusable module Ben Peart
                   ` (2 preceding siblings ...)
  2017-03-24 15:27 ` [PATCH v2 2/2] sub-process: refactor the filter process code into a reusable module Ben Peart
@ 2017-03-24 15:27 ` Ben Peart
  2017-03-24 15:27 ` [PATCH v2 3/4] difftool: handle modified symlinks in dir-diff mode Ben Peart
  2017-03-24 15:27 ` [PATCH v2 4/4] Git 2.12.1 Ben Peart
  5 siblings, 0 replies; 15+ messages in thread
From: Ben Peart @ 2017-03-24 15:27 UTC (permalink / raw)
  To: git
  Cc: benpeart, christian.couder, larsxschneider, David Aguilar,
	Junio C Hamano

From: David Aguilar <davvid@gmail.com>

Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 t/t7800-difftool.sh | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/t/t7800-difftool.sh b/t/t7800-difftool.sh
index e1ec292718..e0e65df8de 100755
--- a/t/t7800-difftool.sh
+++ b/t/t7800-difftool.sh
@@ -591,6 +591,7 @@ test_expect_success 'difftool --no-symlinks detects conflict ' '
 '
 
 test_expect_success 'difftool properly honors gitlink and core.worktree' '
+	test_when_finished rm -rf submod/ule &&
 	git submodule add ./. submod/ule &&
 	test_config -C submod/ule diff.tool checktrees &&
 	test_config -C submod/ule difftool.checktrees.cmd '\''
@@ -600,11 +601,13 @@ test_expect_success 'difftool properly honors gitlink and core.worktree' '
 		cd submod/ule &&
 		echo good >expect &&
 		git difftool --tool=checktrees --dir-diff HEAD~ >actual &&
-		test_cmp expect actual
+		test_cmp expect actual &&
+		rm -f expect actual
 	)
 '
 
 test_expect_success SYMLINKS 'difftool --dir-diff symlinked directories' '
+	test_when_finished git reset --hard &&
 	git init dirlinks &&
 	(
 		cd dirlinks &&
-- 
2.12.0.gvfs.1.43.g876ba2a


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/4] difftool: handle modified symlinks in dir-diff mode
  2017-03-24 15:27 [PATCH v2 0/2] Refactor the filter process code into a reusable module Ben Peart
                   ` (3 preceding siblings ...)
  2017-03-24 15:27 ` [PATCH v2 2/4] t7800: cleanup cruft left behind by tests Ben Peart
@ 2017-03-24 15:27 ` Ben Peart
  2017-03-24 15:27 ` [PATCH v2 4/4] Git 2.12.1 Ben Peart
  5 siblings, 0 replies; 15+ messages in thread
From: Ben Peart @ 2017-03-24 15:27 UTC (permalink / raw)
  To: git
  Cc: benpeart, christian.couder, larsxschneider, David Aguilar,
	Junio C Hamano

From: David Aguilar <davvid@gmail.com>

Detect the null object ID for symlinks in dir-diff so that difftool can
detect when symlinks are modified in the worktree.

Previously, a null symlink object ID would crash difftool.
Handle null object IDs as unknown content that must be read from
the worktree.

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 builtin/difftool.c  | 51 ++++++++++++++++++++++++++++++++++++++++-----
 t/t7800-difftool.sh | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 106 insertions(+), 5 deletions(-)

diff --git a/builtin/difftool.c b/builtin/difftool.c
index d13350ce83..25e54ad3ed 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -254,6 +254,49 @@ static int ensure_leading_directories(char *path)
 	}
 }
 
+/*
+ * Unconditional writing of a plain regular file is what
+ * "git difftool --dir-diff" wants to do for symlinks.  We are preparing two
+ * temporary directories to be fed to a Git-unaware tool that knows how to
+ * show a diff of two directories (e.g. "diff -r A B").
+ *
+ * Because the tool is Git-unaware, if a symbolic link appears in either of
+ * these temporary directories, it will try to dereference and show the
+ * difference of the target of the symbolic link, which is not what we want,
+ * as the goal of the dir-diff mode is to produce an output that is logically
+ * equivalent to what "git diff" produces.
+ *
+ * Most importantly, we want to get textual comparison of the result of the
+ * readlink(2).  get_symlink() provides that---it returns the contents of
+ * the symlink that gets written to a regular file to force the external tool
+ * to compare the readlink(2) result as text, even on a filesystem that is
+ * capable of doing a symbolic link.
+ */
+static char *get_symlink(const struct object_id *oid, const char *path)
+{
+	char *data;
+	if (is_null_oid(oid)) {
+		/* The symlink is unknown to Git so read from the filesystem */
+		struct strbuf link = STRBUF_INIT;
+		if (has_symlinks) {
+			if (strbuf_readlink(&link, path, strlen(path)))
+				die(_("could not read symlink %s"), path);
+		} else if (strbuf_read_file(&link, path, 128))
+			die(_("could not read symlink file %s"), path);
+
+		data = strbuf_detach(&link, NULL);
+	} else {
+		enum object_type type;
+		unsigned long size;
+		data = read_sha1_file(oid->hash, &type, &size);
+		if (!data)
+			die(_("could not read object %s for symlink %s"),
+				oid_to_hex(oid), path);
+	}
+
+	return data;
+}
+
 static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 			int argc, const char **argv)
 {
@@ -270,8 +313,6 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 	struct hashmap working_tree_dups, submodules, symlinks2;
 	struct hashmap_iter iter;
 	struct pair_entry *entry;
-	enum object_type type;
-	unsigned long size;
 	struct index_state wtindex;
 	struct checkout lstate, rstate;
 	int rc, flags = RUN_GIT_CMD, err = 0;
@@ -377,13 +418,13 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 		}
 
 		if (S_ISLNK(lmode)) {
-			char *content = read_sha1_file(loid.hash, &type, &size);
+			char *content = get_symlink(&loid, src_path);
 			add_left_or_right(&symlinks2, src_path, content, 0);
 			free(content);
 		}
 
 		if (S_ISLNK(rmode)) {
-			char *content = read_sha1_file(roid.hash, &type, &size);
+			char *content = get_symlink(&roid, dst_path);
 			add_left_or_right(&symlinks2, dst_path, content, 1);
 			free(content);
 		}
@@ -397,7 +438,7 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix,
 				return error("could not write '%s'", src_path);
 		}
 
-		if (rmode) {
+		if (rmode && !S_ISLNK(rmode)) {
 			struct working_tree_entry *entry;
 
 			/* Avoid duplicate working_tree entries */
diff --git a/t/t7800-difftool.sh b/t/t7800-difftool.sh
index e0e65df8de..0e7f30db2d 100755
--- a/t/t7800-difftool.sh
+++ b/t/t7800-difftool.sh
@@ -626,4 +626,64 @@ test_expect_success SYMLINKS 'difftool --dir-diff symlinked directories' '
 	)
 '
 
+test_expect_success SYMLINKS 'difftool --dir-diff handles modified symlinks' '
+	test_when_finished git reset --hard &&
+	touch b &&
+	ln -s b c &&
+	git add b c &&
+	test_tick &&
+	git commit -m initial &&
+	touch d &&
+	rm c &&
+	ln -s d c &&
+	cat >expect <<-EOF &&
+		b
+		c
+
+		c
+	EOF
+	git difftool --symlinks --dir-diff --extcmd ls >output &&
+	grep -v ^/ output >actual &&
+	test_cmp expect actual &&
+
+	git difftool --no-symlinks --dir-diff --extcmd ls >output &&
+	grep -v ^/ output >actual &&
+	test_cmp expect actual &&
+
+	# The left side contains symlink "c" that points to "b"
+	test_config difftool.cat.cmd "cat \$LOCAL/c" &&
+	printf "%s\n" b >expect &&
+
+	git difftool --symlinks --dir-diff --tool cat >actual &&
+	test_cmp expect actual &&
+
+	git difftool --symlinks --no-symlinks --dir-diff --tool cat >actual &&
+	test_cmp expect actual &&
+
+	# The right side contains symlink "c" that points to "d"
+	test_config difftool.cat.cmd "cat \$REMOTE/c" &&
+	printf "%s\n" d >expect &&
+
+	git difftool --symlinks --dir-diff --tool cat >actual &&
+	test_cmp expect actual &&
+
+	git difftool --no-symlinks --dir-diff --tool cat >actual &&
+	test_cmp expect actual &&
+
+	# Deleted symlinks
+	rm -f c &&
+	cat >expect <<-EOF &&
+		b
+		c
+
+	EOF
+	git difftool --symlinks --dir-diff --extcmd ls >output &&
+	grep -v ^/ output >actual &&
+	test_cmp expect actual &&
+
+	git difftool --no-symlinks --dir-diff --extcmd ls >output &&
+	grep -v ^/ output >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.12.0.gvfs.1.43.g876ba2a


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/4] Git 2.12.1
  2017-03-24 15:27 [PATCH v2 0/2] Refactor the filter process code into a reusable module Ben Peart
                   ` (4 preceding siblings ...)
  2017-03-24 15:27 ` [PATCH v2 3/4] difftool: handle modified symlinks in dir-diff mode Ben Peart
@ 2017-03-24 15:27 ` Ben Peart
  5 siblings, 0 replies; 15+ messages in thread
From: Ben Peart @ 2017-03-24 15:27 UTC (permalink / raw)
  To: git; +Cc: benpeart, christian.couder, larsxschneider, Junio C Hamano

From: Junio C Hamano <gitster@pobox.com>

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 Documentation/git.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index aa895da4a5..25560f69ff 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -44,9 +44,10 @@ unreleased) version of Git, that is available from the 'master'
 branch of the `git.git` repository.
 Documentation for older releases are available here:
 
-* link:v2.12.0/git.html[documentation for release 2.12.0]
+* link:v2.12.1/git.html[documentation for release 2.12.1]
 
 * release notes for
+  link:RelNotes/2.12.1.txt[2.12.1].
   link:RelNotes/2.12.0.txt[2.12].
 
 * link:v2.11.1/git.html[documentation for release 2.11.1]
-- 
2.12.0.gvfs.1.43.g876ba2a


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* RE: [PATCH v2 1/4] t7800: remove whitespace before redirect
  2017-03-24 15:27 ` [PATCH v2 1/4] t7800: remove whitespace before redirect Ben Peart
@ 2017-03-24 16:21   ` Ben Peart
  0 siblings, 0 replies; 15+ messages in thread
From: Ben Peart @ 2017-03-24 16:21 UTC (permalink / raw)
  To: Ben Peart, git@vger.kernel.org
  Cc: christian.couder@gmail.com, larsxschneider@gmail.com,
	David Aguilar, Junio C Hamano

Please disregard.  I apologize - somehow my format-patch/send-email went awry. 

Thanks,

Ben

> -----Original Message-----
> From: Ben Peart [mailto:peartben@gmail.com]
> Sent: Friday, March 24, 2017 11:27 AM
> To: git@vger.kernel.org
> Cc: Ben Peart <Ben.Peart@microsoft.com>; christian.couder@gmail.com;
> larsxschneider@gmail.com; David Aguilar <davvid@gmail.com>; Junio C
> Hamano <gitster@pobox.com>
> Subject: [PATCH v2 1/4] t7800: remove whitespace before redirect
> 
> From: David Aguilar <davvid@gmail.com>
> 
> Signed-off-by: David Aguilar <davvid@gmail.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> ---
>  t/t7800-difftool.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/t/t7800-difftool.sh b/t/t7800-difftool.sh index
> 25241f4096..e1ec292718 100755
> --- a/t/t7800-difftool.sh
> +++ b/t/t7800-difftool.sh
> @@ -428,7 +428,7 @@ run_dir_diff_test 'difftool --dir-diff branch from
> subdirectory' '
>  		git difftool --dir-diff $symlinks --extcmd ls branch >output &&
>  		# "sub" must only exist in "right"
>  		# "file" and "file2" must be listed in both "left" and "right"
> -		grep sub output > sub-output &&
> +		grep sub output >sub-output &&
>  		test_line_count = 1 sub-output &&
>  		grep file"$" output >file-output &&
>  		test_line_count = 2 file-output &&
> --
> 2.12.0.gvfs.1.43.g876ba2a


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] pkt-line: add packet_writel() and packet_read_line_gently()
  2017-03-24 15:27 ` [PATCH v2 1/2] pkt-line: add packet_writel() and packet_read_line_gently() Ben Peart
@ 2017-03-25  5:47   ` Torsten Bögershausen
  2017-03-27 22:19     ` Ben Peart
  0 siblings, 1 reply; 15+ messages in thread
From: Torsten Bögershausen @ 2017-03-25  5:47 UTC (permalink / raw)
  To: Ben Peart, git; +Cc: benpeart, christian.couder, larsxschneider

On 2017-03-24 16:27, Ben Peart wrote:
> Add packet_writel() which writes multiple lines in a single call and
> then calls packet_flush_gently(). Add packet_read_line_gently() to
> enable reading a line without dying on EOF.
> 
> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> ---
>  pkt-line.c | 31 +++++++++++++++++++++++++++++++
>  pkt-line.h | 11 +++++++++++
>  2 files changed, 42 insertions(+)
> 
> diff --git a/pkt-line.c b/pkt-line.c
> index d4b6bfe076..2788aa1af6 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -171,6 +171,25 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...)
>  	return status;
>  }
>  
> +int packet_writel(int fd, const char *line, ...)
The name packet_writel is hard to distinguish from packet_write.
Would packet_write_lines make more sense ?

> +{
> +	va_list args;
> +	int err;
> +	va_start(args, line);
> +	for (;;) {
> +		if (!line)
> +			break;
> +		if (strlen(line) > LARGE_PACKET_DATA_MAX)
> +			return -1;
> +		err = packet_write_fmt_gently(fd, "%s\n", line);
> +		if (err)
> +			return err;
> +		line = va_arg(args, const char*);
> +	}
> +	va_end(args);
> +	return packet_flush_gently(fd);
> +}
> +
I don't think that this va_start() is needed, even if it works.

int packet_write_line(int fd, const char *lines[])
|
	const char *line = *lines;
	int err;
	while (line) {
		if (strlen(line) > LARGE_PACKET_DATA_MAX)
			return -1;
		err = packet_write_fmt_gently(fd, "%s\n", line);
		if (err)
			return err;
		lines++;
		line = *lines;
	}
	return packet_flush_gently(fd);
]



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/2] sub-process: refactor the filter process code into a reusable module
  2017-03-24 15:27 ` [PATCH v2 2/2] sub-process: refactor the filter process code into a reusable module Ben Peart
@ 2017-03-27 18:59   ` Jonathan Tan
  2017-03-27 23:54     ` Ben Peart
  0 siblings, 1 reply; 15+ messages in thread
From: Jonathan Tan @ 2017-03-27 18:59 UTC (permalink / raw)
  To: Ben Peart, git; +Cc: benpeart

On 03/24/2017 08:27 AM, Ben Peart wrote:
> Refactor the filter.<driver>.process code into a separate sub-process
> module that can be used to reduce the cost of starting up a sub-process
> for multiple commands.  It does this by keeping the external process
> running and processing all commands by communicating over standard input
> and standard output using the packet format (pkt-line) based protocol.
> Full documentation is in Documentation/technical/api-sub-process.txt.

Thanks - this looks like something useful to have.

When you create a "struct subprocess_entry" to be entered into the 
system, it is not a true "struct subprocess_entry" - it is a "struct 
subprocess_entry" plus some extra variables at the end. Since the 
sub-process hashmap is keyed solely on the command, what happens if 
another component uses the same trick (but with different extra 
variables) when using a sub-process with the same command?

I can think of at least two ways to solve this: (i) each component can 
have its own sub-process hashmap, or (ii) add a component key to the 
hashmap. (i) seems more elegant to me, but I'm not sure what the code 
will look like.

Also, I saw some minor code improvements possible (e.g. using 
"starts_with" when you're checking for the "status=<foo>" line) but I'll 
comment on those and look into the code more thoroughly once the 
questions in this e-mail are resolved.

> diff --git a/sub-process.h b/sub-process.h
> new file mode 100644
> index 0000000000..d1492f476d
> --- /dev/null
> +++ b/sub-process.h
> @@ -0,0 +1,46 @@
> +#ifndef SUBPROCESS_H
> +#define SUBPROCESS_H
> +
> +#include "git-compat-util.h"
> +#include "hashmap.h"
> +#include "run-command.h"
> +
> +/*
> + * Generic implementation of background process infrastructure.
> + * See Documentation/technical/api-background-process.txt.
> + */
> +
> + /* data structures */
> +
> +struct subprocess_entry {
> +	struct hashmap_entry ent; /* must be the first member! */
> +	struct child_process process;
> +	const char *cmd;
> +};

I notice from the documentation (and from "subprocess_get_child_process" 
below) that this is meant to be opaque, but I think this can be 
non-opaque (like "run-command").

Also, I would prefer adding a "util" pointer here instead of using it as 
an embedded struct. There is no clue here that it is embeddable or meant 
to be embedded.

> +
> +/* subprocess functions */
> +
> +typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
> +int subprocess_start(struct subprocess_entry *entry, const char *cmd,
> +		subprocess_start_fn startfn);

I'm not sure if it is useful to take a callback here - I think the 
caller of this function can just run whatever it wants after a 
successful subprocess_start.

Alternatively, if you add the "util" pointer (as I described above), 
then it makes sense to add a subprocess_get_or_start() function (and now 
it makes sense to take the callback). This way, the data structure will 
own, create, and destroy all the "struct subprocess_entry" that it 
needs, creating a nice separation of concerns.

> +
> +void subprocess_stop(struct subprocess_entry *entry);

(continued from above) And it would be clear that this would free 
"entry", for example.

> +
> +struct subprocess_entry *subprocess_find_entry(const char *cmd);
> +
> +/* subprocess helper functions */
> +
> +static inline struct child_process *subprocess_get_child_process(
> +		struct subprocess_entry *entry)
> +{
> +	return &entry->process;
> +}
> +
> +/*
> + * Helper function that will read packets looking for "status=<foo>"
> + * key/value pairs and return the value from the last "status" packet
> + */
> +
> +int subprocess_read_status(int fd, struct strbuf *status);
> +
> +#endif
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v2 1/2] pkt-line: add packet_writel() and packet_read_line_gently()
  2017-03-25  5:47   ` Torsten Bögershausen
@ 2017-03-27 22:19     ` Ben Peart
  2017-03-30 14:04       ` Torsten Bögershausen
  0 siblings, 1 reply; 15+ messages in thread
From: Ben Peart @ 2017-03-27 22:19 UTC (permalink / raw)
  To: Torsten Bögershausen, Ben Peart, git@vger.kernel.org
  Cc: christian.couder@gmail.com, larsxschneider@gmail.com

> -----Original Message-----
> From: Torsten Bögershausen [mailto:tboegi@web.de]
> Sent: Saturday, March 25, 2017 1:47 AM
> To: Ben Peart <peartben@gmail.com>; git@vger.kernel.org
> Cc: Ben Peart <Ben.Peart@microsoft.com>; christian.couder@gmail.com;
> larsxschneider@gmail.com
> Subject: Re: [PATCH v2 1/2] pkt-line: add packet_writel() and
> packet_read_line_gently()
> 
> On 2017-03-24 16:27, Ben Peart wrote:
> > Add packet_writel() which writes multiple lines in a single call and
> > then calls packet_flush_gently(). Add packet_read_line_gently() to
> > enable reading a line without dying on EOF.
> >
> > Signed-off-by: Ben Peart <benpeart@microsoft.com>
> > ---
> >  pkt-line.c | 31 +++++++++++++++++++++++++++++++  pkt-line.h | 11
> > +++++++++++
> >  2 files changed, 42 insertions(+)
> >
> > diff --git a/pkt-line.c b/pkt-line.c
> > index d4b6bfe076..2788aa1af6 100644
> > --- a/pkt-line.c
> > +++ b/pkt-line.c
> > @@ -171,6 +171,25 @@ int packet_write_fmt_gently(int fd, const char
> *fmt, ...)
> >  	return status;
> >  }
> >
> > +int packet_writel(int fd, const char *line, ...)
> The name packet_writel is hard to distinguish from packet_write.
> Would packet_write_lines make more sense ?
> 

Just goes to prove that there are two hard things in computer science: cache invalidation, naming things, and off-by-one errors. :)  The feedback on V1 was:

"I am hesitant to take a function that does not take any "list"
type argument and yet calls itself "write_list".  IOW, I'd expect something like these

	write_list(int fd, const char **lines);
	write_list(int fd, struct string_list *lines);

given that name, and not "varargs, each of which is a line".  I am tempted to suggest

	packet_writel(int fd, const char *line, ...);"

> > +{
> > +	va_list args;
> > +	int err;
> > +	va_start(args, line);
> > +	for (;;) {
> > +		if (!line)
> > +			break;
> > +		if (strlen(line) > LARGE_PACKET_DATA_MAX)
> > +			return -1;
> > +		err = packet_write_fmt_gently(fd, "%s\n", line);
> > +		if (err)
> > +			return err;
> > +		line = va_arg(args, const char*);
> > +	}
> > +	va_end(args);
> > +	return packet_flush_gently(fd);
> > +}
> > +
> I don't think that this va_start() is needed, even if it works.
> 
> int packet_write_line(int fd, const char *lines[])
> |
> 	const char *line = *lines;
> 	int err;
> 	while (line) {
> 		if (strlen(line) > LARGE_PACKET_DATA_MAX)
> 			return -1;
> 		err = packet_write_fmt_gently(fd, "%s\n", line);
> 		if (err)
> 			return err;
> 		lines++;
> 		line = *lines;
> 	}
> 	return packet_flush_gently(fd);
> ]
> 

This is a helper function to simply the common pattern of writing out a variable number of lines followed by a flush.  The usage of the function as currently implemented is:

packet_writel(fd, "line one", "line two", "line n");

which requires the use of variable number of arguments.  With your proposal that convenience is lost as you have to create an array of strings and pass that instead.  The usage just isn't as simple as the current model.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v2 2/2] sub-process: refactor the filter process code into a reusable module
  2017-03-27 18:59   ` Jonathan Tan
@ 2017-03-27 23:54     ` Ben Peart
  0 siblings, 0 replies; 15+ messages in thread
From: Ben Peart @ 2017-03-27 23:54 UTC (permalink / raw)
  To: Jonathan Tan, Ben Peart, git@vger.kernel.org

> -----Original Message-----
> From: Jonathan Tan [mailto:jonathantanmy@google.com]
> Sent: Monday, March 27, 2017 3:00 PM
> To: Ben Peart <peartben@gmail.com>; git@vger.kernel.org
> Cc: Ben Peart <Ben.Peart@microsoft.com>
> Subject: Re: [PATCH v2 2/2] sub-process: refactor the filter process code into
> a reusable module
> 
> On 03/24/2017 08:27 AM, Ben Peart wrote:
> > Refactor the filter.<driver>.process code into a separate sub-process
> > module that can be used to reduce the cost of starting up a
> > sub-process for multiple commands.  It does this by keeping the
> > external process running and processing all commands by communicating
> > over standard input and standard output using the packet format (pkt-line)
> based protocol.
> > Full documentation is in Documentation/technical/api-sub-process.txt.
> 
> Thanks - this looks like something useful to have.

Thanks for the review and feedback.

> 
> When you create a "struct subprocess_entry" to be entered into the system,
> it is not a true "struct subprocess_entry" - it is a "struct subprocess_entry"
> plus some extra variables at the end. Since the sub-process hashmap is
> keyed solely on the command, what happens if another component uses the
> same trick (but with different extra
> variables) when using a sub-process with the same command?

Having the command be the unique key is sufficient because it gets executed as a process by run_command and there can't be multiple different processes by the same name. 

> 
> I can think of at least two ways to solve this: (i) each component can have its
> own sub-process hashmap, or (ii) add a component key to the hashmap. (i)
> seems more elegant to me, but I'm not sure what the code will look like.
> 
> Also, I saw some minor code improvements possible (e.g. using "starts_with"
> when you're checking for the "status=<foo>" line) but I'll comment on those
> and look into the code more thoroughly once the questions in this e-mail are
> resolved.
> 
> > diff --git a/sub-process.h b/sub-process.h new file mode 100644 index
> > 0000000000..d1492f476d
> > --- /dev/null
> > +++ b/sub-process.h
> > @@ -0,0 +1,46 @@
> > +#ifndef SUBPROCESS_H
> > +#define SUBPROCESS_H
> > +
> > +#include "git-compat-util.h"
> > +#include "hashmap.h"
> > +#include "run-command.h"
> > +
> > +/*
> > + * Generic implementation of background process infrastructure.
> > + * See Documentation/technical/api-background-process.txt.
> > + */
> > +
> > + /* data structures */
> > +
> > +struct subprocess_entry {
> > +	struct hashmap_entry ent; /* must be the first member! */
> > +	struct child_process process;
> > +	const char *cmd;
> > +};
> 
> I notice from the documentation (and from "subprocess_get_child_process"
> below) that this is meant to be opaque, but I think this can be non-opaque
> (like "run-command").
> 
> Also, I would prefer adding a "util" pointer here instead of using it as an
> embedded struct. There is no clue here that it is embeddable or meant to be
> embedded.
> 

The structure is intentionally opaque to provide the benefits of encapsulation.  Obviously, the "C" language doesn't provide any enforcement of that design principal but we do what we can.  

The embedded struct is following the same design pattern as elsewhere in git (hashmap for example) simply for consistency.

> > +
> > +/* subprocess functions */
> > +
> > +typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
> > +int subprocess_start(struct subprocess_entry *entry, const char *cmd,
> > +		subprocess_start_fn startfn);
> 
> I'm not sure if it is useful to take a callback here - I think the caller of this
> function can just run whatever it wants after a successful subprocess_start.

The purpose of doing the subprocess specific initialization via a callback is so that if it encounters an error (for example, it can't negotiate a common interface version) the subprocess_start function can detect that and ensure the hashmap does not contain the invalid/unusable subprocess. 

> 
> Alternatively, if you add the "util" pointer (as I described above), then it
> makes sense to add a subprocess_get_or_start() function (and now it makes
> sense to take the callback). This way, the data structure will own, create, and
> destroy all the "struct subprocess_entry" that it needs, creating a nice
> separation of concerns.
> 
> > +
> > +void subprocess_stop(struct subprocess_entry *entry);
> 
> (continued from above) And it would be clear that this would free
> "entry", for example.
> 
> > +
> > +struct subprocess_entry *subprocess_find_entry(const char *cmd);
> > +
> > +/* subprocess helper functions */
> > +
> > +static inline struct child_process *subprocess_get_child_process(
> > +		struct subprocess_entry *entry)
> > +{
> > +	return &entry->process;
> > +}
> > +
> > +/*
> > + * Helper function that will read packets looking for "status=<foo>"
> > + * key/value pairs and return the value from the last "status" packet
> > + */
> > +
> > +int subprocess_read_status(int fd, struct strbuf *status);
> > +
> > +#endif
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] pkt-line: add packet_writel() and packet_read_line_gently()
  2017-03-27 22:19     ` Ben Peart
@ 2017-03-30 14:04       ` Torsten Bögershausen
  2017-03-30 16:01         ` Ben Peart
  0 siblings, 1 reply; 15+ messages in thread
From: Torsten Bögershausen @ 2017-03-30 14:04 UTC (permalink / raw)
  To: Ben Peart, Ben Peart, git@vger.kernel.org
  Cc: christian.couder@gmail.com, larsxschneider@gmail.com


[snip]
>> Would packet_write_lines make more sense ?
>>
>
> Just goes to prove that there are two hard things in computer science: cache invalidation, naming things, and off-by-one errors. :)  The feedback on V1 was:
>
> "I am hesitant to take a function that does not take any "list"
> type argument and yet calls itself "write_list".  IOW, I'd expect something like these
>
> 	write_list(int fd, const char **lines);
> 	write_list(int fd, struct string_list *lines);
>
> given that name, and not "varargs, each of which is a line".  I am tempted to suggest
>
> 	packet_writel(int fd, const char *line, ...);"
>
>>> +{
>>> +	va_list args;
>>> +	int err;
>>> +	va_start(args, line);
>>> +	for (;;) {
>>> +		if (!line)
>>> +			break;
>>> +		if (strlen(line) > LARGE_PACKET_DATA_MAX)
>>> +			return -1;
>>> +		err = packet_write_fmt_gently(fd, "%s\n", line);
>>> +		if (err)
>>> +			return err;
>>> +		line = va_arg(args, const char*);
>>> +	}
>>> +	va_end(args);
>>> +	return packet_flush_gently(fd);
>>> +}
>>> +
>> I don't think that this va_start() is needed, even if it works.
>>
>> int packet_write_line(int fd, const char *lines[])
>> |
>> 	const char *line = *lines;
>> 	int err;
>> 	while (line) {
>> 		if (strlen(line) > LARGE_PACKET_DATA_MAX)
>> 			return -1;
>> 		err = packet_write_fmt_gently(fd, "%s\n", line);
>> 		if (err)
>> 			return err;
>> 		lines++;
>> 		line = *lines;
>> 	}
>> 	return packet_flush_gently(fd);
>> ]
>>
>
> This is a helper function to simply the common pattern of writing out a variable number of lines followed by a flush.  The usage of the function as currently implemented is:
>
> packet_writel(fd, "line one", "line two", "line n");


Does this work ?
I would have expected
packet_writel(fd, "line one", "line two", "line n"), NULL;

>
> which requires the use of variable number of arguments.  With your proposal that convenience is lost as you have to create an array of strings and pass that instead.  The usage just isn't as simple as the current model.
>
What is wrong with

int packet_write_fmt_gently(int fd, const char *fmt, ...)
and we use it like this:
if packet_write_fmt_gently(fd, "%s%s%s", "line one", "line two", "line n")



(Or do we need another one like)
int packet_write_fmt_gently_flush(int fd, const char *fmt, ...)

Sorry if I am lost here

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v2 1/2] pkt-line: add packet_writel() and packet_read_line_gently()
  2017-03-30 14:04       ` Torsten Bögershausen
@ 2017-03-30 16:01         ` Ben Peart
  2017-03-30 17:01           ` Torsten Bögershausen
  0 siblings, 1 reply; 15+ messages in thread
From: Ben Peart @ 2017-03-30 16:01 UTC (permalink / raw)
  To: Torsten Bögershausen, Ben Peart, git@vger.kernel.org
  Cc: christian.couder@gmail.com, larsxschneider@gmail.com

> From: Torsten Bögershausen [mailto:tboegi@web.de]
> 
> 
> Does this work ?
> I would have expected
> packet_writel(fd, "line one", "line two", "line n"), NULL;
> 

No, that's actually not valid C syntax.

> >
> > which requires the use of variable number of arguments.  With your
> proposal that convenience is lost as you have to create an array of strings
> and pass that instead.  The usage just isn't as simple as the current model.
> >
> What is wrong with
> 
> int packet_write_fmt_gently(int fd, const char *fmt, ...) and we use it like
> this:
> if packet_write_fmt_gently(fd, "%s%s%s", "line one", "line two", "line n")
> 

Packets are not just strings; see pkt-line.h for more details but basically they are a length packet, followed by the data (in this particular case a string).  The packet_writel function is a convenience function to write out a variable number of packetized strings followed by a flush packet.  You're sample above would simply concatenate the three strings and then write a single packet.  A very different outcome. :)

> 
> 
> (Or do we need another one like)
> int packet_write_fmt_gently_flush(int fd, const char *fmt, ...)
> 
> Sorry if I am lost here

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] pkt-line: add packet_writel() and packet_read_line_gently()
  2017-03-30 16:01         ` Ben Peart
@ 2017-03-30 17:01           ` Torsten Bögershausen
  0 siblings, 0 replies; 15+ messages in thread
From: Torsten Bögershausen @ 2017-03-30 17:01 UTC (permalink / raw)
  To: Ben Peart, Ben Peart, git@vger.kernel.org
  Cc: christian.couder@gmail.com, larsxschneider@gmail.com

On 30.03.17 18:01, Ben Peart wrote:
>> From: Torsten Bögershausen [mailto:tboegi@web.de]
>>
>>
>> Does this work ?
>> I would have expected
>> packet_writel(fd, "line one", "line two", "line n"), NULL; 
Typo.
Should have been:
packet_writel(fd, "line one", "line two", "line n", NULL);
>>
> 
> No, that's actually not valid C syntax.
> 
>>>
>>> which requires the use of variable number of arguments.  With your
>> proposal that convenience is lost as you have to create an array of strings
>> and pass that instead.  The usage just isn't as simple as the current model.
>>>
>> What is wrong with
>>
>> int packet_write_fmt_gently(int fd, const char *fmt, ...) and we use it like
>> this:
>> if packet_write_fmt_gently(fd, "%s%s%s", "line one", "line two", "line n")
>>
> 
> Packets are not just strings; see pkt-line.h for more details-
>  but basically they are a length packet, followed by the data (in this particular case a string). 
> The packet_writel function is a convenience function to write out a variable number of
> packetized strings followed by a flush packet.
> You're sample above would simply concatenate the three strings and then write a single packet. 
> A very different outcome. :)

Got it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-03-30 17:01 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-24 15:27 [PATCH v2 0/2] Refactor the filter process code into a reusable module Ben Peart
2017-03-24 15:27 ` [PATCH v2 1/2] pkt-line: add packet_writel() and packet_read_line_gently() Ben Peart
2017-03-25  5:47   ` Torsten Bögershausen
2017-03-27 22:19     ` Ben Peart
2017-03-30 14:04       ` Torsten Bögershausen
2017-03-30 16:01         ` Ben Peart
2017-03-30 17:01           ` Torsten Bögershausen
2017-03-24 15:27 ` [PATCH v2 1/4] t7800: remove whitespace before redirect Ben Peart
2017-03-24 16:21   ` Ben Peart
2017-03-24 15:27 ` [PATCH v2 2/2] sub-process: refactor the filter process code into a reusable module Ben Peart
2017-03-27 18:59   ` Jonathan Tan
2017-03-27 23:54     ` Ben Peart
2017-03-24 15:27 ` [PATCH v2 2/4] t7800: cleanup cruft left behind by tests Ben Peart
2017-03-24 15:27 ` [PATCH v2 3/4] difftool: handle modified symlinks in dir-diff mode Ben Peart
2017-03-24 15:27 ` [PATCH v2 4/4] Git 2.12.1 Ben Peart

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).