git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* [PATCH v5 0/8] refactor the filter process code into a reusable module
@ 2017-04-07 12:03 Ben Peart
  2017-04-07 12:03 ` [PATCH v5 1/8] pkt-line: add packet_read_line_gently() Ben Peart
                   ` (7 more replies)
  0 siblings, 8 replies; 27+ messages in thread
From: Ben Peart @ 2017-04-07 12:03 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, christian.couder, larsxschneider

Hopefully final spin.  Updates patch 6 to rename additional instances
of "filter" in comment and error messages.

Refactor the filter.<driver>.process code into a separate sub-process
module that can be used to reduce the cost of starting up a sub-process
for multiple commands.  It does this by keeping the external process
running and processing all commands by communicating over standard input
and standard output using the packet format (pkt-line) based protocol.
Full documentation is in Documentation/technical/api-sub-process.txt.

This code is refactored from:

	Commit edcc85814c ("convert: add filter.<driver>.process option", 2016-10-16)
	keeps the external process running and processes all commands

Ben Peart (8):
  pkt-line: add packet_read_line_gently()
  convert: move packet_write_list() into pkt-line as packet_writel()
  convert: Split start_multi_file_filter into two separate functions
  convert: Separate generic structures and variables from the filter
    specific ones
  convert: Update generic functions to only use generic data structures
  convert: rename reusable sub-process functions
  sub-process: move sub-process functions into separate files
  convert: Update subprocess_read_status to not die on EOF

 Documentation/technical/api-sub-process.txt |  54 ++++++++++
 Makefile                                    |   1 +
 convert.c                                   | 159 +++++-----------------------
 pkt-line.c                                  |  31 ++++++
 pkt-line.h                                  |  11 ++
 sub-process.c                               | 120 +++++++++++++++++++++
 sub-process.h                               |  46 ++++++++
 7 files changed, 292 insertions(+), 130 deletions(-)
 create mode 100644 Documentation/technical/api-sub-process.txt
 create mode 100644 sub-process.c
 create mode 100644 sub-process.h

-- 
2.12.0.windows.1.31.g1548525701.dirty


^ permalink raw reply	[flat|threaded] 27+ messages in thread

* [PATCH v5 1/8] pkt-line: add packet_read_line_gently()
  2017-04-07 12:03 [PATCH v5 0/8] refactor the filter process code into a reusable module Ben Peart
@ 2017-04-07 12:03 ` Ben Peart
  2017-04-09 19:34   ` Lars Schneider
  2017-04-07 12:03 ` [PATCH v5 2/8] convert: move packet_write_list() into pkt-line as packet_writel() Ben Peart
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Ben Peart @ 2017-04-07 12:03 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, christian.couder, larsxschneider

Add packet_read_line_gently() to enable reading a line without dying on
EOF.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 pkt-line.c | 12 ++++++++++++
 pkt-line.h | 10 ++++++++++
 2 files changed, 22 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index d4b6bfe076..58842544b4 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -323,6 +323,18 @@ char *packet_read_line(int fd, int *len_p)
 	return packet_read_line_generic(fd, NULL, NULL, len_p);
 }
 
+int packet_read_line_gently(int fd, int *dst_len, char** dst_line)
+{
+	int len = packet_read(fd, NULL, NULL,
+		packet_buffer, sizeof(packet_buffer),
+		PACKET_READ_CHOMP_NEWLINE|PACKET_READ_GENTLE_ON_EOF);
+	if (dst_len)
+		*dst_len = len;
+	if (dst_line)
+		*dst_line = (len > 0) ? packet_buffer : NULL;
+	return len;
+}
+
 char *packet_read_line_buf(char **src, size_t *src_len, int *dst_len)
 {
 	return packet_read_line_generic(-1, src, src_len, dst_len);
diff --git a/pkt-line.h b/pkt-line.h
index 18eac64830..12b18991f6 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -74,6 +74,16 @@ int packet_read(int fd, char **src_buffer, size_t *src_len, char
 char *packet_read_line(int fd, int *size);
 
 /*
+ * Convenience wrapper for packet_read that sets the PACKET_READ_GENTLE_ON_EOF
+ * and CHOMP_NEWLINE options. The return value specifies the number of bytes
+ * read into the buffer or -1 on truncated input. if the *dst_line parameter
+ * is not NULL it will return NULL for a flush packet and otherwise points to
+ * a static buffer (that may be overwritten by subsequent calls). If the size
+ * parameter is not NULL, the length of the packet is written to it.
+ */
+int packet_read_line_gently(int fd, int *size, char** dst_line);
+
+/*
  * Same as packet_read_line, but read from a buf rather than a descriptor;
  * see packet_read for details on how src_* is used.
  */
-- 
2.12.0.windows.1.31.g1548525701.dirty


^ permalink raw reply	[flat|threaded] 27+ messages in thread

* [PATCH v5 2/8] convert: move packet_write_list() into pkt-line as packet_writel()
  2017-04-07 12:03 [PATCH v5 0/8] refactor the filter process code into a reusable module Ben Peart
  2017-04-07 12:03 ` [PATCH v5 1/8] pkt-line: add packet_read_line_gently() Ben Peart
@ 2017-04-07 12:03 ` Ben Peart
  2017-04-09 19:43   ` Lars Schneider
  2017-04-07 12:03 ` [PATCH v5 3/8] convert: Split start_multi_file_filter into two separate functions Ben Peart
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Ben Peart @ 2017-04-07 12:03 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, christian.couder, larsxschneider

Add packet_writel() which writes multiple lines in a single call and
then calls packet_flush_gently(). Update convert.c to use the new
packet_writel() function from pkt-line.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 convert.c  | 23 ++---------------------
 pkt-line.c | 19 +++++++++++++++++++
 pkt-line.h |  1 +
 3 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/convert.c b/convert.c
index 8d652bf27c..793c29ebfd 100644
--- a/convert.c
+++ b/convert.c
@@ -521,25 +521,6 @@ static struct cmd2process *find_multi_file_filter_entry(struct hashmap *hashmap,
 	return hashmap_get(hashmap, &key, NULL);
 }
 
-static int packet_write_list(int fd, const char *line, ...)
-{
-	va_list args;
-	int err;
-	va_start(args, line);
-	for (;;) {
-		if (!line)
-			break;
-		if (strlen(line) > LARGE_PACKET_DATA_MAX)
-			return -1;
-		err = packet_write_fmt_gently(fd, "%s\n", line);
-		if (err)
-			return err;
-		line = va_arg(args, const char*);
-	}
-	va_end(args);
-	return packet_flush_gently(fd);
-}
-
 static void read_multi_file_filter_status(int fd, struct strbuf *status)
 {
 	struct strbuf **pair;
@@ -616,7 +597,7 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
 
 	sigchain_push(SIGPIPE, SIG_IGN);
 
-	err = packet_write_list(process->in, "git-filter-client", "version=2", NULL);
+	err = packet_writel(process->in, "git-filter-client", "version=2", NULL);
 	if (err)
 		goto done;
 
@@ -632,7 +613,7 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
 	if (err)
 		goto done;
 
-	err = packet_write_list(process->in, "capability=clean", "capability=smudge", NULL);
+	err = packet_writel(process->in, "capability=clean", "capability=smudge", NULL);
 
 	for (;;) {
 		cap_buf = packet_read_line(process->out, NULL);
diff --git a/pkt-line.c b/pkt-line.c
index 58842544b4..2788aa1af6 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -171,6 +171,25 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...)
 	return status;
 }
 
+int packet_writel(int fd, const char *line, ...)
+{
+	va_list args;
+	int err;
+	va_start(args, line);
+	for (;;) {
+		if (!line)
+			break;
+		if (strlen(line) > LARGE_PACKET_DATA_MAX)
+			return -1;
+		err = packet_write_fmt_gently(fd, "%s\n", line);
+		if (err)
+			return err;
+		line = va_arg(args, const char*);
+	}
+	va_end(args);
+	return packet_flush_gently(fd);
+}
+
 static int packet_write_gently(const int fd_out, const char *buf, size_t size)
 {
 	static char packet_write_buffer[LARGE_PACKET_MAX];
diff --git a/pkt-line.h b/pkt-line.h
index 12b18991f6..cb3eda9695 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -25,6 +25,7 @@ void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 int packet_flush_gently(int fd);
 int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
+int packet_writel(int fd, const char *line, ...);
 int write_packetized_from_fd(int fd_in, int fd_out);
 int write_packetized_from_buf(const char *src_in, size_t len, int fd_out);
 
-- 
2.12.0.windows.1.31.g1548525701.dirty


^ permalink raw reply	[flat|threaded] 27+ messages in thread

* [PATCH v5 3/8] convert: Split start_multi_file_filter into two separate functions
  2017-04-07 12:03 [PATCH v5 0/8] refactor the filter process code into a reusable module Ben Peart
  2017-04-07 12:03 ` [PATCH v5 1/8] pkt-line: add packet_read_line_gently() Ben Peart
  2017-04-07 12:03 ` [PATCH v5 2/8] convert: move packet_write_list() into pkt-line as packet_writel() Ben Peart
@ 2017-04-07 12:03 ` Ben Peart
  2017-04-09 19:56   ` Lars Schneider
  2017-04-11 16:16   ` Jeff King
  2017-04-07 12:03 ` [PATCH v5 4/8] convert: Separate generic structures and variables from the filter specific ones Ben Peart
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 27+ messages in thread
From: Ben Peart @ 2017-04-07 12:03 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, christian.couder, larsxschneider

To enable future reuse of the filter.<driver>.process infrastructure,
split start_multi_file_filter into two separate parts.

start_multi_file_filter will now only contain the generic logic to
manage the creation and tracking of the child process in a hashmap.

start_multi_file_filter_fn is a protocol specific initialization
function that will negotiate the multi-file-filter interface version
and capabilities.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 convert.c | 63 ++++++++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 38 insertions(+), 25 deletions(-)

diff --git a/convert.c b/convert.c
index 793c29ebfd..404757eac9 100644
--- a/convert.c
+++ b/convert.c
@@ -565,35 +565,14 @@ static void stop_multi_file_filter(struct child_process *process)
 	finish_command(process);
 }
 
-static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, const char *cmd)
+static int start_multi_file_filter_fn(struct cmd2process *entry)
 {
 	int err;
-	struct cmd2process *entry;
-	struct child_process *process;
-	const char *argv[] = { cmd, NULL };
 	struct string_list cap_list = STRING_LIST_INIT_NODUP;
 	char *cap_buf;
 	const char *cap_name;
-
-	entry = xmalloc(sizeof(*entry));
-	entry->cmd = cmd;
-	entry->supported_capabilities = 0;
-	process = &entry->process;
-
-	child_process_init(process);
-	process->argv = argv;
-	process->use_shell = 1;
-	process->in = -1;
-	process->out = -1;
-	process->clean_on_exit = 1;
-	process->clean_on_exit_handler = stop_multi_file_filter;
-
-	if (start_command(process)) {
-		error("cannot fork to run external filter '%s'", cmd);
-		return NULL;
-	}
-
-	hashmap_entry_init(entry, strhash(cmd));
+	struct child_process *process = &entry->process;
+	const char *cmd = entry->cmd;
 
 	sigchain_push(SIGPIPE, SIG_IGN);
 
@@ -642,7 +621,41 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
 done:
 	sigchain_pop(SIGPIPE);
 
-	if (err || errno == EPIPE) {
+	if (err || errno == EPIPE)
+		err = err ? err : errno;
+
+	return err;
+}
+
+static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, const char *cmd)
+{
+	int err;
+	struct cmd2process *entry;
+	struct child_process *process;
+	const char *argv[] = { cmd, NULL };
+
+	entry = xmalloc(sizeof(*entry));
+	entry->cmd = cmd;
+	entry->supported_capabilities = 0;
+	process = &entry->process;
+
+	child_process_init(process);
+	process->argv = argv;
+	process->use_shell = 1;
+	process->in = -1;
+	process->out = -1;
+	process->clean_on_exit = 1;
+	process->clean_on_exit_handler = stop_multi_file_filter;
+
+	if (start_command(process)) {
+		error("cannot fork to run external filter '%s'", cmd);
+		return NULL;
+	}
+
+	hashmap_entry_init(entry, strhash(cmd));
+
+	err = start_multi_file_filter_fn(entry);
+	if (err) {
 		error("initialization for external filter '%s' failed", cmd);
 		kill_multi_file_filter(hashmap, entry);
 		return NULL;
-- 
2.12.0.windows.1.31.g1548525701.dirty


^ permalink raw reply	[flat|threaded] 27+ messages in thread

* [PATCH v5 4/8] convert: Separate generic structures and variables from the filter specific ones
  2017-04-07 12:03 [PATCH v5 0/8] refactor the filter process code into a reusable module Ben Peart
                   ` (2 preceding siblings ...)
  2017-04-07 12:03 ` [PATCH v5 3/8] convert: Split start_multi_file_filter into two separate functions Ben Peart
@ 2017-04-07 12:03 ` Ben Peart
  2017-04-10 10:18   ` Lars Schneider
  2017-04-07 12:03 ` [PATCH v5 5/8] convert: Update generic functions to only use generic data structures Ben Peart
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Ben Peart @ 2017-04-07 12:03 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, christian.couder, larsxschneider

To enable future reuse of the filter.<driver>.process infrastructure,
split the cmd2process structure into two separate parts.

subprocess_entry will now contain the generic data required to manage
the creation and tracking of the child process in a hashmap. Also move
all knowledge of the hashmap into the generic functions.

cmd2process is a filter protocol specific structure that is used to
track the negotiated capabilities of the filter.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 convert.c | 57 +++++++++++++++++++++++++++++++--------------------------
 1 file changed, 31 insertions(+), 26 deletions(-)

diff --git a/convert.c b/convert.c
index 404757eac9..f569026511 100644
--- a/convert.c
+++ b/convert.c
@@ -496,29 +496,40 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le
 #define CAP_CLEAN    (1u<<0)
 #define CAP_SMUDGE   (1u<<1)
 
-struct cmd2process {
+struct subprocess_entry {
 	struct hashmap_entry ent; /* must be the first member! */
-	unsigned int supported_capabilities;
 	const char *cmd;
 	struct child_process process;
 };
 
+struct cmd2process {
+	struct subprocess_entry subprocess; /* must be the first member! */
+	unsigned int supported_capabilities;
+};
+
 static int cmd_process_map_initialized;
 static struct hashmap cmd_process_map;
 
-static int cmd2process_cmp(const struct cmd2process *e1,
-			   const struct cmd2process *e2,
+static int cmd2process_cmp(const struct subprocess_entry *e1,
+			   const struct subprocess_entry *e2,
 			   const void *unused)
 {
 	return strcmp(e1->cmd, e2->cmd);
 }
 
-static struct cmd2process *find_multi_file_filter_entry(struct hashmap *hashmap, const char *cmd)
+static struct subprocess_entry *find_multi_file_filter_entry(const char *cmd)
 {
-	struct cmd2process key;
+	struct subprocess_entry key;
+
+	if (!cmd_process_map_initialized) {
+		cmd_process_map_initialized = 1;
+		hashmap_init(&cmd_process_map, (hashmap_cmp_fn)cmd2process_cmp, 0);
+		return NULL;
+	}
+
 	hashmap_entry_init(&key, strhash(cmd));
 	key.cmd = cmd;
-	return hashmap_get(hashmap, &key, NULL);
+	return hashmap_get(&cmd_process_map, &key, NULL);
 }
 
 static void read_multi_file_filter_status(int fd, struct strbuf *status)
@@ -541,7 +552,7 @@ static void read_multi_file_filter_status(int fd, struct strbuf *status)
 	}
 }
 
-static void kill_multi_file_filter(struct hashmap *hashmap, struct cmd2process *entry)
+static void kill_multi_file_filter(struct subprocess_entry *entry)
 {
 	if (!entry)
 		return;
@@ -550,7 +561,7 @@ static void kill_multi_file_filter(struct hashmap *hashmap, struct cmd2process *
 	kill(entry->process.pid, SIGTERM);
 	finish_command(&entry->process);
 
-	hashmap_remove(hashmap, entry, NULL);
+	hashmap_remove(&cmd_process_map, entry, NULL);
 	free(entry);
 }
 
@@ -571,8 +582,8 @@ static int start_multi_file_filter_fn(struct cmd2process *entry)
 	struct string_list cap_list = STRING_LIST_INIT_NODUP;
 	char *cap_buf;
 	const char *cap_name;
-	struct child_process *process = &entry->process;
-	const char *cmd = entry->cmd;
+	struct child_process *process = &entry->subprocess.process;
+	const char *cmd = entry->subprocess.cmd;
 
 	sigchain_push(SIGPIPE, SIG_IGN);
 
@@ -627,7 +638,7 @@ static int start_multi_file_filter_fn(struct cmd2process *entry)
 	return err;
 }
 
-static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, const char *cmd)
+static struct cmd2process *start_multi_file_filter(const char *cmd)
 {
 	int err;
 	struct cmd2process *entry;
@@ -635,9 +646,9 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
 	const char *argv[] = { cmd, NULL };
 
 	entry = xmalloc(sizeof(*entry));
-	entry->cmd = cmd;
+	entry->subprocess.cmd = cmd;
 	entry->supported_capabilities = 0;
-	process = &entry->process;
+	process = &entry->subprocess.process;
 
 	child_process_init(process);
 	process->argv = argv;
@@ -657,11 +668,11 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
 	err = start_multi_file_filter_fn(entry);
 	if (err) {
 		error("initialization for external filter '%s' failed", cmd);
-		kill_multi_file_filter(hashmap, entry);
+		kill_multi_file_filter(&entry->subprocess);
 		return NULL;
 	}
 
-	hashmap_add(hashmap, entry);
+	hashmap_add(&cmd_process_map, entry);
 	return entry;
 }
 
@@ -676,22 +687,16 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	struct strbuf filter_status = STRBUF_INIT;
 	const char *filter_type;
 
-	if (!cmd_process_map_initialized) {
-		cmd_process_map_initialized = 1;
-		hashmap_init(&cmd_process_map, (hashmap_cmp_fn) cmd2process_cmp, 0);
-		entry = NULL;
-	} else {
-		entry = find_multi_file_filter_entry(&cmd_process_map, cmd);
-	}
+	entry = (struct cmd2process *)find_multi_file_filter_entry(cmd);
 
 	fflush(NULL);
 
 	if (!entry) {
-		entry = start_multi_file_filter(&cmd_process_map, cmd);
+		entry = start_multi_file_filter(cmd);
 		if (!entry)
 			return 0;
 	}
-	process = &entry->process;
+	process = &entry->subprocess.process;
 
 	if (!(wanted_capability & entry->supported_capabilities))
 		return 0;
@@ -762,7 +767,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 			 * Force shutdown and restart if another blob requires filtering.
 			 */
 			error("external filter '%s' failed", cmd);
-			kill_multi_file_filter(&cmd_process_map, entry);
+			kill_multi_file_filter(&entry->subprocess);
 		}
 	} else {
 		strbuf_swap(dst, &nbuf);
-- 
2.12.0.windows.1.31.g1548525701.dirty


^ permalink raw reply	[flat|threaded] 27+ messages in thread

* [PATCH v5 5/8] convert: Update generic functions to only use generic data structures
  2017-04-07 12:03 [PATCH v5 0/8] refactor the filter process code into a reusable module Ben Peart
                   ` (3 preceding siblings ...)
  2017-04-07 12:03 ` [PATCH v5 4/8] convert: Separate generic structures and variables from the filter specific ones Ben Peart
@ 2017-04-07 12:03 ` Ben Peart
  2017-04-10 12:05   ` Lars Schneider
  2017-04-07 12:03 ` [PATCH v5 6/8] convert: rename reusable sub-process functions Ben Peart
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Ben Peart @ 2017-04-07 12:03 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, christian.couder, larsxschneider

Update all functions that are going to be moved into a reusable module
so that they only work with the reusable data structures.  Move code
that is specific to the filter out into the filter specific functions.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 convert.c | 46 ++++++++++++++++++++++++++++------------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/convert.c b/convert.c
index f569026511..747c0c363b 100644
--- a/convert.c
+++ b/convert.c
@@ -576,14 +576,15 @@ static void stop_multi_file_filter(struct child_process *process)
 	finish_command(process);
 }
 
-static int start_multi_file_filter_fn(struct cmd2process *entry)
+static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
 {
 	int err;
+	struct cmd2process *entry = (struct cmd2process *)subprocess;
 	struct string_list cap_list = STRING_LIST_INIT_NODUP;
 	char *cap_buf;
 	const char *cap_name;
-	struct child_process *process = &entry->subprocess.process;
-	const char *cmd = entry->subprocess.cmd;
+	struct child_process *process = &subprocess->process;
+	const char *cmd = subprocess->cmd;
 
 	sigchain_push(SIGPIPE, SIG_IGN);
 
@@ -638,17 +639,21 @@ static int start_multi_file_filter_fn(struct cmd2process *entry)
 	return err;
 }
 
-static struct cmd2process *start_multi_file_filter(const char *cmd)
+typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
+int start_multi_file_filter(struct subprocess_entry *entry, const char *cmd,
+	subprocess_start_fn startfn)
 {
 	int err;
-	struct cmd2process *entry;
 	struct child_process *process;
 	const char *argv[] = { cmd, NULL };
 
-	entry = xmalloc(sizeof(*entry));
-	entry->subprocess.cmd = cmd;
-	entry->supported_capabilities = 0;
-	process = &entry->subprocess.process;
+	if (!cmd_process_map_initialized) {
+		cmd_process_map_initialized = 1;
+		hashmap_init(&cmd_process_map, (hashmap_cmp_fn)cmd2process_cmp, 0);
+	}
+
+	entry->cmd = cmd;
+	process = &entry->process;
 
 	child_process_init(process);
 	process->argv = argv;
@@ -658,22 +663,23 @@ static struct cmd2process *start_multi_file_filter(const char *cmd)
 	process->clean_on_exit = 1;
 	process->clean_on_exit_handler = stop_multi_file_filter;
 
-	if (start_command(process)) {
+	err = start_command(process);
+	if (err) {
 		error("cannot fork to run external filter '%s'", cmd);
-		return NULL;
+		return err;
 	}
 
 	hashmap_entry_init(entry, strhash(cmd));
 
-	err = start_multi_file_filter_fn(entry);
+	err = startfn(entry);
 	if (err) {
 		error("initialization for external filter '%s' failed", cmd);
-		kill_multi_file_filter(&entry->subprocess);
-		return NULL;
+		kill_multi_file_filter(entry);
+		return err;
 	}
 
 	hashmap_add(&cmd_process_map, entry);
-	return entry;
+	return 0;
 }
 
 static int apply_multi_file_filter(const char *path, const char *src, size_t len,
@@ -692,9 +698,13 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	fflush(NULL);
 
 	if (!entry) {
-		entry = start_multi_file_filter(cmd);
-		if (!entry)
+		entry = xmalloc(sizeof(*entry));
+		entry->supported_capabilities = 0;
+
+		if (start_multi_file_filter(&entry->subprocess, cmd, start_multi_file_filter_fn)) {
+			free(entry);
 			return 0;
+		}
 	}
 	process = &entry->subprocess.process;
 
@@ -767,7 +777,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 			 * Force shutdown and restart if another blob requires filtering.
 			 */
 			error("external filter '%s' failed", cmd);
-			kill_multi_file_filter(&entry->subprocess);
+			kill_multi_file_filter((struct subprocess_entry *)entry);
 		}
 	} else {
 		strbuf_swap(dst, &nbuf);
-- 
2.12.0.windows.1.31.g1548525701.dirty


^ permalink raw reply	[flat|threaded] 27+ messages in thread

* [PATCH v5 6/8] convert: rename reusable sub-process functions
  2017-04-07 12:03 [PATCH v5 0/8] refactor the filter process code into a reusable module Ben Peart
                   ` (4 preceding siblings ...)
  2017-04-07 12:03 ` [PATCH v5 5/8] convert: Update generic functions to only use generic data structures Ben Peart
@ 2017-04-07 12:03 ` Ben Peart
  2017-04-10 12:11   ` Lars Schneider
  2017-04-07 12:03 ` [PATCH v5 7/8] sub-process: move sub-process functions into separate files Ben Peart
  2017-04-07 12:03 ` [PATCH v5 8/8] convert: Update subprocess_read_status to not die on EOF Ben Peart
  7 siblings, 1 reply; 27+ messages in thread
From: Ben Peart @ 2017-04-07 12:03 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, christian.couder, larsxschneider

Do a mechanical rename of the functions that will become the reusable
sub-process module.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 convert.c | 53 +++++++++++++++++++++++++++--------------------------
 1 file changed, 27 insertions(+), 26 deletions(-)

diff --git a/convert.c b/convert.c
index 747c0c363b..235a6a5279 100644
--- a/convert.c
+++ b/convert.c
@@ -507,8 +507,8 @@ struct cmd2process {
 	unsigned int supported_capabilities;
 };
 
-static int cmd_process_map_initialized;
-static struct hashmap cmd_process_map;
+static int subprocess_map_initialized;
+static struct hashmap subprocess_map;
 
 static int cmd2process_cmp(const struct subprocess_entry *e1,
 			   const struct subprocess_entry *e2,
@@ -517,22 +517,22 @@ static int cmd2process_cmp(const struct subprocess_entry *e1,
 	return strcmp(e1->cmd, e2->cmd);
 }
 
-static struct subprocess_entry *find_multi_file_filter_entry(const char *cmd)
+static struct subprocess_entry *subprocess_find_entry(const char *cmd)
 {
 	struct subprocess_entry key;
 
-	if (!cmd_process_map_initialized) {
-		cmd_process_map_initialized = 1;
-		hashmap_init(&cmd_process_map, (hashmap_cmp_fn)cmd2process_cmp, 0);
+	if (!subprocess_map_initialized) {
+		subprocess_map_initialized = 1;
+		hashmap_init(&subprocess_map, (hashmap_cmp_fn)cmd2process_cmp, 0);
 		return NULL;
 	}
 
 	hashmap_entry_init(&key, strhash(cmd));
 	key.cmd = cmd;
-	return hashmap_get(&cmd_process_map, &key, NULL);
+	return hashmap_get(&subprocess_map, &key, NULL);
 }
 
-static void read_multi_file_filter_status(int fd, struct strbuf *status)
+static void subprocess_read_status(int fd, struct strbuf *status)
 {
 	struct strbuf **pair;
 	char *line;
@@ -552,7 +552,7 @@ static void read_multi_file_filter_status(int fd, struct strbuf *status)
 	}
 }
 
-static void kill_multi_file_filter(struct subprocess_entry *entry)
+static void subprocess_stop(struct subprocess_entry *entry)
 {
 	if (!entry)
 		return;
@@ -561,14 +561,14 @@ static void kill_multi_file_filter(struct subprocess_entry *entry)
 	kill(entry->process.pid, SIGTERM);
 	finish_command(&entry->process);
 
-	hashmap_remove(&cmd_process_map, entry, NULL);
+	hashmap_remove(&subprocess_map, entry, NULL);
 	free(entry);
 }
 
-static void stop_multi_file_filter(struct child_process *process)
+static void subprocess_exit_handler(struct child_process *process)
 {
 	sigchain_push(SIGPIPE, SIG_IGN);
-	/* Closing the pipe signals the filter to initiate a shutdown. */
+	/* Closing the pipe signals the subprocess to initiate a shutdown. */
 	close(process->in);
 	close(process->out);
 	sigchain_pop(SIGPIPE);
@@ -640,16 +640,16 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
 }
 
 typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
-int start_multi_file_filter(struct subprocess_entry *entry, const char *cmd,
+int subprocess_start(struct subprocess_entry *entry, const char *cmd,
 	subprocess_start_fn startfn)
 {
 	int err;
 	struct child_process *process;
 	const char *argv[] = { cmd, NULL };
 
-	if (!cmd_process_map_initialized) {
-		cmd_process_map_initialized = 1;
-		hashmap_init(&cmd_process_map, (hashmap_cmp_fn)cmd2process_cmp, 0);
+	if (!subprocess_map_initialized) {
+		subprocess_map_initialized = 1;
+		hashmap_init(&subprocess_map, (hashmap_cmp_fn)cmd2process_cmp, 0);
 	}
 
 	entry->cmd = cmd;
@@ -661,11 +661,11 @@ int start_multi_file_filter(struct subprocess_entry *entry, const char *cmd,
 	process->in = -1;
 	process->out = -1;
 	process->clean_on_exit = 1;
-	process->clean_on_exit_handler = stop_multi_file_filter;
+	process->clean_on_exit_handler = subprocess_exit_handler;
 
 	err = start_command(process);
 	if (err) {
-		error("cannot fork to run external filter '%s'", cmd);
+		error("cannot fork to run subprocess '%s'", cmd);
 		return err;
 	}
 
@@ -673,12 +673,12 @@ int start_multi_file_filter(struct subprocess_entry *entry, const char *cmd,
 
 	err = startfn(entry);
 	if (err) {
-		error("initialization for external filter '%s' failed", cmd);
-		kill_multi_file_filter(entry);
+		error("initialization for subprocess '%s' failed", cmd);
+		subprocess_stop(entry);
 		return err;
 	}
 
-	hashmap_add(&cmd_process_map, entry);
+	hashmap_add(&subprocess_map, entry);
 	return 0;
 }
 
@@ -693,7 +693,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	struct strbuf filter_status = STRBUF_INIT;
 	const char *filter_type;
 
-	entry = (struct cmd2process *)find_multi_file_filter_entry(cmd);
+	entry = (struct cmd2process *)subprocess_find_entry(cmd);
 
 	fflush(NULL);
 
@@ -701,7 +701,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 		entry = xmalloc(sizeof(*entry));
 		entry->supported_capabilities = 0;
 
-		if (start_multi_file_filter(&entry->subprocess, cmd, start_multi_file_filter_fn)) {
+		if (subprocess_start(&entry->subprocess, cmd, start_multi_file_filter_fn)) {
 			free(entry);
 			return 0;
 		}
@@ -746,7 +746,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	if (err)
 		goto done;
 
-	read_multi_file_filter_status(process->out, &filter_status);
+	subprocess_read_status(process->out, &filter_status);
 	err = strcmp(filter_status.buf, "success");
 	if (err)
 		goto done;
@@ -755,7 +755,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	if (err)
 		goto done;
 
-	read_multi_file_filter_status(process->out, &filter_status);
+	subprocess_read_status(process->out, &filter_status);
 	err = strcmp(filter_status.buf, "success");
 
 done:
@@ -777,7 +777,8 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 			 * Force shutdown and restart if another blob requires filtering.
 			 */
 			error("external filter '%s' failed", cmd);
-			kill_multi_file_filter((struct subprocess_entry *)entry);
+			subprocess_stop((struct subprocess_entry *)entry);
+			free(entry);
 		}
 	} else {
 		strbuf_swap(dst, &nbuf);
-- 
2.12.0.windows.1.31.g1548525701.dirty


^ permalink raw reply	[flat|threaded] 27+ messages in thread

* [PATCH v5 7/8] sub-process: move sub-process functions into separate files
  2017-04-07 12:03 [PATCH v5 0/8] refactor the filter process code into a reusable module Ben Peart
                   ` (5 preceding siblings ...)
  2017-04-07 12:03 ` [PATCH v5 6/8] convert: rename reusable sub-process functions Ben Peart
@ 2017-04-07 12:03 ` Ben Peart
  2017-04-10 12:41   ` Lars Schneider
  2017-04-07 12:03 ` [PATCH v5 8/8] convert: Update subprocess_read_status to not die on EOF Ben Peart
  7 siblings, 1 reply; 27+ messages in thread
From: Ben Peart @ 2017-04-07 12:03 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, christian.couder, larsxschneider

Move the sub-proces functions into sub-process.h/c.  Add documentation
for the new module in Documentation/technical/api-sub-process.txt

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 Documentation/technical/api-sub-process.txt |  54 +++++++++++++
 Makefile                                    |   1 +
 convert.c                                   | 119 +---------------------------
 sub-process.c                               | 116 +++++++++++++++++++++++++++
 sub-process.h                               |  46 +++++++++++
 5 files changed, 218 insertions(+), 118 deletions(-)
 create mode 100644 Documentation/technical/api-sub-process.txt
 create mode 100644 sub-process.c
 create mode 100644 sub-process.h

diff --git a/Documentation/technical/api-sub-process.txt b/Documentation/technical/api-sub-process.txt
new file mode 100644
index 0000000000..eb5005aa72
--- /dev/null
+++ b/Documentation/technical/api-sub-process.txt
@@ -0,0 +1,54 @@
+sub-process API
+===============
+
+The sub-process API makes it possible to run background sub-processes
+that should run until the git command exits and communicate with it
+through stdin and stdout.  This reduces the overhead of having to fork
+a new process each time it needs to be communicated with.
+
+The sub-processes are kept in a hashmap by command name and looked up
+via the subprocess_find_entry function.  If an existing instance can not
+be found then a new process should be created and started.  When the
+parent git command terminates, all sub-processes are also terminated.
+
+This API is based on the run-command API.
+
+Data structures
+---------------
+
+* `struct subprocess_entry`
+
+The sub-process structure.  Members should not be accessed directly.
+
+Types
+-----
+
+'int(*subprocess_start_fn)(struct subprocess_entry *entry)'::
+
+	User-supplied function to initialize the sub-process.  This is
+	typically used to negoiate the interface version and capabilities.
+
+
+Functions
+---------
+
+`subprocess_start`::
+
+	Start a subprocess and add it to the subprocess hashmap.
+
+`subprocess_stop`::
+
+	Kill a subprocess and remove it from the subprocess hashmap.
+
+`subprocess_find_entry`::
+
+	Find a subprocess in the subprocess hashmap.
+
+`subprocess_get_child_process`::
+
+	Get the underlying `struct child_process` from a subprocess.
+
+`subprocess_read_status`::
+
+	Helper function to read packets looking for the last "status=<foo>"
+	key/value pair.
diff --git a/Makefile b/Makefile
index 9f8b35ad41..add945b560 100644
--- a/Makefile
+++ b/Makefile
@@ -838,6 +838,7 @@ LIB_OBJS += streaming.o
 LIB_OBJS += string-list.o
 LIB_OBJS += submodule.o
 LIB_OBJS += submodule-config.o
+LIB_OBJS += sub-process.o
 LIB_OBJS += symlinks.o
 LIB_OBJS += tag.o
 LIB_OBJS += tempfile.o
diff --git a/convert.c b/convert.c
index 235a6a5279..baa41da760 100644
--- a/convert.c
+++ b/convert.c
@@ -4,6 +4,7 @@
 #include "quote.h"
 #include "sigchain.h"
 #include "pkt-line.h"
+#include "sub-process.h"
 
 /*
  * convert.c - convert a file when checking it out and checking it in.
@@ -496,86 +497,11 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le
 #define CAP_CLEAN    (1u<<0)
 #define CAP_SMUDGE   (1u<<1)
 
-struct subprocess_entry {
-	struct hashmap_entry ent; /* must be the first member! */
-	const char *cmd;
-	struct child_process process;
-};
-
 struct cmd2process {
 	struct subprocess_entry subprocess; /* must be the first member! */
 	unsigned int supported_capabilities;
 };
 
-static int subprocess_map_initialized;
-static struct hashmap subprocess_map;
-
-static int cmd2process_cmp(const struct subprocess_entry *e1,
-			   const struct subprocess_entry *e2,
-			   const void *unused)
-{
-	return strcmp(e1->cmd, e2->cmd);
-}
-
-static struct subprocess_entry *subprocess_find_entry(const char *cmd)
-{
-	struct subprocess_entry key;
-
-	if (!subprocess_map_initialized) {
-		subprocess_map_initialized = 1;
-		hashmap_init(&subprocess_map, (hashmap_cmp_fn)cmd2process_cmp, 0);
-		return NULL;
-	}
-
-	hashmap_entry_init(&key, strhash(cmd));
-	key.cmd = cmd;
-	return hashmap_get(&subprocess_map, &key, NULL);
-}
-
-static void subprocess_read_status(int fd, struct strbuf *status)
-{
-	struct strbuf **pair;
-	char *line;
-	for (;;) {
-		line = packet_read_line(fd, NULL);
-		if (!line)
-			break;
-		pair = strbuf_split_str(line, '=', 2);
-		if (pair[0] && pair[0]->len && pair[1]) {
-			/* the last "status=<foo>" line wins */
-			if (!strcmp(pair[0]->buf, "status=")) {
-				strbuf_reset(status);
-				strbuf_addbuf(status, pair[1]);
-			}
-		}
-		strbuf_list_free(pair);
-	}
-}
-
-static void subprocess_stop(struct subprocess_entry *entry)
-{
-	if (!entry)
-		return;
-
-	entry->process.clean_on_exit = 0;
-	kill(entry->process.pid, SIGTERM);
-	finish_command(&entry->process);
-
-	hashmap_remove(&subprocess_map, entry, NULL);
-	free(entry);
-}
-
-static void subprocess_exit_handler(struct child_process *process)
-{
-	sigchain_push(SIGPIPE, SIG_IGN);
-	/* Closing the pipe signals the subprocess to initiate a shutdown. */
-	close(process->in);
-	close(process->out);
-	sigchain_pop(SIGPIPE);
-	/* Finish command will wait until the shutdown is complete. */
-	finish_command(process);
-}
-
 static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
 {
 	int err;
@@ -639,49 +565,6 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
 	return err;
 }
 
-typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
-int subprocess_start(struct subprocess_entry *entry, const char *cmd,
-	subprocess_start_fn startfn)
-{
-	int err;
-	struct child_process *process;
-	const char *argv[] = { cmd, NULL };
-
-	if (!subprocess_map_initialized) {
-		subprocess_map_initialized = 1;
-		hashmap_init(&subprocess_map, (hashmap_cmp_fn)cmd2process_cmp, 0);
-	}
-
-	entry->cmd = cmd;
-	process = &entry->process;
-
-	child_process_init(process);
-	process->argv = argv;
-	process->use_shell = 1;
-	process->in = -1;
-	process->out = -1;
-	process->clean_on_exit = 1;
-	process->clean_on_exit_handler = subprocess_exit_handler;
-
-	err = start_command(process);
-	if (err) {
-		error("cannot fork to run subprocess '%s'", cmd);
-		return err;
-	}
-
-	hashmap_entry_init(entry, strhash(cmd));
-
-	err = startfn(entry);
-	if (err) {
-		error("initialization for subprocess '%s' failed", cmd);
-		subprocess_stop(entry);
-		return err;
-	}
-
-	hashmap_add(&subprocess_map, entry);
-	return 0;
-}
-
 static int apply_multi_file_filter(const char *path, const char *src, size_t len,
 				   int fd, struct strbuf *dst, const char *cmd,
 				   const unsigned int wanted_capability)
diff --git a/sub-process.c b/sub-process.c
new file mode 100644
index 0000000000..60bb650012
--- /dev/null
+++ b/sub-process.c
@@ -0,0 +1,116 @@
+/*
+ * Generic implementation of background process infrastructure.
+ */
+#include "sub-process.h"
+#include "sigchain.h"
+#include "pkt-line.h"
+
+static int subprocess_map_initialized;
+static struct hashmap subprocess_map;
+
+static int cmd2process_cmp(const struct subprocess_entry *e1,
+			   const struct subprocess_entry *e2,
+			   const void *unused)
+{
+	return strcmp(e1->cmd, e2->cmd);
+}
+
+struct subprocess_entry *subprocess_find_entry(const char *cmd)
+{
+	struct subprocess_entry key;
+
+	if (!subprocess_map_initialized) {
+		subprocess_map_initialized = 1;
+		hashmap_init(&subprocess_map, (hashmap_cmp_fn)cmd2process_cmp, 0);
+		return NULL;
+	}
+
+	hashmap_entry_init(&key, strhash(cmd));
+	key.cmd = cmd;
+	return hashmap_get(&subprocess_map, &key, NULL);
+}
+
+void subprocess_read_status(int fd, struct strbuf *status)
+{
+	struct strbuf **pair;
+	char *line;
+	for (;;) {
+		line = packet_read_line(fd, NULL);
+		if (!line)
+			break;
+		pair = strbuf_split_str(line, '=', 2);
+		if (pair[0] && pair[0]->len && pair[1]) {
+			/* the last "status=<foo>" line wins */
+			if (!strcmp(pair[0]->buf, "status=")) {
+				strbuf_reset(status);
+				strbuf_addbuf(status, pair[1]);
+			}
+		}
+		strbuf_list_free(pair);
+	}
+}
+
+void subprocess_stop(struct subprocess_entry *entry)
+{
+	if (!entry)
+		return;
+
+	entry->process.clean_on_exit = 0;
+	kill(entry->process.pid, SIGTERM);
+	finish_command(&entry->process);
+
+	hashmap_remove(&subprocess_map, entry, NULL);
+}
+
+static void subprocess_exit_handler(struct child_process *process)
+{
+	sigchain_push(SIGPIPE, SIG_IGN);
+	/* Closing the pipe signals the subprocess to initiate a shutdown. */
+	close(process->in);
+	close(process->out);
+	sigchain_pop(SIGPIPE);
+	/* Finish command will wait until the shutdown is complete. */
+	finish_command(process);
+}
+
+int subprocess_start(struct subprocess_entry *entry, const char *cmd,
+	subprocess_start_fn startfn)
+{
+	int err;
+	struct child_process *process;
+	const char *argv[] = { cmd, NULL };
+
+	if (!subprocess_map_initialized) {
+		subprocess_map_initialized = 1;
+		hashmap_init(&subprocess_map, (hashmap_cmp_fn)cmd2process_cmp, 0);
+	}
+
+	entry->cmd = cmd;
+	process = &entry->process;
+
+	child_process_init(process);
+	process->argv = argv;
+	process->use_shell = 1;
+	process->in = -1;
+	process->out = -1;
+	process->clean_on_exit = 1;
+	process->clean_on_exit_handler = subprocess_exit_handler;
+
+	err = start_command(process);
+	if (err) {
+		error("cannot fork to run subprocess '%s'", cmd);
+		return err;
+	}
+
+	hashmap_entry_init(entry, strhash(cmd));
+
+	err = startfn(entry);
+	if (err) {
+		error("initialization for subprocess '%s' failed", cmd);
+		subprocess_stop(entry);
+		return err;
+	}
+
+	hashmap_add(&subprocess_map, entry);
+	return 0;
+}
diff --git a/sub-process.h b/sub-process.h
new file mode 100644
index 0000000000..0cf1760a0a
--- /dev/null
+++ b/sub-process.h
@@ -0,0 +1,46 @@
+#ifndef SUBPROCESS_H
+#define SUBPROCESS_H
+
+#include "git-compat-util.h"
+#include "hashmap.h"
+#include "run-command.h"
+
+/*
+ * Generic implementation of background process infrastructure.
+ * See Documentation/technical/api-background-process.txt.
+ */
+
+ /* data structures */
+
+struct subprocess_entry {
+	struct hashmap_entry ent; /* must be the first member! */
+	const char *cmd;
+	struct child_process process;
+};
+
+/* subprocess functions */
+
+typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
+int subprocess_start(struct subprocess_entry *entry, const char *cmd,
+		subprocess_start_fn startfn);
+
+void subprocess_stop(struct subprocess_entry *entry);
+
+struct subprocess_entry *subprocess_find_entry(const char *cmd);
+
+/* subprocess helper functions */
+
+static inline struct child_process *subprocess_get_child_process(
+		struct subprocess_entry *entry)
+{
+	return &entry->process;
+}
+
+/*
+ * Helper function that will read packets looking for "status=<foo>"
+ * key/value pairs and return the value from the last "status" packet
+ */
+
+void subprocess_read_status(int fd, struct strbuf *status);
+
+#endif
-- 
2.12.0.windows.1.31.g1548525701.dirty


^ permalink raw reply	[flat|threaded] 27+ messages in thread

* [PATCH v5 8/8] convert: Update subprocess_read_status to not die on EOF
  2017-04-07 12:03 [PATCH v5 0/8] refactor the filter process code into a reusable module Ben Peart
                   ` (6 preceding siblings ...)
  2017-04-07 12:03 ` [PATCH v5 7/8] sub-process: move sub-process functions into separate files Ben Peart
@ 2017-04-07 12:03 ` Ben Peart
  2017-04-10 12:48   ` Lars Schneider
  7 siblings, 1 reply; 27+ messages in thread
From: Ben Peart @ 2017-04-07 12:03 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, christian.couder, larsxschneider

Enable sub-processes to gracefully handle when the process dies by
updating subprocess_read_status to return an error on EOF instead of
dying.

Update apply_multi_file_filter to take advantage of the revised
subprocess_read_status.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 convert.c     | 10 ++++++++--
 sub-process.c | 10 +++++++---
 sub-process.h |  2 +-
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/convert.c b/convert.c
index baa41da760..9e181e27ad 100644
--- a/convert.c
+++ b/convert.c
@@ -629,7 +629,10 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	if (err)
 		goto done;
 
-	subprocess_read_status(process->out, &filter_status);
+	err = subprocess_read_status(process->out, &filter_status);
+	if (err)
+		goto done;
+
 	err = strcmp(filter_status.buf, "success");
 	if (err)
 		goto done;
@@ -638,7 +641,10 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	if (err)
 		goto done;
 
-	subprocess_read_status(process->out, &filter_status);
+	err = subprocess_read_status(process->out, &filter_status);
+	if (err)
+		goto done;
+
 	err = strcmp(filter_status.buf, "success");
 
 done:
diff --git a/sub-process.c b/sub-process.c
index 60bb650012..c5057cafcd 100644
--- a/sub-process.c
+++ b/sub-process.c
@@ -30,13 +30,15 @@ struct subprocess_entry *subprocess_find_entry(const char *cmd)
 	return hashmap_get(&subprocess_map, &key, NULL);
 }
 
-void subprocess_read_status(int fd, struct strbuf *status)
+int subprocess_read_status(int fd, struct strbuf *status)
 {
 	struct strbuf **pair;
 	char *line;
+	int len;
+
 	for (;;) {
-		line = packet_read_line(fd, NULL);
-		if (!line)
+		len = packet_read_line_gently(fd, NULL, &line);
+		if ((len == -1) || !line)
 			break;
 		pair = strbuf_split_str(line, '=', 2);
 		if (pair[0] && pair[0]->len && pair[1]) {
@@ -48,6 +50,8 @@ void subprocess_read_status(int fd, struct strbuf *status)
 		}
 		strbuf_list_free(pair);
 	}
+
+	return len == -1 ? len : 0;
 }
 
 void subprocess_stop(struct subprocess_entry *entry)
diff --git a/sub-process.h b/sub-process.h
index 0cf1760a0a..5a1eeeece0 100644
--- a/sub-process.h
+++ b/sub-process.h
@@ -41,6 +41,6 @@ static inline struct child_process *subprocess_get_child_process(
  * key/value pairs and return the value from the last "status" packet
  */
 
-void subprocess_read_status(int fd, struct strbuf *status);
+int subprocess_read_status(int fd, struct strbuf *status);
 
 #endif
-- 
2.12.0.windows.1.31.g1548525701.dirty


^ permalink raw reply	[flat|threaded] 27+ messages in thread

* Re: [PATCH v5 1/8] pkt-line: add packet_read_line_gently()
  2017-04-07 12:03 ` [PATCH v5 1/8] pkt-line: add packet_read_line_gently() Ben Peart
@ 2017-04-09 19:34   ` Lars Schneider
  0 siblings, 0 replies; 27+ messages in thread
From: Lars Schneider @ 2017-04-09 19:34 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, gitster, benpeart, christian.couder


> On 07 Apr 2017, at 14:03, Ben Peart <peartben@gmail.com> wrote:
> 
> Add packet_read_line_gently() to enable reading a line without dying on
> EOF.
> 
> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> ---
> pkt-line.c | 12 ++++++++++++
> pkt-line.h | 10 ++++++++++
> 2 files changed, 22 insertions(+)
> 
> diff --git a/pkt-line.c b/pkt-line.c
> index d4b6bfe076..58842544b4 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -323,6 +323,18 @@ char *packet_read_line(int fd, int *len_p)
> 	return packet_read_line_generic(fd, NULL, NULL, len_p);
> }
> 
> +int packet_read_line_gently(int fd, int *dst_len, char** dst_line)
> +{
> +	int len = packet_read(fd, NULL, NULL,
> +		packet_buffer, sizeof(packet_buffer),
> +		PACKET_READ_CHOMP_NEWLINE|PACKET_READ_GENTLE_ON_EOF);

Really minor nit: you could align the parameters according to fd
similar to the "packet_read_line_generic" code (tab width 8).

> +	if (dst_len)
> +		*dst_len = len;
> +	if (dst_line)
> +		*dst_line = (len > 0) ? packet_buffer : NULL;

Minor nit: The explicit check "len > 0" is necessary here as len can
be "-1". The original "packet_read_line_generic" just checks for 
"len". I think it would be nice if the code would be consistent and both
would check "len > 0".

> +	return len;
> +}
> +
> char *packet_read_line_buf(char **src, size_t *src_len, int *dst_len)
> {
> 	return packet_read_line_generic(-1, src, src_len, dst_len);
> diff --git a/pkt-line.h b/pkt-line.h
> index 18eac64830..12b18991f6 100644
> --- a/pkt-line.h
> +++ b/pkt-line.h
> @@ -74,6 +74,16 @@ int packet_read(int fd, char **src_buffer, size_t *src_len, char
> char *packet_read_line(int fd, int *size);
> 
> /*
> + * Convenience wrapper for packet_read that sets the PACKET_READ_GENTLE_ON_EOF
> + * and CHOMP_NEWLINE options. The return value specifies the number of bytes
> + * read into the buffer or -1 on truncated input. if the *dst_line parameter

s/if/If/


> + * is not NULL it will return NULL for a flush packet and otherwise points to

This sentences is a bit confusing to me. Maybe:

If the *dst_line parameter is not NULL, then it will point to a static buffer
(that may be overwritten by subsequent calls) or it will return NULL for a flush 
packet.

... feel free to completely ignore this as I am no native speaker.

> + * a static buffer (that may be overwritten by subsequent calls). If the size
> + * parameter is not NULL, the length of the packet is written to it.
> + */
> +int packet_read_line_gently(int fd, int *size, char** dst_line);
> +
> +/*
>  * Same as packet_read_line, but read from a buf rather than a descriptor;
>  * see packet_read for details on how src_* is used.
>  */

If you send another round then you could address the minor nits.
If not, then this patch as it is looks good to me.

Thanks,
Lars

^ permalink raw reply	[flat|threaded] 27+ messages in thread

* Re: [PATCH v5 2/8] convert: move packet_write_list() into pkt-line as packet_writel()
  2017-04-07 12:03 ` [PATCH v5 2/8] convert: move packet_write_list() into pkt-line as packet_writel() Ben Peart
@ 2017-04-09 19:43   ` Lars Schneider
  0 siblings, 0 replies; 27+ messages in thread
From: Lars Schneider @ 2017-04-09 19:43 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, gitster, benpeart, christian.couder


> On 07 Apr 2017, at 14:03, Ben Peart <peartben@gmail.com> wrote:
> 
> Add packet_writel() which writes multiple lines in a single call and
> then calls packet_flush_gently(). Update convert.c to use the new
> packet_writel() function from pkt-line.
> 
> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> ---
> convert.c  | 23 ++---------------------
> pkt-line.c | 19 +++++++++++++++++++
> pkt-line.h |  1 +
> 3 files changed, 22 insertions(+), 21 deletions(-)
> 
> diff --git a/convert.c b/convert.c
> index 8d652bf27c..793c29ebfd 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -521,25 +521,6 @@ static struct cmd2process *find_multi_file_filter_entry(struct hashmap *hashmap,
> 	return hashmap_get(hashmap, &key, NULL);
> }
> 
> -static int packet_write_list(int fd, const char *line, ...)
> -{
> -	va_list args;
> -	int err;
> -	va_start(args, line);
> -	for (;;) {
> -		if (!line)
> -			break;
> -		if (strlen(line) > LARGE_PACKET_DATA_MAX)
> -			return -1;
> -		err = packet_write_fmt_gently(fd, "%s\n", line);
> -		if (err)
> -			return err;
> -		line = va_arg(args, const char*);
> -	}
> -	va_end(args);
> -	return packet_flush_gently(fd);
> -}
> -
> static void read_multi_file_filter_status(int fd, struct strbuf *status)
> {
> 	struct strbuf **pair;
> @@ -616,7 +597,7 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
> 
> 	sigchain_push(SIGPIPE, SIG_IGN);
> 
> -	err = packet_write_list(process->in, "git-filter-client", "version=2", NULL);
> +	err = packet_writel(process->in, "git-filter-client", "version=2", NULL);
> 	if (err)
> 		goto done;
> 
> @@ -632,7 +613,7 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
> 	if (err)
> 		goto done;
> 
> -	err = packet_write_list(process->in, "capability=clean", "capability=smudge", NULL);
> +	err = packet_writel(process->in, "capability=clean", "capability=smudge", NULL);
> 
> 	for (;;) {
> 		cap_buf = packet_read_line(process->out, NULL);
> diff --git a/pkt-line.c b/pkt-line.c
> index 58842544b4..2788aa1af6 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -171,6 +171,25 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...)
> 	return status;
> }
> 
> +int packet_writel(int fd, const char *line, ...)
> +{
> +	va_list args;
> +	int err;
> +	va_start(args, line);
> +	for (;;) {
> +		if (!line)
> +			break;
> +		if (strlen(line) > LARGE_PACKET_DATA_MAX)
> +			return -1;
> +		err = packet_write_fmt_gently(fd, "%s\n", line);
> +		if (err)
> +			return err;
> +		line = va_arg(args, const char*);
> +	}
> +	va_end(args);
> +	return packet_flush_gently(fd);
> +}
> +
> static int packet_write_gently(const int fd_out, const char *buf, size_t size)
> {
> 	static char packet_write_buffer[LARGE_PACKET_MAX];
> diff --git a/pkt-line.h b/pkt-line.h
> index 12b18991f6..cb3eda9695 100644
> --- a/pkt-line.h
> +++ b/pkt-line.h
> @@ -25,6 +25,7 @@ void packet_buf_flush(struct strbuf *buf);
> void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
> int packet_flush_gently(int fd);
> int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
> +int packet_writel(int fd, const char *line, ...);
> int write_packetized_from_fd(int fd_in, int fd_out);
> int write_packetized_from_buf(const char *src_in, size_t len, int fd_out);
> 
> -- 
> 2.12.0.windows.1.31.g1548525701.dirty

Looks good to me.

Thanks,
Lars

^ permalink raw reply	[flat|threaded] 27+ messages in thread

* Re: [PATCH v5 3/8] convert: Split start_multi_file_filter into two separate functions
  2017-04-07 12:03 ` [PATCH v5 3/8] convert: Split start_multi_file_filter into two separate functions Ben Peart
@ 2017-04-09 19:56   ` Lars Schneider
  2017-04-11 16:16   ` Jeff King
  1 sibling, 0 replies; 27+ messages in thread
From: Lars Schneider @ 2017-04-09 19:56 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, gitster, benpeart, christian.couder


> On 07 Apr 2017, at 14:03, Ben Peart <peartben@gmail.com> wrote:
> 
> To enable future reuse of the filter.<driver>.process infrastructure,
> split start_multi_file_filter into two separate parts.
> 
> start_multi_file_filter will now only contain the generic logic to
> manage the creation and tracking of the child process in a hashmap.
> 
> start_multi_file_filter_fn is a protocol specific initialization
> function that will negotiate the multi-file-filter interface version
> and capabilities.
> 
> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> ---
> convert.c | 63 ++++++++++++++++++++++++++++++++++++++-------------------------
> 1 file changed, 38 insertions(+), 25 deletions(-)
> 
> diff --git a/convert.c b/convert.c
> index 793c29ebfd..404757eac9 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -565,35 +565,14 @@ static void stop_multi_file_filter(struct child_process *process)
> 	finish_command(process);
> }
> 
> -static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, const char *cmd)
> +static int start_multi_file_filter_fn(struct cmd2process *entry)
> {
> 	int err;
> -	struct cmd2process *entry;
> -	struct child_process *process;
> -	const char *argv[] = { cmd, NULL };
> 	struct string_list cap_list = STRING_LIST_INIT_NODUP;
> 	char *cap_buf;
> 	const char *cap_name;
> -
> -	entry = xmalloc(sizeof(*entry));
> -	entry->cmd = cmd;
> -	entry->supported_capabilities = 0;
> -	process = &entry->process;
> -
> -	child_process_init(process);
> -	process->argv = argv;
> -	process->use_shell = 1;
> -	process->in = -1;
> -	process->out = -1;
> -	process->clean_on_exit = 1;
> -	process->clean_on_exit_handler = stop_multi_file_filter;
> -
> -	if (start_command(process)) {
> -		error("cannot fork to run external filter '%s'", cmd);
> -		return NULL;
> -	}
> -
> -	hashmap_entry_init(entry, strhash(cmd));
> +	struct child_process *process = &entry->process;
> +	const char *cmd = entry->cmd;
> 
> 	sigchain_push(SIGPIPE, SIG_IGN);
> 
> @@ -642,7 +621,41 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
> done:
> 	sigchain_pop(SIGPIPE);
> 
> -	if (err || errno == EPIPE) {
> +	if (err || errno == EPIPE)
> +		err = err ? err : errno;

Nice! I should have done this here, too:
https://github.com/git/git/blob/b14f27f91770e0f99f64135348977a0ce1c7993a/convert.c#L755

This is clearly a bug in my code. I'll send a patch shortly.


> +
> +	return err;
> +}
> +
> +static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, const char *cmd)
> +{
> +	int err;
> +	struct cmd2process *entry;
> +	struct child_process *process;
> +	const char *argv[] = { cmd, NULL };
> +
> +	entry = xmalloc(sizeof(*entry));
> +	entry->cmd = cmd;
> +	entry->supported_capabilities = 0;
> +	process = &entry->process;
> +
> +	child_process_init(process);
> +	process->argv = argv;
> +	process->use_shell = 1;
> +	process->in = -1;
> +	process->out = -1;
> +	process->clean_on_exit = 1;
> +	process->clean_on_exit_handler = stop_multi_file_filter;
> +
> +	if (start_command(process)) {
> +		error("cannot fork to run external filter '%s'", cmd);
> +		return NULL;
> +	}
> +
> +	hashmap_entry_init(entry, strhash(cmd));
> +
> +	err = start_multi_file_filter_fn(entry);
> +	if (err) {
> 		error("initialization for external filter '%s' failed", cmd);
> 		kill_multi_file_filter(hashmap, entry);
> 		return NULL;
> -- 
> 2.12.0.windows.1.31.g1548525701.dirty

Looks good to me.

Thanks,
Lars


^ permalink raw reply	[flat|threaded] 27+ messages in thread

* Re: [PATCH v5 4/8] convert: Separate generic structures and variables from the filter specific ones
  2017-04-07 12:03 ` [PATCH v5 4/8] convert: Separate generic structures and variables from the filter specific ones Ben Peart
@ 2017-04-10 10:18   ` Lars Schneider
  2017-04-17  3:31     ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Lars Schneider @ 2017-04-10 10:18 UTC (permalink / raw)
  To: Ben Peart; +Cc: Git Mailing List, gitster, benpeart, christian.couder


> On 07 Apr 2017, at 14:03, Ben Peart <peartben@gmail.com> wrote:
> 
> To enable future reuse of the filter.<driver>.process infrastructure,
> split the cmd2process structure into two separate parts.
> 
> subprocess_entry will now contain the generic data required to manage
> the creation and tracking of the child process in a hashmap. Also move
> all knowledge of the hashmap into the generic functions.
> 
> cmd2process is a filter protocol specific structure that is used to
> track the negotiated capabilities of the filter.
> 
> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> ---
> convert.c | 57 +++++++++++++++++++++++++++++++--------------------------
> 1 file changed, 31 insertions(+), 26 deletions(-)
> 
> diff --git a/convert.c b/convert.c
> index 404757eac9..f569026511 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -496,29 +496,40 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le
> #define CAP_CLEAN    (1u<<0)
> #define CAP_SMUDGE   (1u<<1)
> 
> -struct cmd2process {
> +struct subprocess_entry {
> 	struct hashmap_entry ent; /* must be the first member! */
> -	unsigned int supported_capabilities;
> 	const char *cmd;
> 	struct child_process process;
> };
> 
> +struct cmd2process {
> +	struct subprocess_entry subprocess; /* must be the first member! */
> +	unsigned int supported_capabilities;
> +};
> +
> static int cmd_process_map_initialized;
> static struct hashmap cmd_process_map;
> 
> -static int cmd2process_cmp(const struct cmd2process *e1,
> -			   const struct cmd2process *e2,
> +static int cmd2process_cmp(const struct subprocess_entry *e1,
> +			   const struct subprocess_entry *e2,
> 			   const void *unused)
> {
> 	return strcmp(e1->cmd, e2->cmd);
> }
> 
> -static struct cmd2process *find_multi_file_filter_entry(struct hashmap *hashmap, const char *cmd)
> +static struct subprocess_entry *find_multi_file_filter_entry(const char *cmd)

I am curious why you removed the hashmap parameter (here and in other pars of this patch). 
I know the parameter is not strictly necessary as the hashmap is a global variable anyways. 
However, I think it eases code maintainability in the long run if a function is "as pure as 
possible" (IOW does rely on global state as less as possible).

As I consider this personal preference I think either way is fine.


> {
> -	struct cmd2process key;
> +	struct subprocess_entry key;
> +
> +	if (!cmd_process_map_initialized) {
> +		cmd_process_map_initialized = 1;
> +		hashmap_init(&cmd_process_map, (hashmap_cmp_fn)cmd2process_cmp, 0);
> +		return NULL;
> +	}
> +

I am a bit in doubt about this one. If the process map was not initialized yet, then
I would expect "find_multi_file_filter_entry" to return NULL. Creating the hash
map as side effect seems a bit unexpected to me. 

The rest of the patch looks good to me.

Thanks,
Lars

^ permalink raw reply	[flat|threaded] 27+ messages in thread

* Re: [PATCH v5 5/8] convert: Update generic functions to only use generic data structures
  2017-04-07 12:03 ` [PATCH v5 5/8] convert: Update generic functions to only use generic data structures Ben Peart
@ 2017-04-10 12:05   ` Lars Schneider
  0 siblings, 0 replies; 27+ messages in thread
From: Lars Schneider @ 2017-04-10 12:05 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, gitster, benpeart, christian.couder


> On 07 Apr 2017, at 14:03, Ben Peart <peartben@gmail.com> wrote:
> 
> Update all functions that are going to be moved into a reusable module
> so that they only work with the reusable data structures.  Move code
> that is specific to the filter out into the filter specific functions.
> 
> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> ---
> convert.c | 46 ++++++++++++++++++++++++++++------------------
> 1 file changed, 28 insertions(+), 18 deletions(-)
> 
> diff --git a/convert.c b/convert.c
> index f569026511..747c0c363b 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -576,14 +576,15 @@ static void stop_multi_file_filter(struct child_process *process)
> 	finish_command(process);
> }
> 
> -static int start_multi_file_filter_fn(struct cmd2process *entry)
> +static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
> {
> 	int err;
> +	struct cmd2process *entry = (struct cmd2process *)subprocess;

I wonder if a "subprocess_entry" should rather have a generic data pointer
for any extra information. However, this is probably bikeshedding and I think
this solution is equally good.


> 	struct string_list cap_list = STRING_LIST_INIT_NODUP;
> 	char *cap_buf;
> 	const char *cap_name;
> -	struct child_process *process = &entry->subprocess.process;
> -	const char *cmd = entry->subprocess.cmd;
> +	struct child_process *process = &subprocess->process;
> +	const char *cmd = subprocess->cmd;
> 
> 	sigchain_push(SIGPIPE, SIG_IGN);
> 
> @@ -638,17 +639,21 @@ static int start_multi_file_filter_fn(struct cmd2process *entry)
> 	return err;
> }

> ...

> static int apply_multi_file_filter(const char *path, const char *src, size_t len,
> @@ -692,9 +698,13 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
> 	fflush(NULL);
> 
> 	if (!entry) {
> -		entry = start_multi_file_filter(cmd);
> -		if (!entry)
> +		entry = xmalloc(sizeof(*entry));
> +		entry->supported_capabilities = 0;

If we would use a generic data pointer then we could initialize
supported_capabilities in "start_multi_file_filter_fn".


Apart from the bike shedding above this patch looks good to me.
Minor nit: It breaks t0021 "invalid process filter must fail (and not hang!)"
test case but I assume this is corrected in a later patch.


Cheers,
Lars



^ permalink raw reply	[flat|threaded] 27+ messages in thread

* Re: [PATCH v5 6/8] convert: rename reusable sub-process functions
  2017-04-07 12:03 ` [PATCH v5 6/8] convert: rename reusable sub-process functions Ben Peart
@ 2017-04-10 12:11   ` Lars Schneider
  0 siblings, 0 replies; 27+ messages in thread
From: Lars Schneider @ 2017-04-10 12:11 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, gitster, benpeart, christian.couder


> On 07 Apr 2017, at 14:03, Ben Peart <peartben@gmail.com> wrote:
> 
> Do a mechanical rename of the functions that will become the reusable
> sub-process module.
> 
> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> ---
> convert.c | 53 +++++++++++++++++++++++++++--------------------------
> 1 file changed, 27 insertions(+), 26 deletions(-)
> 
> diff --git a/convert.c b/convert.c
> index 747c0c363b..235a6a5279 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -507,8 +507,8 @@ struct cmd2process {
> 	unsigned int supported_capabilities;
> };
> 

> ...

> 
> -static void kill_multi_file_filter(struct subprocess_entry *entry)
> +static void subprocess_stop(struct subprocess_entry *entry)
> {
> 	if (!entry)
> 		return;
> @@ -561,14 +561,14 @@ static void kill_multi_file_filter(struct subprocess_entry *entry)
> 	kill(entry->process.pid, SIGTERM);
> 	finish_command(&entry->process);
> 
> -	hashmap_remove(&cmd_process_map, entry, NULL);
> +	hashmap_remove(&subprocess_map, entry, NULL);
> 	free(entry);
> }

> ...

> @@ -777,7 +777,8 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
> 			 * Force shutdown and restart if another blob requires filtering.
> 			 */
> 			error("external filter '%s' failed", cmd);
> -			kill_multi_file_filter((struct subprocess_entry *)entry);
> +			subprocess_stop((struct subprocess_entry *)entry);
> +			free(entry);

I think we perform a double "free" here and in subprocess_stop(), no?
Everything else looks good to me!

Thanks,
Lars

^ permalink raw reply	[flat|threaded] 27+ messages in thread

* Re: [PATCH v5 7/8] sub-process: move sub-process functions into separate files
  2017-04-07 12:03 ` [PATCH v5 7/8] sub-process: move sub-process functions into separate files Ben Peart
@ 2017-04-10 12:41   ` Lars Schneider
  0 siblings, 0 replies; 27+ messages in thread
From: Lars Schneider @ 2017-04-10 12:41 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, gitster, benpeart, christian.couder


> On 07 Apr 2017, at 14:03, Ben Peart <peartben@gmail.com> wrote:
> 
> Move the sub-proces functions into sub-process.h/c.  Add documentation
> for the new module in Documentation/technical/api-sub-process.txt
> 
> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> ---
> Documentation/technical/api-sub-process.txt |  54 +++++++++++++
> Makefile                                    |   1 +
> convert.c                                   | 119 +---------------------------
> sub-process.c                               | 116 +++++++++++++++++++++++++++
> sub-process.h                               |  46 +++++++++++
> 5 files changed, 218 insertions(+), 118 deletions(-)
> create mode 100644 Documentation/technical/api-sub-process.txt
> create mode 100644 sub-process.c
> create mode 100644 sub-process.h
> 
> diff --git a/Documentation/technical/api-sub-process.txt b/Documentation/technical/api-sub-process.txt
> new file mode 100644
> index 0000000000..eb5005aa72
> --- /dev/null
> +++ b/Documentation/technical/api-sub-process.txt
> @@ -0,0 +1,54 @@
> +sub-process API
> +===============
> +
> +The sub-process API makes it possible to run background sub-processes
> +that should run until the git command exits and communicate with it
> +through stdin and stdout.  This reduces the overhead of having to fork
> +a new process each time it needs to be communicated with.

Minor nit, maybe:
The sub-process API makes it possible to run background sub-processes
for the entire lifetime of a Git invocation. If Git needs to communicate
with an external process multiple times, then this can reduces the process
invocation overhead. Git and the sub-process communicate through stdin and 
stdout. 

Feel free to ignore as this is bike-shedding.


> +
> +The sub-processes are kept in a hashmap by command name and looked up
> +via the subprocess_find_entry function.  If an existing instance can not
> +be found then a new process should be created and started.  When the
> +parent git command terminates, all sub-processes are also terminated.
> +
> +This API is based on the run-command API.
> +
> +Data structures
> +---------------
> +
> +* `struct subprocess_entry`
> +
> +The sub-process structure.  Members should not be accessed directly.
> +
> +Types
> +-----
> +
> +'int(*subprocess_start_fn)(struct subprocess_entry *entry)'::
> +
> +	User-supplied function to initialize the sub-process.  This is
> +	typically used to negoiate the interface version and capabilities.
s/negoiate/negotiate/


> +
> +
> +Functions
> +---------
> +
> +`subprocess_start`::
> +
> +	Start a subprocess and add it to the subprocess hashmap.
> +
> +`subprocess_stop`::
> +
> +	Kill a subprocess and remove it from the subprocess hashmap.
> +
> +`subprocess_find_entry`::
> +
> +	Find a subprocess in the subprocess hashmap.

As mentioned in an earlier review, this function also initializes the
hashmap as (maybe to the user unexpected?) side effect.

http://public-inbox.org/git/48FA4601-0819-4DE2-943A-7A791BA7C583@gmail.com/


> +
> +`subprocess_get_child_process`::
> +
> +	Get the underlying `struct child_process` from a subprocess.
> +
> +`subprocess_read_status`::
> +
> +	Helper function to read packets looking for the last "status=<foo>"
> +	key/value pair.
> diff --git a/Makefile b/Makefile
> index 9f8b35ad41..add945b560 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -838,6 +838,7 @@ LIB_OBJS += streaming.o
> LIB_OBJS += string-list.o
> LIB_OBJS += submodule.o
> LIB_OBJS += submodule-config.o
> +LIB_OBJS += sub-process.o
> LIB_OBJS += symlinks.o
> LIB_OBJS += tag.o
> LIB_OBJS += tempfile.o
> diff --git a/convert.c b/convert.c
> index 235a6a5279..baa41da760 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -4,6 +4,7 @@
> #include "quote.h"
> #include "sigchain.h"
> #include "pkt-line.h"
> +#include "sub-process.h"
> 
> /*
>  * convert.c - convert a file when checking it out and checking it in.
> @@ -496,86 +497,11 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le
> #define CAP_CLEAN    (1u<<0)
> #define CAP_SMUDGE   (1u<<1)
> 
> -struct subprocess_entry {
> -	struct hashmap_entry ent; /* must be the first member! */
> -	const char *cmd;
> -	struct child_process process;
> -};
> -
> struct cmd2process {
> 	struct subprocess_entry subprocess; /* must be the first member! */
> 	unsigned int supported_capabilities;
> };
> 
> -static int subprocess_map_initialized;
> -static struct hashmap subprocess_map;
> -
> -static int cmd2process_cmp(const struct subprocess_entry *e1,
> -			   const struct subprocess_entry *e2,
> -			   const void *unused)
> -{
> -	return strcmp(e1->cmd, e2->cmd);
> -}
> -
> -static struct subprocess_entry *subprocess_find_entry(const char *cmd)
> -{
> -	struct subprocess_entry key;
> -
> -	if (!subprocess_map_initialized) {
> -		subprocess_map_initialized = 1;
> -		hashmap_init(&subprocess_map, (hashmap_cmp_fn)cmd2process_cmp, 0);
> -		return NULL;
> -	}
> -
> -	hashmap_entry_init(&key, strhash(cmd));
> -	key.cmd = cmd;
> -	return hashmap_get(&subprocess_map, &key, NULL);
> -}
> -
> -static void subprocess_read_status(int fd, struct strbuf *status)
> -{
> -	struct strbuf **pair;
> -	char *line;
> -	for (;;) {
> -		line = packet_read_line(fd, NULL);
> -		if (!line)
> -			break;
> -		pair = strbuf_split_str(line, '=', 2);
> -		if (pair[0] && pair[0]->len && pair[1]) {
> -			/* the last "status=<foo>" line wins */
> -			if (!strcmp(pair[0]->buf, "status=")) {
> -				strbuf_reset(status);
> -				strbuf_addbuf(status, pair[1]);
> -			}
> -		}
> -		strbuf_list_free(pair);
> -	}
> -}
> -
> -static void subprocess_stop(struct subprocess_entry *entry)
> -{
> -	if (!entry)
> -		return;
> -
> -	entry->process.clean_on_exit = 0;
> -	kill(entry->process.pid, SIGTERM);
> -	finish_command(&entry->process);
> -
> -	hashmap_remove(&subprocess_map, entry, NULL);
> -	free(entry);
> -}
> -
> -static void subprocess_exit_handler(struct child_process *process)
> -{
> -	sigchain_push(SIGPIPE, SIG_IGN);
> -	/* Closing the pipe signals the subprocess to initiate a shutdown. */
> -	close(process->in);
> -	close(process->out);
> -	sigchain_pop(SIGPIPE);
> -	/* Finish command will wait until the shutdown is complete. */
> -	finish_command(process);
> -}
> -
> static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
> {
> 	int err;
> @@ -639,49 +565,6 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
> 	return err;
> }
> 
> -typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
> -int subprocess_start(struct subprocess_entry *entry, const char *cmd,
> -	subprocess_start_fn startfn)
> -{
> -	int err;
> -	struct child_process *process;
> -	const char *argv[] = { cmd, NULL };
> -
> -	if (!subprocess_map_initialized) {
> -		subprocess_map_initialized = 1;
> -		hashmap_init(&subprocess_map, (hashmap_cmp_fn)cmd2process_cmp, 0);
> -	}
> -
> -	entry->cmd = cmd;
> -	process = &entry->process;
> -
> -	child_process_init(process);
> -	process->argv = argv;
> -	process->use_shell = 1;
> -	process->in = -1;
> -	process->out = -1;
> -	process->clean_on_exit = 1;
> -	process->clean_on_exit_handler = subprocess_exit_handler;
> -
> -	err = start_command(process);
> -	if (err) {
> -		error("cannot fork to run subprocess '%s'", cmd);
> -		return err;
> -	}
> -
> -	hashmap_entry_init(entry, strhash(cmd));
> -
> -	err = startfn(entry);
> -	if (err) {
> -		error("initialization for subprocess '%s' failed", cmd);
> -		subprocess_stop(entry);
> -		return err;
> -	}
> -
> -	hashmap_add(&subprocess_map, entry);
> -	return 0;
> -}
> -
> static int apply_multi_file_filter(const char *path, const char *src, size_t len,
> 				   int fd, struct strbuf *dst, const char *cmd,
> 				   const unsigned int wanted_capability)
> diff --git a/sub-process.c b/sub-process.c
> new file mode 100644
> index 0000000000..60bb650012
> --- /dev/null
> +++ b/sub-process.c
> @@ -0,0 +1,116 @@
> +/*
> + * Generic implementation of background process infrastructure.
> + */
> +#include "sub-process.h"
> +#include "sigchain.h"
> +#include "pkt-line.h"
> +
> +static int subprocess_map_initialized;
> +static struct hashmap subprocess_map;
> +
> +static int cmd2process_cmp(const struct subprocess_entry *e1,
> +			   const struct subprocess_entry *e2,
> +			   const void *unused)
> +{
> +	return strcmp(e1->cmd, e2->cmd);
> +}
> +
> +struct subprocess_entry *subprocess_find_entry(const char *cmd)
> +{
> +	struct subprocess_entry key;
> +
> +	if (!subprocess_map_initialized) {
> +		subprocess_map_initialized = 1;
> +		hashmap_init(&subprocess_map, (hashmap_cmp_fn)cmd2process_cmp, 0);
> +		return NULL;
> +	}
> +
> +	hashmap_entry_init(&key, strhash(cmd));
> +	key.cmd = cmd;
> +	return hashmap_get(&subprocess_map, &key, NULL);
> +}
> +
> +void subprocess_read_status(int fd, struct strbuf *status)
> +{
> +	struct strbuf **pair;
> +	char *line;
> +	for (;;) {
> +		line = packet_read_line(fd, NULL);
> +		if (!line)
> +			break;
> +		pair = strbuf_split_str(line, '=', 2);
> +		if (pair[0] && pair[0]->len && pair[1]) {
> +			/* the last "status=<foo>" line wins */
> +			if (!strcmp(pair[0]->buf, "status=")) {
> +				strbuf_reset(status);
> +				strbuf_addbuf(status, pair[1]);
> +			}
> +		}
> +		strbuf_list_free(pair);
> +	}
> +}
> +
> +void subprocess_stop(struct subprocess_entry *entry)
> +{
> +	if (!entry)
> +		return;
> +
> +	entry->process.clean_on_exit = 0;
> +	kill(entry->process.pid, SIGTERM);
> +	finish_command(&entry->process);
> +
> +	hashmap_remove(&subprocess_map, entry, NULL);
> +}
> +
> +static void subprocess_exit_handler(struct child_process *process)
> +{
> +	sigchain_push(SIGPIPE, SIG_IGN);
> +	/* Closing the pipe signals the subprocess to initiate a shutdown. */
> +	close(process->in);
> +	close(process->out);
> +	sigchain_pop(SIGPIPE);
> +	/* Finish command will wait until the shutdown is complete. */
> +	finish_command(process);
> +}
> +
> +int subprocess_start(struct subprocess_entry *entry, const char *cmd,
> +	subprocess_start_fn startfn)
> +{
> +	int err;
> +	struct child_process *process;
> +	const char *argv[] = { cmd, NULL };
> +
> +	if (!subprocess_map_initialized) {
> +		subprocess_map_initialized = 1;
> +		hashmap_init(&subprocess_map, (hashmap_cmp_fn)cmd2process_cmp, 0);
> +	}
> +
> +	entry->cmd = cmd;
> +	process = &entry->process;
> +
> +	child_process_init(process);
> +	process->argv = argv;
> +	process->use_shell = 1;
> +	process->in = -1;
> +	process->out = -1;
> +	process->clean_on_exit = 1;
> +	process->clean_on_exit_handler = subprocess_exit_handler;
> +
> +	err = start_command(process);
> +	if (err) {
> +		error("cannot fork to run subprocess '%s'", cmd);
> +		return err;
> +	}
> +
> +	hashmap_entry_init(entry, strhash(cmd));
> +
> +	err = startfn(entry);
> +	if (err) {
> +		error("initialization for subprocess '%s' failed", cmd);
> +		subprocess_stop(entry);
> +		return err;
> +	}
> +
> +	hashmap_add(&subprocess_map, entry);
> +	return 0;
> +}
> diff --git a/sub-process.h b/sub-process.h
> new file mode 100644
> index 0000000000..0cf1760a0a
> --- /dev/null
> +++ b/sub-process.h
> @@ -0,0 +1,46 @@
> +#ifndef SUBPROCESS_H
> +#define SUBPROCESS_H
> +
> +#include "git-compat-util.h"
> +#include "hashmap.h"
> +#include "run-command.h"
> +
> +/*
> + * Generic implementation of background process infrastructure.
> + * See Documentation/technical/api-background-process.txt.
> + */
> +
> + /* data structures */
> +
> +struct subprocess_entry {
> +	struct hashmap_entry ent; /* must be the first member! */
> +	const char *cmd;
> +	struct child_process process;
> +};
> +
> +/* subprocess functions */
> +
> +typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
> +int subprocess_start(struct subprocess_entry *entry, const char *cmd,
> +		subprocess_start_fn startfn);
> +
> +void subprocess_stop(struct subprocess_entry *entry);
> +
> +struct subprocess_entry *subprocess_find_entry(const char *cmd);
> +
> +/* subprocess helper functions */
> +
> +static inline struct child_process *subprocess_get_child_process(
> +		struct subprocess_entry *entry)
> +{
> +	return &entry->process;
> +}
> +
> +/*
> + * Helper function that will read packets looking for "status=<foo>"
> + * key/value pairs and return the value from the last "status" packet
> + */
> +
> +void subprocess_read_status(int fd, struct strbuf *status);
> +
> +#endif
> -- 
> 2.12.0.windows.1.31.g1548525701.dirty

Looks all good to me. This patch also fixes the failing t0021 "invalid 
process filter must fail (and not hang!)" introduced here:
http://public-inbox.org/git/5900F7F1-89D9-433E-A6C3-0AB27C815BE6@gmail.com/

TBH, I don't see why this fixes the test, though.

Thanks,
Lars

^ permalink raw reply	[flat|threaded] 27+ messages in thread

* Re: [PATCH v5 8/8] convert: Update subprocess_read_status to not die on EOF
  2017-04-07 12:03 ` [PATCH v5 8/8] convert: Update subprocess_read_status to not die on EOF Ben Peart
@ 2017-04-10 12:48   ` Lars Schneider
  0 siblings, 0 replies; 27+ messages in thread
From: Lars Schneider @ 2017-04-10 12:48 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, gitster, benpeart, christian.couder


> On 07 Apr 2017, at 14:03, Ben Peart <peartben@gmail.com> wrote:
> 
> Enable sub-processes to gracefully handle when the process dies by
> updating subprocess_read_status to return an error on EOF instead of
> dying.
> 
> Update apply_multi_file_filter to take advantage of the revised
> subprocess_read_status.
> 
> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> ---
> convert.c     | 10 ++++++++--
> sub-process.c | 10 +++++++---
> sub-process.h |  2 +-
> 3 files changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/convert.c b/convert.c
> index baa41da760..9e181e27ad 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -629,7 +629,10 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
> 	if (err)
> 		goto done;
> 
> -	subprocess_read_status(process->out, &filter_status);
> +	err = subprocess_read_status(process->out, &filter_status);
> +	if (err)
> +		goto done;
> +
> 	err = strcmp(filter_status.buf, "success");
> 	if (err)
> 		goto done;
> @@ -638,7 +641,10 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
> 	if (err)
> 		goto done;
> 
> -	subprocess_read_status(process->out, &filter_status);
> +	err = subprocess_read_status(process->out, &filter_status);
> +	if (err)
> +		goto done;
> +
> 	err = strcmp(filter_status.buf, "success");
> 
> done:
> diff --git a/sub-process.c b/sub-process.c
> index 60bb650012..c5057cafcd 100644
> --- a/sub-process.c
> +++ b/sub-process.c
> @@ -30,13 +30,15 @@ struct subprocess_entry *subprocess_find_entry(const char *cmd)
> 	return hashmap_get(&subprocess_map, &key, NULL);
> }
> 
> -void subprocess_read_status(int fd, struct strbuf *status)
> +int subprocess_read_status(int fd, struct strbuf *status)
> {
> 	struct strbuf **pair;
> 	char *line;
> +	int len;
> +
> 	for (;;) {
> -		line = packet_read_line(fd, NULL);
> -		if (!line)
> +		len = packet_read_line_gently(fd, NULL, &line);
> +		if ((len == -1) || !line)

Minor nit: Maybe "if ((len < 0) || !line)" ?


> 			break;
> 		pair = strbuf_split_str(line, '=', 2);
> 		if (pair[0] && pair[0]->len && pair[1]) {
> @@ -48,6 +50,8 @@ void subprocess_read_status(int fd, struct strbuf *status)
> 		}
> 		strbuf_list_free(pair);
> 	}
> +
> +	return len == -1 ? len : 0;

Minor nit: Maybe "return len < 0 ? len : 0;

> }
> 
> void subprocess_stop(struct subprocess_entry *entry)
> diff --git a/sub-process.h b/sub-process.h
> index 0cf1760a0a..5a1eeeece0 100644
> --- a/sub-process.h
> +++ b/sub-process.h
> @@ -41,6 +41,6 @@ static inline struct child_process *subprocess_get_child_process(
>  * key/value pairs and return the value from the last "status" packet
>  */
> 
> -void subprocess_read_status(int fd, struct strbuf *status);
> +int subprocess_read_status(int fd, struct strbuf *status);
> 
> #endif
> -- 
> 2.12.0.windows.1.31.g1548525701.dirty
> 

Looks good to me.

Thanks,
Lars


^ permalink raw reply	[flat|threaded] 27+ messages in thread

* Re: [PATCH v5 3/8] convert: Split start_multi_file_filter into two separate functions
  2017-04-07 12:03 ` [PATCH v5 3/8] convert: Split start_multi_file_filter into two separate functions Ben Peart
  2017-04-09 19:56   ` Lars Schneider
@ 2017-04-11 16:16   ` Jeff King
  2017-04-11 19:29     ` Lars Schneider
  1 sibling, 1 reply; 27+ messages in thread
From: Jeff King @ 2017-04-11 16:16 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, gitster, benpeart, christian.couder, larsxschneider

On Fri, Apr 07, 2017 at 08:03:49AM -0400, Ben Peart wrote:

> @@ -642,7 +621,41 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
>  done:
>  	sigchain_pop(SIGPIPE);
>  
> -	if (err || errno == EPIPE) {
> +	if (err || errno == EPIPE)
> +		err = err ? err : errno;
> +
> +	return err;
> +}

This isn't a new problem introduced by your patch, but this use of errno
seems funny to me. Specifically:

  1. Do we need to save errno before calling sigchain_pop()? It's making
     syscalls (though admittedly they are unlikely to fail).

  2. If err is 0, then nothing failed. Who would have set errno? Aren't
     we reading whatever cruft happened to be in errno before the
     function started?

     So I'm confused about what case would trigger on this errno check
     at all.

Also, I'm not quite sure what the return value of the function is
supposed to be; usually we'd use 0 for success and negative for errors.
But here we might return a negative value that we got from the packet_*
functions, or we might return EPIPE, which is positive. I don't think
it's a huge deal because the caller checks "if (err)", but perhaps it
should be "-errno" for consistency.

-Peff

^ permalink raw reply	[flat|threaded] 27+ messages in thread

* Re: [PATCH v5 3/8] convert: Split start_multi_file_filter into two separate functions
  2017-04-11 16:16   ` Jeff King
@ 2017-04-11 19:29     ` Lars Schneider
  2017-04-11 19:37       ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Lars Schneider @ 2017-04-11 19:29 UTC (permalink / raw)
  To: Jeff King; +Cc: Ben Peart, git, gitster, benpeart, christian.couder


> On 11 Apr 2017, at 18:16, Jeff King <peff@peff.net> wrote:
> 
> On Fri, Apr 07, 2017 at 08:03:49AM -0400, Ben Peart wrote:
> 
>> @@ -642,7 +621,41 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons
>> done:
>> 	sigchain_pop(SIGPIPE);
>> 
>> -	if (err || errno == EPIPE) {
>> +	if (err || errno == EPIPE)
>> +		err = err ? err : errno;
>> +
>> +	return err;
>> +}
> 
> This isn't a new problem introduced by your patch, but this use of errno
> seems funny to me. Specifically:

I introduced these lines, therefore I try to answer :-)


>  1. Do we need to save errno before calling sigchain_pop()? It's making
>     syscalls (though admittedly they are unlikely to fail).

What if we add the following right before sigchain_pop() ?

	if (errno == EPIPE)
		err = -1;


>  2. If err is 0, then nothing failed. Who would have set errno? Aren't
>     we reading whatever cruft happened to be in errno before the
>     function started?

Yeah, looks like you're right:
https://www.securecoding.cert.org/confluence/pages/viewpage.action?pageId=6619179

According to this article we shouldn't even check *only* for errno. 
At least we should add
	errno = 0;
at the beginning of the function, no?

This means we have many areas in Git where we don't handle errno
correctly. E.g. right in convert.c where I stole code from:
https://github.com/git/git/commit/0c4dd67a048b39470b9b95912e4912fecc405a85#diff-7949b716ab0a83e8c422a0d6336f19d6R361

Should that be addressed?

- Lars

^ permalink raw reply	[flat|threaded] 27+ messages in thread

* Re: [PATCH v5 3/8] convert: Split start_multi_file_filter into two separate functions
  2017-04-11 19:29     ` Lars Schneider
@ 2017-04-11 19:37       ` Jeff King
  2017-04-11 20:01         ` Lars Schneider
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2017-04-11 19:37 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Ben Peart, git, gitster, benpeart, christian.couder

On Tue, Apr 11, 2017 at 09:29:36PM +0200, Lars Schneider wrote:

> >  1. Do we need to save errno before calling sigchain_pop()? It's making
> >     syscalls (though admittedly they are unlikely to fail).
> 
> What if we add the following right before sigchain_pop() ?
> 
> 	if (errno == EPIPE)
> 		err = -1;

Yes, that would be fine (though again, this runs against point 2 below).

> >  2. If err is 0, then nothing failed. Who would have set errno? Aren't
> >     we reading whatever cruft happened to be in errno before the
> >     function started?
> 
> Yeah, looks like you're right:
> https://www.securecoding.cert.org/confluence/pages/viewpage.action?pageId=6619179
> 
> According to this article we shouldn't even check *only* for errno. 
> At least we should add
> 	errno = 0;
> at the beginning of the function, no?

If you initialize errno to 0 right before a syscall, then yes, you can
trust it without checking the return value of the syscall. I wouldn't
trust it before calling more complicated functions, though. Not even
xwrite(), which may see EINTR and keep going (which is OK for checking
for EPIPE, but not checking generally for errno values).

> This means we have many areas in Git where we don't handle errno
> correctly. E.g. right in convert.c where I stole code from:
> https://github.com/git/git/commit/0c4dd67a048b39470b9b95912e4912fecc405a85#diff-7949b716ab0a83e8c422a0d6336f19d6R361
> 
> Should that be addressed?

That one is questionable code, but I don't think it behaves incorrectly.
After the write_in_full() call finishes, then either:

  1. write_err is 0, and conditional is a noop

  2. write_err is non-zero, and errno is relevant

I do think it would be more clear as:

  if (write_err && errno == EPIPE)
	write_err = 0;

similar to the code right below it.

-Peff

^ permalink raw reply	[flat|threaded] 27+ messages in thread

* Re: [PATCH v5 3/8] convert: Split start_multi_file_filter into two separate functions
  2017-04-11 19:37       ` Jeff King
@ 2017-04-11 20:01         ` Lars Schneider
  2017-04-11 20:05           ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Lars Schneider @ 2017-04-11 20:01 UTC (permalink / raw)
  To: Jeff King; +Cc: Ben Peart, git, gitster, benpeart, christian.couder


> On 11 Apr 2017, at 21:37, Jeff King <peff@peff.net> wrote:
> 
> On Tue, Apr 11, 2017 at 09:29:36PM +0200, Lars Schneider wrote:
> 
>>> 1. Do we need to save errno before calling sigchain_pop()? It's making
>>>    syscalls (though admittedly they are unlikely to fail).
>> 
>> What if we add the following right before sigchain_pop() ?
>> 
>> 	if (errno == EPIPE)
>> 		err = -1;
> 
> Yes, that would be fine (though again, this runs against point 2 below).
> 
>>> 2. If err is 0, then nothing failed. Who would have set errno? Aren't
>>>    we reading whatever cruft happened to be in errno before the
>>>    function started?
>> 
>> Yeah, looks like you're right:
>> https://www.securecoding.cert.org/confluence/pages/viewpage.action?pageId=6619179
>> 
>> According to this article we shouldn't even check *only* for errno. 
>> At least we should add
>> 	errno = 0;
>> at the beginning of the function, no?
> 
> If you initialize errno to 0 right before a syscall, then yes, you can
> trust it without checking the return value of the syscall. I wouldn't
> trust it before calling more complicated functions, though. Not even
> xwrite(), which may see EINTR and keep going (which is OK for checking
> for EPIPE, but not checking generally for errno values).

Should we remove all the errno checks here as we don't have any direct 
"write" etc syscalls anyways then?


- Lars

^ permalink raw reply	[flat|threaded] 27+ messages in thread

* Re: [PATCH v5 3/8] convert: Split start_multi_file_filter into two separate functions
  2017-04-11 20:01         ` Lars Schneider
@ 2017-04-11 20:05           ` Jeff King
  2017-04-20 17:27             ` Ben Peart
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2017-04-11 20:05 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Ben Peart, git, gitster, benpeart, christian.couder

On Tue, Apr 11, 2017 at 10:01:02PM +0200, Lars Schneider wrote:

> > If you initialize errno to 0 right before a syscall, then yes, you can
> > trust it without checking the return value of the syscall. I wouldn't
> > trust it before calling more complicated functions, though. Not even
> > xwrite(), which may see EINTR and keep going (which is OK for checking
> > for EPIPE, but not checking generally for errno values).
> 
> Should we remove all the errno checks here as we don't have any direct 
> "write" etc syscalls anyways then?

Yeah, you should be trusting the return value from the various
sub-functions. Usually you'd check "errno == EPIPE" only when you saw an
error return but you want to _ignore_ EPIPE. This is what
filter_buffer_or_fd() is doing.

But the code here is the opposite case. It definitely wants to treat
EPIPE as an error. But that should be happening already because any
EPIPE we get would come with an error-return from one of the
packet_write() functions.

So I would say that "err || errno == EPIPE" here can just become "err",
and ditto in apply_multi_file_filter().

-Peff

^ permalink raw reply	[flat|threaded] 27+ messages in thread

* Re: [PATCH v5 4/8] convert: Separate generic structures and variables from the filter specific ones
  2017-04-10 10:18   ` Lars Schneider
@ 2017-04-17  3:31     ` Junio C Hamano
  2017-04-18 16:38       ` Ben Peart
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2017-04-17  3:31 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Ben Peart, Git Mailing List, benpeart, christian.couder

Lars Schneider <larsxschneider@gmail.com> writes:

>> -static struct cmd2process *find_multi_file_filter_entry(struct hashmap *hashmap, const char *cmd)
>> +static struct subprocess_entry *find_multi_file_filter_entry(const char *cmd)
>
> I am curious why you removed the hashmap parameter (here and in other pars of this patch). 
> I know the parameter is not strictly necessary as the hashmap is a global variable anyways. 
> However, I think it eases code maintainability in the long run if a function is "as pure as 
> possible" (IOW does rely on global state as less as possible).

If the original relied on a global hashmap and this update kept the
code like so, I wouldn't mind the end result of this series
(i.e. rely on it being global).  But that is not the case.  It is
making the code worse by stopping passing the hashmap through the
callchain.

^ permalink raw reply	[flat|threaded] 27+ messages in thread

* Re: [PATCH v5 4/8] convert: Separate generic structures and variables from the filter specific ones
  2017-04-17  3:31     ` Junio C Hamano
@ 2017-04-18 16:38       ` Ben Peart
  2017-04-19  1:23         ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Ben Peart @ 2017-04-18 16:38 UTC (permalink / raw)
  To: Junio C Hamano, Lars Schneider; +Cc: Git Mailing List, benpeart, christian.couder

On 4/16/2017 11:31 PM, Junio C Hamano wrote:
> Lars Schneider <larsxschneider@gmail.com> writes:
>
>>> -static struct cmd2process *find_multi_file_filter_entry(struct hashmap *hashmap, const char *cmd)
>>> +static struct subprocess_entry *find_multi_file_filter_entry(const char *cmd)
>> I am curious why you removed the hashmap parameter (here and in other pars of this patch).
>> I know the parameter is not strictly necessary as the hashmap is a global variable anyways.
>> However, I think it eases code maintainability in the long run if a function is "as pure as
>> possible" (IOW does rely on global state as less as possible).
> If the original relied on a global hashmap and this update kept the
> code like so, I wouldn't mind the end result of this series
> (i.e. rely on it being global).  But that is not the case.  It is
> making the code worse by stopping passing the hashmap through the
> callchain.
I debated this myself but there were a couple of things that pushed me 
to this simpler model.  First, it simplified the interface a little as 
you don't need an explicit init call with state to detect if it has 
already been initialized and you don't have to pass the hashmap into the 
various functions.  Since it was already a global, I didn't feel like 
this made it any worse.

Second, since the hashmap key is the exact command that was executed if 
you ever had two hashmaps with the same key, you'd end up with two 
copies of the same background process running.  I couldn't come up with 
any scenario where you would want two copies of the exact same command 
running at the same time so again went with the simpler model.

While I was typing this, I realized that since the background process is 
a versioned interface, if there were two different parts of the code 
using background processes and if they both wanted to run the same 
background process and if they wanted to use different versions of the 
interface, then you could want two different copies.

That said, I tend to build for things that are needed now rather than 
those that might be needed in the future.  If I've missed a scenario, 
I'm happy to change the interface.

^ permalink raw reply	[flat|threaded] 27+ messages in thread

* Re: [PATCH v5 4/8] convert: Separate generic structures and variables from the filter specific ones
  2017-04-18 16:38       ` Ben Peart
@ 2017-04-19  1:23         ` Junio C Hamano
  2017-04-20 17:24           ` Ben Peart
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2017-04-19  1:23 UTC (permalink / raw)
  To: Ben Peart; +Cc: Lars Schneider, Git Mailing List, benpeart, christian.couder

Ben Peart <peartben@gmail.com> writes:

> On 4/16/2017 11:31 PM, Junio C Hamano wrote:
>> Lars Schneider <larsxschneider@gmail.com> writes:
>>
>>> However, I think it eases code maintainability in the long run if a function is "as pure as
>>> possible" (IOW does rely on global state as less as possible).
>>
>> If the original relied on a global hashmap and this update kept the
>> code like so, I wouldn't mind the end result of this series
>> (i.e. rely on it being global).  But that is not the case.  It is
>> making the code worse by stopping passing the hashmap through the
>> callchain.
>
> ...  Since it was already a global, I didn't feel
> like this made it any worse.

The code before your series can easily lose the globals with the
attached patch, _exactly_ because it is prepared to be reusable by a
new caller that supplies its own hashmap by passing it through the
callchain.  The child-process machinery Lars made for his conversion
thing, which you are trying to split out to make it reusable, can be
used by somebody other than apply_multi_file_filter() if you do not
lose the hashmap; what the new caller needs to do is to supply its
own hashmap so that they do not interact with the set of processes
used by Lars's conversion machinery.

If we want to lose the global _after_ applying this patch 4/8, don't
we have to essentially _undo_ 4/8?  How can it not be seen as making
it worse?


 convert.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/convert.c b/convert.c
index 8d652bf27c..ff831e85b8 100644
--- a/convert.c
+++ b/convert.c
@@ -503,9 +503,6 @@ struct cmd2process {
 	struct child_process process;
 };
 
-static int cmd_process_map_initialized;
-static struct hashmap cmd_process_map;
-
 static int cmd2process_cmp(const struct cmd2process *e1,
 			   const struct cmd2process *e2,
 			   const void *unused)
@@ -682,6 +679,9 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
 	struct strbuf filter_status = STRBUF_INIT;
 	const char *filter_type;
 
+static int cmd_process_map_initialized;
+static struct hashmap cmd_process_map;
+
 	if (!cmd_process_map_initialized) {
 		cmd_process_map_initialized = 1;
 		hashmap_init(&cmd_process_map, (hashmap_cmp_fn) cmd2process_cmp, 0);


^ permalink raw reply	[flat|threaded] 27+ messages in thread

* Re: [PATCH v5 4/8] convert: Separate generic structures and variables from the filter specific ones
  2017-04-19  1:23         ` Junio C Hamano
@ 2017-04-20 17:24           ` Ben Peart
  0 siblings, 0 replies; 27+ messages in thread
From: Ben Peart @ 2017-04-20 17:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Lars Schneider, Git Mailing List, benpeart, christian.couder

On 4/18/2017 9:23 PM, Junio C Hamano wrote:
> Ben Peart <peartben@gmail.com> writes:
>
>> On 4/16/2017 11:31 PM, Junio C Hamano wrote:
>>> Lars Schneider <larsxschneider@gmail.com> writes:
>>>
>>>> However, I think it eases code maintainability in the long run if a function is "as pure as
>>>> possible" (IOW does rely on global state as less as possible).
>>> If the original relied on a global hashmap and this update kept the
>>> code like so, I wouldn't mind the end result of this series
>>> (i.e. rely on it being global).  But that is not the case.  It is
>>> making the code worse by stopping passing the hashmap through the
>>> callchain.
>> ...  Since it was already a global, I didn't feel
>> like this made it any worse.
> The code before your series can easily lose the globals with the
> attached patch, _exactly_ because it is prepared to be reusable by a
> new caller that supplies its own hashmap by passing it through the
> callchain.  The child-process machinery Lars made for his conversion
> thing, which you are trying to split out to make it reusable, can be
> used by somebody other than apply_multi_file_filter() if you do not
> lose the hashmap; what the new caller needs to do is to supply its
> own hashmap so that they do not interact with the set of processes
> used by Lars's conversion machinery.
>
> If we want to lose the global _after_ applying this patch 4/8, don't
> we have to essentially _undo_ 4/8?  How can it not be seen as making
> it worse?

That's fine.  I'll flip it back in the next spin and enable multiple 
pools of sub-processes.  We'll still need 4/8 but it will be a smaller 
change.

>
>   convert.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/convert.c b/convert.c
> index 8d652bf27c..ff831e85b8 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -503,9 +503,6 @@ struct cmd2process {
>   	struct child_process process;
>   };
>   
> -static int cmd_process_map_initialized;
> -static struct hashmap cmd_process_map;
> -
>   static int cmd2process_cmp(const struct cmd2process *e1,
>   			   const struct cmd2process *e2,
>   			   const void *unused)
> @@ -682,6 +679,9 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
>   	struct strbuf filter_status = STRBUF_INIT;
>   	const char *filter_type;
>   
> +static int cmd_process_map_initialized;
> +static struct hashmap cmd_process_map;
> +
>   	if (!cmd_process_map_initialized) {
>   		cmd_process_map_initialized = 1;
>   		hashmap_init(&cmd_process_map, (hashmap_cmp_fn) cmd2process_cmp, 0);
>


^ permalink raw reply	[flat|threaded] 27+ messages in thread

* Re: [PATCH v5 3/8] convert: Split start_multi_file_filter into two separate functions
  2017-04-11 20:05           ` Jeff King
@ 2017-04-20 17:27             ` Ben Peart
  0 siblings, 0 replies; 27+ messages in thread
From: Ben Peart @ 2017-04-20 17:27 UTC (permalink / raw)
  To: Jeff King, Lars Schneider; +Cc: git, gitster, benpeart, christian.couder

On 4/11/2017 4:05 PM, Jeff King wrote:
> On Tue, Apr 11, 2017 at 10:01:02PM +0200, Lars Schneider wrote:
>
>>> If you initialize errno to 0 right before a syscall, then yes, you can
>>> trust it without checking the return value of the syscall. I wouldn't
>>> trust it before calling more complicated functions, though. Not even
>>> xwrite(), which may see EINTR and keep going (which is OK for checking
>>> for EPIPE, but not checking generally for errno values).
>> Should we remove all the errno checks here as we don't have any direct
>> "write" etc syscalls anyways then?
> Yeah, you should be trusting the return value from the various
> sub-functions. Usually you'd check "errno == EPIPE" only when you saw an
> error return but you want to _ignore_ EPIPE. This is what
> filter_buffer_or_fd() is doing.
>
> But the code here is the opposite case. It definitely wants to treat
> EPIPE as an error. But that should be happening already because any
> EPIPE we get would come with an error-return from one of the
> packet_write() functions.
>
> So I would say that "err || errno == EPIPE" here can just become "err",
> and ditto in apply_multi_file_filter().
I'll update it this way in the next spin of the patch series.
> -Peff


^ permalink raw reply	[flat|threaded] 27+ messages in thread

end of thread, back to index

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-07 12:03 [PATCH v5 0/8] refactor the filter process code into a reusable module Ben Peart
2017-04-07 12:03 ` [PATCH v5 1/8] pkt-line: add packet_read_line_gently() Ben Peart
2017-04-09 19:34   ` Lars Schneider
2017-04-07 12:03 ` [PATCH v5 2/8] convert: move packet_write_list() into pkt-line as packet_writel() Ben Peart
2017-04-09 19:43   ` Lars Schneider
2017-04-07 12:03 ` [PATCH v5 3/8] convert: Split start_multi_file_filter into two separate functions Ben Peart
2017-04-09 19:56   ` Lars Schneider
2017-04-11 16:16   ` Jeff King
2017-04-11 19:29     ` Lars Schneider
2017-04-11 19:37       ` Jeff King
2017-04-11 20:01         ` Lars Schneider
2017-04-11 20:05           ` Jeff King
2017-04-20 17:27             ` Ben Peart
2017-04-07 12:03 ` [PATCH v5 4/8] convert: Separate generic structures and variables from the filter specific ones Ben Peart
2017-04-10 10:18   ` Lars Schneider
2017-04-17  3:31     ` Junio C Hamano
2017-04-18 16:38       ` Ben Peart
2017-04-19  1:23         ` Junio C Hamano
2017-04-20 17:24           ` Ben Peart
2017-04-07 12:03 ` [PATCH v5 5/8] convert: Update generic functions to only use generic data structures Ben Peart
2017-04-10 12:05   ` Lars Schneider
2017-04-07 12:03 ` [PATCH v5 6/8] convert: rename reusable sub-process functions Ben Peart
2017-04-10 12:11   ` Lars Schneider
2017-04-07 12:03 ` [PATCH v5 7/8] sub-process: move sub-process functions into separate files Ben Peart
2017-04-10 12:41   ` Lars Schneider
2017-04-07 12:03 ` [PATCH v5 8/8] convert: Update subprocess_read_status to not die on EOF Ben Peart
2017-04-10 12:48   ` Lars Schneider

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox